History log of /llvm-project/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll (Results 1 – 25 of 89)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# 5a3299a6 26-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove some -verify-machineinstrs from tests (#117736)

We should leave these for EXPENSIVE_CHECKS builds. Some of these
were near the top of slowest tests.


Revision tags: llvmorg-19.1.4
# 6548b635 09-Nov-2024 Shilei Tian <i@tianshilei.me>

Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"

This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.


# ca33649a 08-Nov-2024 Shilei Tian <i@tianshilei.me>

Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"

This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both
hip and openmp buildbots.


# e215a1e2 08-Nov-2024 Shilei Tian <i@tianshilei.me>

[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)


# 6d7e51de 05-Nov-2024 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Extend type support for update_dpp intrinsic (#114597)

We can split 64-bit DPP as a post-RA pseudo if control values are
supported, but cannot handle other types.


Revision tags: llvmorg-19.1.3
# 3277c7cd 21-Oct-2024 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)

MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for

[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)

MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for VGPR limited
kernels. A kernel becomes VGPR limited if a total number of VGPRs per
SIMD / number of used VGPRs is more than a number of wave slots.

show more ...


Revision tags: llvmorg-19.1.2, llvmorg-19.1.1
# 8632e8bd 23-Sep-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix implicit vcc def to vcc_lo on wave32 targets (#109514)


Revision tags: llvmorg-19.1.0
# e55d6f5e 11-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)

Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifyi

[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889)

Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.

show more ...


# 16cda01d 05-Sep-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] V_SET_INACTIVE optimizations (#98864)

Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically b

[AMDGPU] V_SET_INACTIVE optimizations (#98864)

Optimize V_SET_INACTIVE by allow it to run in WWM.
Hence WWM sections are not broken up for inactive lane setting.
WWM V_SET_INACTIVE can typically be lower to V_CNDMASK.
Some cases require use of exec manipulation V_MOV as previous code.
GFX9 sees slight instruction count increase in edge cases due to
smaller constant bus.

Additionally avoid introducing exec manipulation and V_MOVs where
a source of V_SET_INACTIVE is the destination.
This is a common pattern as WWM register pre-allocation often
assigns the same register.

show more ...


# 126d6f27 04-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108)

Use poison for an unused input to the permlanex16 intrinsic, to improve
register allocation and avoid an unnecessary v_mov ins

[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108)

Use poison for an unused input to the permlanex16 intrinsic, to improve
register allocation and avoid an unnecessary v_mov instruction.

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1
# a8203291 26-Jul-2024 Changpeng Fang <changpeng.fang@amd.com>

[AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711)

They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize
feature from GFX10: https://github.c

[AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711)

They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize
feature from GFX10: https://github.com/llvm/llvm-project/pull/98400
The exception is when "target-features" are set to "+wavefrontsize32" or
"+wavefrontsize64", we still need to remove a wavefrontsize feature
before add a different one to make sure only one of them are present.

show more ...


Revision tags: llvmorg-20-init
# 229e1185 23-Jul-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Codegen support for constrained multi-dword sloads (#96163)

For targets that support xnack replay feature (gfx8+), the
multi-dword scalar loads shouldn't clobber any register that
holds the

[AMDGPU] Codegen support for constrained multi-dword sloads (#96163)

For targets that support xnack replay feature (gfx8+), the
multi-dword scalar loads shouldn't clobber any register that
holds the src address. The constrained version of the scalar
loads have the early clobber flag attached to the dst operand
to restrict RA from re-allocating any of the src regs for its
dst operand.

show more ...


# cf230e77 15-Jul-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Enable atomic optimizer for divergent i64 and double values (#96934)


# b1bcb7ca 15-Jul-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)

This reverts commit adaff46d087799

Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)

This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.

Drop the -O3 checks from default-attributes.hip. I don't know why they
are different on some bots but reverting this is far too disruptive.

show more ...


# adaff46d 15-Jul-2024 dyung <douglas.yung@sony.com>

Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)

This reverts commits 677cc15e0ff2e0

Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)

This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.

The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614

These bots have been broken for a day, so reverting to get everything
back to green.

show more ...


# 78bc1b64 14-Jul-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Move attributor into optimization pipeline (#83131)

Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.

AMDGPU: Move attributor into optimization pipeline (#83131)

Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.

Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.

show more ...


# 2a960716 08-Jul-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Cleanup bitcast spam in atomic optimizer (#96933)


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4
# 3cf539fb 04-Apr-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539)

Call generateWaitcnt unconditionally at the end of
SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to
ge

[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539)

Call generateWaitcnt unconditionally at the end of
SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to
generate a new waitcnt instruction it has the effect of combining or
removing redundant waitcnts that were already present. Tests show
various small improvements in waitcnt placement.

show more ...


Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 9e9907f1 17-Jan-2024 Fangrui Song <i@maskray.me>

[AMDGPU,test] Change llc -march= to -mtriple= (#75982)

Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while

[AMDGPU,test] Change llc -march= to -mtriple= (#75982)

Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```

show more ...


# 48f36c6e 25-Dec-2023 Acim Maravic <Acim.Maravic@Syrmia.com>

[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158)

Update DAG ISel to support 64bit versions S_FF1_I32_B64 and
S_FLBIT_I32_B664

---------

Co-authored-by: Acim Maravic <Acim.Maravic

[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158)

Update DAG ISel to support 64bit versions S_FF1_I32_B64 and
S_FLBIT_I32_B664

---------

Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>

show more ...


# ef067f52 15-Dec-2023 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830)

Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# fe8335ba 30-Oct-2023 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395)

This allows folding of 64-bit operands if fit into 32-bit. Fixes
https://github.com/llvm/llvm-project/issues/67781


Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4
# f09360d2 30-Aug-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Reduction and Scan are implemented using `Iterative`
and `DPP` strategy for `float` type.

Reviewed By: arsenm, #amdgpu

Different

[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Reduction and Scan are implemented using `Iterative`
and `DPP` strategy for `float` type.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D156301

show more ...


Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 7fa7a08f 19-Jul-2023 Jay Foad <jay.foad@amd.com>

[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)

Differential Revision: https://reviews.llvm.org/D155681


# 7a101798 30-Jun-2023 Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Revert "[AMDGPU] Mark mbcnt as convergent"

This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb.

The output of mbcnt does not depend on other active lanes, and hence it is not
convergent. T

Revert "[AMDGPU] Mark mbcnt as convergent"

This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb.

The output of mbcnt does not depend on other active lanes, and hence it is not
convergent. The original change was made as a possible fix for

https://github.com/ROCm-Developer-Tools/HIP/issues/3172

But changing mbcnt does not fix that issue.

Reviewed By: ruiling, foad, yaxunl

Differential Revision: https://reviews.llvm.org/D153953

show more ...


1234