atomic_optimizations_local_pointer.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/AMDGPU/atomic_optimizations_local

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# 5a3299a6	26-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Remove some -verify-machineinstrs from tests (#117736) We should leave these for EXPENSIVE_CHECKS builds. Some of these were near the top of slowest tests.
Revision tags: llvmorg-19.1.4
# 6548b635	09-Nov-2024	Shilei Tian <i@tianshilei.me>	Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)" This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
# ca33649a	08-Nov-2024	Shilei Tian <i@tianshilei.me>	Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)" This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
# e215a1e2	08-Nov-2024	Shilei Tian <i@tianshilei.me>	[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
# 6d7e51de	05-Nov-2024	Stanislav Mekhanoshin <rampitec@users.noreply.github.com>	[AMDGPU] Extend type support for update_dpp intrinsic (#114597) We can split 64-bit DPP as a post-RA pseudo if control values are supported, but cannot handle other types.
Revision tags: llvmorg-19.1.3
# 3277c7cd	21-Oct-2024	Stanislav Mekhanoshin <rampitec@users.noreply.github.com>	[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for [AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots. show more ...
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1
# 8632e8bd	23-Sep-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix implicit vcc def to vcc_lo on wave32 targets (#109514)
Revision tags: llvmorg-19.1.0
# e55d6f5e	11-Sep-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifyi [AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (#107889) Always generate v_cndmask_b32 instead of modifying exec around v_mov_b32. This is expected to be faster because modifying exec generally causes pipeline stalls. show more ...
# 16cda01d	05-Sep-2024	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] V_SET_INACTIVE optimizations (#98864) Optimize V_SET_INACTIVE by allow it to run in WWM. Hence WWM sections are not broken up for inactive lane setting. WWM V_SET_INACTIVE can typically b [AMDGPU] V_SET_INACTIVE optimizations (#98864) Optimize V_SET_INACTIVE by allow it to run in WWM. Hence WWM sections are not broken up for inactive lane setting. WWM V_SET_INACTIVE can typically be lower to V_CNDMASK. Some cases require use of exec manipulation V_MOV as previous code. GFX9 sees slight instruction count increase in edge cases due to smaller constant bus. Additionally avoid introducing exec manipulation and V_MOVs where a source of V_SET_INACTIVE is the destination. This is a common pattern as WWM register pre-allocation often assigns the same register. show more ...
# 126d6f27	04-Sep-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108) Use poison for an unused input to the permlanex16 intrinsic, to improve register allocation and avoid an unnecessary v_mov ins [AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108) Use poison for an unused input to the permlanex16 intrinsic, to improve register allocation and avoid an unnecessary v_mov instruction. show more ...
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1
# a8203291	26-Jul-2024	Changpeng Fang <changpeng.fang@amd.com>	[AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711) They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize feature from GFX10: https://github.c [AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711) They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize feature from GFX10: https://github.com/llvm/llvm-project/pull/98400 The exception is when "target-features" are set to "+wavefrontsize32" or "+wavefrontsize64", we still need to remove a wavefrontsize feature before add a different one to make sure only one of them are present. show more ...
Revision tags: llvmorg-20-init
# 229e1185	23-Jul-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Codegen support for constrained multi-dword sloads (#96163) For targets that support xnack replay feature (gfx8+), the multi-dword scalar loads shouldn't clobber any register that holds the [AMDGPU] Codegen support for constrained multi-dword sloads (#96163) For targets that support xnack replay feature (gfx8+), the multi-dword scalar loads shouldn't clobber any register that holds the src address. The constrained version of the scalar loads have the early clobber flag attached to the dst operand to restrict RA from re-allocating any of the src regs for its dst operand. show more ...
# cf230e77	15-Jul-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Enable atomic optimizer for divergent i64 and double values (#96934)
# b1bcb7ca	15-Jul-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851) This reverts commit adaff46d087799 Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851) This reverts commit adaff46d087799072438dd744b038e6fd50a2d78. Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive. show more ...
# adaff46d	15-Jul-2024	dyung <douglas.yung@sony.com>	Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851) This reverts commits 677cc15e0ff2e0 Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851) This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and 78bc1b64a6dc3fb6191355a5e1b502be8b3668e7. The test CodeGenHIP/default-attributes.hip is failing on multiple bots even after the attempted fix including the following: - https://lab.llvm.org/buildbot/#/builders/3/builds/1473 - https://lab.llvm.org/buildbot/#/builders/65/builds/1380 - https://lab.llvm.org/buildbot/#/builders/161/builds/595 - https://lab.llvm.org/buildbot/#/builders/154/builds/1372 - https://lab.llvm.org/buildbot/#/builders/133/builds/1547 - https://lab.llvm.org/buildbot/#/builders/81/builds/755 - https://lab.llvm.org/buildbot/#/builders/40/builds/570 - https://lab.llvm.org/buildbot/#/builders/13/builds/748 - https://lab.llvm.org/buildbot/#/builders/12/builds/1845 - https://lab.llvm.org/buildbot/#/builders/11/builds/1695 - https://lab.llvm.org/buildbot/#/builders/190/builds/1829 - https://lab.llvm.org/buildbot/#/builders/193/builds/962 - https://lab.llvm.org/buildbot/#/builders/23/builds/991 - https://lab.llvm.org/buildbot/#/builders/144/builds/2256 - https://lab.llvm.org/buildbot/#/builders/46/builds/1614 These bots have been broken for a day, so reverting to get everything back to green. show more ...
# 78bc1b64	14-Jul-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Move attributor into optimization pipeline (#83131) Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. AMDGPU: Move attributor into optimization pipeline (#83131) Removing it from the codegen pipeline induces a lot of test churn because llc is no longer optimizing out implicit arguments to kernels. Mostly mechanical, but there are some creative test updates. I preferred to take the changes as-is in tests where the ABI isn't relevant. In cases where it's more relevant, or the optimize out logic was too ingrained in the test, I pre-run the optimization. Some cases manually add attributes to disable inputs. show more ...
# 2a960716	08-Jul-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Cleanup bitcast spam in atomic optimizer (#96933)
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4
# 3cf539fb	04-Apr-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539) Call generateWaitcnt unconditionally at the end of SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to ge [AMDGPU] Combine or remove redundant waitcnts at the end of each MBB (#87539) Call generateWaitcnt unconditionally at the end of SIInsertWaitcnts::insertWaitcntInBlock. Even if we don't need to generate a new waitcnt instruction it has the effect of combining or removing redundant waitcnts that were already present. Tests show various small improvements in waitcnt placement. show more ...
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 9e9907f1	17-Jan-2024	Fangrui Song <i@maskray.me>	[AMDGPU,test] Change llc -march= to -mtriple= (#75982) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while [AMDGPU,test] Change llc -march= to -mtriple= (#75982) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ``` show more ...
# 48f36c6e	25-Dec-2023	Acim Maravic <Acim.Maravic@Syrmia.com>	[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158) Update DAG ISel to support 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B664 --------- Co-authored-by: Acim Maravic <Acim.Maravic [LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158) Update DAG ISel to support 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B664 --------- Co-authored-by: Acim Maravic <Acim.Maravic@amd.com> show more ...
# ef067f52	15-Dec-2023	Pierre van Houtryve <pierre.vanhoutryve@amd.com>	[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830) Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# fe8335ba	30-Oct-2023	Stanislav Mekhanoshin <rampitec@users.noreply.github.com>	[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395) This allows folding of 64-bit operands if fit into 32-bit. Fixes https://github.com/llvm/llvm-project/issues/67781
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4
# f09360d2	30-Aug-2023	Pravin Jagtap <Pravin.Jagtap@amd.com>	[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer. Reduction and Scan are implemented using `Iterative` and `DPP` strategy for `float` type. Reviewed By: arsenm, #amdgpu Different [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer. Reduction and Scan are implemented using `Iterative` and `DPP` strategy for `float` type. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D156301 show more ...
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 7fa7a08f	19-Jul-2023	Jay Foad <jay.foad@amd.com>	[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) Differential Revision: https://reviews.llvm.org/D155681
# 7a101798	30-Jun-2023	Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>	Revert "[AMDGPU] Mark mbcnt as convergent" This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb. The output of mbcnt does not depend on other active lanes, and hence it is not convergent. T Revert "[AMDGPU] Mark mbcnt as convergent" This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb. The output of mbcnt does not depend on other active lanes, and hence it is not convergent. The original change was made as a possible fix for https://github.com/ROCm-Developer-Tools/HIP/issues/3172 But changing mbcnt does not fix that issue. Reviewed By: ruiling, foad, yaxunl Differential Revision: https://reviews.llvm.org/D153953 show more ...
12 3 4