AMDGPUAtomicOptimizer.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 8e702735	24-Jan-2025	Jeremy Morse <jeremy.morse@sony.com>	[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and sim [NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few). show more ...
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# 83cbb170	03-Dec-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Refine AMDGPUAtomicOptimizerImpl class. NFC. (#118302) Use references instead of pointers for most state and common up some of the initialization between the legacy and new pass manager pa [AMDGPU] Refine AMDGPUAtomicOptimizerImpl class. NFC. (#118302) Use references instead of pointers for most state and common up some of the initialization between the legacy and new pass manager paths. show more ...
Revision tags: llvmorg-19.1.4, llvmorg-19.1.3
# 922992a2	18-Oct-2024	Jay Foad <jay.foad@amd.com>	Fix typo "instrinsic" (#112899)
# 85c17e40	17-Oct-2024	Jay Foad <jay.foad@amd.com>	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsi [LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call. show more ...
Revision tags: llvmorg-19.1.2
# fa789dff	11-Oct-2024	Rahul Joshi <rjoshi@nvidia.com>	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is a [NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation). show more ...
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0
# 126d6f27	04-Sep-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108) Use poison for an unused input to the permlanex16 intrinsic, to improve register allocation and avoid an unnecessary v_mov ins [AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108) Use poison for an unused input to the permlanex16 intrinsic, to improve register allocation and avoid an unnecessary v_mov instruction. show more ...
Revision tags: llvmorg-19.1.0-rc4
# c8ba3170	23-Aug-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Remove comment outdated by #96933
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# cf230e77	15-Jul-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Enable atomic optimizer for divergent i64 and double values (#96934)
# ae63db78	13-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Re-enable atomic optimization of uniform fadd/fsub with result (#97604) Fix various problems to do with the first active lane of the result of optimized fp atomics, as explained in the com [AMDGPU] Re-enable atomic optimization of uniform fadd/fsub with result (#97604) Fix various problems to do with the first active lane of the result of optimized fp atomics, as explained in the comment. Fixes #97554 show more ...
# 5ab9e003	08-Jul-2024	Jie Fu <jiefu@tencent.com>	[AMDGPU] Fix -Wunused-variable in AMDGPUAtomicOptimizer.cpp (NFC) /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:688:18: error: unused variable 'TyBitWidth' [-Werror,-Wunused-variabl [AMDGPU] Fix -Wunused-variable in AMDGPUAtomicOptimizer.cpp (NFC) /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:688:18: error: unused variable 'TyBitWidth' [-Werror,-Wunused-variable] const unsigned TyBitWidth = DL->getTypeSizeInBits(Ty); ^ 1 error generated. show more ...
# 2a960716	08-Jul-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Cleanup bitcast spam in atomic optimizer (#96933)
# b76dd4ed	03-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Disable atomic optimization of fadd/fsub with result (#96479) An atomic fadd instruction like this should return %x: ; value at %ptr is %x %r = atomicrmw fadd ptr %ptr, float %y [AMDGPU] Disable atomic optimization of fadd/fsub with result (#96479) An atomic fadd instruction like this should return %x: ; value at %ptr is %x %r = atomicrmw fadd ptr %ptr, float %y After atomic optimization, if %y is uniform, the result is calculated as %r = %x + * %y * +0.0. This has a couple of problems: 1. If %y is Inf or NaN, this will return NaN instead of %x. 2. If %x is -0.0 and %y is positive, this will return +0.0 instead of -0.0. Avoid these problems by disabling the "%y is uniform" path if there are any uses of the result. show more ...
# 43b98882	02-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Use nan as the identity for atomicrmw fmax/fmin (#97411) atomicrmw fmax/fmin perform the same operation as llvm.maxnum/minnum which return the other operand if one operand is nan. This mea [AMDGPU] Use nan as the identity for atomicrmw fmax/fmin (#97411) atomicrmw fmax/fmin perform the same operation as llvm.maxnum/minnum which return the other operand if one operand is nan. This means that, in the presence of nan arguments, +/- inf is not an identity for these operations but nan is (at least if you don't care about nan payloads). show more ...
# 9df71d76	28-Jun-2024	Nikita Popov <npopov@redhat.com>	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, re [IR] Add getDataLayout() helpers to Function and GlobalValue (#96919) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern. show more ...
# 35f7b60a	26-Jun-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725) These are incremental changes over #89217 , with core logic being the same. This patch along wit [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer. show more ...
# 5feb32ba	25-Jun-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass t [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com> show more ...
# d75f9dd1	24-Jun-2024	Stephen Tozer <stephen.tozer@sony.com>	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27. show more ...
# 6481dc57	24-Jun-2024	Stephen Tozer <stephen.tozer@sony.com>	[IR][NFC] Update IRBuilder to use InsertPosition (#96497) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock [IR][NFC] Update IRBuilder to use InsertPosition (#96497) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators. show more ...
Revision tags: llvmorg-18.1.8
# 18ec885a	10-Jun-2024	Jay Foad <jay.foad@amd.com>	[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801) They have been superseded by llvm.amdgcn.raw.buffer.* and llvm.amdgcn.struct.buffer.*.
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6
# e2d17a05	09-May-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Build lane intrinsics in a mangling-agnostic way. NFC. (#91583) Use the form of CreateIntrinsic that takes an explicit return type and works out the mangling based on that and the types of [AMDGPU] Build lane intrinsics in a mangling-agnostic way. NFC. (#91583) Use the form of CreateIntrinsic that takes an explicit return type and works out the mangling based on that and the types of the arguments. The advantage is that this still works if intrinsics are changed to have type mangling, e.g. if readlane/readfirstlane/writelane are changed to work on any type. show more ...
Revision tags: llvmorg-18.1.5
# fcdb2203	18-Apr-2024	Pierre van Houtryve <pierre.vanhoutryve@amd.com>	[AMDGPU][AtomicOptimizer] Fix DT update for divergent values with Iterative strategy (#87605) We take the terminator from EntryBB and put it in ComputeEnd. Make sure we also move the DT edges, we p [AMDGPU][AtomicOptimizer] Fix DT update for divergent values with Iterative strategy (#87605) We take the terminator from EntryBB and put it in ComputeEnd. Make sure we also move the DT edges, we previously only did it assuming a non-conditional branch. Fixes SWDEV-453943 show more ...
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# e1a8120a	22-Mar-2024	Pravin Jagtap <prjagtap@amd.com>	[AMDGPU] Support double type in atomic optimizer. (#84307) Presently the atomic optimizer supports only 32-bit operations. Plan is to extend the atomic optimizer for 64-bit operations for compute a [AMDGPU] Support double type in atomic optimizer. (#84307) Presently the atomic optimizer supports only 32-bit operations. Plan is to extend the atomic optimizer for 64-bit operations for compute and graphics. This patch extends support for double type for `uniform values` only. Going forward, will extend the support for divergent values. Adding support for divergent values requires extending/legalizing readfirstlane, readlane, writelane, etc ops for 64-bit operations to avoid `bitcast` noise that we have currently. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> show more ...
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 3755ea93	13-Sep-2023	Pravin Jagtap <prjagtap@amd.com>	[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082) [D156301](https://reviews.llvm.org/D156301) introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be perform [AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082) [D156301](https://reviews.llvm.org/D156301) introduced atomic optimizations for FAdd/FSub. For FSub, reduction/scan needs to be performed using add operation (`not sub`) and memory location will be updated by reduced value using atomic sub later by only one lane. --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> show more ...
# e54277fa	11-Sep-2023	Jeremy Morse <jeremy.morse@sony.com>	[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and up [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468 show more ...
Revision tags: llvmorg-17.0.0-rc4
# edb9fab3	30-Aug-2023	Pravin Jagtap <Pravin.Jagtap@amd.com>	[AMDGPU] Support FMin/FMax in AMDGPUAtomicOptimizer. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D157388
12 3