History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp (Results 1 – 25 of 63)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 8e702735 24-Jan-2025 Jeremy Morse <jeremy.morse@sony.com>

[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)

As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and sim

[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)

As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).

show more ...


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# 83cbb170 03-Dec-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Refine AMDGPUAtomicOptimizerImpl class. NFC. (#118302)

Use references instead of pointers for most state and common up some of
the initialization between the legacy and new pass manager pa

[AMDGPU] Refine AMDGPUAtomicOptimizerImpl class. NFC. (#118302)

Use references instead of pointers for most state and common up some of
the initialization between the legacy and new pass manager paths.

show more ...


Revision tags: llvmorg-19.1.4, llvmorg-19.1.3
# 922992a2 18-Oct-2024 Jay Foad <jay.foad@amd.com>

Fix typo "instrinsic" (#112899)


# 85c17e40 17-Oct-2024 Jay Foad <jay.foad@amd.com>

[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)

Convert many instances of:
Fn = Intrinsic::getOrInsertDeclaration(...);
CreateCall(Fn, ...)
to the equivalent CreateIntrinsi

[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)

Convert many instances of:
Fn = Intrinsic::getOrInsertDeclaration(...);
CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.

show more ...


Revision tags: llvmorg-19.1.2
# fa789dff 11-Oct-2024 Rahul Joshi <rjoshi@nvidia.com>

[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)

Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is a

[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)

Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).

show more ...


Revision tags: llvmorg-19.1.1, llvmorg-19.1.0
# 126d6f27 04-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108)

Use poison for an unused input to the permlanex16 intrinsic, to improve
register allocation and avoid an unnecessary v_mov ins

[AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (#107108)

Use poison for an unused input to the permlanex16 intrinsic, to improve
register allocation and avoid an unnecessary v_mov instruction.

show more ...


Revision tags: llvmorg-19.1.0-rc4
# c8ba3170 23-Aug-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Remove comment outdated by #96933


Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# cf230e77 15-Jul-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Enable atomic optimizer for divergent i64 and double values (#96934)


# ae63db78 13-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Re-enable atomic optimization of uniform fadd/fsub with result (#97604)

Fix various problems to do with the first active lane of the result of
optimized fp atomics, as explained in the com

[AMDGPU] Re-enable atomic optimization of uniform fadd/fsub with result (#97604)

Fix various problems to do with the first active lane of the result of
optimized fp atomics, as explained in the comment.

Fixes #97554

show more ...


# 5ab9e003 08-Jul-2024 Jie Fu <jiefu@tencent.com>

[AMDGPU] Fix -Wunused-variable in AMDGPUAtomicOptimizer.cpp (NFC)

/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:688:18:
error: unused variable 'TyBitWidth' [-Werror,-Wunused-variabl

[AMDGPU] Fix -Wunused-variable in AMDGPUAtomicOptimizer.cpp (NFC)

/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:688:18:
error: unused variable 'TyBitWidth' [-Werror,-Wunused-variable]
const unsigned TyBitWidth = DL->getTypeSizeInBits(Ty);
^
1 error generated.

show more ...


# 2a960716 08-Jul-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Cleanup bitcast spam in atomic optimizer (#96933)


# b76dd4ed 03-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Disable atomic optimization of fadd/fsub with result (#96479)

An atomic fadd instruction like this should return %x:

; value at %ptr is %x
%r = atomicrmw fadd ptr %ptr, float %y

[AMDGPU] Disable atomic optimization of fadd/fsub with result (#96479)

An atomic fadd instruction like this should return %x:

; value at %ptr is %x
%r = atomicrmw fadd ptr %ptr, float %y

After atomic optimization, if %y is uniform, the result is calculated
as %r = %x + * %y * +0.0. This has a couple of problems:

1. If %y is Inf or NaN, this will return NaN instead of %x.
2. If %x is -0.0 and %y is positive, this will return +0.0 instead of
-0.0.

Avoid these problems by disabling the "%y is uniform" path if there are
any uses of the result.

show more ...


# 43b98882 02-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Use nan as the identity for atomicrmw fmax/fmin (#97411)

atomicrmw fmax/fmin perform the same operation as llvm.maxnum/minnum
which return the other operand if one operand is nan. This mea

[AMDGPU] Use nan as the identity for atomicrmw fmax/fmin (#97411)

atomicrmw fmax/fmin perform the same operation as llvm.maxnum/minnum
which return the other operand if one operand is nan. This means that,
in the presence of nan arguments, +/- inf is not an identity for these
operations but nan is (at least if you don't care about nan payloads).

show more ...


# 9df71d76 28-Jun-2024 Nikita Popov <npopov@redhat.com>

[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)

Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, re

[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)

Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.

show more ...


# 35f7b60a 26-Jun-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)

These are incremental changes over #89217 , with core logic being the
same. This patch along wit

[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)

These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.

show more ...


# 5feb32ba 25-Jun-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)

This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass t

[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)

This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>

show more ...


# d75f9dd1 24-Jun-2024 Stephen Tozer <stephen.tozer@sony.com>

Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"

Reverts the above commit, as it updates a common header function and
did not update all callsites:

https://lab.llvm.org/buildbot

Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"

Reverts the above commit, as it updates a common header function and
did not update all callsites:

https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.

show more ...


# 6481dc57 24-Jun-2024 Stephen Tozer <stephen.tozer@sony.com>

[IR][NFC] Update IRBuilder to use InsertPosition (#96497)

Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock

[IR][NFC] Update IRBuilder to use InsertPosition (#96497)

Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.

show more ...


Revision tags: llvmorg-18.1.8
# 18ec885a 10-Jun-2024 Jay Foad <jay.foad@amd.com>

[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801)

They have been superseded by llvm.amdgcn.raw.buffer.* and
llvm.amdgcn.struct.buffer.*.


Revision tags: llvmorg-18.1.7, llvmorg-18.1.6
# e2d17a05 09-May-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Build lane intrinsics in a mangling-agnostic way. NFC. (#91583)

Use the form of CreateIntrinsic that takes an explicit return type and
works out the mangling based on that and the types of

[AMDGPU] Build lane intrinsics in a mangling-agnostic way. NFC. (#91583)

Use the form of CreateIntrinsic that takes an explicit return type and
works out the mangling based on that and the types of the arguments. The
advantage is that this still works if intrinsics are changed to have
type mangling, e.g. if readlane/readfirstlane/writelane are changed to
work on any type.

show more ...


Revision tags: llvmorg-18.1.5
# fcdb2203 18-Apr-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU][AtomicOptimizer] Fix DT update for divergent values with Iterative strategy (#87605)

We take the terminator from EntryBB and put it in ComputeEnd. Make sure
we also move the DT edges, we p

[AMDGPU][AtomicOptimizer] Fix DT update for divergent values with Iterative strategy (#87605)

We take the terminator from EntryBB and put it in ComputeEnd. Make sure
we also move the DT edges, we previously only did it assuming a
non-conditional branch.

Fixes SWDEV-453943

show more ...


Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# e1a8120a 22-Mar-2024 Pravin Jagtap <prjagtap@amd.com>

[AMDGPU] Support double type in atomic optimizer. (#84307)

Presently the atomic optimizer supports only 32-bit operations. Plan is
to extend the atomic optimizer for 64-bit operations for compute a

[AMDGPU] Support double type in atomic optimizer. (#84307)

Presently the atomic optimizer supports only 32-bit operations. Plan is
to extend the atomic optimizer for 64-bit operations for compute and
graphics. This patch extends support for double type for `uniform
values` only. Going forward, will extend the support for divergent
values. Adding support for divergent values requires
extending/legalizing readfirstlane, readlane, writelane, etc ops for
64-bit operations to avoid `bitcast` noise that we have currently.

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>

show more ...


Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 3755ea93 13-Sep-2023 Pravin Jagtap <prjagtap@amd.com>

[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082)

[D156301](https://reviews.llvm.org/D156301) introduced atomic
optimizations for FAdd/FSub. For FSub, reduction/scan needs to be
perform

[AMDGPU] Fix scan of atomicFSub in AtomicOptimizer. (#66082)

[D156301](https://reviews.llvm.org/D156301) introduced atomic
optimizations for FAdd/FSub. For FSub, reduction/scan needs to be
performed using add operation (`not sub`) and memory location will be
updated by reduced value using atomic sub later by only one lane.

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>

show more ...


# e54277fa 11-Sep-2023 Jeremy Morse <jeremy.morse@sony.com>

[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder

This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and up

[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder

This patch adds a two-argument SetInsertPoint method to IRBuilder that
takes a block/iterator instead of an instruction, and updates many call
sites to use it. The motivating reason for doing this is given here [0],
we'd like to pass around more information about the position of debug-info
in the iterator object. That necessitates passing iterators around most of
the time.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152468

show more ...


Revision tags: llvmorg-17.0.0-rc4
# edb9fab3 30-Aug-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Support FMin/FMax in AMDGPUAtomicOptimizer.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D157388


123