#
f09360d2 |
| 30-Aug-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Reduction and Scan are implemented using `Iterative` and `DPP` strategy for `float` type.
Reviewed By: arsenm, #amdgpu
Different
[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Reduction and Scan are implemented using `Iterative` and `DPP` strategy for `float` type.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D156301
show more ...
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
597fb7fb |
| 22-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.
Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic
[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.
Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic-optimizations and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`.
We can safely remove old option when LLPC remove its all usage.
Reviewed By: foad, arsenm, #amdgpu, cdevadas
Differential Revision: https://reviews.llvm.org/D153007
show more ...
|
#
8e1e871e |
| 21-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Preserve dom-tree analysis in atomic optimizer.
AMDGPUAtomicOptimizer updates the dominator tree whenever it modified the control flow. Therefore preserving the analysis similar to legacy P
[AMDGPU] Preserve dom-tree analysis in atomic optimizer.
AMDGPUAtomicOptimizer updates the dominator tree whenever it modified the control flow. Therefore preserving the analysis similar to legacy PM.
Reviewed By: arsenm, yassingh, #amdgpu
Differential Revision: https://reviews.llvm.org/D153349
show more ...
|
#
699addef |
| 20-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Use verify<domtree> instead of intra-pass asserts.
Verifying dominator tree is expensive using intra-pass asserts. Asserts added during D147408 are increasing the build time of libc signifi
[AMDGPU] Use verify<domtree> instead of intra-pass asserts.
Verifying dominator tree is expensive using intra-pass asserts. Asserts added during D147408 are increasing the build time of libc significantly. This change does the verification after the atomic optimizer pass and should fix the regression reported in D153232.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D153261
show more ...
|
Revision tags: llvmorg-16.0.6 |
|
#
f6c8a8e9 |
| 09-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.
An alternative implementation iterates over all active l
[AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.
An alternative implementation iterates over all active lanes of Wavefront using llvm.cttz and performs the following steps: 1. Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic 2. Accumulate the result. 3. Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D147408
show more ...
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
faa2c678 |
| 04-Apr-2023 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backen
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer.
In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions.
Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
show more ...
|
#
0c316f00 |
| 28-Apr-2023 |
Joshua Cao <cao.joshua@yahoo.com> |
[BBUtils][NFC] Delete SplitBlockAndInsertIfThen with DT.
The method is marked for deprecation. Delete the method and move all of its consumers to use the DomTreeUpdater version.
Reviewed By: foad
[BBUtils][NFC] Delete SplitBlockAndInsertIfThen with DT.
The method is marked for deprecation. Delete the method and move all of its consumers to use the DomTreeUpdater version.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D149428
show more ...
|
#
21a69bdb |
| 20-Apr-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[NewPM][AMDGPU] Port amdgpu-atomic-optimizer
Reviewed By: arsenm, sameerds, gandhi21299
Differential Revision: https://reviews.llvm.org/D148628
|
Revision tags: llvmorg-16.0.0 |
|
#
f90849df |
| 14-Mar-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Use UniformityAnalysis in AtomicOptimizer
Adds & uses a new `isDivergentUse` API in UA. UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it.
-
[AMDGPU] Use UniformityAnalysis in AtomicOptimizer
Adds & uses a new `isDivergentUse` API in UA. UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it.
-----
Original patch that adds `isDivergentUse` by @sameerds
The user of a temporally divergent value is marked as divergent in the uniformity analysis. But the same user may also have been marked divergent for other reasons, thus losing this information about temporal divergence. But some clients need to specificly check for temporal divergence. This change restores such an API, that already existed in DivergenceAnalysis.
Reviewed By: sameerds, foad
Differential Revision: https://reviews.llvm.org/D146018
show more ...
|
Revision tags: llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6 |
|
#
3830e4e5 |
| 16-Nov-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Create poison values instead of undef
These placeholders don't care about the finer points on the difference between the two.
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
#
bfcfd53b |
| 13-Jun-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were a
[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane.
Also use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D127662
show more ...
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
#
dc6e8dfd |
| 20-Sep-2021 |
Jacob Lambert <jacob.lambert@amd.com> |
[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
#
9d08f276 |
| 19-Mar-2021 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Use reductions instead of scans in the atomic optimizer
If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a sca
[AMDGPU] Use reductions instead of scans in the atomic optimizer
If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a scan. Do this for GFX10, where the permlanex16 instruction makes it viable. For wave64 this saves a couple of dpp operations. For wave32 it saves one readlane (which are generally bad for performance) and one dpp operation.
Differential Revision: https://reviews.llvm.org/D98953
show more ...
|
#
5a5a5312 |
| 19-Mar-2021 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove some redundant code. NFC.
This is redundant because we have already checked that we can't handle divergent 64-bit atomic operands.
|
#
5dd5ddcb |
| 18-Mar-2021 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Skip building some IR if it won't be used. NFC.
|
#
c96dfe0d |
| 18-Mar-2021 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC.
|
Revision tags: llvmorg-12.0.0-rc3 |
|
#
c3ce7bae |
| 02-Mar-2021 |
Piotr Sobczak <Piotr.Sobczak@amd.com> |
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix
[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm
The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future.
The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions.
The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D96257
show more ...
|
Revision tags: llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
#
560d7e04 |
| 20-Jan-2021 |
dfukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
Revision tags: llvmorg-11.1.0-rc1 |
|
#
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5 |
|
#
0249df33 |
| 29-Sep-2020 |
Mirko Brkusanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic instructions in order to avoid making unnecessary instructions when
[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic instructions in order to avoid making unnecessary instructions when -amdgpu-atomic-optimizer is present.
Differential Revision: https://reviews.llvm.org/D88315
show more ...
|
Revision tags: llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
#
aad93654 |
| 30-May-2020 |
Christopher Tetreault <ctetreau@quicinc.com> |
[SVE] Eliminate calls to default-false VectorType::get() from AMDGPU
Reviewers: efriedma, david-arm, fpetrogalli, arsenm
Reviewed By: david-arm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehn
[SVE] Eliminate calls to default-false VectorType::get() from AMDGPU
Reviewers: efriedma, david-arm, fpetrogalli, arsenm
Reviewed By: david-arm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, tschuett, hiraditya, rkruppe, psnobl, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80328
show more ...
|
Revision tags: llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
#
5d3a69fe |
| 05-Mar-2020 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its
[AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes.
This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period.
Use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D65088
show more ...
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
05da2fe5 |
| 13-Nov-2019 |
Reid Kleckner <rnk@google.com> |
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
show more ...
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
#
eac23862 |
| 23-Aug-2019 |
Jay Foad <jay.foad@gmail.com> |
[AMDGPU] gfx10 atomic optimizer changes.
Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, crit
[AMDGPU] gfx10 atomic optimizer changes.
Summary: Add support for gfx10, where all DPP operations are confined to work within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65644
llvm-svn: 369745
show more ...
|
Revision tags: llvmorg-9.0.0-rc2 |
|
#
dcb75324 |
| 29-Jul-2019 |
Jay Foad <jay.foad@gmail.com> |
[DivergenceAnalysis] Add methods for querying divergence at use
Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is unif
[DivergenceAnalysis] Add methods for querying divergence at use
Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is uniform at its definition, a use of it in another basic block can be divergent because of divergent control flow between the def and the use.
This patch adds new isDivergent(Use) methods to DivergenceAnalysis, LegacyDivergenceAnalysis and GPUDivergenceAnalysis.
This might allow D63953 or other similar workarounds to be removed.
Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue
Reviewed By: nhaehnle
Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65141
llvm-svn: 367218
show more ...
|