History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp (Results 26 – 50 of 63)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# f09360d2 30-Aug-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Reduction and Scan are implemented using `Iterative`
and `DPP` strategy for `float` type.

Reviewed By: arsenm, #amdgpu

Different

[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.

Reduction and Scan are implemented using `Iterative`
and `DPP` strategy for `float` type.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D156301

show more ...


Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 597fb7fb 22-Jun-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.

Atomic optimizer is turned on by default through D152649. This patch
removes the usage of old command line option amdgpu-atomic

[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.

Atomic optimizer is turned on by default through D152649. This patch
removes the usage of old command line option amdgpu-atomic-optimizations
and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`.

We can safely remove old option when LLPC remove its all usage.

Reviewed By: foad, arsenm, #amdgpu, cdevadas

Differential Revision: https://reviews.llvm.org/D153007

show more ...


# 8e1e871e 21-Jun-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Preserve dom-tree analysis in atomic optimizer.

AMDGPUAtomicOptimizer updates the dominator tree whenever
it modified the control flow. Therefore preserving the
analysis similar to legacy P

[AMDGPU] Preserve dom-tree analysis in atomic optimizer.

AMDGPUAtomicOptimizer updates the dominator tree whenever
it modified the control flow. Therefore preserving the
analysis similar to legacy PM.

Reviewed By: arsenm, yassingh, #amdgpu

Differential Revision: https://reviews.llvm.org/D153349

show more ...


# 699addef 20-Jun-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Use verify<domtree> instead of intra-pass asserts.

Verifying dominator tree is expensive using intra-pass
asserts. Asserts added during D147408 are
increasing the build time of libc signifi

[AMDGPU] Use verify<domtree> instead of intra-pass asserts.

Verifying dominator tree is expensive using intra-pass
asserts. Asserts added during D147408 are
increasing the build time of libc significantly. This change
does the verification after the atomic optimizer pass
and should fix the regression reported in D153232.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D153261

show more ...


Revision tags: llvmorg-16.0.6
# f6c8a8e9 09-Jun-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Iterative scan implementation for atomic optimizer.

This patch provides an alternative implementation to DPP for Scan Computations.

An alternative implementation iterates over all active l

[AMDGPU] Iterative scan implementation for atomic optimizer.

This patch provides an alternative implementation to DPP for Scan Computations.

An alternative implementation iterates over all active lanes of Wavefront
using llvm.cttz and performs the following steps:
1. Read the value that needs to be atomically incremented using
llvm.amdgcn.readlane intrinsic
2. Accumulate the result.
3. Update the scan result using llvm.amdgcn.writelane intrinsic
if intermediate scan results are needed later in the kernel.

Reviewed By: arsenm, cdevadas

Differential Revision: https://reviews.llvm.org/D147408

show more ...


Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1
# faa2c678 04-Apr-2023 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[AMDGPU] Add buffer intrinsics that take resources as pointers

In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backen

[AMDGPU] Add buffer intrinsics that take resources as pointers

In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547

show more ...


# 0c316f00 28-Apr-2023 Joshua Cao <cao.joshua@yahoo.com>

[BBUtils][NFC] Delete SplitBlockAndInsertIfThen with DT.

The method is marked for deprecation. Delete the method and move all of
its consumers to use the DomTreeUpdater version.

Reviewed By: foad

[BBUtils][NFC] Delete SplitBlockAndInsertIfThen with DT.

The method is marked for deprecation. Delete the method and move all of
its consumers to use the DomTreeUpdater version.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D149428

show more ...


# 21a69bdb 20-Apr-2023 Pravin Jagtap <Pravin.Jagtap@amd.com>

[NewPM][AMDGPU] Port amdgpu-atomic-optimizer

Reviewed By: arsenm, sameerds, gandhi21299

Differential Revision: https://reviews.llvm.org/D148628


Revision tags: llvmorg-16.0.0
# f90849df 14-Mar-2023 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Use UniformityAnalysis in AtomicOptimizer

Adds & uses a new `isDivergentUse` API in UA.
UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it.

-

[AMDGPU] Use UniformityAnalysis in AtomicOptimizer

Adds & uses a new `isDivergentUse` API in UA.
UniformityAnalysis now requires CycleInfo as well as the new temporal divergence API can query it.

-----

Original patch that adds `isDivergentUse` by @sameerds

The user of a temporally divergent value is marked as divergent in the
uniformity analysis. But the same user may also have been marked divergent for
other reasons, thus losing this information about temporal divergence. But some
clients need to specificly check for temporal divergence. This change restores
such an API, that already existed in DivergenceAnalysis.

Reviewed By: sameerds, foad

Differential Revision: https://reviews.llvm.org/D146018

show more ...


Revision tags: llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6
# 3830e4e5 16-Nov-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Create poison values instead of undef

These placeholders don't care about the finer points on
the difference between the two.


Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# bfcfd53b 13-Jun-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic

Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were a

[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic

Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.

Also use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D127662

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4
# dc6e8dfd 20-Sep-2021 Jacob Lambert <jacob.lambert@amd.com>

[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.


Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 9d08f276 19-Mar-2021 Jay Foad <jay.foad@amd.com>

[AMDGPU] Use reductions instead of scans in the atomic optimizer

If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a sca

[AMDGPU] Use reductions instead of scans in the atomic optimizer

If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a scan. Do
this for GFX10, where the permlanex16 instruction makes it viable. For
wave64 this saves a couple of dpp operations. For wave32 it saves one
readlane (which are generally bad for performance) and one dpp
operation.

Differential Revision: https://reviews.llvm.org/D98953

show more ...


# 5a5a5312 19-Mar-2021 Jay Foad <jay.foad@amd.com>

[AMDGPU] Remove some redundant code. NFC.

This is redundant because we have already checked that we can't handle
divergent 64-bit atomic operands.


# 5dd5ddcb 18-Mar-2021 Jay Foad <jay.foad@amd.com>

[AMDGPU] Skip building some IR if it won't be used. NFC.


# c96dfe0d 18-Mar-2021 Jay Foad <jay.foad@amd.com>

[AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC.


Revision tags: llvmorg-12.0.0-rc3
# c3ce7bae 02-Mar-2021 Piotr Sobczak <Piotr.Sobczak@amd.com>

[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm

* Introduce the new intrinsic amdgcn_strict_wwm
* Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix

[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm

* Introduce the new intrinsic amdgcn_strict_wwm
* Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257

show more ...


Revision tags: llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# 560d7e04 20-Jan-2021 dfukalov <daniil.fukalov@amd.com>

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036


Revision tags: llvmorg-11.1.0-rc1
# 6a87e9b0 25-Dec-2020 dfukalov <daniil.fukalov@amd.com>

[NFC][AMDGPU] Reduce include files dependency.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5
# 0249df33 29-Sep-2020 Mirko Brkusanin <Mirko.Brkusanin@amd.com>

[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer

Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when

[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer

Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when
-amdgpu-atomic-optimizer is present.

Differential Revision: https://reviews.llvm.org/D88315

show more ...


Revision tags: llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2
# aad93654 30-May-2020 Christopher Tetreault <ctetreau@quicinc.com>

[SVE] Eliminate calls to default-false VectorType::get() from AMDGPU

Reviewers: efriedma, david-arm, fpetrogalli, arsenm

Reviewed By: david-arm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehn

[SVE] Eliminate calls to default-false VectorType::get() from AMDGPU

Reviewers: efriedma, david-arm, fpetrogalli, arsenm

Reviewed By: david-arm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, tschuett, hiraditya, rkruppe, psnobl, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80328

show more ...


Revision tags: llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4
# 5d3a69fe 05-Mar-2020 Sebastian Neubauer <sebastian.neubauer@amd.com>

[AMDGPU] New llvm.amdgcn.ballot intrinsic

Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its

[AMDGPU] New llvm.amdgcn.ballot intrinsic

Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088

show more ...


Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# 05da2fe5 13-Nov-2019 Reid Kleckner <rnk@google.com>

Sink all InitializePasses.h includes

This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of reco

Sink all InitializePasses.h includes

This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# eac23862 23-Aug-2019 Jay Foad <jay.foad@gmail.com>

[AMDGPU] gfx10 atomic optimizer changes.

Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.

Reviewers: arsenm, sheredom, crit

[AMDGPU] gfx10 atomic optimizer changes.

Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65644

llvm-svn: 369745

show more ...


Revision tags: llvmorg-9.0.0-rc2
# dcb75324 29-Jul-2019 Jay Foad <jay.foad@gmail.com>

[DivergenceAnalysis] Add methods for querying divergence at use

Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is unif

[DivergenceAnalysis] Add methods for querying divergence at use

Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.

This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.

This might allow D63953 or other similar workarounds to be removed.

Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue

Reviewed By: nhaehnle

Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65141

llvm-svn: 367218

show more ...


123