History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp (Results 51 – 63 of 63)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-9.0.0-rc1
# 298500ae 22-Jul-2019 Jay Foad <jay.foad@gmail.com>

[AMDGPU] Save some work when an atomic op has no uses

Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an

[AMDGPU] Save some work when an atomic op has no uses

Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.

Reviewers: arsenm, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64981

llvm-svn: 366667

show more ...


# 7d06ffff 19-Jul-2019 Jay Foad <jay.foad@gmail.com>

[AMDGPU] Simplify the exclusive scan used for optimized atomics

Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3

[AMDGPU] Simplify the exclusive scan used for optimized atomics

Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:

1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
combine into an ADD3 instruction.

Because of #2 and #3 the end result is improved from this:

v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf
v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf
v_add3_u32 v1, v4, v5, v1
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf

To this:

v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe
s_nop 1
v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf
s_nop 1
v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf

I.e. two fewer computational instructions, one extra nop where we could
schedule something else.

Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64411

llvm-svn: 366543

show more ...


Revision tags: llvmorg-10-init
# 70235c64 17-Jul-2019 Jay Foad <jay.foad@gmail.com>

[AMDGPU] Optimize atomic AND/OR/XOR

Summary: Extend the atomic optimizer to handle AND, OR and XOR.

Reviewers: arsenm, sheredom

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t

[AMDGPU] Optimize atomic AND/OR/XOR

Summary: Extend the atomic optimizer to handle AND, OR and XOR.

Reviewers: arsenm, sheredom

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64809

llvm-svn: 366323

show more ...


# 17060f0a 16-Jul-2019 Jay Foad <jay.foad@gmail.com>

[AMDGPU] Optimize atomic max/min

Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.

Reviewers: arsenm, sheredom, critson, rampit

[AMDGPU] Optimize atomic max/min

Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64328

llvm-svn: 366235

show more ...


Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4
# 38902350 08-Jul-2019 Jay Foad <jay.foad@gmail.com>

[AMDGPU] Use a named predicate instead of a magic number.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, ll

[AMDGPU] Use a named predicate instead of a magic number.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64201

llvm-svn: 365294

show more ...


Revision tags: llvmorg-8.0.1-rc3
# 68a2fef9 13-Jun-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32

Differential Revision: https://reviews.llvm.org/D63301

llvm-svn: 363339


Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0
# 3a31b3f6 14-Mar-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't add unnecessary convergent attributes

These are redundant with the intrinsic declaration.

llvm-svn: 356143


Revision tags: llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4
# 9e3f7d8a 05-Mar-2019 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Fix DPP operand order in atomic optimizer

Summary:
Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.

Change-Id: I631d050e1c00a3b4bc7c11a9

[AMDGPU] Fix DPP operand order in atomic optimizer

Summary:
Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.

Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30

Reviewers: sheredom, tpr

Reviewed By: sheredom

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58900

llvm-svn: 355394

show more ...


Revision tags: llvmorg-8.0.0-rc3
# 582c1601 11-Feb-2019 Benjamin Kramer <benny.kra@googlemail.com>

[AMDGPU] Remove unused variable

llvm-svn: 353704


# 8c10fa1a 11-Feb-2019 Neil Henning <neil.henning@amd.com>

[AMDGPU] Fix DPP sequence in atomic optimizer.

This commit fixes the DPP sequence in the atomic optimizer (which was
previously missing the row_shr:3 step), and works around a read_register
exec bug

[AMDGPU] Fix DPP sequence in atomic optimizer.

This commit fixes the DPP sequence in the atomic optimizer (which was
previously missing the row_shr:3 step), and works around a read_register
exec bug by using a ballot instead.

Differential Revision: https://reviews.llvm.org/D57737

llvm-svn: 353703

show more ...


Revision tags: llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# 2946cd70 19-Jan-2019 Chandler Carruth <chandlerc@gmail.com>

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the ne

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636

show more ...


Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3
# 233a02d0 05-Nov-2018 Neil Henning <neil.henning@amd.com>

[AMDGPU] Fix the new atomic optimizer in pixel shaders.

The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes a

[AMDGPU] Fix the new atomic optimizer in pixel shaders.

The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes active.

To fix this we add an llvm.amdgcn.ps.live call that guards a branch
around the entire atomic operation - ensuring that all helper lanes are
inactive within the wavefront when we compute our atomic results.

I've added a test case that can cause derivatives, and exposes the
problem.

Differential Revision: https://reviews.llvm.org/D53930

llvm-svn: 346128

show more ...


Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# 66416574 08-Oct-2018 Neil Henning <neil.henning@amd.com>

[AMDGPU] Add an AMDGPU specific atomic optimizer.

This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding a

[AMDGPU] Add an AMDGPU specific atomic optimizer.

This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding atomicrmw add/sub or uses of
the atomic buffer intrinsics for add/sub.
- If all arguments except the value to be added/subtracted are uniform,
record the value to be optimized.
- Run through the atomic operations we can optimize and, depending on
whether the value is uniform/divergent use wavefront wide operations
(DPP in the divergent case) to calculate the total amount to be
atomically added/subtracted.
- Then let only a single lane of each wavefront perform the atomic
operation, reducing the total number of atomic operations in flight.
- Lastly we recombine the result from the single lane to each lane of
the wavefront, and calculate our individual lanes offset into the
final result.

Differential Revision: https://reviews.llvm.org/D51969

llvm-svn: 343973

show more ...


123