Revision tags: llvmorg-9.0.0-rc1 |
|
#
298500ae |
| 22-Jul-2019 |
Jay Foad <jay.foad@gmail.com> |
[AMDGPU] Save some work when an atomic op has no uses
Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an
[AMDGPU] Save some work when an atomic op has no uses
Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC.
Reviewers: arsenm, dstuttard, tpr
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64981
llvm-svn: 366667
show more ...
|
#
7d06ffff |
| 19-Jul-2019 |
Jay Foad <jay.foad@gmail.com> |
[AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3
[AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because:
1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction.
Because of #2 and #3 the end result is improved from this:
v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
To this:
v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf
I.e. two fewer computational instructions, one extra nop where we could schedule something else.
Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64411
llvm-svn: 366543
show more ...
|
Revision tags: llvmorg-10-init |
|
#
70235c64 |
| 17-Jul-2019 |
Jay Foad <jay.foad@gmail.com> |
[AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.
Reviewers: arsenm, sheredom
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t
[AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.
Reviewers: arsenm, sheredom
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64809
llvm-svn: 366323
show more ...
|
#
17060f0a |
| 16-Jul-2019 |
Jay Foad <jay.foad@gmail.com> |
[AMDGPU] Optimize atomic max/min
Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract.
Reviewers: arsenm, sheredom, critson, rampit
[AMDGPU] Optimize atomic max/min
Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64328
llvm-svn: 366235
show more ...
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
#
38902350 |
| 08-Jul-2019 |
Jay Foad <jay.foad@gmail.com> |
[AMDGPU] Use a named predicate instead of a magic number.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, ll
[AMDGPU] Use a named predicate instead of a magic number.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64201
llvm-svn: 365294
show more ...
|
Revision tags: llvmorg-8.0.1-rc3 |
|
#
68a2fef9 |
| 13-Jun-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301
llvm-svn: 363339
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0 |
|
#
3a31b3f6 |
| 14-Mar-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't add unnecessary convergent attributes
These are redundant with the intrinsic declaration.
llvm-svn: 356143
|
Revision tags: llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4 |
|
#
9e3f7d8a |
| 05-Mar-2019 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Fix DPP operand order in atomic optimizer
Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.
Change-Id: I631d050e1c00a3b4bc7c11a9
[AMDGPU] Fix DPP operand order in atomic optimizer
Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.
Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30
Reviewers: sheredom, tpr
Reviewed By: sheredom
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58900
llvm-svn: 355394
show more ...
|
Revision tags: llvmorg-8.0.0-rc3 |
|
#
582c1601 |
| 11-Feb-2019 |
Benjamin Kramer <benny.kra@googlemail.com> |
[AMDGPU] Remove unused variable
llvm-svn: 353704
|
#
8c10fa1a |
| 11-Feb-2019 |
Neil Henning <neil.henning@amd.com> |
[AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug
[AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was previously missing the row_shr:3 step), and works around a read_register exec bug by using a ballot instead.
Differential Revision: https://reviews.llvm.org/D57737
llvm-svn: 353703
show more ...
|
Revision tags: llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
#
233a02d0 |
| 05-Nov-2018 |
Neil Henning <neil.henning@amd.com> |
[AMDGPU] Fix the new atomic optimizer in pixel shaders.
The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes a
[AMDGPU] Fix the new atomic optimizer in pixel shaders.
The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active.
To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results.
I've added a test case that can cause derivatives, and exposes the problem.
Differential Revision: https://reviews.llvm.org/D53930
llvm-svn: 346128
show more ...
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
#
66416574 |
| 08-Oct-2018 |
Neil Henning <neil.henning@amd.com> |
[AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by:
- Running through a function and finding a
[AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by:
- Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result.
Differential Revision: https://reviews.llvm.org/D51969
llvm-svn: 343973
show more ...
|