Revision tags: llvmorg-10.0.1-rc1 |
|
#
35e6a9c8 |
| 24-Apr-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Break read2/write2 search range on a memory fence
This is to fix performance regressions introduced by 86c944d790728891801778b8d98c2c65a83f36a5.
The old search would collect all potentially
AMDGPU: Break read2/write2 search range on a memory fence
This is to fix performance regressions introduced by 86c944d790728891801778b8d98c2c65a83f36a5.
The old search would collect all potentially mergeable instructions in the entire block. In this case, the same address is written in multiple places in the block on the other side of a fence. When sorted by offset, the two unmergeable, identical addresses would be next to each other and the merge would give up.
Break the search space when we encounter an instruction we won't be able to merge across. This will keep the identical addresses in different merge attempts.
This may also improve compile time by reducing the merge list size.
show more ...
|
#
6bffd0df |
| 24-Apr-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix redundant members
|
#
50128f8a |
| 24-Apr-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use Register
|
#
0337017a |
| 22-Apr-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Use SGPR instead of SReg classes
12994a70cf7 did this for 128-bit classes:
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP
[AMDGPU] Use SGPR instead of SReg classes
12994a70cf7 did this for 128-bit classes:
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good.
This patch extends it to all classes > 64 bits, for consistency.
Differential Revision: https://reviews.llvm.org/D78622
show more ...
|
#
f2334a7e |
| 01-Apr-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fix crash in SILoadStoreOptimizer
SILoadStoreOptimizer::checkAndPrepareMerge() expects base and paired instruction to come in order and scans MBB from base to the paired instruction. An ori
[AMDGPU] Fix crash in SILoadStoreOptimizer
SILoadStoreOptimizer::checkAndPrepareMerge() expects base and paired instruction to come in order and scans MBB from base to the paired instruction. An original order can be changed if there were a dependent instruction in between and base instruction was moved.
Fixed by bailing the optimization. In theory it might be possible still to perform a merge by swapping instructions, but on practice it bails anyway because it finds dependency on that same instruction which has resulted in the base move.
Differential Revision: https://reviews.llvm.org/D77245
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
#
87568691 |
| 29-Jan-2020 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] Add a16 feature to gfx10
Based on D72931
This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gf
[AMDGPU] Add a16 feature to gfx10
Based on D72931
This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gfx7/8.
Differential Revision: https://reviews.llvm.org/D73956
show more ...
|
#
0426c2d0 |
| 30-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 6a4acb9d809aaadb9304a7a2f3382d958a6c2adf.
|
#
6a4acb9d |
| 30-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Revert "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 17dbc6611df9044d779d85b3d545bd37e5dd5200.
A test is failing on some bots
|
#
17dbc661 |
| 30-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup and fix SMRD offset handling
I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the off
AMDGPU: Cleanup and fix SMRD offset handling
I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the offsets are dword aligned before folding.
show more ...
|
#
cb297050 |
| 25-Jan-2020 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU/SILoadStoreOptimizer: Fix uninitialized variable error
This was introduced by 86c944d790728891801778b8d98c2c65a83f36a5 and caught by the sanitizer-x86_64-linux-fast bot.
|
#
86c944d7 |
| 24-Jan-2020 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets
Summary: This improves merging of sequences like:
store a, ptr + 4 store b, ptr + 8 store c, ptr + 12 store d, ptr + 16 store e,
AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets
Summary: This improves merging of sequences like:
store a, ptr + 4 store b, ptr + 8 store c, ptr + 12 store d, ptr + 16 store e, ptr + 20 store f, ptr
Prior to this patch the basic block was scanned in order to find instructions to merge and the above sequence would be transformed to:
store4 <a, b, c, d>, ptr + 4 store e, ptr + 20 store r, ptr
With this change, we now sort all the candidate merge instructions by their offset, so instructions are visited in offset order rather than in the order they appear in the basic block. We now transform this sequnce into:
store4 <f, a, b, c>, ptr store2 <d, e>, ptr + 16
Another benefit of this change is that since we have sorted the mergeable lists by offset, we can easily check if an instruction is mergeable by checking the offset of the instruction that becomes before or after it in the sorted list. Once we determine an instruction is not mergeable we can remove it from the list and avoid having to do the more expensive mergeablilty checks.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Reviewed By: arsenm, nhaehnle
Subscribers: kerbowa, merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65966
show more ...
|
Revision tags: llvmorg-11-init |
|
#
c3bc805f |
| 17-Dec-2019 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct
Summary: Modify CombineInfo to only store information about a single instruction. This is a little easier to work with and removes a lot of
AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct
Summary: Modify CombineInfo to only store information about a single instruction. This is a little easier to work with and removes a lot of duplicate initialization code.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm, nhaehnle
Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71045
show more ...
|
#
bf13a710 |
| 12-Dec-2019 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU/SILoadStoreOptimizer: Simplify function
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llv
AMDGPU/SILoadStoreOptimizer: Simplify function
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71044
show more ...
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
4a801170 |
| 20-Nov-2019 |
Piotr Sobczak <Piotr.Sobczak@amd.com> |
[AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores
Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kz
[AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores
Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69794
show more ...
|
#
4a308d30 |
| 19-Nov-2019 |
Michael Liao <michael.hliao@gmail.com> |
[AMDGPU] Keep consistent check of legal addressing mode.
Summary: - Add test cases for GFX10, which has narrower offset range compared to GFX9.
Reviewers: rampitec, arsenm
Subscribers: kzhuravl,
[AMDGPU] Keep consistent check of legal addressing mode.
Summary: - Add test cases for GFX10, which has narrower offset range compared to GFX9.
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70473
show more ...
|
#
d8f7c68e |
| 09-Oct-2019 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU/SILoadStoreOptimizer: fix a likely bug introduced recently
Summary: We should check for same instruction class before checking whether they have the same base address, else we might iterate o
AMDGPU/SILoadStoreOptimizer: fix a likely bug introduced recently
Summary: We should check for same instruction class before checking whether they have the same base address, else we might iterate out of bounds of a MachineInstr operands list. The InstClass check is also cheaper.
This was introduced in SVN r373630.
Reviewers: tstellar
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68690
show more ...
|
#
05da2fe5 |
| 13-Nov-2019 |
Reid Kleckner <rnk@google.com> |
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
show more ...
|
#
c3d6f0dd |
| 03-Nov-2019 |
Dávid Bolvanský <david.bolvansky@gmail.com> |
[SILoadStoreOptimizer] Fixed typo. NFCI.
|
#
02baaca7 |
| 16-Oct-2019 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] Extend the SI Load/Store optimizer
Summary: Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle different flavours of image_load and image_sample instructions.
When
[AMDGPU] Extend the SI Load/Store optimizer
Summary: Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle different flavours of image_load and image_sample instructions.
When the instructions of the same subclass differ only in dmask, merge them and update dmask accordingly.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64911
llvm-svn: 374984
show more ...
|
#
12994a70 |
| 10-Oct-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use SGPR_128 instead of SReg_128 for vregs
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating
AMDGPU: Use SGPR_128 instead of SReg_128 for vregs
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good.
llvm-svn: 374284
show more ...
|
#
00182683 |
| 09-Oct-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Relax register classes used
llvm-svn: 374254
|
#
165e4691 |
| 04-Oct-2019 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU][SILoadStoreOptimizer] NFC: Refactor code
Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes.
Introducing InstSubcla
[AMDGPU][SILoadStoreOptimizer] NFC: Refactor code
Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes.
Introducing InstSubclass will keep them separate and help extending InstClassEnum with other instruction types (e.g. MIMG) in the future.
This patch also makes getSubRegIdxs() more concise.
Reviewers: nhaehnle, arsenm, tstellar
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68384
llvm-svn: 373699
show more ...
|
#
e6f51713 |
| 03-Oct-2019 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions
Summary: This adds a pre-pass to this optimization that scans through the basic block and generates lists of mergeable instr
AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions
Summary: This adds a pre-pass to this optimization that scans through the basic block and generates lists of mergeable instructions with one list per unique address.
In the optimization phase instead of scanning through the basic block for mergeable instructions, we now iterate over the lists generated by the pre-pass.
The decision to re-optimize a block is now made per list, so if we fail to merge any instructions with the same address, then we do not attempt to optimize them in future passes over the block. This will help to reduce the time this pass spends re-optimizing instructions.
In one pathological test case, this change reduces the time spent in the SILoadStoreOptimizer from 0.2s to 0.03s.
This restructuring will also make it possible to implement further solutions in this pass, because we can now add less expensive checks to the pre-pass and filter instructions out early which will avoid the need to do the expensive scanning during the optimization pass. For example, checking for adjacent offsets is an inexpensive test we can move to the pre-pass.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65961
llvm-svn: 373630
show more ...
|
#
265e94e6 |
| 02-Oct-2019 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] Extend buffer intrinsics with swizzling
Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this
[AMDGPU] Extend buffer intrinsics with swizzling
Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR.
Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store
Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on.
The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged.
There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set.
Reviewers: arsenm, nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68200
llvm-svn: 373491
show more ...
|
#
004c7915 |
| 01-Oct-2019 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfo
Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change th
AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfo
Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change the behavior of the pass.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Reviewed By: nhaehnle, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65496
llvm-svn: 373366
show more ...
|