Revision tags: llvmorg-15.0.7 |
|
#
6443c0ee |
| 12-Dec-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.l
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
show more ...
|
#
67819a72 |
| 13-Dec-2022 |
Fangrui Song <i@maskray.me> |
[CodeGen] llvm::Optional => std::optional
|
#
20cde154 |
| 03-Dec-2022 |
Kazu Hirata <kazu@google.com> |
[Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of
[Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional.
This is part of an effort to migrate from llvm::Optional to std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5 |
|
#
1b560e6a |
| 14-Nov-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.
Reviewed By: dp, arsenm
Differential Revision: https://reviews.llvm.org/D137783
|
#
7425077e |
| 07-Nov-2022 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn'
[AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D137540
show more ...
|
Revision tags: llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1 |
|
#
693f8162 |
| 15-Sep-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D133787
|
Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
#
de9d80c1 |
| 08-Aug-2022 |
Fangrui Song <i@maskray.me> |
[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
|
Revision tags: llvmorg-15.0.0-rc1 |
|
#
4c4db816 |
| 30-Jul-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D130742
|
Revision tags: llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
#
33fb23f7 |
| 24-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Merge flat with global in the SILoadStoreOptimizer
Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat.
Differential
[AMDGPU] Merge flat with global in the SILoadStoreOptimizer
Flat can be merged with flat global since address cast is a no-op. A combined memory operation needs to be promoted to flat.
Differential Revision: https://reviews.llvm.org/D120431
show more ...
|
#
517171ce |
| 24-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores
TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120351
|
#
3279e440 |
| 22-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Extend SILoadStoreOptimizer to handle global stores
TODO: merge flat load/stores. TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120346
|
#
cefa1c5c |
| 23-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fix combined MMO in load-store merge
Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR orde
[AMDGPU] Fix combined MMO in load-store merge
Loads and stores can be out of order in the SILoadStoreOptimizer. When combining MachineMemOperands of two instructions operands are sent in the IR order into the combineKnownAdjacentMMOs. At the moment it picks the first operand and just replaces its offset and size. This essentially loses alignment information and may generally result in an incorrect base pointer to be used.
Use a base pointer in memory addresses order instead and only adjust size.
Differential Revision: https://reviews.llvm.org/D120370
show more ...
|
#
9e055c0f |
| 21-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.
TODO: merge global stores. TODO: merge flat load/stores. TODO:
[AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.
TODO: merge global stores. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120285
show more ...
|
#
ba17bd26 |
| 21-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Extend SILoadStoreOptimizer to handle global loads
There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space di
[AMDGPU] Extend SILoadStoreOptimizer to handle global loads
There can be situations where global and flat loads and stores are not combined by the vectorizer, in particular if their address space differ in the IR but they end up the same class instructions after selection. For example a divergent load from constant address space ends up being the same global_load as a load from global address space.
TODO: merge global stores. TODO: handle SADDR forms. TODO: merge flat load/stores. TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120279
show more ...
|
#
dc098156 |
| 21-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Remove redundand check in the SILoadStoreOptimizer
Differential Revision: https://reviews.llvm.org/D120268
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
359a792f |
| 28-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases
Previously when combining two loads this pass would sink the first one down to the second one, putting the combined load wh
[AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases
Previously when combining two loads this pass would sink the first one down to the second one, putting the combined load where the second one was. It would also sink any intervening instructions which depended on the first load down to just after the combined load.
For example, if we started with this sequence of instructions (code flowing from left to right):
X A B C D E F Y
After combining loads X and Y into XY we might end up with:
A B C D E F XY
But if B D and F depended on X, we would get:
A C E XY B D F
Now if the original code had some short disjoint live ranges from A to B, C to D and E to F, in the transformed code these live ranges will be long and overlapping. In this way a single merge of two loads could cause an unbounded increase in register pressure.
To fix this, change the way the way that loads are moved in order to merge them so that: - The second load is moved up to the first one. (But when merging stores, we still move the first store down to the second one.) - Intervening instructions are never moved. - Instead, if we find an intervening instruction that would need to be moved, give up on the merge. But this case should now be pretty rare because normal stores have no outputs, and normal loads only have address register inputs, but these will be identical for any pair of loads that we try to merge.
As well as fixing the unbounded register pressure increase problem, moving loads up and stores down seems like it should usually be a win for memory latency reasons.
Differential Revision: https://reviews.llvm.org/D119006
show more ...
|
#
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <Sebastian.Neubauer@amd.com> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
#
a456ace9 |
| 27-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.
Separate the function clearly into: - Checks that can be done on CI and Paired before the loop. - The loop over all instructions be
[AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.
Separate the function clearly into: - Checks that can be done on CI and Paired before the loop. - The loop over all instructions between CI and Paired. - Checks that must be done on InstsToMove after the loop.
Previously these were mostly done inside the loop in a very confusing way.
Differential Revision: https://reviews.llvm.org/D118994
show more ...
|
#
001cb431 |
| 04-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: fewer calls to offsetsCanBeCombined
Only call offsetsCanBeCombined with Modify = true in cases where it will really do something. NFC.
|
#
00bbda07 |
| 28-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: simplify class/subclass checks
Also add a comment explaining the difference between class and subclass. NFCI.
|
#
33ef8bdf |
| 04-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: simplify optimizeInstsWithSameBaseAddr
Common up all the calls to CI.setMI. NFCI.
|
#
ca05edd9 |
| 04-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test
At this point CI represents the combined access (original CI combined with Paired) so it doesn't make any sense to add in Paired.width
[AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test
At this point CI represents the combined access (original CI combined with Paired) so it doesn't make any sense to add in Paired.width again. NFCI.
show more ...
|
#
68e39462 |
| 27-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects
This just helps to keep the lists shorter and faster to sort. NFCI.
Differential Revision: https://reviews.llvm.org/D118
[AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects
This just helps to keep the lists shorter and faster to sort. NFCI.
Differential Revision: https://reviews.llvm.org/D118384
show more ...
|
#
4b133cee |
| 27-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them
[AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable list seems cleaner than adding them to the list and rejecting them later.
Differential Revision: https://reviews.llvm.org/D118368
show more ...
|
#
94a4594c |
| 27-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list a
[AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a cleaner solution than putting them all in the same list and then later refusing to merge instructions of different AGPR-ness.
Differential Revision: https://reviews.llvm.org/D118367
show more ...
|