#
4010f894 |
| 09-Jul-2024 |
paperchalice <liujunchang97@outlook.com> |
[CodeGen][NewPM] Port `SlotIndexes` to new pass manager (#97941)
- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.
|
Revision tags: llvmorg-18.1.8 |
|
#
db096adb |
| 12-Jun-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133)
This does not work correctly in divergent control flow. Can be replaced
with a later exec mask manipulation optimizer.
This
[AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133)
This does not work correctly in divergent control flow. Can be replaced
with a later exec mask manipulation optimizer.
This reverts commit a3646ec1bc662e221c2a1d182987257c50958789.
show more ...
|
#
4b24c2df |
| 12-Jun-2024 |
paperchalice <liujunchang97@outlook.com> |
[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis result (#95113)
`MachinePostDominators` version of #94571.
|
#
837dc542 |
| 11-Jun-2024 |
paperchalice <liujunchang97@outlook.com> |
[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree v
[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571)
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
show more ...
|
#
df6750ea |
| 07-Jun-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (#93680)
Whole quad mode requires inserting a copy of the initial EXEC mask. In a
function that also uses llvm.amdgcn.init.exec, inser
[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (#93680)
Whole quad mode requires inserting a copy of the initial EXEC mask. In a
function that also uses llvm.amdgcn.init.exec, insert the COPY after
initializing EXEC.
show more ...
|
#
4c6dd70e |
| 06-Jun-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQuadMode (#94452)
NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT
instructions a little longer so that we can r
[AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQuadMode (#94452)
NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT
instructions a little longer so that we can reliably identify them in
SIWholeQuadMode.
show more ...
|
Revision tags: llvmorg-18.1.7 |
|
#
180448b1 |
| 29-May-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Reduce use of continue in SIWholeQuadMode. NFC. (#93659)
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
f6d431f2 |
| 24-Apr-2024 |
Xu Zhang <simonzgx@gmail.com> |
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
82e33d62 |
| 03-Jan-2024 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] Add VDSDIR instructions for GFX12 (#75197)
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
20e9e4f7 |
| 10-Nov-2023 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[AMDGPU] si-wqm: Skip only LiveMask COPY
si-wqm sometimes needs to save the LiveMask in the entry block. Later on, while looking for a place to enter WQM/WWM, it unconditionally skips over the first
[AMDGPU] si-wqm: Skip only LiveMask COPY
si-wqm sometimes needs to save the LiveMask in the entry block. Later on, while looking for a place to enter WQM/WWM, it unconditionally skips over the first COPY instruction in the entry block. This is incorrect for functions where the LiveMask doesn't need to be saved, and therefore the first COPY is more likely a COPY from a function argument and might need to be in some non-exact mode.
This patch fixes the issue by also checking that the source of the COPY is the EXEC register.
This produces different code in 3 of the existing tests:
In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than outside. This is benign.
In wave32.ll, we end up with an extra register copy. This is because the first COPY in the block is now part of the WWM block, so si-pre-allocate-wwm-regs will allocate a new register for its destination (when it was outside of the WWM region, the register allocator could just re-use the same register). We might be able to improve this in si-pre-allocate-wwm-regs but I haven't looked into it.
The same thing happens in dual-source-blend-export.ll, but for that one it's harder to see because of the scheduling changes. I've uploaded the before/after si-wqm output for it here: https://reviews.llvm.org/differential/diff/553445/
Differential Revision: https://reviews.llvm.org/D158841
show more ...
|
#
0eb51681 |
| 02-Nov-2023 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Remove dom tree requirements from SIWholeQuadMode pass (#71012)
SIWholeQuadMode preserves dominator and post dominator trees, but does
not require them.
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
da7892f7 |
| 16-Jun-2023 |
Jay Foad <jay.foad@amd.com> |
[MC] Use regunits instead of MCRegUnitIterator. NFC.
Differential Revision: https://reviews.llvm.org/D153122
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
#
aa2d0fbc |
| 21-May-2023 |
Sergei Barannikov <barannikov88@gmail.com> |
[MC] Add MCRegisterInfo::regunits for iteration over register units
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D152098
|
#
6afc4b06 |
| 06-Jun-2023 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] WQM: Ensure exact mode placement before branches
Fix for D151797 where the change accidentally allowed exit to exact mode between branch instructions.
Reviewed By: dstuttard
Differential
[AMDGPU] WQM: Ensure exact mode placement before branches
Fix for D151797 where the change accidentally allowed exit to exact mode between branch instructions.
Reviewed By: dstuttard
Differential Revision: https://reviews.llvm.org/D152228
show more ...
|
#
3030c039 |
| 05-Jun-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Make use of MachineInstr::all_defs and all_uses. NFCI.
|
#
2e87ed80 |
| 02-Jun-2023 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] WQM: Allow insertion of exact mode transition as terminator
Allow WQM pass to insert transitions to exact mode among block terminators, instead of forcing them to occur before terminators.
[AMDGPU] WQM: Allow insertion of exact mode transition as terminator
Allow WQM pass to insert transitions to exact mode among block terminators, instead of forcing them to occur before terminators.
This should not yield any functional change, but allows block splitting of control flow, such as that in D145329.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D151797
show more ...
|
#
cafb0991 |
| 26-May-2023 |
Mikael Holmen <mikael.holmen@ericsson.com> |
[AMDGPU] Silence gcc warning [NFC]
Without the fix gcc complains with ../lib/Target/AMDGPU/SIWholeQuadMode.cpp:1543: warning: enumeral and non-enumeral type in conditional expression [-Wextra] 15
[AMDGPU] Silence gcc warning [NFC]
Without the fix gcc complains with ../lib/Target/AMDGPU/SIWholeQuadMode.cpp:1543: warning: enumeral and non-enumeral type in conditional expression [-Wextra] 1542 | unsigned CopyOp = MI->getOperand(1).isReg() | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1543 | ? AMDGPU::COPY | ~~~~~~~~~~~~~~ 1544 | : TII->getMovOpcode(TRI->getRegClassForOperandReg( | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1545 | *MRI, MI->getOperand(0))); |
show more ...
|
#
9283c43e |
| 22-May-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix lowering of @llvm.amdgcn.set.inactive(imm, poison)
If the second argument of V_SET_INACTIVE is undef/poison, SIWholeQuadMode lowered it to a COPY from the first argument, but that cause
[AMDGPU] Fix lowering of @llvm.amdgcn.set.inactive(imm, poison)
If the second argument of V_SET_INACTIVE is undef/poison, SIWholeQuadMode lowered it to a COPY from the first argument, but that caused invalid MIR if the first argument was an immediate rather than a register.
Fix this by lowering to a V_MOV instruction instead of a COPY.
Fixes https://github.com/llvm/llvm-project/issues/62862
Differential Revision: https://reviews.llvm.org/D151105
show more ...
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
5bc703f7 |
| 20-Dec-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes usi
[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen.
Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D139422
show more ...
|
#
6443c0ee |
| 12-Dec-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.l
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4 |
|
#
a3646ec1 |
| 28-Oct-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Add pseudo wavemode to optimize strict_wqm
Strict WQM does not require a WQM transistion if it occurs within an existing WQM section. This occurs heavily in GFX11 pixel shaders with LDS_PAR
[AMDGPU] Add pseudo wavemode to optimize strict_wqm
Strict WQM does not require a WQM transistion if it occurs within an existing WQM section. This occurs heavily in GFX11 pixel shaders with LDS_PARAM_LOAD. Which leads to unnecessary EXEC mask manipulation.
To avoid these transitions, detect WQM -> Strict WQM -> WQM and substitute new ENTER_PSEUDO_WM/EXIT_PSEUDO_WM markers instead. These are treat similarly by WWM register pre-allocation pass, but do not manipulate EXEC or use registers to save EXEC state.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D136813
show more ...
|
Revision tags: llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
#
78341948 |
| 01-Aug-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
TableGen: Introduce generated getSubRegisterClass function
Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a m
TableGen: Introduce generated getSubRegisterClass function
Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.
show more ...
|
Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
#
732eed40 |
| 09-Mar-2022 |
Ruiling Song <ruiling.song@amd.com> |
[AMDGPU] Mark GFX11 dual source blend export as strict-wqm
The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we
[AMDGPU] Mark GFX11 dual source blend export as strict-wqm
The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we need to enable all four lanes of that quad to make the shuffling operation before exporting to dual source blend target work correctly.
Differential Revision: https://reviews.llvm.org/D127981
show more ...
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
#
29621c13 |
| 11-Mar-2021 |
Piotr Sobczak <Piotr.Sobczak@amd.com> |
[AMDGPU] Tag GFX11 LDS loads as using strict_wqm
LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad).
Tag
[AMDGPU] Tag GFX11 LDS loads as using strict_wqm
LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad).
Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm to enforce this and avoid lane clobbering issues. Note that only the instruction itself is tagged. The implicit uses of these do not need to be set WQM. The reduces unnecessary WQM calculation of M0.
Differential Revision: https://reviews.llvm.org/D127977
show more ...
|
#
4271a1ff |
| 18-Jun-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Call *set::insert without checking membership first (NFC)
|