SIWholeQuadMode.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 4010f894	09-Jul-2024	paperchalice <liujunchang97@outlook.com>	[CodeGen][NewPM] Port `SlotIndexes` to new pass manager (#97941) - Add `SlotIndexesAnalysis`. - Add `SlotIndexesPrinterPass`. - Use `SlotIndexesWrapperPass` in legacy pass.
Revision tags: llvmorg-18.1.8
# db096adb	12-Jun-2024	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133) This does not work correctly in divergent control flow. Can be replaced with a later exec mask manipulation optimizer. This [AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133) This does not work correctly in divergent control flow. Can be replaced with a later exec mask manipulation optimizer. This reverts commit a3646ec1bc662e221c2a1d182987257c50958789. show more ...
# 4b24c2df	12-Jun-2024	paperchalice <liujunchang97@outlook.com>	[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis result (#95113) `MachinePostDominators` version of #94571.
# 837dc542	11-Jun-2024	paperchalice <liujunchang97@outlook.com>	[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree v [CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes. show more ...
# df6750ea	07-Jun-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (#93680) Whole quad mode requires inserting a copy of the initial EXEC mask. In a function that also uses llvm.amdgcn.init.exec, inser [AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (#93680) Whole quad mode requires inserting a copy of the initial EXEC mask. In a function that also uses llvm.amdgcn.init.exec, insert the COPY after initializing EXEC. show more ...
# 4c6dd70e	06-Jun-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQuadMode (#94452) NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT instructions a little longer so that we can r [AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQuadMode (#94452) NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT instructions a little longer so that we can reliably identify them in SIWholeQuadMode. show more ...
Revision tags: llvmorg-18.1.7
# 180448b1	29-May-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Reduce use of continue in SIWholeQuadMode. NFC. (#93659)
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5
# f6d431f2	24-Apr-2024	Xu Zhang <simonzgx@gmail.com>	[CodeGen] Make the parameter TRI required in some functions. (#85968) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many [CodeGen] Make the parameter TRI required in some functions. (#85968) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`. show more ...
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 82e33d62	03-Jan-2024	Mirko Brkušanin <Mirko.Brkusanin@amd.com>	[AMDGPU] Add VDSDIR instructions for GFX12 (#75197)
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5
# 20e9e4f7	10-Nov-2023	Diana Picus <Diana-Magda.Picus@amd.com>	[AMDGPU] si-wqm: Skip only LiveMask COPY si-wqm sometimes needs to save the LiveMask in the entry block. Later on, while looking for a place to enter WQM/WWM, it unconditionally skips over the first [AMDGPU] si-wqm: Skip only LiveMask COPY si-wqm sometimes needs to save the LiveMask in the entry block. Later on, while looking for a place to enter WQM/WWM, it unconditionally skips over the first COPY instruction in the entry block. This is incorrect for functions where the LiveMask doesn't need to be saved, and therefore the first COPY is more likely a COPY from a function argument and might need to be in some non-exact mode. This patch fixes the issue by also checking that the source of the COPY is the EXEC register. This produces different code in 3 of the existing tests: In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than outside. This is benign. In wave32.ll, we end up with an extra register copy. This is because the first COPY in the block is now part of the WWM block, so si-pre-allocate-wwm-regs will allocate a new register for its destination (when it was outside of the WWM region, the register allocator could just re-use the same register). We might be able to improve this in si-pre-allocate-wwm-regs but I haven't looked into it. The same thing happens in dual-source-blend-export.ll, but for that one it's harder to see because of the scheduling changes. I've uploaded the before/after si-wqm output for it here: https://reviews.llvm.org/differential/diff/553445/ Differential Revision: https://reviews.llvm.org/D158841 show more ...
# 0eb51681	02-Nov-2023	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Remove dom tree requirements from SIWholeQuadMode pass (#71012) SIWholeQuadMode preserves dominator and post dominator trees, but does not require them.
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# da7892f7	16-Jun-2023	Jay Foad <jay.foad@amd.com>	[MC] Use regunits instead of MCRegUnitIterator. NFC. Differential Revision: https://reviews.llvm.org/D153122
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# aa2d0fbc	21-May-2023	Sergei Barannikov <barannikov88@gmail.com>	[MC] Add MCRegisterInfo::regunits for iteration over register units Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152098
# 6afc4b06	06-Jun-2023	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] WQM: Ensure exact mode placement before branches Fix for D151797 where the change accidentally allowed exit to exact mode between branch instructions. Reviewed By: dstuttard Differential [AMDGPU] WQM: Ensure exact mode placement before branches Fix for D151797 where the change accidentally allowed exit to exact mode between branch instructions. Reviewed By: dstuttard Differential Revision: https://reviews.llvm.org/D152228 show more ...
# 3030c039	05-Jun-2023	Jay Foad <jay.foad@amd.com>	[AMDGPU] Make use of MachineInstr::all_defs and all_uses. NFCI.
# 2e87ed80	02-Jun-2023	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] WQM: Allow insertion of exact mode transition as terminator Allow WQM pass to insert transitions to exact mode among block terminators, instead of forcing them to occur before terminators. [AMDGPU] WQM: Allow insertion of exact mode transition as terminator Allow WQM pass to insert transitions to exact mode among block terminators, instead of forcing them to occur before terminators. This should not yield any functional change, but allows block splitting of control flow, such as that in D145329. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D151797 show more ...
# cafb0991	26-May-2023	Mikael Holmen <mikael.holmen@ericsson.com>	[AMDGPU] Silence gcc warning [NFC] Without the fix gcc complains with ../lib/Target/AMDGPU/SIWholeQuadMode.cpp:1543: warning: enumeral and non-enumeral type in conditional expression [-Wextra] 15 [AMDGPU] Silence gcc warning [NFC] Without the fix gcc complains with ../lib/Target/AMDGPU/SIWholeQuadMode.cpp:1543: warning: enumeral and non-enumeral type in conditional expression [-Wextra] 1542 \| unsigned CopyOp = MI->getOperand(1).isReg() \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1543 \| ? AMDGPU::COPY \| ~~~~~~~~~~~~~~ 1544 \| : TII->getMovOpcode(TRI->getRegClassForOperandReg( \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1545 \| *MRI, MI->getOperand(0))); \| show more ...
# 9283c43e	22-May-2023	Jay Foad <jay.foad@amd.com>	[AMDGPU] Fix lowering of @llvm.amdgcn.set.inactive(imm, poison) If the second argument of V_SET_INACTIVE is undef/poison, SIWholeQuadMode lowered it to a COPY from the first argument, but that cause [AMDGPU] Fix lowering of @llvm.amdgcn.set.inactive(imm, poison) If the second argument of V_SET_INACTIVE is undef/poison, SIWholeQuadMode lowered it to a COPY from the first argument, but that caused invalid MIR if the first argument was an immediate rather than a register. Fix this by lowering to a V_MOV instruction instead of a COPY. Fixes https://github.com/llvm/llvm-project/issues/62862 Differential Revision: https://reviews.llvm.org/D151105 show more ...
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 5bc703f7	20-Dec-2022	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes usi [AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422 show more ...
# 6443c0ee	12-Dec-2022	Jay Foad <jay.foad@amd.com>	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.l [AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828 show more ...
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4
# a3646ec1	28-Oct-2022	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Add pseudo wavemode to optimize strict_wqm Strict WQM does not require a WQM transistion if it occurs within an existing WQM section. This occurs heavily in GFX11 pixel shaders with LDS_PAR [AMDGPU] Add pseudo wavemode to optimize strict_wqm Strict WQM does not require a WQM transistion if it occurs within an existing WQM section. This occurs heavily in GFX11 pixel shaders with LDS_PARAM_LOAD. Which leads to unnecessary EXEC mask manipulation. To avoid these transitions, detect WQM -> Strict WQM -> WQM and substitute new ENTER_PSEUDO_WM/EXIT_PSEUDO_WM markers instead. These are treat similarly by WWM register pre-allocation pass, but do not manipulate EXEC or use registers to save EXEC state. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D136813 show more ...
Revision tags: llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2
# 78341948	01-Aug-2022	Matt Arsenault <Matthew.Arsenault@amd.com>	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a m TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator. show more ...
Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 732eed40	09-Mar-2022	Ruiling Song <ruiling.song@amd.com>	[AMDGPU] Mark GFX11 dual source blend export as strict-wqm The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we [AMDGPU] Mark GFX11 dual source blend export as strict-wqm The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we need to enable all four lanes of that quad to make the shuffling operation before exporting to dual source blend target work correctly. Differential Revision: https://reviews.llvm.org/D127981 show more ...
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 29621c13	11-Mar-2021	Piotr Sobczak <Piotr.Sobczak@amd.com>	[AMDGPU] Tag GFX11 LDS loads as using strict_wqm LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad). Tag [AMDGPU] Tag GFX11 LDS loads as using strict_wqm LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad). Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm to enforce this and avoid lane clobbering issues. Note that only the instruction itself is tagged. The implicit uses of these do not need to be set WQM. The reduces unnecessary WQM calculation of M0. Differential Revision: https://reviews.llvm.org/D127977 show more ...
# 4271a1ff	18-Jun-2022	Kazu Hirata <kazu@google.com>	[llvm] Call *set::insert without checking membership first (NFC)
123 4 5 6