History log of /llvm-project/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp (Results 26 – 50 of 128)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 4010f894 09-Jul-2024 paperchalice <liujunchang97@outlook.com>

[CodeGen][NewPM] Port `SlotIndexes` to new pass manager (#97941)

- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.


Revision tags: llvmorg-18.1.8
# db096adb 12-Jun-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133)

This does not work correctly in divergent control flow. Can be replaced
with a later exec mask manipulation optimizer.

This

[AMDGPU] Remove SIWholeQuadMode pseudo wavemode optimization (#94133)

This does not work correctly in divergent control flow. Can be replaced
with a later exec mask manipulation optimizer.

This reverts commit a3646ec1bc662e221c2a1d182987257c50958789.

show more ...


# 4b24c2df 12-Jun-2024 paperchalice <liujunchang97@outlook.com>

[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis result (#95113)

`MachinePostDominators` version of #94571.


# 837dc542 11-Jun-2024 paperchalice <liujunchang97@outlook.com>

[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571)

Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree v

[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571)

Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.

show more ...


# df6750ea 07-Jun-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (#93680)

Whole quad mode requires inserting a copy of the initial EXEC mask. In a
function that also uses llvm.amdgcn.init.exec, inser

[AMDGPU] Fix interaction between WQM and llvm.amdgcn.init.exec (#93680)

Whole quad mode requires inserting a copy of the initial EXEC mask. In a
function that also uses llvm.amdgcn.init.exec, insert the COPY after
initializing EXEC.

show more ...


# 4c6dd70e 06-Jun-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQuadMode (#94452)

NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT
instructions a little longer so that we can r

[AMDGPU] Move INIT_EXEC lowering from SILowerControlFlow to SIWholeQuadMode (#94452)

NFCI; this just preserves SI_INIT_EXEC and SI_INIT_EXEC_FROM_INPUT
instructions a little longer so that we can reliably identify them in
SIWholeQuadMode.

show more ...


Revision tags: llvmorg-18.1.7
# 180448b1 29-May-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Reduce use of continue in SIWholeQuadMode. NFC. (#93659)


Revision tags: llvmorg-18.1.6, llvmorg-18.1.5
# f6d431f2 24-Apr-2024 Xu Zhang <simonzgx@gmail.com>

[CodeGen] Make the parameter TRI required in some functions. (#85968)

Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many

[CodeGen] Make the parameter TRI required in some functions. (#85968)

Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.

show more ...


Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 82e33d62 03-Jan-2024 Mirko Brkušanin <Mirko.Brkusanin@amd.com>

[AMDGPU] Add VDSDIR instructions for GFX12 (#75197)


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5
# 20e9e4f7 10-Nov-2023 Diana Picus <Diana-Magda.Picus@amd.com>

[AMDGPU] si-wqm: Skip only LiveMask COPY

si-wqm sometimes needs to save the LiveMask in the entry block. Later
on, while looking for a place to enter WQM/WWM, it unconditionally
skips over the first

[AMDGPU] si-wqm: Skip only LiveMask COPY

si-wqm sometimes needs to save the LiveMask in the entry block. Later
on, while looking for a place to enter WQM/WWM, it unconditionally
skips over the first COPY instruction in the entry block. This is
incorrect for functions where the LiveMask doesn't need to be saved, and
therefore the first COPY is more likely a COPY from a function argument
and might need to be in some non-exact mode.

This patch fixes the issue by also checking that the source of the COPY
is the EXEC register.

This produces different code in 3 of the existing tests:

In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than
outside. This is benign.

In wave32.ll, we end up with an extra register copy. This is because
the first COPY in the block is now part of the WWM block, so
si-pre-allocate-wwm-regs will allocate a new register for its
destination (when it was outside of the WWM region, the register
allocator could just re-use the same register). We might be able to
improve this in si-pre-allocate-wwm-regs but I haven't looked into it.

The same thing happens in dual-source-blend-export.ll, but for that
one it's harder to see because of the scheduling changes. I've uploaded
the before/after si-wqm output for it here:
https://reviews.llvm.org/differential/diff/553445/

Differential Revision: https://reviews.llvm.org/D158841

show more ...


# 0eb51681 02-Nov-2023 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Remove dom tree requirements from SIWholeQuadMode pass (#71012)

SIWholeQuadMode preserves dominator and post dominator trees, but does
not require them.


Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# da7892f7 16-Jun-2023 Jay Foad <jay.foad@amd.com>

[MC] Use regunits instead of MCRegUnitIterator. NFC.

Differential Revision: https://reviews.llvm.org/D153122


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# aa2d0fbc 21-May-2023 Sergei Barannikov <barannikov88@gmail.com>

[MC] Add MCRegisterInfo::regunits for iteration over register units

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152098


# 6afc4b06 06-Jun-2023 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] WQM: Ensure exact mode placement before branches

Fix for D151797 where the change accidentally allowed exit to
exact mode between branch instructions.

Reviewed By: dstuttard

Differential

[AMDGPU] WQM: Ensure exact mode placement before branches

Fix for D151797 where the change accidentally allowed exit to
exact mode between branch instructions.

Reviewed By: dstuttard

Differential Revision: https://reviews.llvm.org/D152228

show more ...


# 3030c039 05-Jun-2023 Jay Foad <jay.foad@amd.com>

[AMDGPU] Make use of MachineInstr::all_defs and all_uses. NFCI.


# 2e87ed80 02-Jun-2023 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] WQM: Allow insertion of exact mode transition as terminator

Allow WQM pass to insert transitions to exact mode among block
terminators, instead of forcing them to occur before terminators.

[AMDGPU] WQM: Allow insertion of exact mode transition as terminator

Allow WQM pass to insert transitions to exact mode among block
terminators, instead of forcing them to occur before terminators.

This should not yield any functional change, but allows block
splitting of control flow, such as that in D145329.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D151797

show more ...


# cafb0991 26-May-2023 Mikael Holmen <mikael.holmen@ericsson.com>

[AMDGPU] Silence gcc warning [NFC]

Without the fix gcc complains with
../lib/Target/AMDGPU/SIWholeQuadMode.cpp:1543: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
15

[AMDGPU] Silence gcc warning [NFC]

Without the fix gcc complains with
../lib/Target/AMDGPU/SIWholeQuadMode.cpp:1543: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
1542 | unsigned CopyOp = MI->getOperand(1).isReg()
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1543 | ? AMDGPU::COPY
| ~~~~~~~~~~~~~~
1544 | : TII->getMovOpcode(TRI->getRegClassForOperandReg(
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1545 | *MRI, MI->getOperand(0)));
|

show more ...


# 9283c43e 22-May-2023 Jay Foad <jay.foad@amd.com>

[AMDGPU] Fix lowering of @llvm.amdgcn.set.inactive(imm, poison)

If the second argument of V_SET_INACTIVE is undef/poison,
SIWholeQuadMode lowered it to a COPY from the first argument, but that
cause

[AMDGPU] Fix lowering of @llvm.amdgcn.set.inactive(imm, poison)

If the second argument of V_SET_INACTIVE is undef/poison,
SIWholeQuadMode lowered it to a COPY from the first argument, but that
caused invalid MIR if the first argument was an immediate rather than a
register.

Fix this by lowering to a V_MOV instruction instead of a COPY.

Fixes https://github.com/llvm/llvm-project/issues/62862

Differential Revision: https://reviews.llvm.org/D151105

show more ...


Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 5bc703f7 20-Dec-2022 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass

Accelerate finding the base class for a physical register by
building a statically mapping table from physical registers
to base classes usi

[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass

Accelerate finding the base class for a physical register by
building a statically mapping table from physical registers
to base classes using TableGen.

Replace uses of SIRegisterInfo::getPhysRegClass with
TargetRegisterInfo::getPhysRegBaseClass in order to use
the computed table.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D139422

show more ...


# 6443c0ee 12-Dec-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Stop using make_pair and make_tuple. NFC.

C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.l

[AMDGPU] Stop using make_pair and make_tuple. NFC.

C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828

show more ...


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4
# a3646ec1 28-Oct-2022 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Add pseudo wavemode to optimize strict_wqm

Strict WQM does not require a WQM transistion if it occurs within
an existing WQM section.
This occurs heavily in GFX11 pixel shaders with LDS_PAR

[AMDGPU] Add pseudo wavemode to optimize strict_wqm

Strict WQM does not require a WQM transistion if it occurs within
an existing WQM section.
This occurs heavily in GFX11 pixel shaders with LDS_PARAM_LOAD.
Which leads to unnecessary EXEC mask manipulation.

To avoid these transitions, detect WQM -> Strict WQM -> WQM
and substitute new ENTER_PSEUDO_WM/EXIT_PSEUDO_WM markers instead.
These are treat similarly by WWM register pre-allocation pass,
but do not manipulate EXEC or use registers to save EXEC state.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D136813

show more ...


Revision tags: llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2
# 78341948 01-Aug-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

TableGen: Introduce generated getSubRegisterClass function

Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
m

TableGen: Introduce generated getSubRegisterClass function

Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.

show more ...


Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 732eed40 09-Mar-2022 Ruiling Song <ruiling.song@amd.com>

[AMDGPU] Mark GFX11 dual source blend export as strict-wqm

The instructions that generate the source of dual source blend export
should run in strict-wqm. That is if any lane in a quad is active,
we

[AMDGPU] Mark GFX11 dual source blend export as strict-wqm

The instructions that generate the source of dual source blend export
should run in strict-wqm. That is if any lane in a quad is active,
we need to enable all four lanes of that quad to make the shuffling
operation before exporting to dual source blend target work correctly.

Differential Revision: https://reviews.llvm.org/D127981

show more ...


Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 29621c13 11-Mar-2021 Piotr Sobczak <Piotr.Sobczak@amd.com>

[AMDGPU] Tag GFX11 LDS loads as using strict_wqm

LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad
(if any pixel is enabled in the quad, data is written
to all 4 pixels/threads in the quad).

Tag

[AMDGPU] Tag GFX11 LDS loads as using strict_wqm

LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad
(if any pixel is enabled in the quad, data is written
to all 4 pixels/threads in the quad).

Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm
to enforce this and avoid lane clobbering issues.
Note that only the instruction itself is tagged.
The implicit uses of these do not need to be set WQM.
The reduces unnecessary WQM calculation of M0.

Differential Revision: https://reviews.llvm.org/D127977

show more ...


# 4271a1ff 18-Jun-2022 Kazu Hirata <kazu@google.com>

[llvm] Call *set::insert without checking membership first (NFC)


123456