History log of /llvm-project/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp (Results 1 – 25 of 51)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 96c4f978 20-Jan-2025 Akshat Oke <Akshat.Oke@amd.com>

[AMDGPU][NewPM] Port SIOptimizeExecMasking to NPM (#123572)


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 63fae3ed 17-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)


# abde52aa 10-Jul-2024 paperchalice <liujunchang97@outlook.com>

[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)

- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use

[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)

- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.

This would be the last analysis required by `PHIElimination`.

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2
# 7c21495f 08-Mar-2024 AtariDreams <83477269+AtariDreams@users.noreply.github.com>

Reapply "Convert many LivePhysRegs uses to LiveRegUnits" (#84338)

This only converts the instances where all that is needed is to change
the variable type name.

Basically, anything that involves

Reapply "Convert many LivePhysRegs uses to LiveRegUnits" (#84338)

This only converts the instances where all that is needed is to change
the variable type name.

Basically, anything that involves a function that LiveRegUnits does not
directly have was skipped to play it safe.

Reverts
https://github.com/llvm/llvm-project/commit/7a0e222a17058a311b69153d0b6f1b4459414778

show more ...


Revision tags: llvmorg-18.1.1
# 7a0e222a 07-Mar-2024 Jay Foad <jay.foad@amd.com>

Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905)"

This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c.

It was causing test failures on the expensive check builders.


# 2a13422b 06-Mar-2024 AtariDreams <83477269+AtariDreams@users.noreply.github.com>

Convert many LivePhysRegs uses to LiveRegUnits (#83905)


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5
# 18839aec 02-Nov-2023 Thomas Symalla <5754458+tsymalla@users.noreply.github.com>

[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. (#68293)

During the SIOptimizeExecMasking pass, we try to form V_CMPX
instructions by detecting S_AND_SAVEEXEC and V_

[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. (#68293)

During the SIOptimizeExecMasking pass, we try to form V_CMPX
instructions by detecting S_AND_SAVEEXEC and V_MOV instructions.
Generally, we require the input operand of the V_MOV, which is the input
operand to the to-be-formed V_CMPX, to be alive. This is forced by
clearing the kill flags on the operand after V_CMPX has been generated.

However, if we have a kill of a register set that contains said
register, this will not be detected by clearKillFlags.
With this change, possible additional kill-flag candidates will be
detected during the final call to findInstrBackwards and then, the kill
flag will be removed to keep all registers in the set alive.

Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>

show more ...


Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3
# 5ee0fb7e 20-Aug-2022 Thomas Symalla <github@thomassymalla.de>

[NFC][AMDGPU] Some cleanups in the SIOptimizeExecMasking pass.

Fix typos and remove an unused argument.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132292


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# fd64a857 29-Jun-2022 Thomas Symalla <thomas.symalla@amd.com>

[AMDGPU] Combine s_or_saveexec, s_xor instructions.

This patch merges a consecutive sequence of

s_or_saveexec s_o, s_i
s_xor exec, exec, s_o

into a single

s_andn2_saveexec s_o, s_i instruction.
T

[AMDGPU] Combine s_or_saveexec, s_xor instructions.

This patch merges a consecutive sequence of

s_or_saveexec s_o, s_i
s_xor exec, exec, s_o

into a single

s_andn2_saveexec s_o, s_i instruction.
This patch also cleans up the SIOptimizeExecMasking pass a bit.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D129073

show more ...


# 86bd7e20 04-Jul-2022 Thomas Symalla <thomas.symalla@amd.com>

[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.

This patch removes a bit of code duplication and
moves the v_cmpx optimization out of the
runOnMachineFunction pass.

Reviewed By: foad

Differe

[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.

This patch removes a bit of code duplication and
moves the v_cmpx optimization out of the
runOnMachineFunction pass.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D129086

show more ...


# 7736ce1c 24-Jun-2022 Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

AMDGPU: Clear kill flags when optimizing vcmp save exec sequence

It was causing bad machine code for several blender scenes:
*** Bad machine code: Using an undefined physical register ***
- func

AMDGPU: Clear kill flags when optimizing vcmp save exec sequence

It was causing bad machine code for several blender scenes:
*** Bad machine code: Using an undefined physical register ***
- function: kernel_holdout_emission_blurring_pathtermination_ao
- basic block: %bb.28 if.end40.i (0x7f84861a2320)
- instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec
- operand 1: $vgpr3

Differential Revision: https://reviews.llvm.org/D127768

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 6d97ca69 08-Apr-2022 Thomas Symalla <thomas.symalla@amd.com>

[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.

We found that it might be beneficial to have the SIOptimizeExecMasking
pass detect more cases where v_cmp, s_and_saveexec patterns

[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.

We found that it might be beneficial to have the SIOptimizeExecMasking
pass detect more cases where v_cmp, s_and_saveexec patterns can be
transformed to s_mov, v_cmpx patterns. Currently, the search range
for finding a fitting v_cmp instruction is 5, however, this is doubled
to 10 here.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D123367

show more ...


# 1a6aa8b1 31-Mar-2022 Thomas Symalla <thomas.symalla@amd.com>

[AMDGPU] Add missing use check in SIOptimizeExecMasking pass.

Whenever a v_cmp, s_and_saveexec instruction sequence shall be
transformed to an equivalent s_mov, v_cmpx sequence, it needs
to be detec

[AMDGPU] Add missing use check in SIOptimizeExecMasking pass.

Whenever a v_cmp, s_and_saveexec instruction sequence shall be
transformed to an equivalent s_mov, v_cmpx sequence, it needs
to be detected if the v_cmp target register is used between
the two instructions as the v_cmp result gets omitted by
using the v_cmpx instruction, resulting in invalid code.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D122797

show more ...


# 3bd15c03 25-Mar-2022 Thomas Symalla <thomas.symalla@amd.com>

[AMDGPU] Fix adding modifiers when creating v_cmpx instructions.

Revision https://reviews.llvm.org/D122332 added a pattern transformation
where v_cmpx instructions are introduced. However, the modif

[AMDGPU] Fix adding modifiers when creating v_cmpx instructions.

Revision https://reviews.llvm.org/D122332 added a pattern transformation
where v_cmpx instructions are introduced. However, the modifiers are
not correctly inherited from the original operands. The patch
adds the source modifiers, if they are exist, or sets them to 0.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D122489

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init
# 718aec20 01-Feb-2022 Thomas Symalla <thomas.symalla@amd.com>

[AMDGPU] Improve v_cmpx usage on GFX10.3.

On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a

[AMDGPU] Improve v_cmpx usage on GFX10.3.

On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Same as D119696 including a buildbot and MIR test fix.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D122332

show more ...


# 7de6107d 21-Mar-2022 Thomas Symalla <thomas.symalla@amd.com>

Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."

This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and
e725e2afe02e18398525652c9bceda1eb055ea64.

Differential Revision: https://reviews.

Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."

This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and
e725e2afe02e18398525652c9bceda1eb055ea64.

Differential Revision: https://reviews.llvm.org/D122117

show more ...


# e725e2af 21-Mar-2022 Thomas Symalla <5754458+tsymalla@users.noreply.github.com>

[AMDGPU] [NFC] Fix missing include.


# 011c6419 01-Feb-2022 Thomas Symalla <thomas.symalla@amd.com>

[AMDGPU] Improve v_cmpx usage on GFX10.3.

On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a

[AMDGPU] Improve v_cmpx usage on GFX10.3.

On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Reviewed By: sebastian-ne, critson

Differential Revision: https://reviews.llvm.org/D119696

show more ...


# d86dcb7e 16-Feb-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Return better Changed status from SIOptimizeExecMasking

Differential Revision: https://reviews.llvm.org/D120024


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# c16f7760 10-Feb-2021 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Move kill lowering to WQM pass and add live mask tracking

Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Us

[AMDGPU] Move kill lowering to WQM pass and add live mask tracking

Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Use live lane tracking to enable early termination of shader
at any point in control flow.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94746

show more ...


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# 560d7e04 20-Jan-2021 dfukalov <daniil.fukalov@amd.com>

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036


Revision tags: llvmorg-11.1.0-rc1
# 6a87e9b0 25-Dec-2020 dfukalov <daniil.fukalov@amd.com>

[NFC][AMDGPU] Reduce include files dependency.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# fde83517 10-Nov-2020 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term

If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.

[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term

If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 0576f436 10-Sep-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't sometimes allow instructions before lowered si_end_cf

Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes
not emit the or to exec at the beginning of the block, where

AMDGPU: Don't sometimes allow instructions before lowered si_end_cf

Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.

This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.

In a future change, this should always split the block.

show more ...


Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1
# 44211f20 23-Jul-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Optimize copies to exec with other insts after exec def

It's possible to have terminator instructions after a write to exec,
so skip over them to find it.


123