Revision tags: llvmorg-21-init |
|
#
96c4f978 |
| 20-Jan-2025 |
Akshat Oke <Akshat.Oke@amd.com> |
[AMDGPU][NewPM] Port SIOptimizeExecMasking to NPM (#123572)
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
63fae3ed |
| 17-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)
|
#
abde52aa |
| 10-Jul-2024 |
paperchalice <liujunchang97@outlook.com> |
[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use
[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2 |
|
#
7c21495f |
| 08-Mar-2024 |
AtariDreams <83477269+AtariDreams@users.noreply.github.com> |
Reapply "Convert many LivePhysRegs uses to LiveRegUnits" (#84338)
This only converts the instances where all that is needed is to change
the variable type name.
Basically, anything that involves
Reapply "Convert many LivePhysRegs uses to LiveRegUnits" (#84338)
This only converts the instances where all that is needed is to change
the variable type name.
Basically, anything that involves a function that LiveRegUnits does not
directly have was skipped to play it safe.
Reverts
https://github.com/llvm/llvm-project/commit/7a0e222a17058a311b69153d0b6f1b4459414778
show more ...
|
Revision tags: llvmorg-18.1.1 |
|
#
7a0e222a |
| 07-Mar-2024 |
Jay Foad <jay.foad@amd.com> |
Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905)"
This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c.
It was causing test failures on the expensive check builders.
|
#
2a13422b |
| 06-Mar-2024 |
AtariDreams <83477269+AtariDreams@users.noreply.github.com> |
Convert many LivePhysRegs uses to LiveRegUnits (#83905)
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
18839aec |
| 02-Nov-2023 |
Thomas Symalla <5754458+tsymalla@users.noreply.github.com> |
[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. (#68293)
During the SIOptimizeExecMasking pass, we try to form V_CMPX
instructions by detecting S_AND_SAVEEXEC and V_
[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. (#68293)
During the SIOptimizeExecMasking pass, we try to form V_CMPX
instructions by detecting S_AND_SAVEEXEC and V_MOV instructions.
Generally, we require the input operand of the V_MOV, which is the input
operand to the to-be-formed V_CMPX, to be alive. This is forced by
clearing the kill flags on the operand after V_CMPX has been generated.
However, if we have a kill of a register set that contains said
register, this will not be detected by clearKillFlags.
With this change, possible additional kill-flag candidates will be
detected during the final call to findInstrBackwards and then, the kill
flag will be removed to keep all registers in the set alive.
Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>
show more ...
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
#
5ee0fb7e |
| 20-Aug-2022 |
Thomas Symalla <github@thomassymalla.de> |
[NFC][AMDGPU] Some cleanups in the SIOptimizeExecMasking pass.
Fix typos and remove an unused argument.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D132292
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
fd64a857 |
| 29-Jun-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. T
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. This patch also cleans up the SIOptimizeExecMasking pass a bit.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D129073
show more ...
|
#
86bd7e20 |
| 04-Jul-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass.
Reviewed By: foad
Differe
[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D129086
show more ...
|
#
7736ce1c |
| 24-Jun-2022 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
It was causing bad machine code for several blender scenes: *** Bad machine code: Using an undefined physical register *** - func
AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
It was causing bad machine code for several blender scenes: *** Bad machine code: Using an undefined physical register *** - function: kernel_holdout_emission_blurring_pathtermination_ao - basic block: %bb.28 if.end40.i (0x7f84861a2320) - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec - operand 1: $vgpr3
Differential Revision: https://reviews.llvm.org/D127768
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
6d97ca69 |
| 08-Apr-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.
We found that it might be beneficial to have the SIOptimizeExecMasking pass detect more cases where v_cmp, s_and_saveexec patterns
[AMDGPU] Increase detection range for s_mov, v_cmpx transformation.
We found that it might be beneficial to have the SIOptimizeExecMasking pass detect more cases where v_cmp, s_and_saveexec patterns can be transformed to s_mov, v_cmpx patterns. Currently, the search range for finding a fitting v_cmp instruction is 5, however, this is doubled to 10 here.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D123367
show more ...
|
#
1a6aa8b1 |
| 31-Mar-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Add missing use check in SIOptimizeExecMasking pass.
Whenever a v_cmp, s_and_saveexec instruction sequence shall be transformed to an equivalent s_mov, v_cmpx sequence, it needs to be detec
[AMDGPU] Add missing use check in SIOptimizeExecMasking pass.
Whenever a v_cmp, s_and_saveexec instruction sequence shall be transformed to an equivalent s_mov, v_cmpx sequence, it needs to be detected if the v_cmp target register is used between the two instructions as the v_cmp result gets omitted by using the v_cmpx instruction, resulting in invalid code.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D122797
show more ...
|
#
3bd15c03 |
| 25-Mar-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Fix adding modifiers when creating v_cmpx instructions.
Revision https://reviews.llvm.org/D122332 added a pattern transformation where v_cmpx instructions are introduced. However, the modif
[AMDGPU] Fix adding modifiers when creating v_cmpx instructions.
Revision https://reviews.llvm.org/D122332 added a pattern transformation where v_cmpx instructions are introduced. However, the modifiers are not correctly inherited from the original operands. The patch adds the source modifiers, if they are exist, or sets them to 0.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D122489
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
718aec20 |
| 01-Feb-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR.
An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison.
This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence.
It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets.
Same as D119696 including a buildbot and MIR test fix.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D122332
show more ...
|
#
7de6107d |
| 21-Mar-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64.
Differential Revision: https://reviews.
Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191ef9ccc6538d52f4b57f98f37d4ea36e and e725e2afe02e18398525652c9bceda1eb055ea64.
Differential Revision: https://reviews.llvm.org/D122117
show more ...
|
#
e725e2af |
| 21-Mar-2022 |
Thomas Symalla <5754458+tsymalla@users.noreply.github.com> |
[AMDGPU] [NFC] Fix missing include.
|
#
011c6419 |
| 01-Feb-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a
[AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ... s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR.
An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison.
This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence.
It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets.
Reviewed By: sebastian-ne, critson
Differential Revision: https://reviews.llvm.org/D119696
show more ...
|
#
d86dcb7e |
| 16-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Return better Changed status from SIOptimizeExecMasking
Differential Revision: https://reviews.llvm.org/D120024
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
#
c16f7760 |
| 10-Feb-2021 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Us
[AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane tracking by updating a stored exec mask when lanes are killed. Use live lane tracking to enable early termination of shader at any point in control flow.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D94746
show more ...
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
#
560d7e04 |
| 20-Jan-2021 |
dfukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D95036
|
Revision tags: llvmorg-11.1.0-rc1 |
|
#
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
fde83517 |
| 10-Nov-2020 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY.
Reviewed By: arsenm
Differential Revision: https://reviews.
[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D90451
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
#
0576f436 |
| 10-Sep-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes not emit the or to exec at the beginning of the block, where
AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes not emit the or to exec at the beginning of the block, where it really has to be. If there is an instruction that defines one of the source operands, split the block and turn the si_end_cf into a terminator.
This avoids regressions when regalloc fast is switched to inserting reloads at the beginning of the block, instead of spills at the end of the block.
In a future change, this should always split the block.
show more ...
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1 |
|
#
44211f20 |
| 23-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Optimize copies to exec with other insts after exec def
It's possible to have terminator instructions after a write to exec, so skip over them to find it.
|