#
2bfcacf0 |
| 03-Jul-2020 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Insert PS early exit at end of control flow
Exit early if the exec mask is zero at the end of control flow. Mark the ends of control flow during control flow lowering and convert these to e
[AMDGPU] Insert PS early exit at end of control flow
Exit early if the exec mask is zero at the end of control flow. Mark the ends of control flow during control flow lowering and convert these to exits during the insert skips pass.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D82737
show more ...
|
Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
12a32439 |
| 07-Apr-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Limit endcf-collapase to simple if
We can only collapse adjacent SI_END_CF if outer statement belongs to a simple SI_IF, otherwise correct mask is not in the register we expect, but is an a
[AMDGPU] Limit endcf-collapase to simple if
We can only collapse adjacent SI_END_CF if outer statement belongs to a simple SI_IF, otherwise correct mask is not in the register we expect, but is an argument of an S_XOR instruction.
Even if SI_IF is simple it might be lowered using S_XOR because lowering is dependent on a basic block layout. It is not considered simple if instruction consuming its output is not an SI_END_CF. Since that SI_END_CF might have already been lowered to an S_OR isSimpleIf() check may return false.
This situation is an opportunity for a further optimization of SI_IF lowering, but that is a separate optimization. In the meanwhile move SI_END_CF post the lowering when we already know how the rest of the CFG was lowered since a non-simple SI_IF case still needs to be handled.
Differential Revision: https://reviews.llvm.org/D77610
show more ...
|
#
ddd2f4b9 |
| 06-Apr-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix inaccurate comments
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5 |
|
#
c262b69d |
| 13-Mar-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fix endcf collapse
Only collapse inner endcf if the outer one belongs to SI_IF. If it does belong to SI_ELSE then mask being restored in fact a partial inverse of what we need.
Differentia
[AMDGPU] Fix endcf collapse
Only collapse inner endcf if the outer one belongs to SI_IF. If it does belong to SI_ELSE then mask being restored in fact a partial inverse of what we need.
Differential Revision: https://reviews.llvm.org/D76154
show more ...
|
#
32e90cbc |
| 13-Mar-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our scopes are not as perfectly enclosed as I expected. Disable it for now.
Differential Revision: https://review
[AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our scopes are not as perfectly enclosed as I expected. Disable it for now.
Differential Revision: https://reviews.llvm.org/D76148
show more ...
|
Revision tags: llvmorg-10.0.0-rc4 |
|
#
360aff04 |
| 11-Mar-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Simplify nested SI_END_CF
This is to replace the optimization from the SIOptimizeExecMaskingPreRA. We have less opportunities in the control flow lowering because many VGPR copies are still
[AMDGPU] Simplify nested SI_END_CF
This is to replace the optimization from the SIOptimizeExecMaskingPreRA. We have less opportunities in the control flow lowering because many VGPR copies are still in place and will be removed later, but we know for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64 with EXEC.
The subsequent change needs to convert s_and_saveexec into s_and and address new TODO lines in tests, then code block guarded by the -amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer will be removed.
Differential Revision: https://reviews.llvm.org/D76033
show more ...
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
#
6e177082 |
| 27-Dec-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix SI_IF lowering when the save exec reg has terminator uses
Reverts part of 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f. Since that commit, the expansion was ignoring the actual save exec reg
AMDGPU: Fix SI_IF lowering when the save exec reg has terminator uses
Reverts part of 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f. Since that commit, the expansion was ignoring the actual save exec register produced by the instruction, and looking at other instructions. I do not understand why it was looking at other instructions, but relying on this scan was wrong.
Fixes verifier errors after SI_IF is tail duplicated, which should be correct to do. The results were fed into a phi, which was lowered to the S_MOV_B64_term instructions.
show more ...
|
#
e53a9d96 |
| 22-Jan-2020 |
cdevadas <cdevadas@amd.com> |
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to h
Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary.
This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
show more ...
|
#
a80291ce |
| 21-Jan-2020 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Me
Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd.
The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.
show more ...
|
#
0dc6c249 |
| 10-Jan-2020 |
cdevadas <cdevadas@amd.com> |
[AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an opt
[AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary.
This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
show more ...
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
008e65a7 |
| 18-Nov-2019 |
vpykhtin <valery.pykhtin@gmail.com> |
[AMDGPU] Fix emitIfBreak CF lowering: use temp reg to make register coalescer life easier.
Differential revision: https://reviews.llvm.org/D70405
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6 |
|
#
6524a7a2 |
| 17-Sep-2019 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed
Defferential Revision: https://reviews.llvm.org/D67101
Reviewers: rampitec, vpykhtin llvm-svn: 372086
|
#
9ff70132 |
| 13-Sep-2019 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
llvm-svn: 371873
|
Revision tags: llvmorg-9.0.0-rc5 |
|
#
c2d292f8 |
| 10-Sep-2019 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
[AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
Reviewers: rampitec, vpykhtin
Differential Revision: https://reviews.llvm.org/D67101
llvm-svn: 371508
|
Revision tags: llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
#
4b7fc85c |
| 20-Aug-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"
This reverts r367500 and r369203. This is causing various test failures.
llvm-svn: 369417
|
#
479f3bdb |
| 18-Aug-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix iterator error when lowering SI_END_CF
If the instruction is the last in the block, there is no next instruction but the iteration still needs to look at the new block.
llvm-svn: 369203
|
#
0c476111 |
| 15-Aug-2019 |
Daniel Sanders <daniel_l_sanders@apple.com> |
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Re
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
show more ...
|
Revision tags: llvmorg-9.0.0-rc2 |
|
#
2bea69bf |
| 01-Aug-2019 |
Daniel Sanders <daniel_l_sanders@apple.com> |
Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
|
#
d48324ff |
| 01-Aug-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Split block for si_end_cf"
This reverts commit r359363, reapplying r357634
llvm-svn: 367500
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
#
e3a676e9 |
| 24-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Introduce a class for registers
Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set neede
CodeGen: Introduce a class for registers
Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg().
llvm-svn: 364191
show more ...
|
#
52500216 |
| 16-Jun-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden getRegClass().
Differential Revision: https://reviews.llvm.org/D63351
llvm-svn: 363513
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
#
76c5b629 |
| 27-Apr-2019 |
Mark Searles <m.c.searles@gmail.com> |
Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.
We discovered some internal test failures, so reverting for now.
Differential Revision: htt
Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.
We discovered some internal test failures, so reverting for now.
Differential Revision: https://reviews.llvm.org/D61213
llvm-svn: 359363
show more ...
|
#
396653f8 |
| 03-Apr-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgot
AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten.
Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA.
This avoids regressions in the fast register allocator rewrite from inverting the direction.
llvm-svn: 357634
show more ...
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4 |
|
#
87039773 |
| 05-Mar-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Preserve undef flag when expanding SI_IF
Fixes undefined value verifier error.
llvm-svn: 355426
|
Revision tags: llvmorg-8.0.0-rc3 |
|
#
476e26b5 |
| 22-Feb-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use removeAllRegUnitsForPhysReg
llvm-svn: 354686
|