#
4c0251da |
| 25-Oct-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Enable SGPR copy folding
That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IM
[AMDGPU] Enable SGPR copy folding
That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier.
Borrowed code from the verifier to check for RC legality.
Differential Revision: https://reviews.llvm.org/D69445
show more ...
|
#
c7dcacf1 |
| 25-Oct-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fixed asan failure in SIFoldOperands
Both tryFoldOMod() and tryFoldClamp() remove original instruction, so the check MI.modifiesRegister() may use a deleted MI.
Differential Revision: http
[AMDGPU] Fixed asan failure in SIFoldOperands
Both tryFoldOMod() and tryFoldClamp() remove original instruction, so the check MI.modifiesRegister() may use a deleted MI.
Differential Revision: https://reviews.llvm.org/D69448
show more ...
|
#
d4303b38 |
| 24-Oct-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fold AGPR reg_sequence initializers
Differential Revision: https://reviews.llvm.org/D69413
|
#
b2a65f0d |
| 23-Oct-2019 |
Michael Liao <michael.hliao@gmail.com> |
[AMDGPU] Skip additional folding on the same operand.
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
[AMDGPU] Skip additional folding on the same operand.
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69355
show more ...
|
#
61e7a61b |
| 21-Oct-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Allow folding of sgpr to vgpr copy
Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xe
[AMDGPU] Allow folding of sgpr to vgpr copy
Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xexec permutations.
Differential Revision: https://reviews.llvm.org/D69280
show more ...
|
#
48f57138 |
| 22-Oct-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Allow tied operand subreg folding
Turns out it makes sense, contrarily to what comment said.
Differential Revision: https://reviews.llvm.org/D69287
|
#
8ebbf25c |
| 21-Oct-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Erase redundant redefs of m0 in SIFoldOperands
Only handle simple inter-block redefs of m0 to the same value. This avoids interference from redefs of m0 in SILoadStoreOptimzer. I was initial
AMDGPU: Erase redundant redefs of m0 in SIFoldOperands
Only handle simple inter-block redefs of m0 to the same value. This avoids interference from redefs of m0 in SILoadStoreOptimzer. I was initially teaching that pass to ignore redefs of m0, but having them not exist beforehand is much simpler.
This is in preparation for deleting the current special m0 handling in SIFixSGPRCopies to allow the register coalescer to handle the difficult cases.
llvm-svn: 375449
show more ...
|
#
e5be543a |
| 20-Oct-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Increase vcc liveness scan threshold
Avoids a test regression in a future patch. Also add debug printing on this case, so I waste less time debugging folds in the future.
llvm-svn: 375367
|
#
f8bf7d7f |
| 09-Oct-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't fold copies to physregs
In a future patch, this will help cleanup m0 handling.
The register coalescer handles copies from a register that materializes an immediate, but doesn't handle
AMDGPU: Don't fold copies to physregs
In a future patch, this will help cleanup m0 handling.
The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy.
llvm-svn: 374257
show more ...
|
#
565b1d3d |
| 30-Sep-2019 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition
Reviewers: rampitec
Differential Revision: https://reviews.llvm.org/D67662
llvm-svn: 373221
|
#
d3b2b971 |
| 25-Sep-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx10 v_fmac_f16 operand folding
Fold immediates into v_fmac_f16.
Differential Revision: https://reviews.llvm.org/D68037
llvm-svn: 372906
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
#
8fe1245a |
| 23-Aug-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug
gfx908 ignores an mfma if SrcC is a literal.
Differential Revision: https://reviews.llvm.org/D66670
llvm-svn: 369816
|
#
78347c97 |
| 21-Aug-2019 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec
llvm-svn: 369532
|
#
0c476111 |
| 15-Aug-2019 |
Daniel Sanders <daniel_l_sanders@apple.com> |
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Re
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
show more ...
|
#
10db641a |
| 13-Aug-2019 |
Tim Renouf <tpr.llvm@botech.co.uk> |
[AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'
That change (r363670) could leave a copy from vgpr to sgpr. Fixed.
Differential Revision: https://reviews.llvm.org/D66133
Change-Id: I00c3f
[AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'
That change (r363670) could leave a copy from vgpr to sgpr. Fixed.
Differential Revision: https://reviews.llvm.org/D66133
Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4 llvm-svn: 368736
show more ...
|
Revision tags: llvmorg-9.0.0-rc2 |
|
#
2bea69bf |
| 01-Aug-2019 |
Daniel Sanders <daniel_l_sanders@apple.com> |
Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init |
|
#
27ec195f |
| 12-Jul-2019 |
Jay Foad <jay.foad@gmail.com> |
[AMDGPU] Fix DPP combiner check for exec modification
Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it check
[AMDGPU] Fix DPP combiner check for exec modification
Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer.
Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned.
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64393
llvm-svn: 365910
show more ...
|
#
e67cc380 |
| 11-Jul-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx908 mfma support
Differential Revision: https://reviews.llvm.org/D64584
llvm-svn: 365824
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
#
2710171a |
| 25-Jun-2019 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Write LDS objects out as global symbols in code generation
Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then
AMDGPU: Write LDS objects out as global symbols in code generation
Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted.
Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader.
Some notes:
- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied.
The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now.
- We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check.
Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61494
llvm-svn: 364297
show more ...
|
#
60957cb7 |
| 24-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fold frame index into MUBUF
This matters for byval uses outside of the entry block, which appear as copies.
Previously, the only folding done was during selection, which could not see the u
AMDGPU: Fold frame index into MUBUF
This matters for byval uses outside of the entry block, which appear as copies.
Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset.
This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames.
llvm-svn: 364185
show more ...
|
#
4d000d24 |
| 19-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix folding immediate into readfirstlane through reg_sequence
The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conserva
AMDGPU: Fix folding immediate into readfirstlane through reg_sequence
The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded.
For some reason copies aren't making it through the reg_sequence, although they should.
llvm-svn: 363876
show more ...
|
#
f39f3bd0 |
| 18-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Change API for checking for exec modification
Invert the name and return value to better reflect the imprecise nature.
Force passing in the DefMI, since it's known in the 2 users and could
AMDGPU: Change API for checking for exec modification
Invert the name and return value to better reflect the imprecise nature.
Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg.
Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely.
Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order.
llvm-svn: 363675
show more ...
|
#
bcb5ea00 |
| 18-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fold readlane from copy of SGPR or imm
These may be inserted to assert uniformity somewhere.
llvm-svn: 363670
|
#
e75e197a |
| 18-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove unnecessary check for virtual register
The copy was found by searching the uses of a virtual register, so it's already known to be virtual.
llvm-svn: 363669
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
#
cfd0ca38 |
| 03-May-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch
llvm-svn: 359898
|