SIFoldOperands.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 4c0251da	25-Oct-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Enable SGPR copy folding That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IM [AMDGPU] Enable SGPR copy folding That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445 show more ...
# c7dcacf1	25-Oct-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Fixed asan failure in SIFoldOperands Both tryFoldOMod() and tryFoldClamp() remove original instruction, so the check MI.modifiesRegister() may use a deleted MI. Differential Revision: http [AMDGPU] Fixed asan failure in SIFoldOperands Both tryFoldOMod() and tryFoldClamp() remove original instruction, so the check MI.modifiesRegister() may use a deleted MI. Differential Revision: https://reviews.llvm.org/D69448 show more ...
# d4303b38	24-Oct-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Fold AGPR reg_sequence initializers Differential Revision: https://reviews.llvm.org/D69413
# b2a65f0d	23-Oct-2019	Michael Liao <michael.hliao@gmail.com>	[AMDGPU] Skip additional folding on the same operand. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm [AMDGPU] Skip additional folding on the same operand. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355 show more ...
# 61e7a61b	21-Oct-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Allow folding of sgpr to vgpr copy Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xe [AMDGPU] Allow folding of sgpr to vgpr copy Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xexec permutations. Differential Revision: https://reviews.llvm.org/D69280 show more ...
# 48f57138	22-Oct-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Allow tied operand subreg folding Turns out it makes sense, contrarily to what comment said. Differential Revision: https://reviews.llvm.org/D69287
# 8ebbf25c	21-Oct-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Erase redundant redefs of m0 in SIFoldOperands Only handle simple inter-block redefs of m0 to the same value. This avoids interference from redefs of m0 in SILoadStoreOptimzer. I was initial AMDGPU: Erase redundant redefs of m0 in SIFoldOperands Only handle simple inter-block redefs of m0 to the same value. This avoids interference from redefs of m0 in SILoadStoreOptimzer. I was initially teaching that pass to ignore redefs of m0, but having them not exist beforehand is much simpler. This is in preparation for deleting the current special m0 handling in SIFixSGPRCopies to allow the register coalescer to handle the difficult cases. llvm-svn: 375449 show more ...
# e5be543a	20-Oct-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Increase vcc liveness scan threshold Avoids a test regression in a future patch. Also add debug printing on this case, so I waste less time debugging folds in the future. llvm-svn: 375367
# f8bf7d7f	09-Oct-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't fold copies to physregs In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle AMDGPU: Don't fold copies to physregs In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257 show more ...
# 565b1d3d	30-Sep-2019	Alexander Timofeev <Alexander.Timofeev@amd.com>	[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221
# d3b2b971	25-Sep-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] gfx10 v_fmac_f16 operand folding Fold immediates into v_fmac_f16. Differential Revision: https://reviews.llvm.org/D68037 llvm-svn: 372906
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# 8fe1245a	23-Aug-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 llvm-svn: 369816
# 78347c97	21-Aug-2019	Alexander Timofeev <Alexander.Timofeev@amd.com>	[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532
# 0c476111	15-Aug-2019	Daniel Sanders <daniel_l_sanders@apple.com>	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Re Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041 show more ...
# 10db641a	13-Aug-2019	Tim Renouf <tpr.llvm@botech.co.uk>	[AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm' That change (r363670) could leave a copy from vgpr to sgpr. Fixed. Differential Revision: https://reviews.llvm.org/D66133 Change-Id: I00c3f [AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm' That change (r363670) could leave a copy from vgpr to sgpr. Fixed. Differential Revision: https://reviews.llvm.org/D66133 Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4 llvm-svn: 368736 show more ...
Revision tags: llvmorg-9.0.0-rc2
# 2bea69bf	01-Aug-2019	Daniel Sanders <daniel_l_sanders@apple.com>	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init
# 27ec195f	12-Jul-2019	Jay Foad <jay.foad@gmail.com>	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it check [AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910 show more ...
# e67cc380	11-Jul-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3
# 2710171a	25-Jun-2019	Nicolai Haehnle <nhaehnle@gmail.com>	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297 show more ...
# 60957cb7	24-Jun-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fold frame index into MUBUF This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the u AMDGPU: Fold frame index into MUBUF This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185 show more ...
# 4d000d24	19-Jun-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conserva AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876 show more ...
# f39f3bd0	18-Jun-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Change API for checking for exec modification Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could AMDGPU: Change API for checking for exec modification Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. llvm-svn: 363675 show more ...
# bcb5ea00	18-Jun-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fold readlane from copy of SGPR or imm These may be inserted to assert uniformity somewhere. llvm-svn: 363670
# e75e197a	18-Jun-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Remove unnecessary check for virtual register The copy was found by searching the uses of a virtual register, so it's already known to be virtual. llvm-svn: 363669
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# cfd0ca38	03-May-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Support shrinking add with FI in SIFoldOperands Avoids test regression in a future patch llvm-svn: 359898
1 2 3 4 567 8 9 10