|
Revision tags: llvmorg-21-init |
|
| #
11b04019 |
| 24-Jan-2025 |
Aaditya <115080342+easyonaadit@users.noreply.github.com> |
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same amount in the epilog.
Prolog:
sp += frameSize
Epilog:
sp -= frameSize
If a function has dynamic stack realignment,
Prolog:
sp += frameSize + max_alignment
Epilog:
sp -= frameSize + max_alignment
These calculations are not optimal in case of dynamic
stack realignment, and completely fail in case of
dynamic stack readjustment.
This patch uses the saved Frame Pointer to restore SP.
Prolog:
fp = sp
sp += frameSize
Epilog:
sp = fp
In case of dynamic stack realignment, SP is restored from
the saved Base Pointer.
Prolog:
fp = sp + (max_alignment - 1)
fp = fp & (-max_alignment)
bp = sp
sp += frameSize + max_alignment
Epilog:
sp = bp
(Note: The presence of BP has been enforced in case of any
dynamic stack realignment.)
---------
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
show more ...
|
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
230c13d5 |
| 24-Jan-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to a
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.
This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.
show more ...
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
| #
4d42e8b5 |
| 28-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc2
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.
show more ...
|
| #
a496c8be |
| 26-Jul-2023 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
show more ...
|
|
Revision tags: llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5 |
|
| #
7a98f084 |
| 17-May-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which is an unproblematic case.
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
| #
f2c164c8 |
| 21-Jun-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in t
[AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns.
Differential Revision: https://reviews.llvm.org/D153537
show more ...
|
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
a3028239 |
| 21-Dec-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
|
| #
262c2c0f |
| 19-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Update some tests to use opaque pointers
vectorize-buffer-fat-pointer.ll required a manual check line fix. vector-alloca-addrspacecast.ll required a manual fixup of a check line. partial-reg
AMDGPU: Update some tests to use opaque pointers
vectorize-buffer-fat-pointer.ll required a manual check line fix. vector-alloca-addrspacecast.ll required a manual fixup of a check line. partial-regcopy-and-spill-missed-at-regalloc.ll required re-running update_mir_test_checks. The HSA metadata tests required avoiding the script touching the type name in the metadata.
annotate-noclobber.ll ran into one update script bug. It deleted a check line with a 0 offset GEP, moving the following -NEXT check logically up one line.
show more ...
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
40ba0942 |
| 14-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case.
This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
| #
29247824 |
| 30-Sep-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPr
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPrologue/emitEpilogue require the exec bits flipping to avoid clobbering the inactive lanes of VGPRs used for SGPR spilling. Currently, these spill instructions are referenced from the SP at function entry and when the callee performs a stack realignment, they ended up getting incorrect stack offsets. Even if we try to adjust the offsets, the FP-SP becomes a runtime entity with dynamic stack realignment and the offsets would still be inaccurate.
To fix it, use FP as the frame base in the spill instructions whenever the function has FP. The offsets obtained for the CS objects would always be the right values from FP.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134949
show more ...
|
| #
b25b4c0a |
| 13-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a commo
[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a common function used in both places to find the free VGPR lane. This patch eliminates that dependency to find the free VGPR by handling it separately for PEI. It is a prerequisite patch for a future work to allow SGPR spills to virtual VGPR lanes during SILowerSGPRSpills.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124195
show more ...
|
| #
7a846240 |
| 27-Sep-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make various vector undefs legal
Surprisingly these were getting legalized to something zero initialized.
This fixes an infinite loop when combining some vector types. Also fixes zero initi
AMDGPU: Make various vector undefs legal
Surprisingly these were getting legalized to something zero initialized.
This fixes an infinite loop when combining some vector types. Also fixes zero initializing some undef values.
SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking for the legality of the output undefs they are replacing unused operations with. This resulted in turning vectors into undefs that were later re-legalized back into zero vectors.
show more ...
|
| #
5cae8816 |
| 06-Jul-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Add GFX11 test coverage
Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10.
[AMDGPU] Add GFX11 test coverage
Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10.
Differential Revision: https://reviews.llvm.org/D129295
show more ...
|
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
| #
04fff547 |
| 07-Mar-2022 |
Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
c7b71ace |
| 19-Jan-2022 |
Konstantina <Konstantina.Mitropoulou@amd.com> |
[AMDGPU][NFC] Add autogenerated tests for vgpr-tuple-allocation.ll
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D117692
|
|
Revision tags: llvmorg-13.0.1-rc2 |
|
| #
09b53296 |
| 22-Dec-2021 |
Ron Lieberman <Ron.Lieberman@amd.com> |
Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.
Failed amdgpu runtime buildbot # 3514
|
| #
9075009d |
| 22-Dec-2021 |
RamNalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
729bf9b2 |
| 14-Aug-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on
AMDGPU: Enable fixed function ABI by default
Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway.
Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.
show more ...
|
| #
976f3b3c |
| 24-Nov-2021 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Only allow implicit WQM in pixel shaders
Implicit derivatives are only valid in pixel shaders, hence only implicitly enable WQM for pixel shaders. This avoids unintended WQM in other shader
[AMDGPU] Only allow implicit WQM in pixel shaders
Implicit derivatives are only valid in pixel shaders, hence only implicitly enable WQM for pixel shaders. This avoids unintended WQM in other shader types (e.g. compute) when image sampling instructions are used.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114414
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
| #
eebe841a |
| 26-Sep-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know a
RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point.
Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run.
One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work.
Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.
show more ...
|
| #
f8816c74 |
| 08-Jun-2021 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Add v5f32/VReg_160 support for MIMG instructions
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are required for a MIMG address operand.
Maintain _V8 instruction variants of
[AMDGPU] Add v5f32/VReg_160 support for MIMG instructions
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are required for a MIMG address operand.
Maintain _V8 instruction variants of pseudo instructions allowing assembly prior to GFX10 to work as-is. Currently the validator can tell for GFX10 what the correct size is, so will disallow oversize address registers.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103672
show more ...
|
| #
5682ae2f |
| 25-Mar-2021 |
madhur13490 <Madhur.Amilkanthwar@amd.com> |
[AMDGPU] Set implicit arg attributes for indirect calls
This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken.
Once su
[AMDGPU] Set implicit arg attributes for indirect calls
This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken.
Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D99347
show more ...
|
| #
3c297a25 |
| 10-Feb-2021 |
madhur13490 <Madhur.Amilkanthwar@amd.com> |
Make fixed-abi default for AMD HSA OS
fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple.
Reviewed
Make fixed-abi default for AMD HSA OS
fixed-abi uses pre-defined and predictable SGPR/VGPRs for passing arguments. This patch makes this scheme default when HSA OS is specified in triple.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D96340
show more ...
|
| #
d28624a2 |
| 01-Dec-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop adding an implicit def of vcc_hi for wave32
This doesn't seem to be needed for anything.
Differential Revision: https://reviews.llvm.org/D92400
|
| #
58de4b20 |
| 29-Oct-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Use pseudo instructions for readlane/writelane
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".
All the codegen changes are caused by
[AMDGPU] Use pseudo instructions for readlane/writelane
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".
All the codegen changes are caused by the post-RA scheduler no longer treating readlane/writelane as scheduling barriers due to having unmodelled side effects. (The pseudos are hasSideEffects = 0, but the real instructions are hasSideEffects = ? which TableGen conservatively treats as 1.)
Differential Revision: https://reviews.llvm.org/D90401
show more ...
|