Revision tags: llvmorg-21-init |
|
#
11b04019 |
| 24-Jan-2025 |
Aaditya <115080342+easyonaadit@users.noreply.github.com> |
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same amount in the epilog.
Prolog:
sp += frameSize
Epilog:
sp -= frameSize
If a function has dynamic stack realignment,
Prolog:
sp += frameSize + max_alignment
Epilog:
sp -= frameSize + max_alignment
These calculations are not optimal in case of dynamic
stack realignment, and completely fail in case of
dynamic stack readjustment.
This patch uses the saved Frame Pointer to restore SP.
Prolog:
fp = sp
sp += frameSize
Epilog:
sp = fp
In case of dynamic stack realignment, SP is restored from
the saved Base Pointer.
Prolog:
fp = sp + (max_alignment - 1)
fp = fp & (-max_alignment)
bp = sp
sp += frameSize + max_alignment
Epilog:
sp = bp
(Note: The presence of BP has been enforced in case of any
dynamic stack realignment.)
---------
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
show more ...
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
facead61 |
| 28-Nov-2023 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU] PromoteAlloca - bail always if load/store is volatile (#73228)
This change is addressing case where alloca size is the same as
load/store size.
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
#
4d42e8b5 |
| 28-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc2
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.
show more ...
|
#
a496c8be |
| 26-Jul-2023 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
show more ...
|
Revision tags: llvmorg-18-init |
|
#
3890a3b1 |
| 28-Jun-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Use SSAUpdater in PromoteAlloca
This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly.
Note Promot
[AMDGPU] Use SSAUpdater in PromoteAlloca
This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly.
Note PromoteAlloca is still reliant on SROA running first to canonicalize the IR. For instance, PromoteAlloca will no longer handle aggregate types because those should be simplified by SROA before reaching the pass.
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D152706
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
#
7a98f084 |
| 17-May-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which is an unproblematic case.
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
#
7007b993 |
| 28-Jun-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
Revert "[AMDGPU] Use SSAUpdater in PromoteAlloca"
This reverts commit 091bfa76db64fbe96d0e53d99b2068cc05f6aa16.
|
#
091bfa76 |
| 13-Jun-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Use SSAUpdater in PromoteAlloca
This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly.
Note Promot
[AMDGPU] Use SSAUpdater in PromoteAlloca
This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly.
Note PromoteAlloca is still reliant on SROA running first to canonicalize the IR. For instance, PromoteAlloca will no longer handle aggregate types because those should be simplified by SROA before reaching the pass.
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D152706
show more ...
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
a3028239 |
| 21-Dec-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
#
40ba0942 |
| 14-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case.
This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
#
29247824 |
| 30-Sep-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPr
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPrologue/emitEpilogue require the exec bits flipping to avoid clobbering the inactive lanes of VGPRs used for SGPR spilling. Currently, these spill instructions are referenced from the SP at function entry and when the callee performs a stack realignment, they ended up getting incorrect stack offsets. Even if we try to adjust the offsets, the FP-SP becomes a runtime entity with dynamic stack realignment and the offsets would still be inaccurate.
To fix it, use FP as the frame base in the spill instructions whenever the function has FP. The offsets obtained for the CS objects would always be the right values from FP.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134949
show more ...
|