#
a496c8be |
| 26-Jul-2023 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
show more ...
|
Revision tags: llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5 |
|
#
7a98f084 |
| 17-May-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which is an unproblematic case.
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
#
b4a62b1f |
| 07-Jul-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Enable whole wave register copy
So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patc
[AMDGPU] Enable whole wave register copy
So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patch, we want the allocator to efficiently allocate them rather than using the custom pre-allocation pass.
Any liverange split of virtual registers involved in whole-wave operations require the resulting COPY introduced with the split to be performed for all lanes. It isn't implemented in the compiler yet.
This patch would identify all such copies and manipulate the exec mask around them to enable all lanes without affecting the value of exec mask elsewhere.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D143762
show more ...
|
#
b78b36e1 |
| 07-Jul-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spil
[AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spill loads and restores should be activated for all lanes by temporarily flipping all bits in exec register to one just before the spills. It is not implemented in the compiler as of today and this patch enables the necessary support.
This is a pre-patch before the SGPR spill to virtual VGPR lanes that would eventually causes the whole wave register spills during allocation.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D143759
show more ...
|
#
853b2a84 |
| 29-Jun-2023 |
Brendon Cahoon <brendon.cahoon@amd.com> |
[AMDGPU] Reserve SGPR pair when long branches are present
Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register sca
[AMDGPU] Reserve SGPR pair when long branches are present
Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register scavanger may not find available registers, which causes a “did not find scavenging index” assert to occur in assignRegToScavengingIndex.
In this patch, we estimate before register allocation whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. The distance threshold is an approximation as the exact code size and branch distance are unknown prior to register allocation.
Patch by Corbin Robeck. Thanks!
Differential Review: https://reviews.llvm.org/D149775
show more ...
|
#
aa144fbe |
| 19-May-2023 |
Kazu Hirata <kazu@google.com> |
[AMDGPU] Fix warnings
This patch fixes warnings like:
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h:711: warning: enumerated and non-enumerated type in conditional expression
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4 |
|
#
7ac3ab34 |
| 05-Mar-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix missing MIR serialization for PSInputAddr/PSInputEnable
Resuming any mir test for a pixel shader would assert in the AsmPrinter.
|
#
7ada7bbe |
| 15-Mar-2023 |
Kazu Hirata <kazu@google.com> |
[Target] Use *{Set,Map}::contains (NFC)
|
#
dcb83484 |
| 23-Feb-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget.
[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget.
Differential Revision: https://reviews.llvm.org/D144650
show more ...
|
Revision tags: llvmorg-16.0.0-rc3 |
|
#
1c9e6238 |
| 10-Feb-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Allow architected SGPRs for workgroup IDs
Some subtargets use architected SGPRs for workgroup IDs instead of the regular SGPRs. This patch enables the support for the same and is guarded un
[AMDGPU] Allow architected SGPRs for workgroup IDs
Some subtargets use architected SGPRs for workgroup IDs instead of the regular SGPRs. This patch enables the support for the same and is guarded under the subtarget feature FeatureArchitectedSGPRs.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D143707
show more ...
|
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
38818b60 |
| 04-Jan-2023 |
serge-sans-paille <sguelton@mozilla.com> |
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
show more ...
|
#
4463badf |
| 06-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use DenormalMode type in FP mode tracking
This simplies a future patch. The MIR handling should be fixed. We're still printing these in custom MachineFunctionInfo as bools (plus the inverted
AMDGPU: Use DenormalMode type in FP mode tracking
This simplies a future patch. The MIR handling should be fixed. We're still printing these in custom MachineFunctionInfo as bools (plus the inverted meaning is hard to follow).
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
#
69e75ae6 |
| 18-Jun-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on w
CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on where this is first called, you can conclude different information based on the MachineFunction. For example, the AMDGPU implementation inspected the MachineFrameInfo on construction for the stack objects and if the frame has calls. This kind of worked in SelectionDAG which visited all allocas up front, but broke in GlobalISel which hasn't visited any of the IR when arguments are lowered.
I've run into similar problems before with the MIR parser and trying to make use of other MachineFunction fields, so I think it's best to just categorically disallow dependency on the MachineFunction state in the constructor and to always construct this at the same time as the MachineFunction itself.
A missing feature I still could use is a way to access an custom analysis pass on the IR here.
show more ...
|
#
a3028239 |
| 21-Dec-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
|
#
40ba0942 |
| 14-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case.
This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
#
29247824 |
| 30-Sep-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPr
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPrologue/emitEpilogue require the exec bits flipping to avoid clobbering the inactive lanes of VGPRs used for SGPR spilling. Currently, these spill instructions are referenced from the SP at function entry and when the callee performs a stack realignment, they ended up getting incorrect stack offsets. Even if we try to adjust the offsets, the FP-SP becomes a runtime entity with dynamic stack realignment and the offsets would still be inaccurate.
To fix it, use FP as the frame base in the spill instructions whenever the function has FP. The offsets obtained for the CS objects would always be the right values from FP.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134949
show more ...
|
#
7a72a935 |
| 23-Sep-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Preserve only the inactive lanes of scratch vgprs
In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can
[AMDGPU] Preserve only the inactive lanes of scratch vgprs
In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can potentially have its inactive lanes overwritten by the writelane instructions. When the function returns, it can cause unexpected behavior if the VGPR value is not preserved appropriately.
The current scheme to preserve the inactive lanes of such scratch VGPRs is not done rightly. It preserves all lanes and causes the outgoing values (if any) getting overwritten by the epilog restores. It then corrupts the return value.
To avoid such situation with scratch VGPRs, this patch ensures we preserve only their inactive lanes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134526
show more ...
|
#
20a940f1 |
| 18-Aug-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores
There is a lot of customization and eventually code duplication in the frame lowering that handles special SGPR spills like the one
[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores
There is a lot of customization and eventually code duplication in the frame lowering that handles special SGPR spills like the one needed for the Frame Pointer. Incorporating any additional SGPR spill currently makes it difficult during PEI. This patch introduces a new spill builder to efficiently handle such spill requirements. Various spill methods are special handled using a separate class.
Reviewed By: sebastian-ne, scott.linder
Differential Revision: https://reviews.llvm.org/D132436
show more ...
|
#
b25b4c0a |
| 13-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a commo
[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a common function used in both places to find the free VGPR lane. This patch eliminates that dependency to find the free VGPR by handling it separately for PEI. It is a prerequisite patch for a future work to allow SGPR spills to virtual VGPR lanes during SILowerSGPRSpills.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124195
show more ...
|
#
af5e5c40 |
| 19-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Add WWM reserved VGPRs to WWMSpills
The custom VGPR spills inserted during frame lowering maintain a separate list for WWM reserved registers. Added them into WWMSpills that already tracks
[AMDGPU] Add WWM reserved VGPRs to WWMSpills
The custom VGPR spills inserted during frame lowering maintain a separate list for WWM reserved registers. Added them into WWMSpills that already tracks such reserved registers. It unifies the spill insertion.
Reviewed By: nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D124193
show more ...
|
#
5692a7e8 |
| 13-Jun-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Callee must always spill writelane VGPRs
Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if
[AMDGPU] Callee must always spill writelane VGPRs
Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if it was marked Caller-saved.
Reviewed By: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D124192
show more ...
|
#
67819a72 |
| 13-Dec-2022 |
Fangrui Song <i@maskray.me> |
[CodeGen] llvm::Optional => std::optional
|
#
c589730a |
| 05-Dec-2022 |
Krzysztof Parzyszek <kparzysz@quicinc.com> |
[YAML] Convert Optional to std::optional
|
#
b7f44f7c |
| 29-Nov-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU: Remove ImagePSV and move images to addrspace 7
Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU: Remove BufferPseudoSourceValue")
It is unclear what exactly the right
AMDGPU: Remove ImagePSV and move images to addrspace 7
Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU: Remove BufferPseudoSourceValue")
It is unclear what exactly the right address space for images should be. They seem morally closest to buffers, so that's what I went with. In practical terms, address space 7 is better than address space 0 because it can't alias with LDS.
Differential Revision: https://reviews.llvm.org/D138949
show more ...
|
#
43b86bf9 |
| 25-Nov-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is c
AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is clearly not true.
Instead, build MachineMemOperands without a pointer value but with an address space, so that address space-based alias analysis can still work.
There is a lot of test churn because previously address space 4 (constant address space) was used as an address space for buffer intrinsics. This doesn't make much sense and seems to have been an accident -- see the change in AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.
Differential Revision: https://reviews.llvm.org/D138711
show more ...
|