Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3 |
|
#
ad4a582f |
| 18-Oct-2024 |
Alex Rønne Petersen <alex@alexrp.com> |
[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that
[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
show more ...
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3 |
|
#
f6610578 |
| 09-Feb-2024 |
Jan Patrick Lehr <JanPatrick.Lehr@amd.com> |
Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)
Reverts llvm/llvm-project#79586
This broke the AMDGPU OpenMP Offload buildbot.
The
Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)
Reverts llvm/llvm-project#79586
This broke the AMDGPU OpenMP Offload buildbot.
The typical error message was that the GPU attempted to read beyong the
largest legal address.
Error message:
AMDGPU fatal error 1: Received error in queue 0x7f8363f22000:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to
access memory beyond the largest legal address.
show more ...
|
#
88e52511 |
| 08-Feb-2024 |
alex-t <alex-t@users.noreply.github.com> |
[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)
This change implements synthesizing the private buffer resource
descriptor in the kernel prolo
[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)
This change implements synthesizing the private buffer resource
descriptor in the kernel prolog instead of using the preloaded kernel
argument.
show more ...
|
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2 |
|
#
6d9b9631 |
| 21-Sep-2023 |
Pranav Taneja <138754701+prtaneja@users.noreply.github.com> |
[AMDGPU] [SIFrameLowering] Use LiveRegUnits instead of LivePhysRegs.
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
b78b36e1 |
| 07-Jul-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spil
[AMDGPU] Implement whole wave register spill
To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spill loads and restores should be activated for all lanes by temporarily flipping all bits in exec register to one just before the spills. It is not implemented in the compiler as of today and this patch enables the necessary support.
This is a pre-patch before the SGPR spill to virtual VGPR lanes that would eventually causes the whole wave register spills during allocation.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D143759
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
a3028239 |
| 21-Dec-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
#
40ba0942 |
| 14-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case.
This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
#
29247824 |
| 30-Sep-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPr
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPrologue/emitEpilogue require the exec bits flipping to avoid clobbering the inactive lanes of VGPRs used for SGPR spilling. Currently, these spill instructions are referenced from the SP at function entry and when the callee performs a stack realignment, they ended up getting incorrect stack offsets. Even if we try to adjust the offsets, the FP-SP becomes a runtime entity with dynamic stack realignment and the offsets would still be inaccurate.
To fix it, use FP as the frame base in the spill instructions whenever the function has FP. The offsets obtained for the CS objects would always be the right values from FP.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134949
show more ...
|
#
20a940f1 |
| 18-Aug-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores
There is a lot of customization and eventually code duplication in the frame lowering that handles special SGPR spills like the one
[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores
There is a lot of customization and eventually code duplication in the frame lowering that handles special SGPR spills like the one needed for the Frame Pointer. Incorporating any additional SGPR spill currently makes it difficult during PEI. This patch introduces a new spill builder to efficiently handle such spill requirements. Various spill methods are special handled using a separate class.
Reviewed By: sebastian-ne, scott.linder
Differential Revision: https://reviews.llvm.org/D132436
show more ...
|
#
5bd87350 |
| 21-Apr-2022 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unus
[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one exist.
Fixes SWDEV-330006.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123525
show more ...
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
41bfac6a |
| 02-Jan-2022 |
Kazu Hirata <kazu@google.com> |
[Target] Remove unused forward declarations (NFC)
|
Revision tags: llvmorg-13.0.1-rc1 |
|
#
273a0c8b |
| 04-Nov-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the em
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only.
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
d57bba7c |
| 04-Nov-2020 |
Sander de Smalen <sander.desmalen@arm.com> |
[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset u
[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets.
The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack.
This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D90018
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2 |
|
#
ec8c172d |
| 28-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Correct prolog SP initialization logic
Having callees that will read SP is not the only reason we need to reference the stack pointer.
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
f7060f4f |
| 30-Apr-2020 |
Ram Nalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer
Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers c
For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer
Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers clobber preloaded Scratch Wave Offset register, copy the Scratch Wave Offset register to a free SGPR.
show more ...
|
#
2481f26a |
| 07-Apr-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Use Register in TargetFrameLowering
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
#
60b1967c |
| 21-Jan-2020 |
Scott Linder <Scott.Linder@amd.com> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
#
30bb113b |
| 24-Feb-2020 |
Scott Linder <Scott.Linder@amd.com> |
[AMDGPU][NFC] Refactor emitEntryFunctionPrologue
Remove dead code and factor repeated conditions out into a single check. Rename and move code to make it more obvious what is running only for entry
[AMDGPU][NFC] Refactor emitEntryFunctionPrologue
Remove dead code and factor repeated conditions out into a single check. Rename and move code to make it more obvious what is running only for entry functions. Simplify function arguments to make it clearer what the relevant inputs are. Make flat scratch init accept an MBB iterator and move it to where it was logically being emitted within the prologue.
These changes will make a future update to the calling convention simpler.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75092
show more ...
|
#
805c157e |
| 21-Jan-2020 |
Guillaume Chatelet <gchatelet@google.com> |
[Alignment][NFC] Deprecate Align::None()
Summary: This is a follow up on https://reviews.llvm.org/D71473#inline-647262. There's a caveat here that `Align(1)` relies on the compiler understanding of
[Alignment][NFC] Deprecate Align::None()
Summary: This is a follow up on https://reviews.llvm.org/D71473#inline-647262. There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
show more ...
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
3cc4835c |
| 21-Oct-2019 |
Guillaume Chatelet <gchatelet@google.com> |
Use Align for TFL::TransientStackAlignment
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/
Use Align for TFL::TransientStackAlignment
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69216
llvm-svn: 375398
show more ...
|
#
882c43d7 |
| 17-Oct-2019 |
Guillaume Chatelet <gchatelet@google.com> |
[Alignment][NFC] Use Align for TargetFrameLowering/Subtarget
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/l
[Alignment][NFC] Use Align for TargetFrameLowering/Subtarget
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68993
llvm-svn: 375084
show more ...
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init |
|
#
b3f967d4 |
| 16-Jul-2019 |
Michael Liao <michael.hliao@gmail.com> |
[AMDGPU] Add the adjusted FP as a livein register.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
D
[AMDGPU] Add the adjusted FP as a livein register.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64145
llvm-svn: 366223
show more ...
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
#
71dfb7ec |
| 08-Jul-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make s34 the FP register
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame r
AMDGPU: Make s34 the FP register
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog.
If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort.
This also doesn't attempt to handle SGPR spilling with scalar stores.
llvm-svn: 365372
show more ...
|
#
5b0922fe |
| 03-Jul-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add pass to lower SGPR spills
This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining Li
AMDGPU: Add pass to lower SGPR spills
This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining LiveIntervals).
This simplifies making the frame pointer register callee saved. As it is now, the code to determine callee saves needs to predict all the possible SGPR spills and how many callee saved VGPRs are needed. By handling this before PrologEpilogInserter, it's possible to just check the spill objects that already exist.
Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04 llvm-svn: 365095
show more ...
|