Revision tags: llvmorg-21-init |
|
#
11b04019 |
| 24-Jan-2025 |
Aaditya <115080342+easyonaadit@users.noreply.github.com> |
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same amount in the epilog.
Prolog:
sp += frameSize
Epilog:
sp -= frameSize
If a function has dynamic stack realignment,
Prolog:
sp += frameSize + max_alignment
Epilog:
sp -= frameSize + max_alignment
These calculations are not optimal in case of dynamic
stack realignment, and completely fail in case of
dynamic stack readjustment.
This patch uses the saved Frame Pointer to restore SP.
Prolog:
fp = sp
sp += frameSize
Epilog:
sp = fp
In case of dynamic stack realignment, SP is restored from
the saved Base Pointer.
Prolog:
fp = sp + (max_alignment - 1)
fp = fp & (-max_alignment)
bp = sp
sp += frameSize + max_alignment
Epilog:
sp = bp
(Note: The presence of BP has been enforced in case of any
dynamic stack realignment.)
---------
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
show more ...
|
Revision tags: llvmorg-19.1.7 |
|
#
1a935d7a |
| 14-Jan-2025 |
Guy David <49722543+guy-david@users.noreply.github.com> |
[llvm] Mark scavenging spill-slots as *spilled* stack objects. (#122673)
This seems like an oversight when copying code from other backends.
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
09c41246 |
| 20-Nov-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[AMDGPU] Fix restores in chain functions (#116193)
When spilling a VGPR in `emitPrologue`, chain functions prefer to use
offsets to access the stack instead of the SP.
This patch fixes `emitEpil
[AMDGPU] Fix restores in chain functions (#116193)
When spilling a VGPR in `emitPrologue`, chain functions prefer to use
offsets to access the stack instead of the SP.
This patch fixes `emitEpilogue` to do the same. It also brings back some
test coverage that was lost in #93526, when WWM registers started being
shifted to the lowest available range (which meant that tests that were
originally spilling v8 would shift to spill v0, which is a scratch
register for chain functions and didn't get spilled).
Change-Id: Icb07fccd859b563cd45f74c25ae578ecb38bdeeb
show more ...
|
Revision tags: llvmorg-19.1.4, llvmorg-19.1.3 |
|
#
ad4a582f |
| 18-Oct-2024 |
Alex Rønne Petersen <alex@alexrp.com> |
[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that
[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
show more ...
|
Revision tags: llvmorg-19.1.2 |
|
#
8d13e7b8 |
| 03-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
Revision tags: llvmorg-19.1.1 |
|
#
ac0f64f0 |
| 30-Sep-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There a
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There are
times when regalloc reuses the registers of regular
VGPRs operations for wwm-operations in a small range
leading to unwantedly clobbering their inactive lanes
causing correctness issues that are hard to trace.
This patch splits the VGPR allocation pipeline further
to allocate wwm-registers first and the regular VGPR
operands in a separate pipeline. The splitting would
ensure that the physical registers used for wwm
allocations won't take part in the next allocation
pipeline to avoid any such clobbering.
show more ...
|
#
23487be4 |
| 26-Sep-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Merge the conditions used for deciding CS spills for amdgpu_cs_chain[_preserve] (#109911)
Multiple conditions exist to decide whether callee save spills/restores
are required for amdgpu_cs
[AMDGPU] Merge the conditions used for deciding CS spills for amdgpu_cs_chain[_preserve] (#109911)
Multiple conditions exist to decide whether callee save spills/restores
are required for amdgpu_cs_chain or amdgpu_cs_chain_preserve calling
conventions. This patch consolidates them all and moves to a single
place.
show more ...
|
#
3659aa80 |
| 24-Sep-2024 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Fix handling of DBG_VALUE_LIST while fixing the dead frame indices. (#109685)
Both SGPR->VGPR and VGPR->AGPR spilling code give a fixup to the spill
frame indices referred in debug instruc
[AMDGPU] Fix handling of DBG_VALUE_LIST while fixing the dead frame indices. (#109685)
Both SGPR->VGPR and VGPR->AGPR spilling code give a fixup to the spill
frame indices referred in debug instructions so that they can be
entirely removed. The stack argument is present at 0th index in
DBG_VALUE and at 2nd index for DBG_VALUE_LIST.
Fixes: SWDEV-484156
show more ...
|
#
cee0bf96 |
| 20-Sep-2024 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Use Lo_32 and Hi_32 helpers (NFC) (#109413)
|
Revision tags: llvmorg-19.1.0 |
|
#
33562085 |
| 13-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.
The problem was a
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.
The problem was a conflict with
https://github.com/llvm/llvm-project/commit/e55d6f5ea2656bf842973d8bee86c3ace31bc865
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).
...if only we had a merge queue.
show more ...
|
#
7792b4ae |
| 12-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173
si-init-whole-wave.mir crashes on some buildbots (although it passed
bo
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173
si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
show more ...
|
#
703ebca8 |
| 12-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.
The bui
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.
The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
show more ...
|
#
c7a7767f |
| 10-Sep-2024 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)
Breaks bots, see #105822.
Reverts llvm/llvm-project#105822
|
#
44556e64 |
| 10-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may conta
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.
As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):
```
entry:
%func_exec = call i1 @llvm.amdgcn.init.whole.wave()
br i1 %func_exec, label %func, label %tail
func:
; ... stuff that should run with the actual EXEC mask
br label %tail
tail:
; ... stuff that runs with all the lanes enabled;
; can contain more than one basic block
```
It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).
The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
#
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
#
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
dc8da7dd |
| 29-May-2024 |
Pankaj Dwivedi <167427157+PankajDwivedi-25@users.noreply.github.com> |
[AMDGPU] Reserved private memory register during PEI (#93536)
- Reserved newly selected private memory registers in entry Function
Prologue generation.
- Added assertion patch in eliminateFrameInd
[AMDGPU] Reserved private memory register during PEI (#93536)
- Reserved newly selected private memory registers in entry Function
Prologue generation.
- Added assertion patch in eliminateFrameIndex to ensure register is
reserved.
Co-authored-by: PankajDwivedi-25 <pankajkumar.divedi@amd.com>
show more ...
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
167427f5 |
| 01-May-2024 |
Gang Chen <gangc@amd.com> |
[AMDGPU] change order of fp and sp in kernel prologue (#90626)
change order of fp and sp in kernel prologue also related codegen tests
to make it easier to merge code into our downstream branches
[AMDGPU] change order of fp and sp in kernel prologue (#90626)
change order of fp and sp in kernel prologue also related codegen tests
to make it easier to merge code into our downstream branches
Signed-off-by: gangc <gangc@amd.com>
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
dfa1d9b0 |
| 23-Feb-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined consta
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.
show more ...
|
Revision tags: llvmorg-18.1.0-rc3 |
|
#
f6610578 |
| 09-Feb-2024 |
Jan Patrick Lehr <JanPatrick.Lehr@amd.com> |
Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)
Reverts llvm/llvm-project#79586
This broke the AMDGPU OpenMP Offload buildbot.
The
Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)
Reverts llvm/llvm-project#79586
This broke the AMDGPU OpenMP Offload buildbot.
The typical error message was that the GPU attempted to read beyong the
largest legal address.
Error message:
AMDGPU fatal error 1: Received error in queue 0x7f8363f22000:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to
access memory beyond the largest legal address.
show more ...
|
#
88e52511 |
| 08-Feb-2024 |
alex-t <alex-t@users.noreply.github.com> |
[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)
This change implements synthesizing the private buffer resource
descriptor in the kernel prolo
[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)
This change implements synthesizing the private buffer resource
descriptor in the kernel prolog instead of using the preloaded kernel
argument.
show more ...
|
Revision tags: llvmorg-18.1.0-rc2 |
|
#
89ec940b |
| 05-Feb-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Insert spill codes for the SGPRs used for EXEC copy (#79428)
The SGPR registers used for preserving EXEC mask while lowering the
whole-wave register spills and copies should be preserved a
[AMDGPU] Insert spill codes for the SGPRs used for EXEC copy (#79428)
The SGPR registers used for preserving EXEC mask while lowering the
whole-wave register spills and copies should be preserved at the prolog
and epilog if they are in the CSR range. It isn't happening when there
is only wwm-copy lowered and there are no wwm-spills. This patch
addresses that problem.
show more ...
|
Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
230c13d5 |
| 24-Jan-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to a
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.
This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.
show more ...
|
#
9ca36932 |
| 18-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186)
|