Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 11b04019 24-Jan-2025 Aaditya <115080342+easyonaadit@users.noreply.github.com>

[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)

Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same

[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)

Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same amount in the epilog.
Prolog:
sp += frameSize

Epilog:
sp -= frameSize

If a function has dynamic stack realignment,
Prolog:
sp += frameSize + max_alignment

Epilog:
sp -= frameSize + max_alignment

These calculations are not optimal in case of dynamic
stack realignment, and completely fail in case of
dynamic stack readjustment.
This patch uses the saved Frame Pointer to restore SP.
Prolog:
fp = sp
sp += frameSize

Epilog:
sp = fp

In case of dynamic stack realignment, SP is restored from
the saved Base Pointer.
Prolog:
fp = sp + (max_alignment - 1)
fp = fp & (-max_alignment)
bp = sp
sp += frameSize + max_alignment

Epilog:
sp = bp

(Note: The presence of BP has been enforced in case of any
dynamic stack realignment.)

---------

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

show more ...


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1
# 404071b0 24-Jul-2024 Matt Davis <mattd@nvidia.com>

[SelectionDAG] Preserve volatile undef stores. (#99918)

This patch preserves `undef` SDNodes that are `volatile` qualified.
Previously, these nodes would be discarded. The motivation behind this
c

[SelectionDAG] Preserve volatile undef stores. (#99918)

This patch preserves `undef` SDNodes that are `volatile` qualified.
Previously, these nodes would be discarded. The motivation behind this
change is to adhere to the
[LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses),
even though that doc is mostly in terms of LLVM-IR, it seems reasonable
to imply that the volatile constraints also imply to SDNodes.

> Certain memory accesses, such as
[load](https://llvm.org/docs/LangRef.html#i-load)’s,
[store](https://llvm.org/docs/LangRef.html#i-store)’s, and
[llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be
marked volatile. The optimizers must not change the number of volatile
operations or change their order of execution relative to other volatile
operations. The optimizers may change the order of volatile operations
relative to non-volatile operations. This is not Java’s “volatile” and
has no cross-thread synchronization behavior.

Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses

show more ...


Revision tags: llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 4490003a 06-Mar-2024 Emma Pilkington <emma.pilkington95@gmail.com>

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.

show more ...


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# f6610578 09-Feb-2024 Jan Patrick Lehr <JanPatrick.Lehr@amd.com>

Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)

Reverts llvm/llvm-project#79586

This broke the AMDGPU OpenMP Offload buildbot.
The

Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)

Reverts llvm/llvm-project#79586

This broke the AMDGPU OpenMP Offload buildbot.
The typical error message was that the GPU attempted to read beyong the
largest legal address.

Error message:
AMDGPU fatal error 1: Received error in queue 0x7f8363f22000:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to
access memory beyond the largest legal address.

show more ...


# 88e52511 08-Feb-2024 alex-t <alex-t@users.noreply.github.com>

[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)

This change implements synthesizing the private buffer resource
descriptor in the kernel prolo

[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)

This change implements synthesizing the private buffer resource
descriptor in the kernel prolog instead of using the preloaded kernel
argument.

show more ...


Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 777b6de7 12-Dec-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339)

Regenerate a few llc tests to test for COV5 instead of the default ABI
version.


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 466a8149 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)

This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.


# 0a8d17e7 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed B

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818

show more ...


Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1
# 4d42e8b5 28-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc2

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.

show more ...


# a496c8be 26-Jul-2023 Vitaly Buka <vitalybuka@google.com>

Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048

Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381

show more ...


Revision tags: llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5
# 7a98f084 17-May-2023 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

Differential Revision: https://reviews.llvm.org/D124196

show more ...


Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2
# 54cf69c9 03-Feb-2023 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Use module flag to get code object version at IR level

Summary:
This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command l

AMDGPU: Use module flag to get code object version at IR level

Summary:
This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.

For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.

For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.

Reviewer: arsenm

Differential Revision:
https://reviews.llvm.org/D14313

show more ...


Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# a3028239 21-Dec-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"

This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.


# 262c2c0f 19-Dec-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Update some tests to use opaque pointers

vectorize-buffer-fat-pointer.ll required a manual check line fix.
vector-alloca-addrspacecast.ll required a manual fixup of a check
line. partial-reg

AMDGPU: Update some tests to use opaque pointers

vectorize-buffer-fat-pointer.ll required a manual check line fix.
vector-alloca-addrspacecast.ll required a manual fixup of a check
line. partial-regcopy-and-spill-missed-at-regalloc.ll required
re-running update_mir_test_checks. The HSA metadata tests required
avoiding the script touching the type name in the metadata.

annotate-noclobber.ll ran into one update script bug. It deleted a
check line with a 0 offset GEP, moving the following -NEXT check
logically up one line.

show more ...


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# 40ba0942 14-Apr-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which isn an unproblematic case.

This patch also implements the whole wave spills which
might occur if RA spills any live range of virtual registers
involved in the whole wave operations. Earlier, we had
been hand-picking registers for such machine operands.
But now with SGPR spills into virtual VGPR lanes, we are
exposing them to the allocator.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124196

show more ...


# 29247824 30-Sep-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills

Unlike the callee-saved VGPR spill instructions emitted by
`PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during
emitPr

[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills

Unlike the callee-saved VGPR spill instructions emitted by
`PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during
emitPrologue/emitEpilogue require the exec bits flipping to avoid
clobbering the inactive lanes of VGPRs used for SGPR spilling.
Currently, these spill instructions are referenced from the SP at
function entry and when the callee performs a stack realignment,
they ended up getting incorrect stack offsets. Even if we try to
adjust the offsets, the FP-SP becomes a runtime entity with dynamic
stack realignment and the offsets would still be inaccurate.

To fix it, use FP as the frame base in the spill instructions
whenever the function has FP. The offsets obtained for the CS
objects would always be the right values from FP.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134949

show more ...


# b25b4c0a 13-Apr-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI

SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a commo

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI

SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124195

show more ...


# 4c4db816 30-Jul-2022 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions

Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742


# 3a205977 19-Jul-2022 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS varia

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060

show more ...


Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 04fff547 07-Mar-2022 Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652

show more ...


Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 09b53296 22-Dec-2021 Ron Lieberman <Ron.Lieberman@amd.com>

Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"

This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.

Failed amdgpu runtime buildbot # 3514


# 9075009d 22-Dec-2021 RamNalamothu <VenkataRamanaiah.Nalamothu@amd.com>

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added a

[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range

Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652

show more ...


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 06b90175 14-Aug-2021 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove fixed function ABI option


# 729bf9b2 14-Aug-2021 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Enable fixed function ABI by default

Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on

AMDGPU: Enable fixed function ABI by default

Code using indirect calls is broken without this, and there isn't
really much value in supporting the old attempt to vary the argument
placement based on uses. This resulted in more argument shuffling code
anyway.

Also have the option stop implying all inputs need to be passed. This
will no rely on the amdgpu-no-* attributes to avoid passing
unnecessary values.

show more ...


# da067ed5 10-Nov-2021 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Set most sched model resource's BufferSize to one

Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memor

[AMDGPU] Set most sched model resource's BufferSize to one

Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memory ops
and their consumers on an in-order processor. After this change, the
scheduler will treat the data edges from loads as blocking so that
stalls are guaranteed when waiting for data to be retreaved from memory.
Since we don't actually track waitcnt here, this should do a better job
at modeling their behavior.

Practically, this means that the scheduler will trigger the 'STALL'
heuristic more often.

This type of change needs to be evaluated experimentally. Preliminary
results are positive.

Fixes: SWDEV-282962

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114777

show more ...


12