History log of /llvm-project/llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (Results 1 – 23 of 23)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 87c21bf0 04-Dec-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[AMDGPU] Preserve `noundef` and `range` during kernel argument loads (#118395)

This commit ensures than noundef (which is frequently a prerequisite for
other annotations) and range() annotations on

[AMDGPU] Preserve `noundef` and `range` during kernel argument loads (#118395)

This commit ensures than noundef (which is frequently a prerequisite for
other annotations) and range() annotations on kernel arguments are
copied onto their corresponding load from the kernel argument structure.

show more ...


Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7
# e31bfc04 03-Jun-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801)

The various preloaded registers (workitem IDs, workgroup IDs, and
various implicit pointers) always have a finite, invariant,

[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801)

The various preloaded registers (workitem IDs, workgroup IDs, and
various implicit pointers) always have a finite, invariant, well-defined
value throughout a well-defined program.

In cases where the compiler infers or the user declares that some
implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the
behavior of the entire program is undefined, since that misdeclaration
can cause arbitrary other preloaded-register intrinsics to access the
wrong register. This case is not expected to arise in practice, but
could occur when the no implicit argument attributes were not cleared
correctly in the presence of external functions, indrect calls, or other
means of executing un-analyzable code. Failure to detect that case would
be a bug in the attributor.

This commit updates the documentation to reflect this long-standing
reality.

Then, on the basis that all implicit arguments are defined in all
correct programs, the intrinsics that return those values are
annototated with `noundef``. Some implicit pointer arguments gain a
`nonnull`, but the kernel argument segment pointer or implicit argument
pointers don't necessarily have this property.

This will prevent spurious calls to `freeze` in front-end optimizations
that destroy user-provided ranges on built-in IDs.

(While I'm here, this commit adds a test for `noundef` on kernel
arguments which is currently unimplemented)

show more ...


Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 4490003a 06-Mar-2024 Emma Pilkington <emma.pilkington95@gmail.com>

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.

show more ...


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# e21b7e21 15-Dec-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU][NFC] Check more autogenerated llc tests for COV5 (#75219)

Regenerate a few more llc tests to check for COV5 instead of the default
ABI version.


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 466a8149 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)

This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.


# 0a8d17e7 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed B

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818

show more ...


Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3
# 58e87c96 09-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Port AMDGPULowerKernelArguments to new pass manager

https://reviews.llvm.org/D157498


Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 7a3682f6 19-Dec-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Convert a few more special case tests to opaque pointers

lower-kernargs.ll needed a switch to use update_test_checks metadata
matching.


# ca856fff 29-Nov-2022 Ron Lieberman <ron.lieberman@amd.com>

Revert "enable code-object-version=5"

very sorry wrong repo.

This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.


# d882ba7a 29-Nov-2022 Ron Lieberman <ron.lieberman@amd.com>

enable code-object-version=5


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 2959e082 26-Oct-2021 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default

Previously we would require adding an attribute to kernels to enable
the inputs passed in the kernarg segment, accessed by
llvm

AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default

Previously we would require adding an attribute to kernels to enable
the inputs passed in the kernarg segment, accessed by
llvm.amdgcn.implicitarg.ptr. This violates the principle of being
correct by default. Some OpenMP testcases were broken recently since
it wasn't correctly setting this attribute, and no known frontends are
setting this to anything other than the maximum.

Most of the test changes are from load widening of argument loads
since there now more implied dereferenceable bytes.

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1
# bdd55b2f 31-Jul-2021 Eli Friedman <efriedma@quicinc.com>

Fix the default alignment of i1 vectors.

Currently, the default alignment is much larger than the actual size of
the vector in memory. Fix this to use a sane default.

For SVE, temporarily remove l

Fix the default alignment of i1 vectors.

Currently, the default alignment is much larger than the actual size of
the vector in memory. Fix this to use a sane default.

For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.

This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway. If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).

Differential Revision: https://reviews.llvm.org/D88994

show more ...


Revision tags: llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 9b296102 29-Dec-2020 Juneyoung Lee <aqjune@gmail.com>

Use unary CreateShuffleVector if possible

As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleV

Use unary CreateShuffleVector if possible

As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923

show more ...


# 61177943 24-Dec-2020 Praveen Velliengiri <Praveen.Velliengiri@amd.com>

[AMDGPU] Use MUBUF instructions for global address space access

Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This pa

[AMDGPU] Use MUBUF instructions for global address space access

Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This patch
fix the crash by selecting MUBUF instructions for global load/stores
in gfx600.

Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com>

Reviewed by: t-tye

Differential revision: https://reviews.llvm.org/D92483

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1
# 1168119c 07-May-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Start interpreting byref on kernel arguments

These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%st

AMDGPU: Start interpreting byref on kernel arguments

These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should
produce the same offsets and argument metadata.

This handles all 3 kernel ABI implementations, and the two HSA
metadata emission paths.

show more ...


# 4f04db4b 15-May-2020 Eli Friedman <efriedma@quicinc.com>

AllocaInst should store Align instead of MaybeAlign.

Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
vari

AllocaInst should store Align instead of MaybeAlign.

Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.

Differential Revision: https://reviews.llvm.org/D80044

show more ...


# 074c371a 05-May-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Insert kernarg code after allocas

This produces more normal looking IR by keeping all the allocas
clustered at the start of the block.


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# e0c1f9e7 17-Mar-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Partially fix default device for HSA

There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address

AMDGPU: Partially fix default device for HSA

There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.

Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).

Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.

llvm-svn: 356347

show more ...


Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2
# 14359ef1 01-Feb-2019 James Y Knight <jyknight@google.com>

[opaque pointer types] Pass value type to LoadInst creation.

This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

[opaque pointer types] Pass value type to LoadInst creation.

This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911

show more ...


Revision tags: llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1
# 72b0e38b 28-Jul-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Stop trying to extend arguments for clover

This was trying to replace i8/i16 arguments with i32, which
was broken and no longer necessary.

llvm-svn: 338193


# f5be3ad7 29-Jun-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't use struct type for argument layout

This was introducing unnecessary padding after the explicit
arguments, depending on the alignment of the total struct type.
Also has the side effect

AMDGPU: Don't use struct type for argument layout

This was introducing unnecessary padding after the explicit
arguments, depending on the alignment of the total struct type.
Also has the side effect of avoiding creating an extra GEP for
the offset from the base kernel argument to the explicit kernel
argument offset.

llvm-svn: 335999

show more ...


# 513e0c0e 28-Jun-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix assert on aggregate type kernel arguments

Just fix the crash for now by not doing the optimization since
figuring out how to properly convert the bits for an arbitrary
struct is a pain.

AMDGPU: Fix assert on aggregate type kernel arguments

Just fix the crash for now by not doing the optimization since
figuring out how to properly convert the bits for an arbitrary
struct is a pain.

Also fix a crash when there is only an empty struct argument.

llvm-svn: 335827

show more ...


# 8c4a3523 26-Jun-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add pass to lower kernel arguments to loads

This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful fo

AMDGPU: Add pass to lower kernel arguments to loads

This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed on them as the equivalent !range
metadata is not valid on pointer typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650

show more ...