lower-kernargs.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/AMDGPU/lower-kernargs.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 87c21bf0	04-Dec-2024	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	[AMDGPU] Preserve `noundef` and `range` during kernel argument loads (#118395) This commit ensures than noundef (which is frequently a prerequisite for other annotations) and range() annotations on [AMDGPU] Preserve `noundef` and `range` during kernel argument loads (#118395) This commit ensures than noundef (which is frequently a prerequisite for other annotations) and range() annotations on kernel arguments are copied onto their corresponding load from the kernel argument structure. show more ...
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7
# e31bfc04	03-Jun-2024	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801) The various preloaded registers (workitem IDs, workgroup IDs, and various implicit pointers) always have a finite, invariant, [AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801) The various preloaded registers (workitem IDs, workgroup IDs, and various implicit pointers) always have a finite, invariant, well-defined value throughout a well-defined program. In cases where the compiler infers or the user declares that some implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the behavior of the entire program is undefined, since that misdeclaration can cause arbitrary other preloaded-register intrinsics to access the wrong register. This case is not expected to arise in practice, but could occur when the no implicit argument attributes were not cleared correctly in the presence of external functions, indrect calls, or other means of executing un-analyzable code. Failure to detect that case would be a bug in the attributor. This commit updates the documentation to reflect this long-standing reality. Then, on the basis that all implicit arguments are defined in all correct programs, the intrinsics that return those values are annototated with `noundef``. Some implicit pointer arguments gain a `nonnull`, but the kernel argument segment pointer or implicit argument pointers don't necessarily have this property. This will prevent spurious calls to `freeze` in front-end optimizations that destroy user-provided ranges on built-in IDs. (While I'm here, this commit adds a test for `noundef` on kernel arguments which is currently unimplemented) show more ...
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 4490003a	06-Mar-2024	Emma Pilkington <emma.pilkington95@gmail.com>	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling [AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb. show more ...
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# e21b7e21	15-Dec-2023	Saiyedul Islam <Saiyedul.Islam@amd.com>	[AMDGPU][NFC] Check more autogenerated llc tests for COV5 (#75219) Regenerate a few more llc tests to check for COV5 instead of the default ABI version.
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 466a8149	12-Sep-2023	Saiyedul Islam <Saiyedul.Islam@amd.com>	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
# 0a8d17e7	12-Sep-2023	Saiyedul Islam <Saiyedul.Islam@amd.com>	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed B [AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818 show more ...
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3
# 58e87c96	09-Aug-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Port AMDGPULowerKernelArguments to new pass manager https://reviews.llvm.org/D157498
Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 7a3682f6	19-Dec-2022	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Convert a few more special case tests to opaque pointers lower-kernargs.ll needed a switch to use update_test_checks metadata matching.
# ca856fff	29-Nov-2022	Ron Lieberman <ron.lieberman@amd.com>	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.
# d882ba7a	29-Nov-2022	Ron Lieberman <ron.lieberman@amd.com>	enable code-object-version=5
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 2959e082	26-Oct-2021	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm.amdgcn.implicitarg.ptr. This violates the principle of being correct by default. Some OpenMP testcases were broken recently since it wasn't correctly setting this attribute, and no known frontends are setting this to anything other than the maximum. Most of the test changes are from load widening of argument loads since there now more implied dereferenceable bytes. show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1
# bdd55b2f	31-Jul-2021	Eli Friedman <efriedma@quicinc.com>	Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove l Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove lowering of load/store operations for predicates with less than 16 elements. The layout the backend was assuming for SVE predicates with less than 16 elements doesn't agree with the frontend. More work probably needs to be done here. This change is, strictly speaking, not backwards-compatible at the bitcode level. But probably nobody is actually depending on that; i1 vectors in memory are rare, and the code that does use them probably ends up forcing the alignment to something sane anyway. If we think this is a concern, I can restrict this to scalable vectors for now (where it's actually causing issues for me at the moment). Differential Revision: https://reviews.llvm.org/D88994 show more ...
Revision tags: llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 9b296102	29-Dec-2020	Juneyoung Lee <aqjune@gmail.com>	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleV Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923 show more ...
# 61177943	24-Dec-2020	Praveen Velliengiri <Praveen.Velliengiri@amd.com>	[AMDGPU] Use MUBUF instructions for global address space access Currently, the compiler crashes in instruction selection of global load/stores in gfx600 due to the lack of FLAT instructions. This pa [AMDGPU] Use MUBUF instructions for global address space access Currently, the compiler crashes in instruction selection of global load/stores in gfx600 due to the lack of FLAT instructions. This patch fix the crash by selecting MUBUF instructions for global load/stores in gfx600. Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com> Reviewed by: t-tye Differential revision: https://reviews.llvm.org/D92483 show more ...
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1
# 1168119c	07-May-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Start interpreting byref on kernel arguments These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%st AMDGPU: Start interpreting byref on kernel arguments These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should produce the same offsets and argument metadata. This handles all 3 kernel ABI implementations, and the two HSA metadata emission paths. show more ...
# 4f04db4b	15-May-2020	Eli Friedman <efriedma@quicinc.com>	AllocaInst should store Align instead of MaybeAlign. Along the lines of D77454 and D79968. Unlike loads and stores, the default alignment is getPrefTypeAlign, to match the existing handling in vari AllocaInst should store Align instead of MaybeAlign. Along the lines of D77454 and D79968. Unlike loads and stores, the default alignment is getPrefTypeAlign, to match the existing handling in various places, including SelectionDAG and InstCombine. Differential Revision: https://reviews.llvm.org/D80044 show more ...
# 074c371a	05-May-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Insert kernarg code after allocas This produces more normal looking IR by keeping all the allocas clustered at the start of the block.
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# e0c1f9e7	17-Mar-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347 show more ...
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2
# 14359ef1	01-Feb-2019	James Y Knight <jyknight@google.com>	[opaque pointer types] Pass value type to LoadInst creation. This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. [opaque pointer types] Pass value type to LoadInst creation. This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911 show more ...
Revision tags: llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1
# 72b0e38b	28-Jul-2018	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Stop trying to extend arguments for clover This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193
# f5be3ad7	29-Jun-2018	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't use struct type for argument layout This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect AMDGPU: Don't use struct type for argument layout This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999 show more ...
# 513e0c0e	28-Jun-2018	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix assert on aggregate type kernel arguments Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. AMDGPU: Fix assert on aggregate type kernel arguments Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827 show more ...
# 8c4a3523	26-Jun-2018	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful fo AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650 show more ...