Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
2e5c2982 |
| 10-Jan-2025 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167)
Add a prologue to the kernel entry to handle cases where code designed
for kernarg preloading is executed on hardware equi
[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167)
Add a prologue to the kernel entry to handle cases where code designed
for kernarg preloading is executed on hardware equipped with
incompatible firmware. If hardware has compatible firmware the 256 bytes
at the start of the kernel entry will be skipped. This skipping is done
automatically by hardware that supports the feature.
A pass is added which is intended to be run at the very end of the
pipeline to avoid any optimizations that would assume the prologue is a
real predecessor block to the actual code start. In reality we have two
possible entry points for the function. 1. The optimized path that
supports kernarg preloading which begins at an offset of 256 bytes. 2.
The backwards compatible entry point which starts at offset 0.
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
6cfe6a6b |
| 24-Sep-2024 |
Georgi Mirazchiyski <georgi.mirazchiyski@codeplay.com> |
[NFC][AMDGPU] Assert no bad shift operations will happen (#108416)
The assumption in the asserts is based on the fact that no SGPR/VGPR
register Arg mask in the ISelLowering and Legalizer can equal
[NFC][AMDGPU] Assert no bad shift operations will happen (#108416)
The assumption in the asserts is based on the fact that no SGPR/VGPR
register Arg mask in the ISelLowering and Legalizer can equal zero. They
are implicitly set to ~0 by default (meaning non-masked) or explicitly
to a non-zero value.
The `optimizeCompareInstr` case is different from the above described.
It requires the mask to be a power-of-two because it's a special-case
optimization, hence in this case we still cannot have an invalid shift.
This commit also silences static analysis tools wrt potential bad shifts
that could result from the output of `countr_zero(Mask)`.
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
7e9b49f6 |
| 25-Jun-2024 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU: Add plumbing for private segment size argument (#96445)
The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent cha
AMDGPU: Add plumbing for private segment size argument (#96445)
The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent change.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
1c6b3363 |
| 12-Dec-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Miscellaneous clang-format changes (#75186)
Reformat one table and protect a couple of tables that we don't want to
reformat.
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
#
0455596e |
| 02-Aug-2023 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patc
[AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch.
Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count.
Aggregates and arguments passed by-ref are not supported.
Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D158579
show more ...
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
#
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1 |
|
#
ce76d15a |
| 20-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use MCRegister for preloaded arguments
Attempt to fix build error with ancient GCC
|
Revision tags: llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
#
f25d020c |
| 05-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/GlobalISel: Add types to special inputs
When passing special ABI inputs, we have no existing context for the type to use.
|
Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
091f7f01 |
| 24-Apr-2020 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
AMDGPUArgumentUsageInfo.h - cleanup includes and forward declarations. NFC. Reduce Function.h include to (already existing) forward declaration. Remove unused GCNSubtarget/TargetMachine forward decla
AMDGPUArgumentUsageInfo.h - cleanup includes and forward declarations. NFC. Reduce Function.h include to (already existing) forward declaration. Remove unused GCNSubtarget/TargetMachine forward declarations.
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5 |
|
#
4ea1baf6 |
| 17-Mar-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Initial, crude support for indirect calls
This isn't really usable, and requires using the -amdgpu-fixed-function-abi flag to work.
Assumes a uniform call target, and will hit a verifier er
AMDGPU: Initial, crude support for indirect calls
This isn't really usable, and requires using the -amdgpu-fixed-function-abi flag to work.
Assumes a uniform call target, and will hit a verifier error if the call target ends up in a VGPR. Also doesn't attempt to do anything sensible for the reported register/stack usage.
show more ...
|
Revision tags: llvmorg-10.0.0-rc4 |
|
#
015b640b |
| 11-Mar-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add flag to used fixed function ABI
Pass all arguments to every function, rather than only passing the minimum set of inputs needed for the call graph.
|
Revision tags: llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4 |
|
#
2a7304c8 |
| 05-Sep-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix Register copypaste error
llvm-svn: 371141
|
Revision tags: llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
#
1b317685 |
| 01-Jul-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Convert some places to Register
llvm-svn: 364769
|
#
07fd88d7 |
| 28-Jun-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Packed thread ids in function call ABI
Differential Revision: https://reviews.llvm.org/D63851
llvm-svn: 364619
|
Revision tags: llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
#
5bfbae5c |
| 11-Jul-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Me
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
show more ...
|
#
766c77ef |
| 21-Jun-2018 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z and everything that comes with it from implementation and v3 header files.
Leave definition in v2 header files for backwards compatibility.
Differentia
AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z and everything that comes with it from implementation and v3 header files.
Leave definition in v2 header files for backwards compatibility.
Differential Revision: https://reviews.llvm.org/D48191
llvm-svn: 335267
show more ...
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2 |
|
#
2feb1058 |
| 04-Aug-2017 |
Florian Gross <fgross@noexcept.net> |
[AMDGPU] Fixed MSVC build break
Error was:
field of type 'llvm::ArgDescriptor' has private default constructor const AMDGPUFunctionArgInfo AMDGPUArgumentUsageInfo::ExternFunctionInfo{};
[AMDGPU] Fixed MSVC build break
Error was:
field of type 'llvm::ArgDescriptor' has private default constructor const AMDGPUFunctionArgInfo AMDGPUArgumentUsageInfo::ExternFunctionInfo{}; ^
llvm-svn: 310048
show more ...
|
#
817c253e |
| 03-Aug-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix implicitarg.ptr handling special inputs
llvm-svn: 310002
|
#
7016f134 |
| 03-Aug-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add analysis pass for function argument info
This will allow only adding necessary inputs to callee functions that need special inputs forwarded from the kernel.
llvm-svn: 309996
|