Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
be187369 |
| 14-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
|
#
4a6d13bf |
| 06-Nov-2024 |
Thurston Dang <thurston@google.com> |
Remove unused variable to fix '[AMDGPU] modify named barrier builtins and intrinsics (#114550)'
https://github.com/llvm/llvm-project/pull/114550 caused a buildbot breakage (https://lab.llvm.org/buil
Remove unused variable to fix '[AMDGPU] modify named barrier builtins and intrinsics (#114550)'
https://github.com/llvm/llvm-project/pull/114550 caused a buildbot breakage (https://lab.llvm.org/buildbot/#/builders/66/builds/5853) because of an unused variable. This patch attempts to fix forward:
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp:106:24: error: variable 'TTy' set but not used [-Werror,-Wunused-but-set-variable] 106 | if (TargetExtType *TTy = AMDGPU::isNamedBarrier(GV)) { | ^
show more ...
|
#
8c752900 |
| 06-Nov-2024 |
Gang Chen <gangc@amd.com> |
[AMDGPU] modify named barrier builtins and intrinsics (#114550)
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
baca
[AMDGPU] modify named barrier builtins and intrinsics (#114550)
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
bacause they do not need to worry about the hardware ID assignment. Also
this approach is more like the other popular GPU programming language.
Named barriers should be represented as global variables of addrspace(3)
in LLVM-IR. Compiler assigns the special LDS offsets for those variables
during AMDGPULowerModuleLDS pass. Those addresses are converted to hw
barrier ID during instruction selection. The rest of the
instruction-selection changes are primarily due to the
intrinsic-definition changes.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
8d13e7b8 |
| 03-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
6bba44e8 |
| 16-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Use member initializers. NFC.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
9803de0e |
| 04-Jan-2024 |
Chaitanya <Krishna.Sankisa@amd.com> |
[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)
"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add
[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)
"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.
hidden argument will be added in below cases:
- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
#
5272ae66 |
| 27-Jul-2023 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[AMDGPU] Add IsChainFunction to the MachineFunctionInfo
This will represent functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions.
Differential Revision: https://review
[AMDGPU] Add IsChainFunction to the MachineFunctionInfo
This will represent functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions.
Differential Revision: https://reviews.llvm.org/D156410
show more ...
|
Revision tags: llvmorg-18-init |
|
#
6043d4df |
| 15-Jul-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca
|
#
74e928a0 |
| 13-Jul-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu][lds] Remove recalculation of LDS frame from backend
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.
Prior to this patch: The IR lowering pa
[amdgpu][lds] Remove recalculation of LDS frame from backend
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.
Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation.
After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size.
Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D155190
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
0507448d |
| 04-Apr-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Implement dynamic LDS accesses from non-kernel functions
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.
[amdgpu] Implement dynamic LDS accesses from non-kernel functions
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.
1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use
Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144233
show more ...
|
#
75c7019b |
| 30-Mar-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Fix broken error detection in LDS lowering
std::optional<uint32_t> can be compared to uint32_t without warning, but does not compare to the value within the optional. It needs to be prefixe
[amdgpu] Fix broken error detection in LDS lowering
std::optional<uint32_t> can be compared to uint32_t without warning, but does not compare to the value within the optional. It needs to be prefixed *. Wconversion does not warn about this. ``` bool bug(uint32_t Offset, std::optional<uint32_t> Expect) { return (Offset != Expect); } bool deref(uint32_t Offset, std::optional<uint32_t> Expect) { return (Offset != *Expect); } ``` Both compile without warnings. Wrote the former, intended the latter.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D146775
show more ...
|
Revision tags: llvmorg-16.0.0 |
|
#
d3dda422 |
| 12-Mar-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently
[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses.
This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144221
show more ...
|
Revision tags: llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
#
69e75ae6 |
| 18-Jun-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on w
CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on where this is first called, you can conclude different information based on the MachineFunction. For example, the AMDGPU implementation inspected the MachineFrameInfo on construction for the stack objects and if the frame has calls. This kind of worked in SelectionDAG which visited all allocas up front, but broke in GlobalISel which hasn't visited any of the IR when arguments are lowered.
I've run into similar problems before with the MIR parser and trying to make use of other MachineFunction fields, so I think it's best to just categorically disallow dependency on the MachineFunction state in the constructor and to always construct this at the same time as the MachineFunction itself.
A missing feature I still could use is a way to access an custom analysis pass on the IR here.
show more ...
|
#
6443c0ee |
| 12-Dec-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.l
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
show more ...
|
#
67819a72 |
| 13-Dec-2022 |
Fangrui Song <i@maskray.me> |
[CodeGen] llvm::Optional => std::optional
|
#
d77ae7f2 |
| 07-Dec-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variabl
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
show more ...
|
#
a862d09a |
| 06-Dec-2022 |
Nico Weber <thakis@chromium.org> |
Revert "[amdgpu] Reimplement LDS lowering"
This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
|
#
98201724 |
| 06-Dec-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variabl
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
show more ...
|
#
5a3fe9a0 |
| 27-Sep-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.
Differential Revision: https://reviews.llvm.org/D134666
|
#
80ba4328 |
| 28-Sep-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically
A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic
[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically
A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic address by allocating it at the same time as the module scope instance.
This is relatively likely to be where the instance was allocated anyway (~NFC) but will allow later patches to calculate where a given field can be found, which means a function which is only reachable from a single kernel will be able to access a LDS variable with zero overhead. That will be particularly helpful for applications that instantiate a function template containing LDS variables once per kernel.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127052
show more ...
|
#
20a80d60 |
| 27-Sep-2022 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI"
Break msan bots. Details in D134666.
This reverts commit 0ce96e06ee0226938e723bd0c8e16e3d2d51f203.
|
#
0ce96e06 |
| 26-Sep-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Move SIModeRegisterDefaults to SI MFI
It does not belong to a general AMDGPU MFI.
Differential Revision: https://reviews.llvm.org/D134666
|
#
3a205977 |
| 19-Jul-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS varia
[amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
show more ...
|
#
bc78c099 |
| 04-May-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Elide module lds allocation in kernels with no callees
Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block
[amdgpu] Elide module lds allocation in kernels with no callees
Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block as before if the attribute is missing or has its default value of true.
Patch uses the new attribute to detect the simplest possible instance of this, where a kernel makes no calls and thus cannot call any functions that use LDS.
Tests updated to match, coverage was already good. Interesting cases is in lower-module-lds-offsets where annotating the kernel allows the backend to pick a different (in this case better) variable ordering than previously. A later patch will avoid moving kernel variables into module.lds when the kernel can have this attribute, allowing optimal ordering and locally unused variable elimination.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122091
show more ...
|
#
70306542 |
| 03-May-2022 |
serge-sans-paille <sguelton@redhat.com> |
[iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few regressions, fixing them.
Differential Revision: https://reviews.llvm.
[iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few regressions, fixing them.
Differential Revision: https://reviews.llvm.org/D124847
show more ...
|