Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
be187369 |
| 14-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3 |
|
#
1cc5290a |
| 17-Oct-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Factor out getNumUsedPhysRegs(). NFC. (#112624)
I will need it from one more place.
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
c897c13d |
| 30-Sep-2024 |
Janek van Oirschot <janek.vanoirschot@amd.com> |
[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913)
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propag
[AMDGPU] Convert AMDGPUResourceUsageAnalysis pass from Module to MF pass (#102913)
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.
Fixes https://github.com/llvm/llvm-project/issues/64863
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
#
51f72837 |
| 30-Jul-2024 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU: Remove a pointless SIFunctionResourceInfo::getTotalNumVgprs o… (#101158)
…verload
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
351a4b27 |
| 17-Jul-2024 |
Joseph Huber <huberjn@outlook.com> |
[AMDGPU] Simplify alias stripping to use utility function
|
#
6dc8c2da |
| 16-Jul-2024 |
Joseph Huber <huberjn@outlook.com> |
[AMDGPU] Fix resource analysis crash on alias-to-alias function (#99034)
Summary: Previously this code only looked through a single level of aliases to find the underlying function. This patch chang
[AMDGPU] Fix resource analysis crash on alias-to-alias function (#99034)
Summary: Previously this code only looked through a single level of aliases to find the underlying function. This patch changes it to continue until it finds the end. Aliases that form a cycle are illegal IR, so we shouldn't need to worry about infinite loops.
Fixes https://github.com/llvm/llvm-project/issues/96812
show more ...
|
#
520e0454 |
| 12-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Handle llvm.amdgcn.pops.exiting.wave.id with calls (#98614)
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
df5e431e |
| 26-Jan-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[Target][AMDGPU] Fix TSan error on AMDGPU Target. (#79529)
Updating the value of the global flag within the code was flagged as a
TSAN error. Fixing that.
|
Revision tags: llvmorg-19-init |
|
#
bc82cfb3 |
| 21-Jan-2024 |
Emma Pilkington <emma.pilkington95@gmail.com> |
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the ass
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
show more ...
|
#
8c6172b0 |
| 28-Dec-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][True16] Don't use the VGPR_LO/HI16 register classes. (#76440)
Removing the classes requires updating tests and so is planned to be
done with a separate change.
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3 |
|
#
343be513 |
| 19-Aug-2023 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add utilities to track number of user SGPRs. NFC.
Factor out and unify some common code that calculates and tracks the number of user SGRPs.
Reviewed By: arsenm
Differential Revision: htt
[AMDGPU] Add utilities to track number of user SGPRs. NFC.
Factor out and unify some common code that calculates and tracks the number of user SGRPs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159439
show more ...
|
Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
3604fdf1 |
| 12-Jun-2023 |
Baptiste <baptiste.saleil@amd.com> |
[AMDGPU] Do not assume stack size for PAL code object indirect calls
There is no need to set a big default stack size for PAL code object indirect calls. The driver knows the max recursion depth, so
[AMDGPU] Do not assume stack size for PAL code object indirect calls
There is no need to set a big default stack size for PAL code object indirect calls. The driver knows the max recursion depth, so it can compute a more accurate value from the minimum scratch size.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D150609
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2 |
|
#
ec821884 |
| 17-Apr-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Do not crash on agpr_hi16 in AMDGPUResourceUsageAnalysis
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D148438
|
#
2d39f5b5 |
| 13-Apr-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Allow use of TTMP registers in AMDGPUResourceUsageAnalysis
With architected SGPRs, workgroup IDs are passed into a compute shader in TTMP registers. Allow for this in AMDGPUResourceUsageAna
[AMDGPU] Allow use of TTMP registers in AMDGPUResourceUsageAnalysis
With architected SGPRs, workgroup IDs are passed into a compute shader in TTMP registers. Allow for this in AMDGPUResourceUsageAnalysis instead of failing an assertion.
Differential Revision: https://reviews.llvm.org/D148239
show more ...
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
7ca3444f |
| 10-Feb-2023 |
Changpeng Fang <changpeng.fang@amd.com> |
AMDGPU: Use module flag to get code object version at IR level folow-up
Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version a
AMDGPU: Use module flag to get code object version at IR level folow-up
Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump.
Reviewers: arsenm
Differential Revision https://reviews.llvm.org/D143293
show more ...
|
Revision tags: llvmorg-16.0.0-rc2 |
|
#
54cf69c9 |
| 03-Feb-2023 |
Changpeng Fang <changpeng.fang@amd.com> |
AMDGPU: Use module flag to get code object version at IR level
Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command l
AMDGPU: Use module flag to get code object version at IR level
Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler.
For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later.
For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.
Reviewer: arsenm
Differential Revision: https://reviews.llvm.org/D14313
show more ...
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
6443c0ee |
| 12-Dec-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.l
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
show more ...
|
Revision tags: llvmorg-15.0.6 |
|
#
595a0884 |
| 17-Nov-2022 |
Mateja Marjanovic <mateja.marjanovic@amd.com> |
[AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.
Differential Revision: https://reviews.llvm.org/D138205
|
#
220147d5 |
| 22-Nov-2022 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Make aperture registers 64 bit
Makes the SRC_(SHARED|PRIVATE)_(BASE|LIMIT) registers 64 bit instead of 32. They're still usable as 32 bit operands by using the _LO suffix.
Preparation for
[AMDGPU] Make aperture registers 64 bit
Makes the SRC_(SHARED|PRIVATE)_(BASE|LIMIT) registers 64 bit instead of 32. They're still usable as 32 bit operands by using the _LO suffix.
Preparation for D137542
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D137767
show more ...
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1 |
|
#
3759398b |
| 15-Sep-2022 |
Abinav Puthan Purayil <abinavpp@gmail.com> |
[AMDGPU] Report minimum scratch size in code object v5 and later by default
This change sets -amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size} options to zero by default for code
[AMDGPU] Report minimum scratch size in code object v5 and later by default
This change sets -amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size} options to zero by default for code object v5 and later. The runtime is expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit in the kernel descriptor is set.
Differential Revision: https://reviews.llvm.org/D128346
show more ...
|
Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
ef906f28 |
| 24-Jul-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix assertion when printing unreachable functions
Since 814a0abccefdd2e52b1b507f21ce842b689dbedd, this would break if we had a function in the module that becomes dead in any codegen IR pass
AMDGPU: Fix assertion when printing unreachable functions
Since 814a0abccefdd2e52b1b507f21ce842b689dbedd, this would break if we had a function in the module that becomes dead in any codegen IR pass. The function wasn't deleted since it was initially used in dead code, but is detached from the call graph and doesn't appear in the PO traversal. Do a second walk over the module to populate the resources of any functions which weren't already processed.
show more ...
|
Revision tags: llvmorg-14.0.6 |
|
#
cb9ae937 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision:
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision: https://reviews.llvm.org/D127527
show more ...
|
Revision tags: llvmorg-14.0.5 |
|
#
814a0abc |
| 03-Jun-2022 |
Jacob Weightman <jacobdweightman@gmail.com> |
AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis
The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed that a function's callees were always analyzed prior to
AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis
The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed that a function's callees were always analyzed prior to their callees. When it was refactored into a module pass, this assumption no longer always holds. This results in calls being erroneously identified as indirect, and reserving private segment space for them. This results in significantly slower kernel launch latency.
This patch changes the order in which the module's functions are analyzed from the order in which they occur in the module to a post-order traversal of the call graph. Perhaps Clang always generates the module's functions in such an order, but this is not the case for the Cray Fortran compiler.
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D126025
show more ...
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <sguelton@redhat.com> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
#
a278250b |
| 10-Mar-2022 |
Nico Weber <thakis@chromium.org> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|