Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
be187369 |
| 14-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
d31e3141 |
| 20-Sep-2024 |
Youngsuk Kim <youngsuk.kim@hpe.com> |
[llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b1361
[llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
#
7a2a36f9 |
| 29-Jul-2024 |
Sergei Barannikov <barannikov88@gmail.com> |
[AsmPrinter] Don't EmitToStreamer instructions lowered by tblgenned code (#100803)
This allows lowering individual instructions in a bundle before a single
call to EmitToStreamer for VLIW targets.
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
7f017f0a |
| 21-Jul-2024 |
Fangrui Song <i@maskray.me> |
[MC] Drop unnecessary MCSymbol::setExternal calls for ELF
Similar to e4c360a897fe062914519d331e8f1e28b2b1fbfd (2020).
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
29d571fa |
| 05-Jul-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU] Fix expensive-checks build.
Completes <https://reviews.llvm.org/D154337>.
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
#
1ab8b9ae |
| 27-Apr-2023 |
Changpeng Fang <changpeng.fang@amd.com> |
AMDGPU: Define sub-class of SGPR_64 for tail call return
Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved regist
AMDGPU: Define sub-class of SGPR_64 for tail call return
Summary: Registers for tail call return should not be clobbered by callee. So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold the tail call return address.
Because GFX and C calling conventions have different CSR, we need to define the sub-class separately. This work is an extension of D147096 with the consideration of GFX calling convention.
Based on the calling conventions, different instructions will be selected with different sub-class of SGPR_64 as the input.
Reviewers: arsenm, cdevadas and sebastian-ne
Differential Revision: https://reviews.llvm.org/D148824
show more ...
|
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4 |
|
#
0c049ea6 |
| 10-Mar-2023 |
Alexis Engelke <engelke@in.tum.de> |
[MC] Always encode instruction into SmallVector
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs s
[MC] Always encode instruction into SmallVector
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream to encode the instruction into a SmallVector. The raw_ostream however incurs some overhead for the actual encoding.
This change allows an MCCodeEmitter to directly emit an instruction into a SmallVector without using a raw_ostream and therefore allow for performance improvments in encoding. A default path that uses existing raw_ostream implementations is provided.
Reviewed By: MaskRay, Amir
Differential Revision: https://reviews.llvm.org/D145791
show more ...
|
#
d3dda422 |
| 12-Mar-2023 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently
[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses.
This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D144221
show more ...
|
Revision tags: llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
d77ae7f2 |
| 07-Dec-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variabl
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
show more ...
|
#
a862d09a |
| 06-Dec-2022 |
Nico Weber <thakis@chromium.org> |
Revert "[amdgpu] Reimplement LDS lowering"
This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
|
#
98201724 |
| 06-Dec-2022 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variabl
[amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis.
Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139433
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
#
b0f4678b |
| 17-Aug-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other
[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a.
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D132079
show more ...
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
#
f5b21680 |
| 13-Jun-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used eith
[AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior.
The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them.
As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0)
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D128158
show more ...
|
#
3e0bf1c7 |
| 14-Jul-2022 |
David Green <david.green@arm.com> |
[CodeGen] Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added
[CodeGen] Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments.
Recommitted with some fixes for the leftover MCII variables in release builds.
Differential Revision: https://reviews.llvm.org/D129506
show more ...
|
#
95252133 |
| 13-Jul-2022 |
David Green <david.green@arm.com> |
Revert "Move instruction predicate verification to emitInstruction"
This reverts commit e2fb8c0f4b940e0285ee36c112469fa75d4b60ff as it does not build for Release builds, and some buildbots are givin
Revert "Move instruction predicate verification to emitInstruction"
This reverts commit e2fb8c0f4b940e0285ee36c112469fa75d4b60ff as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.
show more ...
|
#
e2fb8c0f |
| 13-Jul-2022 |
David Green <david.green@arm.com> |
Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Targ
Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments.
Differential Revision: https://reviews.llvm.org/D129506
show more ...
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
2db70021 |
| 25-Mar-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers
[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers have knowledge of inter-wave runtime behavior which is unknown to the compiler this builtin can be used to tune scheduling.
This intrinsic creates a barrier between scheduling regions. The immediate parameter is a mask to determine the types of instructions that should be prevented from crossing the sched_barrier. In this initial patch, there are only two variations. A mask of 0 means that no instructions may be scheduled across the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing instructions may cross the sched_barrier.
Note that this intrinsic is only meant to work with the scheduling passes. Any other transformations that may move code will not be impacted in the ways described above.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D124700
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
#
04fff547 |
| 07-Mar-2022 |
Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
Revision tags: llvmorg-14.0.0-rc2 |
|
#
2aed07e9 |
| 16-Feb-2022 |
Shao-Ce SUN <shaoce@nj.iscas.ac.cn> |
[NFC][MC] remove unused argument `MCRegisterInfo` in `MCCodeEmitter`
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D119846
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
09b53296 |
| 22-Dec-2021 |
Ron Lieberman <Ron.Lieberman@amd.com> |
Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1fd5f2bf9aa6c2f362d2993691a316b3.
Failed amdgpu runtime buildbot # 3514
|
#
9075009d |
| 22-Dec-2021 |
RamNalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added a
[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls.
But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D114652
show more ...
|
Revision tags: llvmorg-13.0.1-rc1 |
|
#
5b8bbbec |
| 18-Nov-2021 |
Zarko Todorovski <zarko@ca.ibm.com> |
[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity test`.
|
#
76cbe622 |
| 25-Oct-2021 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not goin
[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D111637
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
#
47d6274d |
| 07-Sep-2021 |
Daniil Fukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Reduce includes dependencies, part 2
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Minor forward declarations, red
[NFC][AMDGPU] Reduce includes dependencies, part 2
1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Minor forward declarations, redundant includes and flags in GCNSubtarget cleanup.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D109351
show more ...
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
#
bf980930 |
| 16-Jul-2021 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] Ignore KILLs when forming clauses
KILL instructions are sometimes present and prevented hard clauses from being formed.
Fix this by ignoring all meta instructions in clauses.
Differential
[AMDGPU] Ignore KILLs when forming clauses
KILL instructions are sometimes present and prevented hard clauses from being formed.
Fix this by ignoring all meta instructions in clauses.
Differential Revision: https://reviews.llvm.org/D106042
show more ...
|