Revision tags: llvmorg-21-init |
|
#
5e26ff35 |
| 14-Jan-2025 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499)
True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
|
Revision tags: llvmorg-19.1.7 |
|
#
b2adeae8 |
| 03-Jan-2025 |
Jun Wang <jwang86@yahoo.com> |
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
5dd48c49 |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (#117590)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
|
#
01c9a14c |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
show more ...
|
Revision tags: llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
16238669 |
| 18-Jun-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][MC] Support UC_VERSION_* constants. (#95618)
Our other tools support them, so we want them in LLVM
assembler/disassembler too.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
68e814d9 |
| 18-Apr-2024 |
Emma Pilkington <emma.pilkington95@gmail.com> |
[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400)
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a
[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400)
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a bit more context since they seem like the
most likely ones to be observed in practice.
This commit also improves the error handling mechanism for
MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter
that was just being ignored by llvm-objdump, now it returns errors using
Expected<T>.
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
60e7ae3f |
| 26-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992)
Speed up disassembly by only calling tryDecodeInst for DecoderTables
that make sense for the current subtarget.
This giv
[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992)
Speed up disassembly by only calling tryDecodeInst for DecoderTables
that make sense for the current subtarget.
This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my
Release+Asserts build.
show more ...
|
#
42f6f95e |
| 23-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775)
Remove all the code that set and tested Res. Change all convert*
functions to return void since none of them can fail.
[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775)
Remove all the code that set and tested Res. Change all convert*
functions to return void since none of them can fail. getInstruction
only has one main point of failure, after all calls to tryDecodeInst
have failed.
show more ...
|
#
bcbffd99 |
| 22-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encodi
[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encoding is different: for DPP16 FI is a single bit, but for DPP8 it
uses two different special values in the src0 field. Having a dedicated
decoder for Dpp8FI allows it to reject other (non-special) src0 values
so that AMDGPUDisassembler::getInstruction no longer needs to call
isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
show more ...
|
Revision tags: llvmorg-18.1.0-rc3 |
|
#
13e64958 |
| 19-Feb-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Fix decoder for BF16 inline constants (#82276)
Fix #82039.
|
#
7d19dc50 |
| 08-Feb-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][True16] Support VOP3 source DPP operands. (#80892)
|
Revision tags: llvmorg-18.1.0-rc2 |
|
#
4eb08109 |
| 01-Feb-2024 |
Emma Pilkington <emma.pilkington95@gmail.com> |
[llvm-objdump][AMDGPU] Pass ELF ABIVersion through disassembler (#78907)
Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart
but I'm not sure what a better place for it would be.
|
Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
7f55d7de |
| 13-Dec-2023 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
|
#
f5868cb6 |
| 04-Dec-2023 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062)
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
ab6c3d50 |
| 12-Oct-2023 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Change the representation of double literals in operands (#68740)
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 b
[AMDGPU] Change the representation of double literals in operands (#68740)
A 64-bit literal can be used as a 32-bit zero or sign extended operand.
In case of double zeroes are added to the low 32 bits. Currently asm
parser stores only high 32 bits of a double into an operand. To support
codegen as requested by the
https://github.com/llvm/llvm-project/issues/67781 we need to change the
representation to store a full 64-bit value so that codegen can simply
add immediates to an instruction.
There is some code to support compatibility with existing tests and asm
kernels. We allow to use short hex strings to represent only a high 32
bit of a double value as a valid literal.
show more ...
|
Revision tags: llvmorg-17.0.2 |
|
#
9310baa5 |
| 25-Sep-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Add True16 operand definitions.
Reviewed By: Joe_Nash
Differential Revision: https://reviews.llvm.org/D156103
|
#
fab28e0e |
| 22-Sep-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.
Analysis of failures on buildbots with expensive checks enabled showed that th
Reapply "[AMDGPU] Introduce real and keep fake True16 instructions."
Reverts 6cb3866b1ce9d835402e414049478cea82427cf1.
Analysis of failures on buildbots with expensive checks enabled showed that the problem was triggered by changes in another commit, 469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug addressed in #67245.
show more ...
|
#
6cb3866b |
| 22-Sep-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
Revert "[AMDGPU] Introduce real and keep fake True16 instructions."
This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to failures on expensive checks.
|
#
0f864c7b |
| 22-Sep-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionali
[AMDGPU] Introduce real and keep fake True16 instructions.
The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionality is implemented and relevant tests are updated.
Reviewed By: arsenm, Joe_Nash
Differential Revision: https://reviews.llvm.org/D156101
show more ...
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
#
69447d6a |
| 02-Aug-2023 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add ASM and MC updates for preloading kernargs
Add assembler directives for preloading kernel arguments that correspond to new fields in the kernel descriptor for the length and offset of a
[AMDGPU] Add ASM and MC updates for preloading kernargs
Add assembler directives for preloading kernel arguments that correspond to new fields in the kernel descriptor for the length and offset of arguments that will be placed in SGPRs prior to kernel launch. Alignment of the arguments in SGPRs is equivalent to the kernarg segment when accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated directly after other user SGPRs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159459
show more ...
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
986001c8 |
| 29-Jun-2023 |
Scott Linder <Scott.Linder@amd.com> |
[AMDGPU] Improve assembler + disassembler handling of kernel descriptors
* Relax the AsmParser to accept `.amdhsa_wavefront_size32 0` when the `.amdhsa_shared_vgpr_count` directive is present. * T
[AMDGPU] Improve assembler + disassembler handling of kernel descriptors
* Relax the AsmParser to accept `.amdhsa_wavefront_size32 0` when the `.amdhsa_shared_vgpr_count` directive is present. * Teach the KD disassembler to respect the setting of KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32 when calculating the value of `.amdhsa_next_free_vgpr`. * Teach the KD disassembler to disassemble COMPUTE_PGM_RSRC3 for gfx90a and gfx10+. * Include "pseudo directive" comments for gfx10 fields which are not controlled by any assembler directive. * Fix disassembleObject failure diagnostic in llvm-objdump to not hard-code a comment string, and to follow the convention of not capitalizing the first sentence.
Reviewed By: rochauha
Differential Revision: https://reviews.llvm.org/D128014
show more ...
|
#
ede070a2 |
| 27-Jun-2023 |
Scott Linder <Scott.Linder@amd.com> |
[NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.
Use split-file to simplify and extend existing kernel-descriptor disassembly tests.
Add a
[NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.
Use split-file to simplify and extend existing kernel-descriptor disassembly tests.
Add a comment to AMDHSAKernelDescriptor.h, as at least one small set towards keeping all kernel-descriptor sensitive code in sync.
Reviewed By: MaskRay, kzhuravl, arsenm
Differential Revision: https://reviews.llvm.org/D130105
show more ...
|
#
b0abd489 |
| 17-Jun-2023 |
Elliot Goodrich <elliotgoodrich@gmail.com> |
[llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"` from the header to source file of `llvm/Support/Error.h`, first add in all the missing
[llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"` from the header to source file of `llvm/Support/Error.h`, first add in all the missing includes that were previously included transitively through this header.
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
#
e23891a3 |
| 26-Apr-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][Disassembler] Fix a spurious error message in an instruction comment.
The patch prevents pollution of instruction comments with error messages generated during unsuccessful decoding attempt
[AMDGPU][Disassembler] Fix a spurious error message in an instruction comment.
The patch prevents pollution of instruction comments with error messages generated during unsuccessful decoding attempts.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D149049
show more ...
|
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
#
b0c1a45b |
| 01-Feb-2023 |
Petar Avramovic <Petar.Avramovic@amd.com> |
AMDGPU/MC: Refactor decoders. Rework decoders for float immediates
decodeFPImmed creates immediate operand using register operand width, but size of created immediate should correspond to OperandTyp
AMDGPU/MC: Refactor decoders. Rework decoders for float immediates
decodeFPImmed creates immediate operand using register operand width, but size of created immediate should correspond to OperandType for RegisterOperand. e.g. OPW128 could be used for RegisterOperands that use v2f64 v4f32 and v8f16. Each RegisterOperands would have different OperandType and require that immediate is decoded using 64, 32 and 16 bit immediate respectively. decodeOperand_<RegClass> only provides width for register decoding, introduce decodeOperand_<RegClass>_Imm<ImmWidth> that also provides width for immediate decoding. Refactor RegisterOperands: - decoders get _Imm<ImmWidth> suffix in some cases - removed unused RegisterOperands defined via multiclass - use different RegisterOperand in a few places, new RegisterOperand's decoder corresponds to the number of bits used for operand's encoding Refactor decoder functions: - add asserts for the size of encoding that will be decoded - regroup them according to the method of decoding decodeOperand_<RegClass> (register only, no immediate) decoders can now create immediate of consistent size, use it for better diagnostic of 'invalid immediate'.
Differential Revision: https://reviews.llvm.org/D142636
show more ...
|