#
42f6f95e |
| 23-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775)
Remove all the code that set and tested Res. Change all convert*
functions to return void since none of them can fail.
[AMDGPU] Simplify AMDGPUDisassembler::getInstruction by removing Res. (#82775)
Remove all the code that set and tested Res. Change all convert*
functions to return void since none of them can fail. getInstruction
only has one main point of failure, after all calls to tryDecodeInst
have failed.
show more ...
|
#
3b7d4330 |
| 22-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491)
Now that there is no special checking for valid DPP encodings, these
instructions can use the same DecoderNamespace as other 64- or 96-bit
inst
[AMDGPU] Remove DPP DecoderNamespaces. NFC. (#82491)
Now that there is no special checking for valid DPP encodings, these
instructions can use the same DecoderNamespace as other 64- or 96-bit
instructions.
Also clean up setting DecoderNamespace: in most cases it should be set
as a pair with AssemblerPredicate.
show more ...
|
#
b9ce2379 |
| 22-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Clean up conversion of DPP instructions in AMDGPUDisassembler (#82480)
Convert DPP instructions after all calls to tryDecodeInst, just like we
do for all other instruction types. NFCI.
|
#
bcbffd99 |
| 22-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encodi
[AMDGPU] Split Dpp8FI and Dpp16FI operands (#82379)
Split Dpp8FI and Dpp16FI into two different operands sharing an
AsmOperandClass. They are parsed and rendered identically as fi:1 but
the encoding is different: for DPP16 FI is a single bit, but for DPP8 it
uses two different special values in the src0 field. Having a dedicated
decoder for Dpp8FI allows it to reject other (non-special) src0 values
so that AMDGPUDisassembler::getInstruction no longer needs to call
isValidDPP8 to do post hoc validation of decoded DPP8 instructions.
show more ...
|
Revision tags: llvmorg-18.1.0-rc3 |
|
#
ddba6b27 |
| 20-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233)
64-bit SDWA encodings have to be checked first because their first 32
bits are a special case of the corresponding 32-bit non-SDWA encodin
[AMDGPU] Stop using SDWA DecoderNamespaces. NFCI. (#82233)
64-bit SDWA encodings have to be checked first because their first 32
bits are a special case of the corresponding 32-bit non-SDWA encoding of
the same instruction. But all 64-bit encodings are checked first, so we
don't need special handling for SDWA.
show more ...
|
#
a4d46157 |
| 20-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Try decoding instructions longest first. NFCI. (#82014)
AMDGPUDisassembler::getInstruction tries decoding instructions using
different DecoderTables in a confusing order: first 96-bit inst
[AMDGPU] Try decoding instructions longest first. NFCI. (#82014)
AMDGPUDisassembler::getInstruction tries decoding instructions using
different DecoderTables in a confusing order: first 96-bit instructions,
then some 64-bit, then 32-bit, then some more 64-bit.
This patch changes it to always try longer encodings first. The
motivation is to make getInstruction easier to understand, and to pave
the way for combining some 64-bit tables that do not need to be
separate.
show more ...
|
#
13e64958 |
| 19-Feb-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Fix decoder for BF16 inline constants (#82276)
Fix #82039.
|
#
ded3ca22 |
| 17-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Set predicates more consistently for BUF instructions (#81865)
Set DecoderNamespace and AssemblerPredicate in the base class for Real
instructions for each subtarget. This avoids some ad h
[AMDGPU] Set predicates more consistently for BUF instructions (#81865)
Set DecoderNamespace and AssemblerPredicate in the base class for Real
instructions for each subtarget. This avoids some ad hoc "let" around
groups of instructions definitions, and fixes some missed cases like
BUFFER_GL0_INV_gfx10 which was missing DecoderNamespace.
show more ...
|
#
d3b825f8 |
| 15-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Use consistent DecoderNamespace for wave64 instructions. NFC. (#81863)
For wave64 WMMA instructions, putting W64 in the DecoderNamespace is
more descriptive than WMMA, and matches other us
[AMDGPU] Use consistent DecoderNamespace for wave64 instructions. NFC. (#81863)
For wave64 WMMA instructions, putting W64 in the DecoderNamespace is
more descriptive than WMMA, and matches other uses for GFX12
GLOBAL_LOAD_TR instructions.
show more ...
|
#
4c931091 |
| 13-Feb-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Get rid of some operand decoders defined using macros. (#81482)
Use templates instead.
Part of <https://github.com/llvm/llvm-project/issues/62629>.
|
#
7d19dc50 |
| 08-Feb-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][True16] Support VOP3 source DPP operands. (#80892)
|
Revision tags: llvmorg-18.1.0-rc2 |
|
#
4eb08109 |
| 01-Feb-2024 |
Emma Pilkington <emma.pilkington95@gmail.com> |
[llvm-objdump][AMDGPU] Pass ELF ABIVersion through disassembler (#78907)
Admittedly, its a bit ugly to pass the ABIVersion through onSymbolStart
but I'm not sure what a better place for it would be.
|
Revision tags: llvmorg-18.1.0-rc1 |
|
#
70fbcdb4 |
| 26-Jan-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
Fix MSVC "signed/unsigned mismatch" warning. NFC.
|
#
2aa8945d |
| 25-Jan-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Use templates to decode AV operands. (#79313)
Eliminates the need to define them manually.
Part of <https://github.com/llvm/llvm-project/issues/62629>.
|
#
2e81ac25 |
| 24-Jan-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Simplify AGPR/VGPR load/store operand definitions. (#79289)
Part of <https://github.com/llvm/llvm-project/issues/62629>.
|
#
7fdf608c |
| 24-Jan-2024 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
|
#
cfddb59b |
| 24-Jan-2024 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions
Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
instructions that were supp
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions
Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
instructions that were supported on GFX940 (MI300):
- V_CVT_F32_FP8
- V_CVT_F32_BF8
- V_CVT_PK_F32_FP8
- V_CVT_PK_F32_BF8
- V_CVT_PK_FP8_F32
- V_CVT_PK_BF8_F32
- V_CVT_SR_FP8_F32
- V_CVT_SR_BF8_F32
---------
Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
show more ...
|
Revision tags: llvmorg-19-init |
|
#
bc82cfb3 |
| 21-Jan-2024 |
Emma Pilkington <emma.pilkington95@gmail.com> |
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the ass
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
show more ...
|
#
57f6a3f7 |
| 18-Jan-2024 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.
* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic i
[AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.
* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
show more ...
|
#
49b49204 |
| 03-Jan-2024 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultim
AMDGPU: Fix packed 16-bit inline constants (#76522)
Consistently treat packed 16-bit operands as 32-bit values, because
that's really what they are. The attempt to treat them differently was
ultimately incorrect and lead to miscompiles, e.g. when using non-splat
constants such as (1, 0) as operands.
Recognize 32-bit float constants for i/u16 instructions. This is a bit
odd conceptually, but it matches HW behavior and SP3.
Remove isFoldableLiteralV216; there was too much magic in the dependency
between it and its use in SIFoldOperands. Instead, we now simply rely on
checking whether a constant is an inline constant, and trying a bunch of
permutations of the low and high halves. This is more obviously correct
and leads to some new cases where inline constants are used as shown by
tests.
Move the logic for switching packed add vs. sub into SIFoldOperands.
This has two benefits: all logic that optimizes for inline constants in
packed math is now in one place; and it applies to both SelectionDAG and
GISel paths.
Disable the use of opsel with v_dot* instructions on gfx11. They are
documented to ignore opsel on src0 and src1. It may be interesting to
re-enable to use of opsel on src2 as a future optimization.
A similar "proper" fix of what inline constants mean could potentially
be applied to unpacked 16-bit ops. However, it's less clear what the
benefit would be, and there are surely places where we'd have to
carefully audit whether values are properly sign- or zero-extended. It
is best to keep such a change separate.
Fixes: Corruption in FSR 2.0 (latent bug exposed by an LLPC change)
show more ...
|
#
c01e844a |
| 02-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Update compute program resource registers for GFX12 (#75911)
Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>
|
#
8c6172b0 |
| 28-Dec-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][True16] Don't use the VGPR_LO/HI16 register classes. (#76440)
Removing the classes requires updating tests and so is planned to be
done with a separate change.
|
#
8fdfd34c |
| 21-Dec-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove GDS and GWS for GFX12 (#76148)
|
#
569ef8dd |
| 15-Dec-2023 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204)
|
#
c1a6974d |
| 15-Dec-2023 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU][MC] Add GFX12 SMEM encoding (#75215)
|