Revision tags: llvmorg-21-init |
|
#
5e26ff35 |
| 14-Jan-2025 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][True16][MC] true16 for v_cmp_lt_f16 (#122499)
True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
|
Revision tags: llvmorg-19.1.7 |
|
#
b2adeae8 |
| 03-Jan-2025 |
Jun Wang <jwang86@yahoo.com> |
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s
[AMDGPU][MC] Allow null where 128b or larger dst reg is expected (#115200)
For GFX10+, currently null cannot be used as dst reg in instructions
that expect the dst reg to be 128b or larger (e.g., s_load_dwordx4).
This patch fixes this problem while ensuring null cannot be used as S#,
T#, or V#.
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
716364eb |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNa
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
show more ...
|
#
22503a9d |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (#117592)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
|
#
5dd48c49 |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (#117590)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
|
#
cd20fc07 |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410)
This is a refinement for the existing hack. With this, the default target will have neither wavefrontsize feature present, unless i
AMDGPU: Remove wavefrontsize64 feature from dummy target (#117410)
This is a refinement for the existing hack. With this, the default target will have neither wavefrontsize feature present, unless it was explicitly specified. That is, getWavefrontSize() == 64 no longer implies +wavefrontsize64. getWavefrontSize() == 32 does imply +wavefrontsize32.
Continue to assume the value is 64 with no wavesize feature. This maintains the codegenable property without any code that directly cares about the wavesize needing to worry about it.
Introduce an isWaveSizeKnown helper to check if we know the wavesize is accurate based on having one of the features explicitly set, or a known target-cpu.
I'm not sure what's going on in wave_any.s. It's testing what happens when both wavesizes are enabled, but this is treated as an error in codegen. We now treat wave32 as the winning case, so some cases that were previously printed as vcc are now vcc_lo.
show more ...
|
#
8b087d64 |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move default wavesize hack for disassembler (#117422)
You cannot adjust the disassembler's subtarget. llvm-mc passes the originally constructed MCSubtargetInfo around, rather than querying t
AMDGPU: Move default wavesize hack for disassembler (#117422)
You cannot adjust the disassembler's subtarget. llvm-mc passes the originally constructed MCSubtargetInfo around, rather than querying the pointer in the disassembler instance.
show more ...
|
#
01c9a14c |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
show more ...
|
#
9fb01fcd |
| 20-Nov-2024 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233)
Support true16 format for VOP2 instructions in MC
This patch updates the true16 and fake16 vop_profile for the followin
[AMDGPU][MC][True16] Support VOP2 instructions with true16 format (#115233)
Support true16 format for VOP2 instructions in MC
This patch updates the true16 and fake16 vop_profile for the following
instructions and update the asm/dasm tests:
v_fmac_f16
v_fmamk_f16
v_fmaak_f16
It seems vop2_t16_promote.s files are not yet updated with true16 flag
in the previous batch update. It will be updated seperately
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
abff8fe2 |
| 14-Nov-2024 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][True16][MC] VINTERP instructions supporting true16/fake16 (#113634)
Update VInterp instructions with true16 and fake16 formats.
This patch includes instructions:
v_interp_p10_f16_f32
v
[AMDGPU][True16][MC] VINTERP instructions supporting true16/fake16 (#113634)
Update VInterp instructions with true16 and fake16 formats.
This patch includes instructions:
v_interp_p10_f16_f32
v_interp_p2_f16_f32
v_interp_p10_rtz_f16_f32
v_interp_p2_rtz_f16_f32
dasm test vinterp-fake16.txt is removed and the testline are merged into
vinterp.txt which handles both true16/fake16 cases
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
8d13e7b8 |
| 03-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
Revision tags: llvmorg-19.1.1 |
|
#
fd50cdfb |
| 28-Sep-2024 |
Craig Topper <craig.topper@sifive.com> |
[AMDGPU] Use MCRegister. NFC
|
#
f6a8eb98 |
| 24-Sep-2024 |
Jun Wang <jwang86@yahoo.com> |
[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730)
Some flat instructions have an saddr operand. When 'null' is provided as
saddr, it may have the same encoding as another instructi
[AMDGPU][MC] Disallow null as saddr in flat instructions (#101730)
Some flat instructions have an saddr operand. When 'null' is provided as
saddr, it may have the same encoding as another instruction. For
example, the instructions 'global_atomic_add v1, v2, null' and
'global_atomic_add v[1:2], v2, off' have the same encoding. This patch
disallows having null as saddr.
show more ...
|
#
73b8074e |
| 20-Sep-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Do not use APInt for simple 64-bit arithmetic. NFC. (#109414)
|
Revision tags: llvmorg-19.1.0 |
|
#
35e27c0e |
| 11-Sep-2024 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510)
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.
This patch
[AMDGPU][True16][MC] 16bit vsrc and vdst support in MC (#104510)
This is a large patch includes the MC level support for V_CVT_F16_F32,
V_CVT_F32_F16 and V_LDEXP_F16 in true16 format.
This patch includes the asm/disasm changes to encode/decode the 16bit
vsrc, vdst and src modifieres for vop and dpp format. This patch is a
dependency for many 16 bit instructions while only three instructions
are updated to make it easier to review.
There will be another patch to support these three instructions in the
codeGen level, this patch just replaces these two instructions with its
fake16 format.
show more ...
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
c1b3ebba |
| 26-Aug-2024 |
Craig Topper <craig.topper@sifive.com> |
[MC] Update MCOperand::getReg/setReg/createReg and MCInstBuilder::addReg to use MCRegister. (#106015)
Replace unsigned with MCRegister.
Update some ternary operators that started giving errors.
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
63fae3ed |
| 17-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)
|
#
b132dd41 |
| 16-Jul-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Remove wavefrontsize feature from GFX10+ (#98400)
Processor definition shall not include a default feature which may be
switched off by a different wave size. This allows not to write
-ma
[AMDGPU] Remove wavefrontsize feature from GFX10+ (#98400)
Processor definition shall not include a default feature which may be
switched off by a different wave size. This allows not to write
-mattr=-wavefrontsize32,+wavefrontsize64 in tests.
show more ...
|
#
e83e53b7 |
| 07-Jul-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU][MC] Allow UC_VERSION_* constant reuse (#96461)
If more than one disassembler is created for a context then allow reuse
of existing constants.
Warn if constants values do not match.
|
#
bb973785 |
| 27-Jun-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Only reinitialize disassembler Bytes array when needed. NFC. (#96666)
|
#
16238669 |
| 18-Jun-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][MC] Support UC_VERSION_* constants. (#95618)
Our other tools support them, so we want them in LLVM
assembler/disassembler too.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
a98a6e95 |
| 04-May-2024 |
luolent <56246516+luolent@users.noreply.github.com> |
Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391)
Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868)
Parenthesis are added as requested on t
Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391)
Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868)
Parenthesis are added as requested on ternary operators with non trivial conditions.
I used this [precedence table](https://en.cppreference.com/w/cpp/language/operator_precedence) for reference, to make sure we get the expected behavior on each change.
show more ...
|
Revision tags: llvmorg-18.1.5 |
|
#
6e722bbe |
| 26-Apr-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Support byte_sel modifier on v_cvt_sr_fp8_f32 and v_cvt_sr_bf8_f32 (#90244)
|
#
68e814d9 |
| 18-Apr-2024 |
Emma Pilkington <emma.pilkington95@gmail.com> |
[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400)
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a
[AMDGPU] Add disassembler diagnostics for invalid kernel descriptors (#87400)
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a bit more context since they seem like the
most likely ones to be observed in practice.
This commit also improves the error handling mechanism for
MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter
that was just being ignored by llvm-objdump, now it returns errors using
Expected<T>.
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
60e7ae3f |
| 26-Feb-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992)
Speed up disassembly by only calling tryDecodeInst for DecoderTables
that make sense for the current subtarget.
This giv
[AMDGPU] Only try DecoderTables for the current subtarget. NFCI. (#82992)
Speed up disassembly by only calling tryDecodeInst for DecoderTables
that make sense for the current subtarget.
This gives a 1.3x speed-up on check-llvm-mc-disassembler-amdgpu in my
Release+Asserts build.
show more ...
|