Revision tags: llvmorg-21-init |
|
#
b08b5638 |
| 20-Jan-2025 |
Alex Voicu <alexandru.voicu@amd.com> |
[NFC][AMDGPU] Clean-up feature parsing for AMDGCNSPIRV. (#123519)
When we did the initial AMDGCNSPIRV commits we left the initialisation of the feature map in a relatively disorderly state. This cha
[NFC][AMDGPU] Clean-up feature parsing for AMDGCNSPIRV. (#123519)
When we did the initial AMDGCNSPIRV commits we left the initialisation of the feature map in a relatively disorderly state. This change corrects that oversight:
- We make sure that AMDGCNSPIRV actually advertises the union of all AMDGCN features, as some were not included; - We keep feature initialisation in sorted order to make it easy to pick an insertion point when features are added in the future.
show more ...
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
56156572 |
| 27-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16|f16}_f32 instructions (#117824)
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
|
#
62dc8f30 |
| 27-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. (#117823)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
|
#
0f4fcca5 |
| 27-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp|bf]6 for gfx950 (#117745)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
|
#
2b9e947d |
| 27-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte
AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743)
OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read.
OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write.
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
show more ...
|
#
815069c7 |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16|f32]_[bf8|fp8] (#117739)
OPSEL[1:0] collectively decide which byte to read from src input.
Builtin takes additional imm argument which re
AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16|f32]_[bf8|fp8] (#117739)
OPSEL[1:0] collectively decide which byte to read from src input.
Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch.
OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3}
Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp|bf}8
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
show more ...
|
#
7fc71f79 |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
|
#
716364eb |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNa
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
show more ...
|
#
aa7eb572 |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597)
v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 i
AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597)
v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler.
This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16).
All necessary changes to gfx11 and gfx12 are updated to reflect this change.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
show more ...
|
#
5d650a62 |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596)
This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions.
Co-authored-by: Sirish Pande <Sirish
AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596)
This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
show more ...
|
#
22503a9d |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (#117592)
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
|
#
d1cca313 |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset o
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1.
The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
ca1b35a6 |
| 18-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)
Rand num instruction for stochastic rounding.
|
#
a6fc489b |
| 18-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and tests for removed instructions.
|
#
de0fd64b |
| 13-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch
[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
show more ...
|
Revision tags: llvmorg-19.1.3 |
|
#
076aac59 |
| 23-Oct-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Add a new target for gfx1153 (#113138)
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
f363e30f |
| 09-Jul-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633)
|
Revision tags: llvmorg-18.1.8 |
|
#
88e2bb40 |
| 07-Jun-2024 |
Alex Voicu <alexandru.voicu@amd.com> |
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796)
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796)
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:
- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.
Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
show more ...
|
#
1ca0055f |
| 06-Jun-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Add a new target gfx1152 (#94534)
|
Revision tags: llvmorg-18.1.7 |
|
#
775f1cd3 |
| 31-May-2024 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDGPU: Add gfx12-generic target (#93875)
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1 |
|
#
96813de5 |
| 06-Mar-2024 |
Changpeng Fang <changpeng.fang@amd.com> |
AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248)
FeatureDot11Insts (dot11-insts) for:
v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8,
v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3 |
|
#
43c7eb5d |
| 14-Feb-2024 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Replace '.' with '-' in generic target names (#81718)
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs
[AMDGPU] Replace '.' with '-' in generic target names (#81718)
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.
After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.
show more ...
|
#
f93aa515 |
| 12-Feb-2024 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost o
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.
Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.
This contains the documentation changes for both this change and #76954
as well.
show more ...
|
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
cfddb59b |
| 24-Jan-2024 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions
Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
instructions that were supp
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions
Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
instructions that were supported on GFX940 (MI300):
- V_CVT_F32_FP8
- V_CVT_F32_BF8
- V_CVT_PK_F32_FP8
- V_CVT_PK_F32_BF8
- V_CVT_PK_FP8_F32
- V_CVT_PK_BF8_F32
- V_CVT_SR_FP8_F32
- V_CVT_SR_BF8_F32
---------
Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
show more ...
|
Revision tags: llvmorg-19-init |
|
#
e21b0b08 |
| 19-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove gws feature from GFX12 (#78711)
This was already done for LLVM. This patch just updates the Clang
builtin handling to match.
|