|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
| #
a796f597 |
| 02-Dec-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297)
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
|
| #
e97fb220 |
| 25-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add support for load transpose instructions for gfx950 (#117378)
This patch support for intrinsics in clang, as well as assembly instructions in the backend.
Co-authored-by: Sirish Pande <S
AMDGPU: Add support for load transpose instructions for gfx950 (#117378)
This patch support for intrinsics in clang, as well as assembly instructions in the backend.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
show more ...
|
| #
d1cca313 |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset o
AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)
This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1.
The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.
show more ...
|
| #
7d544c64 |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (#117259)
|
| #
90dc644d |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (#117258)
|
| #
8d3435f8 |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (#117257)
|
| #
8a5c2414 |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_32x32x64_bf8_bf8 for gfx950 (#117256)
|
| #
836d2dcf |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (#117235)
|
| #
33124910 |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_16x16x128_fp8_bf8 for gfx950 (#117234)
|
| #
3678f8a8 |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (#117233)
|
| #
7baadb2a |
| 22-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_16x16x128_bf8_bf8 for gfx950 (#117232)
|
| #
3e6f3508 |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (#117214)
|
| #
95ddc1a6 |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (#117211)
|
| #
e50eaa2c |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (#117205)
|
| #
2ab17882 |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (#117202)
|
| #
1c47d67a |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (#117053)
|
| #
f4ed79b1 |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_mfma_i32_32x32x32_i8 for gfx950 (#117052)
|
| #
76b24640 |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_mfma_i32_16x16x64_i8 for gfx950 (#116728)
|
| #
01c9a14c |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
show more ...
|
|
Revision tags: llvmorg-19.1.4 |
|
| #
130a3150 |
| 19-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Define v_mfma_f32_32x32x16_bf16 for gfx950 (#116679)
Unlike the existing gfx940 intrinsics using short/i16 in place of bfloat, this uses the natural bfloat type.
|
| #
0c421687 |
| 18-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add first gfx950 mfma instructions (#116312)
Scheduling info and hazards are wrong and TBD.
|
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
280d90d0 |
| 18-Jul-2024 |
Changpeng Fang <changpeng.fang@amd.com> |
AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540)
half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reas
AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540)
half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reasons. This work adds
the support of the float types back.
show more ...
|
| #
35f7b60a |
| 26-Jun-2024 |
Vikram Hegde <115221833+vikramRH@users.noreply.github.com> |
[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)
These are incremental changes over #89217 , with core logic being the
same. This patch along wit
[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
show more ...
|
| #
5feb32ba |
| 25-Jun-2024 |
Vikram Hegde <115221833+vikramRH@users.noreply.github.com> |
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass t
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
show more ...
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3 |
|
| #
350bda44 |
| 25-Mar-2024 |
Changpeng Fang <changpeng.fang@amd.com> |
AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313)
Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 i
AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313)
Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 instead of
global_load_tr.
This patch also removes f16/bf16 versions of builtins/intrinsics. To
simplify the design, we should avoid enumerating all possible types in
implementing builtins. We can always use bitcast.
show more ...
|