History log of /llvm-project/llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll (Results 1 – 25 of 30)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# a796f597 02-Dec-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297)

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>


# e97fb220 25-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add support for load transpose instructions for gfx950 (#117378)

This patch support for intrinsics in clang, as well as assembly
instructions in the backend.

Co-authored-by: Sirish Pande <S

AMDGPU: Add support for load transpose instructions for gfx950 (#117378)

This patch support for intrinsics in clang, as well as assembly
instructions in the backend.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>

show more ...


# d1cca313 23-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)

This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset o

AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260)

This was a bit annoying because these introduce a new special case
encoding usage. op_sel is repurposed as a subset of dpp controls,
and is eligible for VOP3->VOP1 shrinking. For some reason fi also
uses an enum value, so we need to convert the raw boolean to 1 instead
of -1.

The 2 registers are swapped, so this has 2 defs. Ideally the builtin
would return a pair, but that's difficult so return a vector instead.
This would make a hypothetical builtin that supports v2f16 directly
uglier.

show more ...


# 7d544c64 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (#117259)


# 90dc644d 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (#117258)


# 8d3435f8 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (#117257)


# 8a5c2414 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_32x32x64_bf8_bf8 for gfx950 (#117256)


# 836d2dcf 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (#117235)


# 33124910 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_16x16x128_fp8_bf8 for gfx950 (#117234)


# 3678f8a8 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (#117233)


# 7baadb2a 22-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_16x16x128_bf8_bf8 for gfx950 (#117232)


# 3e6f3508 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (#117214)


# 95ddc1a6 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (#117211)


# e50eaa2c 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (#117205)


# 2ab17882 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (#117202)


# 1c47d67a 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (#117053)


# f4ed79b1 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_mfma_i32_32x32x32_i8 for gfx950 (#117052)


# 76b24640 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_mfma_i32_16x16x64_i8 for gfx950 (#116728)


# 01c9a14c 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. Non

AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.

show more ...


Revision tags: llvmorg-19.1.4
# 130a3150 19-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Define v_mfma_f32_32x32x16_bf16 for gfx950 (#116679)

Unlike the existing gfx940 intrinsics using short/i16 in place of
bfloat, this uses the natural bfloat type.


# 0c421687 18-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add first gfx950 mfma instructions (#116312)

Scheduling info and hazards are wrong and TBD.


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 280d90d0 18-Jul-2024 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540)

half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reas

AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540)

half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reasons. This work adds
the support of the float types back.

show more ...


# 35f7b60a 26-Jun-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)

These are incremental changes over #89217 , with core logic being the
same. This patch along wit

[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)

These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.

show more ...


# 5feb32ba 25-Jun-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)

This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass t

[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)

This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3
# 350bda44 25-Mar-2024 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313)

Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 i

AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313)

Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 instead of
global_load_tr.

This patch also removes f16/bf16 versions of builtins/intrinsics. To
simplify the design, we should avoid enumerating all possible types in
implementing builtins. We can always use bitcast.

show more ...


12