intrinsics.ll - OpenGrok history log for /llvm-project/llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# a796f597	02-Dec-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Allow f16/bf16 for DS_READ_TR16_B64 gfx950 builtins (#118297) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
# e97fb220	25-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add support for load transpose instructions for gfx950 (#117378) This patch support for intrinsics in clang, as well as assembly instructions in the backend. Co-authored-by: Sirish Pande <S AMDGPU: Add support for load transpose instructions for gfx950 (#117378) This patch support for intrinsics in clang, as well as assembly instructions in the backend. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com> show more ...
# d1cca313	23-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset o AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier. show more ...
# 7d544c64	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_32x32x64_fp8_fp8 for gfx950 (#117259)
# 90dc644d	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_32x32x32x64_fp8_bf8 for gfx950 (#117258)
# 8d3435f8	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_32x32x64_bf8_fp8 for gfx950 (#117257)
# 8a5c2414	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_32x32x64_bf8_bf8 for gfx950 (#117256)
# 836d2dcf	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_16x16x128_fp8_fp8 for gfx950 (#117235)
# 33124910	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_16x16x128_fp8_bf8 for gfx950 (#117234)
# 3678f8a8	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_16x16x128_bf8_fp8 for gfx950 (#117233)
# 7baadb2a	22-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_16x16x128_bf8_bf8 for gfx950 (#117232)
# 3e6f3508	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_i32_32x32x64_i8 for gfx950 (#117214)
# 95ddc1a6	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_16x16x64_bf16 for gfx950 (#117211)
# e50eaa2c	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_32x32x32_f16 for gfx950 (#117205)
# 2ab17882	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_smfmac_f32_16x16x64_f16 for gfx950 (#117202)
# 1c47d67a	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_mfma_f32_16x16x32_bf16 for gfx950 (#117053)
# f4ed79b1	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_mfma_i32_32x32x32_i8 for gfx950 (#117052)
# 76b24640	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_mfma_i32_16x16x64_i8 for gfx950 (#116728)
# 01c9a14c	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior. show more ...
Revision tags: llvmorg-19.1.4
# 130a3150	19-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Define v_mfma_f32_32x32x16_bf16 for gfx950 (#116679) Unlike the existing gfx940 intrinsics using short/i16 in place of bfloat, this uses the natural bfloat type.
# 0c421687	18-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add first gfx950 mfma instructions (#116312) Scheduling info and hazards are wrong and TBD.
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 280d90d0	18-Jul-2024	Changpeng Fang <changpeng.fang@amd.com>	AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540) half and bfloat are common types for 16-bit elements. The support of them was original there and dropped due to some reas AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540) half and bfloat are common types for 16-bit elements. The support of them was original there and dropped due to some reasons. This work adds the support of the float types back. show more ...
# 35f7b60a	26-Jun-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725) These are incremental changes over #89217 , with core logic being the same. This patch along wit [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer. show more ...
# 5feb32ba	25-Jun-2024	Vikram Hegde <115221833+vikramRH@users.noreply.github.com>	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass t [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com> show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3
# 350bda44	25-Mar-2024	Changpeng Fang <changpeng.fang@amd.com>	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 i AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast. show more ...
12