AMDGPUInstCombineIntrinsic.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# c82a6a02	15-Jan-2025	Fraser Cormack <fraser@codeplay.com>	[AMDGPU] Use correct vector elt type when shrinking mfma scale (#123043) This might be a copy/paste error. I don't think this an issue in practice as the builtins/intrinsics are only legal with ide [AMDGPU] Use correct vector elt type when shrinking mfma scale (#123043) This might be a copy/paste error. I don't think this an issue in practice as the builtins/intrinsics are only legal with identical vector element types. show more ...
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# 4a0d53a0	13-Dec-2024	Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>	PatternMatch: migrate to CmpPredicate (#118534) With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key p PatternMatch: migrate to CmpPredicate (#118534) With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key pieces of infrastructure that must be updated to match a CmpInst respecting samesign information. Implement this change to Cmp-matchers. This is a preparatory step in migrating the codebase over to CmpPredicate. Since we no functional changes are desired at this stage, we have chosen not to migrate CmpPredicate::operator==(CmpPredicate) calls to use CmpPredicate::getMatching(), as that would have visible impact on tests that are not yet written: instead, we call CmpPredicate::operator==(Predicate), preserving the old behavior, while also inserting a few FIXME comments for follow-ups. show more ...
# c74e2232	06-Dec-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Simplify demanded bits on readlane/writeline index arguments (#117963) The main goal is to fold away wave64 code when compiled for wave32. If we have out of bounds indexing, these will now AMDGPU: Simplify demanded bits on readlane/writeline index arguments (#117963) The main goal is to fold away wave64 code when compiled for wave32. If we have out of bounds indexing, these will now clamp down to a low bit which may CSE with the operations on the low half of the wave. show more ...
Revision tags: llvmorg-19.1.5
# 48ec59c2	25-Nov-2024	Alex Voicu <alexandru.voicu@amd.com>	[llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481) Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's concrete value is used throughout subsequent optimisation pa [llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481) Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's concrete value is used throughout subsequent optimisation passes. show more ...
# 0a6e8741	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Shrink used number of registers for mfma scale based on format (#117047) Currently the builtins assume you are using an 8-bit format that requires an 8 element vector. We can shrink the numb AMDGPU: Shrink used number of registers for mfma scale based on format (#117047) Currently the builtins assume you are using an 8-bit format that requires an 8 element vector. We can shrink the number of registers if the format requires 4 or 6. show more ...
# 01c9a14c	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior. show more ...
Revision tags: llvmorg-19.1.4
# ca1b35a6	18-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310) Rand num instruction for stochastic rounding.
Revision tags: llvmorg-19.1.3
# 85c17e40	17-Oct-2024	Jay Foad <jay.foad@amd.com>	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsi [LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call. show more ...
Revision tags: llvmorg-19.1.2
# fa789dff	11-Oct-2024	Rahul Joshi <rjoshi@nvidia.com>	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is a [NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation). show more ...
# 8d13e7b8	03-Oct-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Qualify auto. NFC. (#110878) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
Revision tags: llvmorg-19.1.1
# d2d947b7	18-Sep-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Fold llvm.amdgcn.cvt.pkrtz when either operand is fpext (#108237) This also generalizes the Undef handling and adds Poison handling.
Revision tags: llvmorg-19.1.0
# ff7eb1d0	11-Sep-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Simplify API of matchFPExtFromF16. NFC. (#108223)
Revision tags: llvmorg-19.1.0-rc4
# f142f8af	23-Aug-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812) Common up handling of intrinsics that are a no-op on uniform arguments. This catches a couple of new cases: readlane [AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812) Common up handling of intrinsics that are a no-op on uniform arguments. This catches a couple of new cases: readlane (readlane x, y), z -> readlane x, y (for any z, does not have to equal y). permlane64 (readfirstlane x) -> readfirstlane x (and likewise for any other uniform argument to permlane64). show more ...
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 06ab30b5	18-Jul-2024	Changpeng Fang <changpeng.fang@amd.com>	[AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562) If the parameters(the input and segment select) coming in to amdgcn.trig.preop intrinsic are compile time constants, we pre-compute the [AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562) If the parameters(the input and segment select) coming in to amdgcn.trig.preop intrinsic are compile time constants, we pre-compute the output of amdgcn.trig.preop on the CPU and replaces the uses with the computed constant. This work extends the patch https://reviews.llvm.org/D120150 to make it a complete coverage. For the segment select, only src1[4:0] are used. A segment select is invalid if we are selecting the 53-bit segment beyond the [1200:0] range of the 2/PI table. 0 is returned when a segment select is not valid. show more ...
Revision tags: llvmorg-18.1.8
# 18ec885a	10-Jun-2024	Jay Foad <jay.foad@amd.com>	[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801) They have been superseded by llvm.amdgcn.raw.buffer.* and llvm.amdgcn.struct.buffer.*.
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6
# f893dccb	09-May-2024	Eli Friedman <efriedma@quicinc.com>	Replace uses of ConstantExpr::getCompare. (#91558) Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands in other places. This only changes places where the either the fold is gu Replace uses of ConstantExpr::getCompare. (#91558) Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands in other places. This only changes places where the either the fold is guaranteed to succeed, or the code doesn't use the resulting compare if we fail to fold. show more ...
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2
# 14114523	15-Mar-2024	Artem Tyurin <artem.tyurin@gmail.com>	[IRBuilder] Fold binary intrinsics (#80743) Fixes https://github.com/llvm/llvm-project/issues/61240.
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# 930996e9	05-Feb-2024	Yingwei Zheng <dtcxzyw2333@gmail.com>	[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The moti [ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated. show more ...
# b8025d14	02-Feb-2024	Valery Pykhtin <valery.pykhtin@gmail.com>	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312. show more ...
Revision tags: llvmorg-18.1.0-rc1
# 65f486c4	29-Jan-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Simplify else if to just else in AMDGPUInstCombineIntrinsic Fixes #79738
Revision tags: llvmorg-19-init
# 9791e541	17-Jan-2024	Valery Pykhtin <valery.pykhtin@gmail.com>	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289 show more ...
# 57b50ef0	17-Jan-2024	Valery Pykhtin <valery.pykhtin@gmail.com>	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556) Substitute with zero-extended to i64 ballot.i32 intrinsic.
# 2b83ceee	12-Jan-2024	Mariusz Sikora <mariusz.sikora@amd.com>	[AMDGPU][GFX12] Default component broadcast store (#76212) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pa [AMDGPU][GFX12] Default component broadcast store (#76212) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com> show more ...
# 9d60e95b	20-Dec-2023	Nikita Popov <npopov@redhat.com>	[AMDGPU] Use poison instead of undef for non-demanded elements (#75914) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of [AMDGPU] Use poison instead of undef for non-demanded elements (#75914) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of dmask is 0, as this case has special semantics: > If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by > LWE status if exists. TFE status is not generated since the fetch is dropped. show more ...
# 966416b9	15-Dec-2023	Mariusz Sikora <mariusz.sikora@amd.com>	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475)
12 3 4