History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (Results 1 – 25 of 88)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# c82a6a02 15-Jan-2025 Fraser Cormack <fraser@codeplay.com>

[AMDGPU] Use correct vector elt type when shrinking mfma scale (#123043)

This might be a copy/paste error. I don't think this an issue in
practice as the builtins/intrinsics are only legal with ide

[AMDGPU] Use correct vector elt type when shrinking mfma scale (#123043)

This might be a copy/paste error. I don't think this an issue in
practice as the builtins/intrinsics are only legal with identical vector
element types.

show more ...


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# 4a0d53a0 13-Dec-2024 Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>

PatternMatch: migrate to CmpPredicate (#118534)

With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key p

PatternMatch: migrate to CmpPredicate (#118534)

With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.

This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.

show more ...


# c74e2232 06-Dec-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Simplify demanded bits on readlane/writeline index arguments (#117963)

The main goal is to fold away wave64 code when compiled for wave32.
If we have out of bounds indexing, these will now

AMDGPU: Simplify demanded bits on readlane/writeline index arguments (#117963)

The main goal is to fold away wave64 code when compiled for wave32.
If we have out of bounds indexing, these will now clamp down to
a low bit which may CSE with the operations on the low half of the
wave.

show more ...


Revision tags: llvmorg-19.1.5
# 48ec59c2 25-Nov-2024 Alex Voicu <alexandru.voicu@amd.com>

[llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481)

Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's
concrete value is used throughout subsequent optimisation pa

[llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481)

Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's
concrete value is used throughout subsequent optimisation passes.

show more ...


# 0a6e8741 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Shrink used number of registers for mfma scale based on format (#117047)

Currently the builtins assume you are using an 8-bit format that requires
an 8 element vector. We can shrink the numb

AMDGPU: Shrink used number of registers for mfma scale based on format (#117047)

Currently the builtins assume you are using an 8-bit format that requires
an 8 element vector. We can shrink the number of registers if the format
requires 4 or 6.

show more ...


# 01c9a14c 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. Non

AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.

show more ...


Revision tags: llvmorg-19.1.4
# ca1b35a6 18-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)

Rand num instruction for stochastic rounding.


Revision tags: llvmorg-19.1.3
# 85c17e40 17-Oct-2024 Jay Foad <jay.foad@amd.com>

[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)

Convert many instances of:
Fn = Intrinsic::getOrInsertDeclaration(...);
CreateCall(Fn, ...)
to the equivalent CreateIntrinsi

[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)

Convert many instances of:
Fn = Intrinsic::getOrInsertDeclaration(...);
CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.

show more ...


Revision tags: llvmorg-19.1.2
# fa789dff 11-Oct-2024 Rahul Joshi <rjoshi@nvidia.com>

[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)

Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is a

[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)

Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).

show more ...


# 8d13e7b8 03-Oct-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Qualify auto. NFC. (#110878)

Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)


Revision tags: llvmorg-19.1.1
# d2d947b7 18-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Fold llvm.amdgcn.cvt.pkrtz when either operand is fpext (#108237)

This also generalizes the Undef handling and adds Poison handling.


Revision tags: llvmorg-19.1.0
# ff7eb1d0 11-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Simplify API of matchFPExtFromF16. NFC. (#108223)


Revision tags: llvmorg-19.1.0-rc4
# f142f8af 23-Aug-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812)

Common up handling of intrinsics that are a no-op on uniform arguments.
This catches a couple of new cases:

readlane

[AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812)

Common up handling of intrinsics that are a no-op on uniform arguments.
This catches a couple of new cases:

readlane (readlane x, y), z -> readlane x, y
(for any z, does not have to equal y).

permlane64 (readfirstlane x) -> readfirstlane x
(and likewise for any other uniform argument to permlane64).

show more ...


Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 06ab30b5 18-Jul-2024 Changpeng Fang <changpeng.fang@amd.com>

[AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562)

If the parameters(the input and segment select) coming in to
amdgcn.trig.preop intrinsic are compile time constants, we pre-compute
the

[AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562)

If the parameters(the input and segment select) coming in to
amdgcn.trig.preop intrinsic are compile time constants, we pre-compute
the output of amdgcn.trig.preop on the CPU and replaces the uses with
the computed constant.

This work extends the patch https://reviews.llvm.org/D120150 to make it
a complete coverage.

For the segment select, only src1[4:0] are used. A segment select is
invalid if we are selecting the 53-bit segment beyond the [1200:0] range
of the 2/PI table. 0 is returned when a segment select is not valid.

show more ...


Revision tags: llvmorg-18.1.8
# 18ec885a 10-Jun-2024 Jay Foad <jay.foad@amd.com>

[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801)

They have been superseded by llvm.amdgcn.raw.buffer.* and
llvm.amdgcn.struct.buffer.*.


Revision tags: llvmorg-18.1.7, llvmorg-18.1.6
# f893dccb 09-May-2024 Eli Friedman <efriedma@quicinc.com>

Replace uses of ConstantExpr::getCompare. (#91558)

Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
gu

Replace uses of ConstantExpr::getCompare. (#91558)

Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.

show more ...


Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2
# 14114523 15-Mar-2024 Artem Tyurin <artem.tyurin@gmail.com>

[IRBuilder] Fold binary intrinsics (#80743)

Fixes https://github.com/llvm/llvm-project/issues/61240.


Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# 930996e9 05-Feb-2024 Yingwei Zheng <dtcxzyw2333@gmail.com>

[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657)

This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The moti

[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657)

This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.

Example (extracted from
[fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566)):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else

if.then1:
%fneg = fneg float %x
br label %if.end

if.else:
br i1 %cond, label %if.then2, label %if.end

if.then2:
br label %if.end

if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.

show more ...


# b8025d14 02-Feb-2024 Valery Pykhtin <valery.pykhtin@gmail.com>

Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303)

Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.

This reverts commit

Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303)

Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.

This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.

show more ...


Revision tags: llvmorg-18.1.0-rc1
# 65f486c4 29-Jan-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Simplify else if to just else in AMDGPUInstCombineIntrinsic

Fixes #79738


Revision tags: llvmorg-19-init
# 9791e541 17-Jan-2024 Valery Pykhtin <valery.pykhtin@gmail.com>

Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429)

Reverts llvm/llvm-project#71556

Fixes failures:
https://lab.llvm.org/buildbot/#/builders/188/builds/40541

Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429)

Reverts llvm/llvm-project#71556

Fixes failures:
https://lab.llvm.org/buildbot/#/builders/188/builds/40541
https://lab.llvm.org/buildbot/#/builders/91/builds/21847
https://lab.llvm.org/buildbot/#/builders/98/builds/31671
https://lab.llvm.org/buildbot/#/builders/139/builds/57289

show more ...


# 57b50ef0 17-Jan-2024 Valery Pykhtin <valery.pykhtin@gmail.com>

[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556)

Substitute with zero-extended to i64 ballot.i32 intrinsic.


# 2b83ceee 12-Jan-2024 Mariusz Sikora <mariusz.sikora@amd.com>

[AMDGPU][GFX12] Default component broadcast store (#76212)

For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pa

[AMDGPU][GFX12] Default component broadcast store (#76212)

For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pass only
X component, it will be the same as XXXX, or XY same as XYXX.

This patch simplifies the passed vector of components in InstCombine by
removing components from the end that are equal to the first component.

For image stores it also trims DMask if necessary.

---------

Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>

show more ...


# 9d60e95b 20-Dec-2023 Nikita Popov <npopov@redhat.com>

[AMDGPU] Use poison instead of undef for non-demanded elements (#75914)

Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.

Also bail out of

[AMDGPU] Use poison instead of undef for non-demanded elements (#75914)

Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.

Also bail out of dmask is 0, as this case has special semantics:

> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.

show more ...


# 966416b9 15-Dec-2023 Mariusz Sikora <mariusz.sikora@amd.com>

[AMDGPU][GFX12] Add new v_permlane16 variants (#75475)


1234