Revision tags: llvmorg-21-init |
|
#
c82a6a02 |
| 15-Jan-2025 |
Fraser Cormack <fraser@codeplay.com> |
[AMDGPU] Use correct vector elt type when shrinking mfma scale (#123043)
This might be a copy/paste error. I don't think this an issue in
practice as the builtins/intrinsics are only legal with ide
[AMDGPU] Use correct vector elt type when shrinking mfma scale (#123043)
This might be a copy/paste error. I don't think this an issue in
practice as the builtins/intrinsics are only legal with identical vector
element types.
show more ...
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
4a0d53a0 |
| 13-Dec-2024 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key p
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
show more ...
|
#
c74e2232 |
| 06-Dec-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Simplify demanded bits on readlane/writeline index arguments (#117963)
The main goal is to fold away wave64 code when compiled for wave32.
If we have out of bounds indexing, these will now
AMDGPU: Simplify demanded bits on readlane/writeline index arguments (#117963)
The main goal is to fold away wave64 code when compiled for wave32.
If we have out of bounds indexing, these will now clamp down to
a low bit which may CSE with the operations on the low half of the
wave.
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
48ec59c2 |
| 25-Nov-2024 |
Alex Voicu <alexandru.voicu@amd.com> |
[llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481)
Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's
concrete value is used throughout subsequent optimisation pa
[llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481)
Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's
concrete value is used throughout subsequent optimisation passes.
show more ...
|
#
0a6e8741 |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Shrink used number of registers for mfma scale based on format (#117047)
Currently the builtins assume you are using an 8-bit format that requires an 8 element vector. We can shrink the numb
AMDGPU: Shrink used number of registers for mfma scale based on format (#117047)
Currently the builtins assume you are using an 8-bit format that requires an 8 element vector. We can shrink the number of registers if the format requires 4 or 6.
show more ...
|
#
01c9a14c |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
ca1b35a6 |
| 18-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310)
Rand num instruction for stochastic rounding.
|
Revision tags: llvmorg-19.1.3 |
|
#
85c17e40 |
| 17-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
Fn = Intrinsic::getOrInsertDeclaration(...);
CreateCall(Fn, ...)
to the equivalent CreateIntrinsi
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
Fn = Intrinsic::getOrInsertDeclaration(...);
CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
show more ...
|
Revision tags: llvmorg-19.1.2 |
|
#
fa789dff |
| 11-Oct-2024 |
Rahul Joshi <rjoshi@nvidia.com> |
[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is a
[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
show more ...
|
#
8d13e7b8 |
| 03-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
Revision tags: llvmorg-19.1.1 |
|
#
d2d947b7 |
| 18-Sep-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fold llvm.amdgcn.cvt.pkrtz when either operand is fpext (#108237)
This also generalizes the Undef handling and adds Poison handling.
|
Revision tags: llvmorg-19.1.0 |
|
#
ff7eb1d0 |
| 11-Sep-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Simplify API of matchFPExtFromF16. NFC. (#108223)
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
f142f8af |
| 23-Aug-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812)
Common up handling of intrinsics that are a no-op on uniform arguments.
This catches a couple of new cases:
readlane
[AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812)
Common up handling of intrinsics that are a no-op on uniform arguments.
This catches a couple of new cases:
readlane (readlane x, y), z -> readlane x, y
(for any z, does not have to equal y).
permlane64 (readfirstlane x) -> readfirstlane x
(and likewise for any other uniform argument to permlane64).
show more ...
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
06ab30b5 |
| 18-Jul-2024 |
Changpeng Fang <changpeng.fang@amd.com> |
[AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562)
If the parameters(the input and segment select) coming in to
amdgcn.trig.preop intrinsic are compile time constants, we pre-compute
the
[AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562)
If the parameters(the input and segment select) coming in to
amdgcn.trig.preop intrinsic are compile time constants, we pre-compute
the output of amdgcn.trig.preop on the CPU and replaces the uses with
the computed constant.
This work extends the patch https://reviews.llvm.org/D120150 to make it
a complete coverage.
For the segment select, only src1[4:0] are used. A segment select is
invalid if we are selecting the 53-bit segment beyond the [1200:0] range
of the 2/PI table. 0 is returned when a segment select is not valid.
show more ...
|
Revision tags: llvmorg-18.1.8 |
|
#
18ec885a |
| 10-Jun-2024 |
Jay Foad <jay.foad@amd.com> |
[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801)
They have been superseded by llvm.amdgcn.raw.buffer.* and
llvm.amdgcn.struct.buffer.*.
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
f893dccb |
| 09-May-2024 |
Eli Friedman <efriedma@quicinc.com> |
Replace uses of ConstantExpr::getCompare. (#91558)
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
gu
Replace uses of ConstantExpr::getCompare. (#91558)
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2 |
|
#
14114523 |
| 15-Mar-2024 |
Artem Tyurin <artem.tyurin@gmail.com> |
[IRBuilder] Fold binary intrinsics (#80743)
Fixes https://github.com/llvm/llvm-project/issues/61240.
|
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2 |
|
#
930996e9 |
| 05-Feb-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657)
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The moti
[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657)
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.
Example (extracted from
[fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566)):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else
if.then1:
%fneg = fneg float %x
br label %if.end
if.else:
br i1 %cond, label %if.then2, label %if.end
if.then2:
br label %if.end
if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
show more ...
|
#
b8025d14 |
| 02-Feb-2024 |
Valery Pykhtin <valery.pykhtin@gmail.com> |
Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303)
Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.
This reverts commit
Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303)
Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`.
This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.
show more ...
|
Revision tags: llvmorg-18.1.0-rc1 |
|
#
65f486c4 |
| 29-Jan-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Simplify else if to just else in AMDGPUInstCombineIntrinsic
Fixes #79738
|
Revision tags: llvmorg-19-init |
|
#
9791e541 |
| 17-Jan-2024 |
Valery Pykhtin <valery.pykhtin@gmail.com> |
Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429)
Reverts llvm/llvm-project#71556
Fixes failures:
https://lab.llvm.org/buildbot/#/builders/188/builds/40541
Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429)
Reverts llvm/llvm-project#71556
Fixes failures:
https://lab.llvm.org/buildbot/#/builders/188/builds/40541
https://lab.llvm.org/buildbot/#/builders/91/builds/21847
https://lab.llvm.org/buildbot/#/builders/98/builds/31671
https://lab.llvm.org/buildbot/#/builders/139/builds/57289
show more ...
|
#
57b50ef0 |
| 17-Jan-2024 |
Valery Pykhtin <valery.pykhtin@gmail.com> |
[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556)
Substitute with zero-extended to i64 ballot.i32 intrinsic.
|
#
2b83ceee |
| 12-Jan-2024 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU][GFX12] Default component broadcast store (#76212)
For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pa
[AMDGPU][GFX12] Default component broadcast store (#76212)
For image and buffer stores the default behaviour on GFX12 is to set all
unset components to the value of the first component. So if we pass only
X component, it will be the same as XXXX, or XY same as XYXX.
This patch simplifies the passed vector of components in InstCombine by
removing components from the end that are equal to the first component.
For image stores it also trims DMask if necessary.
---------
Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>
show more ...
|
#
9d60e95b |
| 20-Dec-2023 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Use poison instead of undef for non-demanded elements (#75914)
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.
Also bail out of
[AMDGPU] Use poison instead of undef for non-demanded elements (#75914)
Return poison instead of undef for non-demanded lanes in the AMDGPU
demanded element simplification hook.
Also bail out of dmask is 0, as this case has special semantics:
> If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by
> LWE status if exists. TFE status is not generated since the fetch is dropped.
show more ...
|
#
966416b9 |
| 15-Dec-2023 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475)
|