Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2 |
|
#
bc7ca917 |
| 02-Oct-2023 |
Nikita Popov <npopov@redhat.com> |
[AMDGPUInstCombine] Avoid use of ConstantExpr::getSExt() (NFC)
Let the IRBuilder handle the constant folding instead.
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3 |
|
#
edecb604 |
| 11-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.
|
#
61c8af67 |
| 16-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16
There's nothing special about f16 sqrt handling.
https://reviews.llvm.org/D158090
|
#
7c4aa3b3 |
| 15-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: InstCombine amdgcn.rcp(amdgcn.sqrt) -> amdgcn.rsq
We currently have some wrong combines in the backend that approximately do this.
https://reviews.llvm.org/D158002
|
Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
5ccfc454 |
| 29-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fold away mbcnt.hi in wave32 mode
This will allow libraries to drop some of the special casing based on wave size.
|
#
d9333e36 |
| 16-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Revert "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp"
This reverts commit 1159c670d40e3ef302264c681fe7e0268a550874.
Accidentally pushed wrong patch
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
#
1159c670 |
| 28-Apr-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp
|
#
84313162 |
| 15-Jun-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop replacing amdgcn.ballot(1) with amdgcn.s.getreg(exec)
Rationale: - It does not enable any further IR simplifications. - It does not improve the generated code since the isel lowering o
[AMDGPU] Stop replacing amdgcn.ballot(1) with amdgcn.s.getreg(exec)
Rationale: - It does not enable any further IR simplifications. - It does not improve the generated code since the isel lowering of ballot also has special cases for 0 and 1. - getreg is "too powerful" since it can read from many different registers, so its intrinsic properties have to be set very conservatively.
There is also a correctness problem that getreg can read from exec but it is currently not marked as convergent.
Differential Revision: https://reviews.llvm.org/D153047
show more ...
|
#
7047cb52 |
| 08-Jun-2023 |
Mateja Marjanovic <mmarjano@amd.com> |
[AMDGPU] Trim trailing undefs from the end of image and buffer store
Remove undef values from the end of the vector operand in image and buffer store instructions. Also instead of call to computeKno
[AMDGPU] Trim trailing undefs from the end of image and buffer store
Remove undef values from the end of the vector operand in image and buffer store instructions. Also instead of call to computeKnownFPClass, use only findScalarElement.
Continuation of: 88421ea973916e Trim zero components from buffer and image stores
Differential Revision: https://reviews.llvm.org/D152440
show more ...
|
#
c6aaa0b1 |
| 14-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Perform basic folds on llvm.amdgcn.exp2
|
#
10717f92 |
| 11-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add basic folds for llvm.amdgcn.log
|
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
faa2c678 |
| 04-Apr-2023 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backen
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer.
In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions.
Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
show more ...
|
#
c91246b7 |
| 05-Jun-2023 |
Mateja Marjanovic <mmarjano@amd.com> |
fix failures caused by https://reviews.llvm.org/D146737
buildbot: https://lab.llvm.org/buildbot/#/builders/77/builds/27340
|
#
88421ea9 |
| 02-Jun-2023 |
Mateja Marjanovic <mmarjano@amd.com> |
[AMDGPU] Trim zero components from buffer and image stores
For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component
[AMDGPU] Trim zero components from buffer and image stores
For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00.
This patch simplifies the passed vector of components in InstCombine by removing zero components from the end.
For image stores it also trims DMask if necessary.
Reviewed by: arsenm, foad, nhaehnle, piotr
show more ...
|
#
55635433 |
| 02-Jun-2023 |
Haojian Wu <hokein.wu@gmail.com> |
Fix isKnownNeverInfOrNaN() call in AMDGPU after ORE removal 97b5cc214aee48e30391bfcd2cde4252163d7406
|
#
8609df7c |
| 24-May-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic
This barely matters since 99% are converted to the generic intrinsic now, and the only real difference is the target intrinsic supports
AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic
This barely matters since 99% are converted to the generic intrinsic now, and the only real difference is the target intrinsic supports a variable test mask. Start propagating poison. Prefer folding to a defined result (false) for an undef test mask. Propagate undef for the first operand.
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
9ef1333b |
| 10-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace certain llvm.amdgcn.class uses with llvm.is.fpclass
Most transforms should now be performed on llvm.is.fpclass. Unlike the generic intrinsic, this supports variable test masks.
|
#
9c8c31ee |
| 18-May-2023 |
Mateja Marjanovic <mmarjano@amd.com> |
Revert "[AMDGPU] Trim zero components from buffer and image stores"
This reverts commit 3181a6e3e7dae9292782216a55c5e1f0583c1668.
|
#
8f3e6462 |
| 05-May-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fold fmed3 of fpext sources to f16 fmed3
InstCombine already does this for minnum/maxnum. If we also apply this to fmed3, we don't need to explicitly use 16-bit fmed3 if we're not sure the t
AMDGPU: Fold fmed3 of fpext sources to f16 fmed3
InstCombine already does this for minnum/maxnum. If we also apply this to fmed3, we don't need to explicitly use 16-bit fmed3 if we're not sure the target supports 16-bit instructions yet.
show more ...
|
#
86d0b524 |
| 16-May-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
ValueTracking: Expand signature of isKnownNeverInfinity/NaN
This is in preparation for replacing the implementation with a wrapper around computeKnownFPClass.
|
#
3181a6e3 |
| 15-May-2023 |
Mateja Marjanovic <mailto:mmarjano@amd.com> |
[AMDGPU] Trim zero components from buffer and image stores
For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component
[AMDGPU] Trim zero components from buffer and image stores
For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00.
This patch simplifies the passed vector of components in InstCombine by removing zero components from the end.
For image stores it also trims DMask if necessary.
Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D146737
show more ...
|
#
1bce1bea |
| 11-Apr-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Reduce number of calls to computeKnownFPClass and pass all arguments
Makes assumes work for this case.
|
#
1f60c8d0 |
| 04-Apr-2023 |
Craig Topper <craig.topper@sifive.com> |
[IR] Replace calls to ConstantFP::getNullValue with ConstantFP::getZero. NFC
There is no getNullValue in ConstantFP. Due to inheritance, we're calling Constant::getNullValue which handles any type i
[IR] Replace calls to ConstantFP::getNullValue with ConstantFP::getZero. NFC
There is no getNullValue in ConstantFP. Due to inheritance, we're calling Constant::getNullValue which handles any type including FP. Since we already know we want an FP constant we can use ConstantFP::getZero which might be faster and is a more readable name for an FP zero.
show more ...
|
#
f8f3db27 |
| 20-Feb-2023 |
Kazu Hirata <kazu@google.com> |
Use APInt::count{l,r}_{zero,one} (NFC)
|
#
cbde2124 |
| 19-Feb-2023 |
Kazu Hirata <kazu@google.com> |
Use APInt::popcount instead of APInt::countPopulation (NFC)
This is for consistency with the C++20-style bit manipulation functions in <bit>.
|