|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
| #
38fffa63 |
| 06-Nov-2024 |
Paul Walker <paul.walker@arm.com> |
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)
|
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7 |
|
| #
0cd2bf35 |
| 19-May-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
ValueTracking: Correct undef handling for constant FP vectors (#92557)
Treat undef as unknown, and poison as ignorable.
|
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4 |
|
| #
acb2a475 |
| 04-Apr-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Regenerate test checks
|
|
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
| #
231aa0f2 |
| 13-Sep-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Avoid creating vector extracts if we aren't going to do anything
Try to avoid expensive checks failures from reporting no changes when some dead instructions were introduced.
|
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3 |
|
| #
72a7024a |
| 16-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.
Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case
AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.
Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner.
https://reviews.llvm.org/D158129
show more ...
|
| #
6012fed6 |
| 30-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix sqrt fast math flags spreading to fdiv fast math flags
This was working around the lack of operator| on FastMathFlags. We have that now which revealed the bug.
|
| #
a738bdf3 |
| 16-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Permit more rsq formation in AMDGPUCodeGenPrepare
We were basing the defer the fast case to codegen based on the fdiv itself, and not looking for a foldable sqrt input.
https://reviews.llvm
AMDGPU: Permit more rsq formation in AMDGPUCodeGenPrepare
We were basing the defer the fast case to codegen based on the fdiv itself, and not looking for a foldable sqrt input.
https://reviews.llvm.org/D158127
show more ...
|
|
Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
8406c356 |
| 16-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Implement new 2ulp fdiv lowering
Extends the new frexp scaled reciprocal to the general case. The reciprocal case is just the same thing when frexp of 1 is constant folded. Could probably cl
AMDGPU: Implement new 2ulp fdiv lowering
Extends the new frexp scaled reciprocal to the general case. The reciprocal case is just the same thing when frexp of 1 is constant folded. Could probably clean up the code to rely on that constant folding.
Improves results for the IEEE path for the default OpenCL division. We used to only emit the fdiv.fast intrinsic with a 2.5 ulp accuracy threshold with DAZ, which uses explicit range checks. This gives us a better fast option with the default IEEE behavior.
show more ...
|
| #
6699c370 |
| 19-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refactor AMDGPUCodeGenPrepare fdiv handling
NFC-ish. Does trigger some reordering of the fdiv scalarization. Also skips scalarizing in more cases where nothing was going to happen. We can st
AMDGPU: Refactor AMDGPUCodeGenPrepare fdiv handling
NFC-ish. Does trigger some reordering of the fdiv scalarization. Also skips scalarizing in more cases where nothing was going to happen. We can still scalarize in some no-op edge cases.
https://reviews.llvm.org/D155740
show more ...
|
| #
8287f3af |
| 03-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Overhaul and improve rcp and rsq f32 formation
The highlight change is a new denormal safe 1ulp lowering which uses rcp after using frexp to perform input scaling. This saves 2 instructions
AMDGPU: Overhaul and improve rcp and rsq f32 formation
The highlight change is a new denormal safe 1ulp lowering which uses rcp after using frexp to perform input scaling. This saves 2 instructions compared to other implementations which performed an explicit denormal range change. This improves the OpenCL default, and requires a flag for HIP. I don't believe there's any flag wired up for OpenMP to emit the necessary fpmath metadata.
This provides several improvements and changes that were hard to separate without regressing one case or another. Disturbingly the OpenCL conformance test seems to have the reciprocal test commented out. I locally hacked it back in to test this.
Starts introducing f32 rsq intrinsics in AMDGPUCodeGenPrepare. Like the rcp case, we could do this in codegen if !fpmath were preserved (although we would lose some computeKnownFPClass tricks). Start requiring contract flags to form rsq. The rsq fusion actually improves the result from ~2ulp to ~1ulp. We have some older fusion in codegen which only keys off unsafe math which should be refined.
Expand rsq patterns by checking for denormal inputs and pre/post multiplying like the current library code does. We also take advantage of computeKnownFPClass to avoid the scaling when we can statically prove the input cannot be a denormal. We could do the same for the rcp case, but unlike rsq a large input can underflow to denormal. We need additional upper bound exponent checks on the input in order to do the same for rcp.
This rsq handling also now starts handling the negated case. We introduce rsq with an fneg. In the case the fneg doesn't fold into its user, it's a neutral change but provides improvement if it is foldable as a source modifier.
Also starts respecting the arcp attribute properly, and more strictly interprets afn. We were previously interpreting afn as implying you could do the reciprocal expansion of an fdiv. The codegen handling of these also needs to be revisited.
This also effectively introduces the optimization combineRepeatedFPDivisors enables, just done in the IR instead (and only for f32).
This is almost across the board better. The one minor regression is for gfx6/buggy frexp case where for multiple reciprocals, we could previously reuse rematerialized constants per instance (it's neutral for a single rcp).
The fdiv.fast and sqrt handling need to be revisited next.
https://reviews.llvm.org/D155593
show more ...
|
| #
b2d58b59 |
| 17-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Expand rsq testing to cover contract flag
The 1.0/sqrt(x) -> rsq(x) fold increases precision and probably needs a contract flag.
|
| #
881e9f29 |
| 20-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Regenerate test checks
Mostly a workaround for recent reverts in update_test_checks
|
| #
4a81283b |
| 16-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Generate and add fdiv tests
Prepare for new lowering strategies because we somehow didn't have enough of them already.
|
| #
cdfdfe7c |
| 15-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add some additional rcp/rsq tests
|
| #
ef4a2b60 |
| 13-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Expand testing of AMDGPUCodeGenPrepare fdiv handling
- Switch to generated checks - Use a different run line per denormal mode to reduce test duplication - Add test coverage for rsqrt cases
AMDGPU: Expand testing of AMDGPUCodeGenPrepare fdiv handling
- Switch to generated checks - Use a different run line per denormal mode to reduce test duplication - Add test coverage for rsqrt cases - Add test coverage for repeated arcp denominator - Fix the optnone test
show more ...
|
| #
fbe4ff81 |
| 14-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still s
AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
bdf2fbba |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-15.0.6 |
|
| #
3830e4e5 |
| 16-Nov-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Create poison values instead of undef
These placeholders don't care about the finer points on the difference between the two.
|
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
5660bb6b |
| 18-Nov-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
|
| #
75cf3091 |
| 01-Nov-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Assume f32 denormals are enabled by default
This will likely introduce catastrophic performance regressions on older subtargets, but should be correct. A follow up change will remove the old
AMDGPU: Assume f32 denormals are enabled by default
This will likely introduce catastrophic performance regressions on older subtargets, but should be correct. A follow up change will remove the old fp32-denormals subtarget features, and switch to using the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends should be making sure to add the denormal-fp-math-f32 attribute when appropriate to avoid performance regressions.
show more ...
|
| #
884acbb9 |
| 07-Feb-2020 |
Changpeng Fang <changpeng.fang@gmail.com> |
AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare
Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp t
AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare
Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D73588
show more ...
|
| #
25315359 |
| 24-Jan-2020 |
Changpeng Fang <changpeng.fang@gmail.com> |
AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare
Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG
AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare
Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv.
In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath.
FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request.
RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal.
Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed.
1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D71293
show more ...
|
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1 |
|
| #
9d7b1c9d |
| 06-Jul-2017 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic
[AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast.
An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag:
%div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00}
Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior.
This change allows an "fdiv fast" to be always lowered as rcp + mul.
Differential Revision: https://reviews.llvm.org/D34844
llvm-svn: 307308
show more ...
|
|
Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1 |
|
| #
3dbeefa9 |
| 21-Mar-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default ca
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel).
llvm-svn: 298444
show more ...
|
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
| #
e14df4b2 |
| 28-Sep-2016 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125
llvm-svn: 282624
|