Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
ee795fd1 |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral
https://reviews.llvm.org/D158999
|
#
def22855 |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use pown instead of pow if known integral
https://reviews.llvm.org/D158998
|
#
deefda70 |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic.
https://reviews.llvm.
AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic.
https://reviews.llvm.org/D158997
show more ...
|
#
dac8f974 |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion
https://reviews.llvm.org/D158996
|
#
699685b7 |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Enable assumptions in AMDGPULibCalls
https://reviews.llvm.org/D159006
|
#
a45b787c |
| 25-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Turn pow libcalls into powr
powr is just pow with the assumption that x >= 0, otherwise nan. This fires at least 6 times in luxmark
https://reviews.llvm.org/D158908
|
#
f5d8a9b1 |
| 25-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Simplify handling of constant vectors in libcalls
Also fixes not handling the partially undef case.
https://reviews.llvm.org/D158905
|
#
afb24cbb |
| 25-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't require all flags to expand fast powr
This was requiring all fast math flags, which is practically useless. This wouldn't fire using all the standard OpenCL fast math flags. This only
AMDGPU: Don't require all flags to expand fast powr
This was requiring all fast math flags, which is practically useless. This wouldn't fire using all the standard OpenCL fast math flags. This only needs afn nnan and ninf.
https://reviews.llvm.org/D158904
show more ...
|
#
bfe6bc05 |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup check for integral exponents in pow folds
Also improves undef handling
https://reviews.llvm.org/D159006
|
#
80e5b46e |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix assertion on half typed pow with constant exponents
https://reviews.llvm.org/D158993
|
#
35c2a754 |
| 25-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix asserting on fast f16 pown
https://reviews.llvm.org/D158903
|
#
b24dab0e |
| 25-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Trim dead includes
|
Revision tags: llvmorg-17.0.0-rc3 |
|
#
66ee7940 |
| 16-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls
Apparently the spec has overloads for fmin/fmax and ldexp with one of the operands as scalar. We need to broadcast the scalars
AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls
Apparently the spec has overloads for fmin/fmax and ldexp with one of the operands as scalar. We need to broadcast the scalars to the vector type.
https://reviews.llvm.org/D158077
show more ...
|
#
d2517616 |
| 14-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace log libcalls with log intrinsics
|
#
d45022b0 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove special case constant folding of divide
We should probably just swap this out for the fdiv, but that's what the implementation is anyway.
|
#
483cc218 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove special case folding of sqrt
|
#
416f6af9 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove special case folding of fma/mad
These just get replaced with an intrinsic now. This was also introducing host dependence on the result since it relied on the compiler choice to contra
AMDGPU: Remove special case folding of fma/mad
These just get replaced with an intrinsic now. This was also introducing host dependence on the result since it relied on the compiler choice to contract or not.
show more ...
|
#
0eabe65b |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace ldexp libcalls with intrinsic
|
#
f337a77c |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace rounding libcalls with intrinsics
|
#
c7876c55 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace fabs and copysign libcalls with intrinsics
Preserves flags and metadata like the other cases.
|
#
a70006c4 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace some libcalls with intrinsics
OpenCL loses fast math information by going through libcall wrappers around intrinsics.
Do this to preserve call site flags which are lost when inlinin
AMDGPU: Replace some libcalls with intrinsics
OpenCL loses fast math information by going through libcall wrappers around intrinsics.
Do this to preserve call site flags which are lost when inlining. It's not safe in general to propagate flags during inline, so avoid dealing with this by just special casing some of the useful calls.
show more ...
|
Revision tags: llvmorg-17.0.0-rc2 |
|
#
f44beecb |
| 31-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Try to use private version of sincos if available
The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec
AMDGPU: Try to use private version of sincos if available
The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec would suggest only the flat version exists, but the overloads do exist in the implementation.
https://reviews.llvm.org/D156720
show more ...
|
#
42c6e420 |
| 30-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle multiple uses when matching sincos
Match how the generic implementation handles this. We now will leave behind the dead other user for later passes to deal with.
https://reviews.llvm
AMDGPU: Handle multiple uses when matching sincos
Match how the generic implementation handles this. We now will leave behind the dead other user for later passes to deal with.
https://reviews.llvm.org/D156707
show more ...
|
#
6dbd4581 |
| 30-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove pointless libcall optimization of fma/mad
After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag h
AMDGPU: Remove pointless libcall optimization of fma/mad
After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag handling. This was requiring all fast math flags when we really just need nsz for the fma(a, b, 0) case.
https://reviews.llvm.org/D156677
show more ...
|
#
6448d5ba |
| 30-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove pointless libcall recognition of native_{divide|recip}
This was trying to constant fold these calls, and also turn some of them into a regular fmul/fdiv. There's no point to doing tha
AMDGPU: Remove pointless libcall recognition of native_{divide|recip}
This was trying to constant fold these calls, and also turn some of them into a regular fmul/fdiv. There's no point to doing that, the underlying library implementation should be using those in the first place. Even when the library does use the rcp intrinsics, the backend handles constant folding of those. This was also only performing the folds under overly strict fast-evertyhing-is-required conditions.
The one possible plus this gained over linking in the library is if you were using all fast math flags, it would propagate them to the new instructions. We could address this in the library by adding more fast math flags to the native implementations.
The constant fold case also had no test coverage.
https://reviews.llvm.org/D156676
show more ...
|