|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6 |
|
| #
7dbd6cd2 |
| 11-Dec-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwi
[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.
show more ...
|
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4 |
|
| #
38fffa63 |
| 06-Nov-2024 |
Paul Walker <paul.walker@arm.com> |
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)
|
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
| #
492484e6 |
| 09-Aug-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (#102086)"
This reverts commit 2fe61a5acf272d6826352ef72f47196b01003fc5.
|
| #
2fe61a5a |
| 09-Aug-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (#102086)
Currently `AMDGPUAttributorPass` is registered in default optimizer
pipeline.
This will allow the pass to run in default
[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (#102086)
Currently `AMDGPUAttributorPass` is registered in default optimizer
pipeline.
This will allow the pass to run in default pipeline as well as at
thinLTO post
link stage. However, it will not run in full LTO post link stage. This
patch
moves it to full LTO.
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc2 |
|
| #
b455edbc |
| 31-Jul-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[InstCombine] Recognize copysign idioms (#101324)
This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg
Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern
exi
[InstCombine] Recognize copysign idioms (#101324)
This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg
Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern
exists in some graphics applications/math libraries.
Alive2: https://alive2.llvm.org/ce/z/ggQZV2
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
| #
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
| #
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
| #
bff619f9 |
| 01-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Revert "AMDGPU: Use real copysign in fast pow (#97152)"
This reverts commit d3e7c4ce7a3d7f08cea02cba8f34c590a349688b.
|
| #
d3e7c4ce |
| 01-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use real copysign in fast pow (#97152)
Previously this would introduce some codegen regressions, but those have been avoided by simplifying demanded bits on copysign operations.
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
| #
dab1f7c8 |
| 21-May-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Emit 1/llvm.sqrt(x) instead of rsqrt calls in libcall handling (#92863)
With the contract flag we should end up codegening to the rsqrt
instruction, or denormal corrected rsqrt sequence pre
AMDGPU: Emit 1/llvm.sqrt(x) instead of rsqrt calls in libcall handling (#92863)
With the contract flag we should end up codegening to the rsqrt
instruction, or denormal corrected rsqrt sequence present in the
library.
show more ...
|
| #
66b76faf |
| 21-May-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Directly emit sqrt intrinsic when folding rootn(x, 2) (#92598)
This avoids depending on pre/post link runs.
Depends #92595
|
|
Revision tags: llvmorg-18.1.6 |
|
| #
3aae916f |
| 14-May-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
Reland "[ValueTracking] Compute knownbits from known fp classes" (#92084)
This patch relands https://github.com/llvm/llvm-project/pull/86409.
I mistakenly thought that `Known.makeNegative()` clea
Reland "[ValueTracking] Compute knownbits from known fp classes" (#92084)
This patch relands https://github.com/llvm/llvm-project/pull/86409.
I mistakenly thought that `Known.makeNegative()` clears the sign bit of
`Known.Zero`. This patch fixes the assertion failure by explicitly
clearing the sign bit.
show more ...
|
| #
2e165a2c |
| 14-May-2024 |
Martin Storsjö <martin@martin.st> |
Revert "[ValueTracking] Compute knownbits from known fp classes (#86409)"
This reverts commit d03a1a6e5838c7c2c0836d71507dfdf7840ade49.
This change caused failed assertions, see https://github.com/
Revert "[ValueTracking] Compute knownbits from known fp classes (#86409)"
This reverts commit d03a1a6e5838c7c2c0836d71507dfdf7840ade49.
This change caused failed assertions, see https://github.com/llvm/llvm-project/pull/86409#issuecomment-2109469845 for details.
show more ...
|
| #
d03a1a6e |
| 13-May-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[ValueTracking] Compute knownbits from known fp classes (#86409)
This patch calculates knownbits from fp instructions/dominating fcmp
conditions. It will enable more optimizations with signbit idio
[ValueTracking] Compute knownbits from known fp classes (#86409)
This patch calculates knownbits from fp instructions/dominating fcmp
conditions. It will enable more optimizations with signbit idioms.
show more ...
|
|
Revision tags: llvmorg-18.1.5 |
|
| #
bc3620d3 |
| 17-Apr-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move libcall simplify into PeepholeEP (#88853)
We were running this immediately on the incoming IR, which
is still littered with temporary allocas obscuring trivial values.
This needs to r
AMDGPU: Move libcall simplify into PeepholeEP (#88853)
We were running this immediately on the incoming IR, which
is still littered with temporary allocas obscuring trivial values.
This needs to run after initial SROA to handle sincos insertion.
show more ...
|
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
daecc303 |
| 09-Jan-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197)
The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197)
The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
so that the lowering can see the !fpmath metadata and fast math flags.
Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing
!fpmath on OpenCL library sqrt calls. Also don't bother emitting
native_sqrt anymore, it's just another wrapper around llvm.sqrt.
show more ...
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
| #
deefda70 |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic.
https://reviews.llvm.
AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32
These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic.
https://reviews.llvm.org/D158997
show more ...
|
| #
80e5b46e |
| 26-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix assertion on half typed pow with constant exponents
https://reviews.llvm.org/D158993
|
| #
35c2a754 |
| 25-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix asserting on fast f16 pown
https://reviews.llvm.org/D158903
|
|
Revision tags: llvmorg-17.0.0-rc3 |
|
| #
d2517616 |
| 14-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace log libcalls with log intrinsics
|
| #
d45022b0 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove special case constant folding of divide
We should probably just swap this out for the fdiv, but that's what the implementation is anyway.
|
| #
a70006c4 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace some libcalls with intrinsics
OpenCL loses fast math information by going through libcall wrappers around intrinsics.
Do this to preserve call site flags which are lost when inlinin
AMDGPU: Replace some libcalls with intrinsics
OpenCL loses fast math information by going through libcall wrappers around intrinsics.
Do this to preserve call site flags which are lost when inlining. It's not safe in general to propagate flags during inline, so avoid dealing with this by just special casing some of the useful calls.
show more ...
|
|
Revision tags: llvmorg-17.0.0-rc2 |
|
| #
f44beecb |
| 31-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Try to use private version of sincos if available
The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec
AMDGPU: Try to use private version of sincos if available
The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec would suggest only the flat version exists, but the overloads do exist in the implementation.
https://reviews.llvm.org/D156720
show more ...
|
| #
6dbd4581 |
| 30-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove pointless libcall optimization of fma/mad
After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag h
AMDGPU: Remove pointless libcall optimization of fma/mad
After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag handling. This was requiring all fast math flags when we really just need nsz for the fma(a, b, 0) case.
https://reviews.llvm.org/D156677
show more ...
|