|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
| #
6548b635 |
| 09-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
|
| #
ca33649a |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
|
| #
e215a1e2 |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
|
|
Revision tags: llvmorg-19.1.3 |
|
| #
3277c7cd |
| 21-Oct-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)
MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for
[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)
MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for VGPR limited
kernels. A kernel becomes VGPR limited if a total number of VGPRs per
SIMD / number of used VGPRs is more than a number of wave slots.
show more ...
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
| #
4c4908cd |
| 27-Aug-2024 |
Alex MacLean <amaclean@nvidia.com> |
[AMDGPU] adjust tests to prevent fpclass bitcast folding (#106268)
Make some minor tweaks to AMDGPU tests to ensure they still work as
intended after https://github.com/llvm/llvm-project/pull/97762
[AMDGPU] adjust tests to prevent fpclass bitcast folding (#106268)
Make some minor tweaks to AMDGPU tests to ensure they still work as
intended after https://github.com/llvm/llvm-project/pull/97762. These
tests can be radically simplified after bitcast aware fpclass deduction.
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
| #
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
| #
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
| #
cc13f3ba |
| 21-Feb-2024 |
David Majnemer <david.majnemer@gmail.com> |
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the t
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the top 16 bits of a FP32 which will turn some NaNs into
infinities
Let's do this in a more principled way by rounding types with more
precision than FP32 to FP32 using round-inexact-to-odd which will negate
double rounding issues.
show more ...
|
|
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
460ffcdd |
| 04-Jan-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
la
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.
Depends #76213 #76214
show more ...
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
7fa7a08f |
| 19-Jul-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
Differential Revision: https://reviews.llvm.org/D155681
|
| #
f7684d85 |
| 11-Jul-2023 |
Jay Foad <jay.foad@amd.com> |
[DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes should generally be true during type legalization, so this
[DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes should generally be true during type legalization, so this patch does that.
On AMDGPU the effect is that we use i32 (a sane type) instead of i64 (pointer sized type) for more shift amounts, which in turn allows more formation of rotates and funnel shifts pre-legalization.
Differential Revision: https://reviews.llvm.org/D154960
show more ...
|
| #
f2c164c8 |
| 21-Jun-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in t
[AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns.
Differential Revision: https://reviews.llvm.org/D153537
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
| #
ac2d6df2 |
| 08-May-2023 |
Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> |
[AMDGPU] Add basic support for extended i8 perm matching
Differential Revision: https://reviews.llvm.org/D142782
Change-Id: Ibb95224f7885839e8b77a705f487f10b47a258a6
|
|
Revision tags: llvmorg-16.0.3 |
|
| #
2fce50e8 |
| 20-Apr-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix assertion with multiple uses of f64 fneg of select
A bitcast needs to be inserted back to the original type. Just skip the multiple use case for a safer quick fix. Handling the multiple
AMDGPU: Fix assertion with multiple uses of f64 fneg of select
A bitcast needs to be inserted back to the original type. Just skip the multiple use case for a safer quick fix. Handling the multiple use case seems to be beneficial in some but not all cases.
show more ...
|
|
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1 |
|
| #
f608ac62 |
| 26-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Push fneg into bitcast of integer select
Avoids some regressions in the math libraries in a future patch.
|
| #
0f59720e |
| 26-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fold fneg into bitcast of build_vector
The math libraries have a lot of code that performs manual sign bit operations by bitcasting doubles to int2 and doing bithacking on them. This is a ba
AMDGPU: Fold fneg into bitcast of build_vector
The math libraries have a lot of code that performs manual sign bit operations by bitcasting doubles to int2 and doing bithacking on them. This is a bad canonical form we should rewrite to use high level sign operations directly on double. To avoid codegen regressions, we need to do a better job moving fnegs to operate only on the high 32-bits.
This is only halfway to fixing the real case.
show more ...
|
|
Revision tags: llvmorg-17-init |
|
| #
bd1f7c41 |
| 23-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Try to push fneg as integer into select
I initially attempted to select the source modifier from xor of a sign mask. This proved to be more difficult since foldBinOpIntoSelect does not consi
AMDGPU: Try to push fneg as integer into select
I initially attempted to select the source modifier from xor of a sign mask. This proved to be more difficult since foldBinOpIntoSelect does not consider free fneg of integers and undoes the combine.
show more ...
|
| #
f3c008ca |
| 23-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
DAG: Relax foldBitcastedFPLogic conditions
Requiring a bitcast to exist was unhelpful. The most basic cases are always going to be a CopyFromReg or load, so they would need a new cast inserted. Don'
DAG: Relax foldBitcastedFPLogic conditions
Requiring a bitcast to exist was unhelpful. The most basic cases are always going to be a CopyFromReg or load, so they would need a new cast inserted. Don't require a bitcast if it's a free operation. I don't think this logic makes particularly much sense (it seems to be imparting special interpretation of bitcast), but this needs to be in sync with foldSignChangeInBitcast.
We should also get rid of this hasBitPreservingFPLogic hook. fabs/fneg are bitpreserving or incorrectly implemented, so this should just be a regular legality check.
show more ...
|
| #
a7fad92b |
| 27-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add more tests to fneg modifier with casting tests
|
|
Revision tags: llvmorg-15.0.7 |
|
| #
8e6406c2 |
| 13-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add fneg and select test
|