|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
9e9907f1 |
| 17-Jan-2024 |
Fangrui Song <i@maskray.me> |
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
show more ...
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2 |
|
| #
a3dfa4e0 |
| 18-Apr-2023 |
Kriti Gupta <kriti.gupta@research.iiit.ac.in> |
[test] Remove occurences of br undef in CodeGen/AMDGPU tests
Differential Revision: https://reviews.llvm.org/D148041
|
|
Revision tags: llvmorg-16.0.1 |
|
| #
047efda6 |
| 04-Apr-2023 |
Nuno Lopes <nuno.lopes@tecnico.ulisboa.pt> |
Revert "[test] Remove occurences of br undef in CodeGen/AMDGPU tests"
This reverts commit 18c594c176036b7ffcd8439ed9c4b08d2085a244. Build bots broke
|
| #
18c594c1 |
| 04-Apr-2023 |
Kriti Gupta <kriti.gupta@research.iiit.ac.in> |
[test] Remove occurences of br undef in CodeGen/AMDGPU tests
Differential Revision: https://reviews.llvm.org/D145622
|
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
| #
d8925210 |
| 13-Feb-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Break-up large PHIs for DAGISel
DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen. This is because it introduces a need to have a build
[AMDGPU] Break-up large PHIs for DAGISel
DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen. This is because it introduces a need to have a build_vector before copying the PHI value, and that build_vector may have many undef elements. This can cause very high register pressure and abnormal stack usage in some cases.
This scalarization/phi "break-up" can be easily tuned/disabled through CL options in case it's not beneficial for some users. It's also only enabled for DAGIsel and GlobalISel handles PHIs much better (as it works on the whole function).
This can both scalarize (break a vector into its elements) and simplify (break a vector into smaller, more manageable subvectors) PHIs.
Fixes SWDEV-321581
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D143731
show more ...
|
|
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
96d3c826 |
| 16-Dec-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)"
While the PPC litte-endian miscompile did get addressed by https://reviews.llvm.org/D140046 the PP
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)"
While the PPC litte-endian miscompile did get addressed by https://reviews.llvm.org/D140046 the PPV big-endian bots are still unhappy. https://lab.llvm.org/buildbot/#/builders/93/builds/12560
This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.
show more ...
|
| #
cfd594f8 |
| 09-Dec-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)
* This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f, * which was reverted in 25f01d593ce296078
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)
* This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f, * which was reverted in 25f01d593ce296078f57e872778b77d074ae5888, because it exposed a miscompile in PPC backend, which was resolved in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5. * which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a, * which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, 5and caused compile time explosions in some cases.
Let's try with something really *REALLY* conservative first, just to get somewhere, and try to bump it later.
FIXME: should this respect TTI reg width * num vec regs?
Original commit message:
Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint.
But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem.
Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here.
Fixes #59116.
show more ...
|
| #
d85e849f |
| 02-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Convert some assorted tests to opaque pointers
|
|
Revision tags: llvmorg-15.0.6 |
|
| #
25f01d59 |
| 26-Nov-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)"
TableGen is still getting miscompiled on PPC buildbots. Sent a mail with request for help.
This r
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)"
TableGen is still getting miscompiled on PPC buildbots. Sent a mail with request for help.
This reverts commit 3c4d2a03968ccf5889bacffe02d6fa2443b0260f.
show more ...
|
| #
3c4d2a03 |
| 26-Nov-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)
This is a recommit of cf624b23bc5d5a6161706d1663def49380ff816a, which was reverted in 5cfc22cafe3f2465e0bb3
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)
This is a recommit of cf624b23bc5d5a6161706d1663def49380ff816a, which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, and caused compile time explosions in some cases.
Let's try with something really *REALLY* conservative first, just to get somewhere, and try to bump it (to 64/128) later.
FIXME: should this respect TTI reg width * num vec regs?
Original commit message:
Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint.
But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem.
Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here.
Fixes #59116.
show more ...
|
| #
5cfc22ca |
| 23-Nov-2022 |
Benjamin Kramer <benny.kra@googlemail.com> |
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes"
This reverts commit cf624b23bc5d5a6161706d1663def49380ff816a. It triggers crashes in clang, see the comment
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes"
This reverts commit cf624b23bc5d5a6161706d1663def49380ff816a. It triggers crashes in clang, see the comments on github on the original change.
show more ...
|
| #
cf624b23 |
| 22-Nov-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes
Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking th
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes
Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint.
But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem.
Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here.
Fixes #59116.
show more ...
|
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2 |
|
| #
7a846240 |
| 27-Sep-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make various vector undefs legal
Surprisingly these were getting legalized to something zero initialized.
This fixes an infinite loop when combining some vector types. Also fixes zero initi
AMDGPU: Make various vector undefs legal
Surprisingly these were getting legalized to something zero initialized.
This fixes an infinite loop when combining some vector types. Also fixes zero initializing some undef values.
SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking for the legality of the output undefs they are replacing unused operations with. This resulted in turning vectors into undefs that were later re-legalized back into zero vectors.
show more ...
|
|
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init |
|
| #
6e0fa292 |
| 16-Jul-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Change register type for v32 vectors
When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction.
[AMDGPU] Change register type for v32 vectors
When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction.
Differential Revision: https://reviews.llvm.org/D64815
llvm-svn: 366252
show more ...
|