|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
| #
7b3bbd83 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.
Reverted due to various buildbot failures.
|
| #
2501ae58 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
show more ...
|
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6 |
|
| #
03de1cb7 |
| 09-Jun-2023 |
Nikita Popov <npopov@redhat.com> |
[InstCombine][CGP] Move swapMayExposeCSEOpportunities() fold
InstCombine tries to swap compare operands to match sub instructions in order to expose "CSE opportunities". However, it doesn't really m
[InstCombine][CGP] Move swapMayExposeCSEOpportunities() fold
InstCombine tries to swap compare operands to match sub instructions in order to expose "CSE opportunities". However, it doesn't really make sense to perform this transform in the middle-end, as we cannot actually CSE the instructions there.
The backend already performs this fold in https://github.com/llvm/llvm-project/blob/18f5446a45da5a61dbfb1b7667d27fb441ac62db/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L4236 on the SDAG level, however this only works within a single basic block.
To handle cross-BB cases, we do need to handle this in the IR layer. This patch moves the fold from InstCombine to CGP in the backend, while keeping the same (somewhat dubious) heuristic.
Differential Revision: https://reviews.llvm.org/D152541
show more ...
|
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
| #
eb64450a |
| 29-Mar-2023 |
David Green <david.green@arm.com> |
[ARM] Convert active.lane.masks to vctp with non-zero starts
This attempts to expand the logic in the MVETailPredication pass to convert active lane masks that the vectorizer produces to vctp instru
[ARM] Convert active.lane.masks to vctp with non-zero starts
This attempts to expand the logic in the MVETailPredication pass to convert active lane masks that the vectorizer produces to vctp instructions that the backend can later turn into tail predicated loops. Especially for addrecs with non-zero starts that can be created from epilog vectorization. There is some adjustment to the logic to handle this, moving some of the code to check the addrec earlier so that we can get the start value. This start value is then incorporated into the logic of checkin the new vctp is valid, and there is a newly added check that it is known to be a multiple of the VF as we expect.
Differential Revision: https://reviews.llvm.org/D146517
show more ...
|
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
b5b663aa |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[Thumb2] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
09124402 |
| 04-Nov-2021 |
David Green <david.green@arm.com> |
[ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form.
[ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered.
This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops.
Differential Revision: https://reviews.llvm.org/D113094
show more ...
|
| #
08d7eec0 |
| 24-Sep-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
Revert "Allow rematerialization of virtual reg uses"
Reverted due to two distcint performance regression reports.
This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39.
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
92c1fd19 |
| 19-Aug-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a tr
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists.
Differential Revision: https://reviews.llvm.org/D106408
show more ...
|
| #
2d4470ab |
| 18-Aug-2021 |
Petr Hosek <phosek@google.com> |
Revert "Allow rematerialization of virtual reg uses"
This reverts commit 877572cc193a470f310eec46a7ce793a6cc97c2f which introduced PR51516.
|
| #
877572cc |
| 09-Aug-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a tr
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists.
Differential Revision: https://reviews.llvm.org/D106408
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
211ce51f |
| 22-May-2021 |
David Green <david.green@arm.com> |
[ARM] Clean up some tests, removing dead instructions. NFC
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
e7dc083a |
| 18-Jan-2021 |
David Green <david.green@arm.com> |
[ARM] Don't handle low overhead branches in AnalyzeBranch
It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the uncondition
[ARM] Don't handle low overhead branches in AnalyzeBranch
It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues.
This effectively reverts 372eb2bbb6fb903ce76266e659dfefbaee67722b, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.
show more ...
|
| #
372eb2bb |
| 16-Jan-2021 |
David Green <david.green@arm.com> |
[ARM] Add low overhead loops terminators to AnalyzeBranch
This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the dire
[ARM] Add low overhead loops terminators to AnalyzeBranch
This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases).
Differential Revision: https://reviews.llvm.org/D94392
show more ...
|
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
b2ac9681 |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more t
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6 |
|
| #
322d0afd |
| 03-Oct-2020 |
Amara Emerson <amara@apple.com> |
[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.
The autoupgrader will handle lega
[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.
The autoupgrader will handle legacy intrinsics.
Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html
Differential Revision: https://reviews.llvm.org/D88787
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
b30adfb5 |
| 28-Aug-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the
[ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero.
Differential Revision: https://reviews.llvm.org/D86613
show more ...
|
| #
c352e7fb |
| 25-Aug-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the B
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1.
Differential Revision: https://reviews.llvm.org/D86303
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2 |
|
| #
0f14b2e6 |
| 17-Aug-2020 |
Dávid Bolvanský <david.bolvansky@gmail.com> |
Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 50c743fa713002fe4e0c76d23043e6c1f9e9fe6f. Patch will be split to smaller ones.
|
| #
50c743fa |
| 13-Aug-2020 |
Dávid Bolvanský <david.bolvansky@gmail.com> |
[BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revi
[BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
show more ...
|
| #
f9264995 |
| 13-Aug-2020 |
Dávid Bolvanský <david.bolvansky@gmail.com> |
Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 44587e2f7e732604cd6340061d40ac21e7e188e5. Sanitizer tests need to be updated.
|
| #
44587e2f |
| 13-Aug-2020 |
Dávid Bolvanský <david.bolvansky@gmail.com> |
[BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revi
[BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
show more ...
|
| #
a0485421 |
| 13-Aug-2020 |
Dávid Bolvanský <david.bolvansky@gmail.com> |
Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 385c9d673f217e176b18e7bf6fe055154ac589c6.
|
| #
385c9d67 |
| 13-Aug-2020 |
Dávid Bolvanský <david.bolvansky@gmail.com> |
[BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revi
[BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
show more ...
|
| #
6716e786 |
| 11-Aug-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] tail-predication: overflow checks for backedge taken count.
This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrini
[ARM][MVE] tail-predication: overflow checks for backedge taken count.
This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j:
M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i];
For this inner loop, the SCEV backedge taken count (BTC) expression is:
(-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body>
and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case.
This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases.
Differential Revision: https://reviews.llvm.org/D85737
show more ...
|
| #
a680c797 |
| 11-Aug-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Added extra tail-predication runs for auto-correlation test case. NFC
|