#
d9bf6245 |
| 07-Dec-2020 |
David Green <david.green@arm.com> |
[ARM] Revert low overhead loops with calls before registry allocation.
This adds code to revert low overhead loops with calls in them before register allocation. Ideally we would not create low over
[ARM] Revert low overhead loops with calls before registry allocation.
This adds code to revert low overhead loops with calls in them before register allocation. Ideally we would not create low overhead loops with calls in them to begin with, but that can be difficult to always get correct. If we want to try and glue together t2LoopDec and t2LoopEnd into a single instruction, we need to ensure that no instructions use LR in the loop. (Technically the final code can be better too, as it doesn't need to use the same registers but that has not been optimized for here, as reverting loops with calls is expected to be very rare).
It also adds a MVETailPredUtils.h header to share the revert code between different passes, and provides a place to expand upon, with RevertLoopWithCall becoming a place to perform other low overhead loop alterations like removing copies or combining LoopDec and End into a single instruction.
Differential Revision: https://reviews.llvm.org/D91273
show more ...
|
Revision tags: llvmorg-11.0.1-rc1 |
|
#
8ecb015e |
| 19-Nov-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition
This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures
[ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition
This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures in case behaviuour is changed.
This is a follow-up to https://reviews.llvm.org/D90935 and implements the post-approval comments.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91790
show more ...
|
#
f45c052c |
| 18-Nov-2020 |
Mikhail Goncharov <goncharov.mikhail@gmail.com> |
Fix unused variables in release build
Differential Revision: https://reviews.llvm.org/D91705
|
#
da2e4728 |
| 06-Nov-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks
This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block.
Differential Revision: https://reviews.llv
[ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks
This patch adds support for combining a VPST with a dangling VCMP from a previous VPT block.
Differential Revision: https://reviews.llvm.org/D90935
show more ...
|
#
898a81df |
| 11-Nov-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM] Replace lambda with any_of
|
#
08d1c2d4 |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead lo
[ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a t2DoLoopStart but taking 2 parameters - the original loop iteration count needed for a low overhead loop, plus the VCTP element count needed for a DLSTP instruction setting up a tail predicated loop. The idea is that the instruction holds both values and the backend ARMLowOverheadLoops pass can pick between the two, depending on whether it creates a tail predicated loop or falls back to a low overhead loop.
To do that there needs to be something that converts a t2DoLoopStart to a t2DoLoopStartTP, for which this patch repurposes the MVEVPTOptimisationsPass as a "tail predication and vpt optimisation" pass. The extra operand for the t2DoLoopStartTP is chosen based on the operands of VCTP's in the loop, and the instruction is moved as late in the block as possible to attempt to increase the likelihood of making tail predicated loops.
Differential Revision: https://reviews.llvm.org/D90591
show more ...
|
#
dbe1bf63 |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Cleanup for ARMLowOverheadLoops. NFC
|
#
b2ac9681 |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more t
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
show more ...
|
#
40a3f7e4 |
| 30-Oct-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT
There were cases where a VCMP and a VPST were merged even if the VCMP didn't have the same defs of its operands as the VPST. This is
[ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT
There were cases where a VCMP and a VPST were merged even if the VCMP didn't have the same defs of its operands as the VPST. This is fixed by adding RDA checks for the defs. This however gave rise to cases where the new VPST created would precede the un-merged VCMP and so would fail a predicate mask assertion since the VCMP wasn't predicated. This was solved by converting the VCMP to a VPT instead of inserting the new VPST.
Differential Revision: https://reviews.llvm.org/D90461
show more ...
|
#
e24537d4 |
| 21-Oct-2020 |
Mircea Trofin <mtrofin@google.com> |
[NFC][MC] Use MCRegister for ReachingDefAnalysis APIs
Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp
Differential Revision: https://reviews.llvm.org/D89912
|
#
6dcbc323 |
| 20-Oct-2020 |
David Green <david.green@arm.com> |
Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1360b035271422bab922d22ed04d79a.
This commit contains some holes in its logic and has been causing issues since
Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1360b035271422bab922d22ed04d79a.
This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.
show more ...
|
#
cb27006a |
| 10-Oct-2020 |
David Green <david.green@arm.com> |
[ARM] Attempt to make Tail predication / RDA more resilient to empty blocks
There are a number of places in RDA where we assume the block will not be empty. This isn't necessarily true for tail pred
[ARM] Attempt to make Tail predication / RDA more resilient to empty blocks
There are a number of places in RDA where we assume the block will not be empty. This isn't necessarily true for tail predicated loops where we have removed instructions. This attempt to make the pass more resilient to empty blocks, not casting pointers to machine instructions where they would be invalid.
The test contains a case that was previously failing, but recently been hidden on trunk. It contains an empty block to begin with to show a similar error.
Differential Revision: https://reviews.llvm.org/D88926
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6 |
|
#
7e02bc81 |
| 01-Oct-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM] LowOverheadLoop DEBUG statements
|
#
38f625d0 |
| 01-Oct-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Adjust Start insertion.
Try to move the insertion point to become the terminator of the block, usually the preheader.
Differential Revision: https://reviews.llvm.org/D88638
|
Revision tags: llvmorg-11.0.0-rc5 |
|
#
6ec5f324 |
| 30-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Iteration count liveness
Before deciding to insert a [W|D]LSTP, check that defining LR with the element count won't affect any other instructions that should be taking the it
[ARM][LowOverheadLoops] Iteration count liveness
Before deciding to insert a [W|D]LSTP, check that defining LR with the element count won't affect any other instructions that should be taking the iteration count.
Differential Revision: https://reviews.llvm.org/D88549
show more ...
|
#
7b90516d |
| 30-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Start insertion point
If possible, try not to move the start position earlier than it already is.
Differential Revision: https://reviews.llvm.org/D88542
|
#
dfa2c14b |
| 30-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Use iterator for InsertPt.
Use a MachineBasicBlock::iterator instead of a MachineInstr* for the position of our LoopStart instruction. NFCish, as it change debug info.
|
#
779a8a02 |
| 30-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] TryRemove helper.
Make a helper function that wraps around RDA::isSafeToRemove and utilises the existing DCE IT block checks.
|
Revision tags: llvmorg-11.0.0-rc4 |
|
#
195c22f2 |
| 24-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Change VPT state assertion
Just because we haven't encountered an instruction setting the VPR, it doesn't mean we can't create a VPT block - the VPR maybe a live-in.
Differential Revision: ht
[ARM] Change VPT state assertion
Just because we haven't encountered an instruction setting the VPR, it doesn't mean we can't create a VPT block - the VPR maybe a live-in.
Differential Revision: https://reviews.llvm.org/D88224
show more ...
|
#
4c19b89b |
| 29-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM] Comments and lambdas
Add some comments in LowOverheadLoops and make some lambda variables explicit arguments instead of capturing.
|
#
e82a0084 |
| 25-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Cleanup and re-arrange
Rename and reorganise how we decide where to put the LoopStart instruction.
|
#
3d1d0891 |
| 28-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM] Factor out some logic for LoLoops.
Create a DCE function that accepts an instruction.
|
#
e4b9867c |
| 28-Sep-2020 |
David Green <david.green@arm.com> |
[ARM] Expand cannotInsertWDLSTPBetween to the last instruction
9d9a11c7be037 added this check for predicatable instructions between the D/WLSTP and the loop's start, but it was missing the last inst
[ARM] Expand cannotInsertWDLSTPBetween to the last instruction
9d9a11c7be037 added this check for predicatable instructions between the D/WLSTP and the loop's start, but it was missing the last instruction in the block. Change it to use some iterators instead.
Differential Revision: https://reviews.llvm.org/D88354
show more ...
|
Revision tags: llvmorg-11.0.0-rc3 |
|
#
a399d188 |
| 22-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Find VPT implicitly predicated by VCTP
On failing to find a VCTP in the list of instructions that explicitly predicate the entry of a VPT block, inspect whether the block is controlled via VPT
[ARM] Find VPT implicitly predicated by VCTP
On failing to find a VCTP in the list of instructions that explicitly predicate the entry of a VPT block, inspect whether the block is controlled via VPT which is implicitly predicated due to it's predicated operand(s).
Differential Revision: https://reviews.llvm.org/D87819
show more ...
|
#
00ee52ae |
| 24-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM] Remove dead loop.
Remove a loop that just calculated a couple of values that were now longer needed.
|