|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
b5b663aa |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[Thumb2] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
09124402 |
| 04-Nov-2021 |
David Green <david.green@arm.com> |
[ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form.
[ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered.
This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops.
Differential Revision: https://reviews.llvm.org/D113094
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
211ce51f |
| 22-May-2021 |
David Green <david.green@arm.com> |
[ARM] Clean up some tests, removing dead instructions. NFC
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3 |
|
| #
48230355 |
| 02-Feb-2021 |
David Green <david.green@arm.com> |
[ARM] Remove DLS lr, lr
A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader.
Differential Revision: https://reviews.llvm
[ARM] Remove DLS lr, lr
A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader.
Differential Revision: https://reviews.llvm.org/D78916
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
73a6cd4b |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it eas
[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely.
The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter.
Differential Revision: https://reviews.llvm.org/D89883
show more ...
|
| #
b2ac9681 |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more t
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6 |
|
| #
322d0afd |
| 03-Oct-2020 |
Amara Emerson <amara@apple.com> |
[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.
The autoupgrader will handle lega
[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.
The autoupgrader will handle legacy intrinsics.
Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html
Differential Revision: https://reviews.llvm.org/D88787
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
b4fa884a |
| 22-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Improve VPT predicate tracking
The VPTBlock has been modified to track the 'global' state of the VPR, as well as the state for each block. Each object now just holds a list of instructions tha
[ARM] Improve VPT predicate tracking
The VPTBlock has been modified to track the 'global' state of the VPR, as well as the state for each block. Each object now just holds a list of instructions that makeup the block, while static structures hold the predicate information. This enables global access for querying how both a VPT block and individual instructions are predicated. These changes now allow us, again, to handle more complicated cases where multiple instructions build a predicate and/or where the same predicate in used in multiple blocks.
It doesn't, however, get us back to before the tracking was 'fixed' as some extra logic will be required to properly handle VPT instructions. Currently a VPT could be effectively predicated because of it's inputs, but the existing logic will not detect that and so will refuse to perform the transformation. This can be seen in remat-vctp.ll test where we still don't perform the transform.
Differential Revision: https://reviews.llvm.org/D87681
show more ...
|
| #
a63b2a46 |
| 16-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Fix tail predication predicate tracking
Clear the CurrentPredicate when we find an instruction which would completely overwrite the VPR. This fix essentially means we're back to not really bei
[ARM] Fix tail predication predicate tracking
Clear the CurrentPredicate when we find an instruction which would completely overwrite the VPR. This fix essentially means we're back to not really being able to handle VPT instructions when tail predicating.
Differential Revision: https://reviews.llvm.org/D87610
show more ...
|
| #
ac2717bf |
| 16-Sep-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM][LowOverheadLoops] Fix tests after ef0b9f3
ef0b9f3 didn't update the tests that it affected.
|
| #
3ebc7552 |
| 09-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Try to rematerialize VCTP instructions
We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions.
Diff
[ARM] Try to rematerialize VCTP instructions
We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions.
Differential Revision: https://reviews.llvm.org/D87280
show more ...
|
| #
a3e41d45 |
| 27-Aug-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict.
Diff
[ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict.
Differential Revision: https://reviews.llvm.org/D40061
show more ...
|
| #
c352e7fb |
| 25-Aug-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the B
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1.
Differential Revision: https://reviews.llvm.org/D86303
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2 |
|
| #
22916481 |
| 03-Aug-2020 |
David Green <david.green@arm.com> |
[ARM] Convert VPSEL to VMOV in tail predicated loops
VPSEL has slightly different semantics under tail predication (it can end up selecting from Qn, Qm and Qd). We do not model that at the moment so
[ARM] Convert VPSEL to VMOV in tail predicated loops
VPSEL has slightly different semantics under tail predication (it can end up selecting from Qn, Qm and Qd). We do not model that at the moment so they block tail predicated loops from being formed.
This just converts them into a predicated VMOV instead (via a VORR), allowing tail predication to happen whilst still modelling the original behaviour of the input.
Differential Revision: https://reviews.llvm.org/D85110
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc1 |
|
| #
3533e0a0 |
| 22-Jul-2020 |
David Green <david.green@arm.com> |
[ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z)
Most MVE instructions can be predicated to fold a select into the instruction, using the predicate and the selects else as a pass
[ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z)
Most MVE instructions can be predicated to fold a select into the instruction, using the predicate and the selects else as a passthough. This adds tablegen patterns for most two operand instructions using the newly added TwoOpPattern from 1030e82598da.
Differential Revision: https://reviews.llvm.org/D83222
show more ...
|
|
Revision tags: llvmorg-12-init |
|
| #
595270ae |
| 13-Jul-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather t
[ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones.
This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option.
Differential Revision: https://reviews.llvm.org/D83133
show more ...
|
|
Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
| #
d1522513 |
| 17-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use in
[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before.
Many thanks to Eli Friedman and Sam Parker for all their help with this work.
This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask.
Differential Revision: https://reviews.llvm.org/D79175
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc1 |
|
| #
835251f7 |
| 08-Apr-2020 |
Pierre-vh <pierre.vanhoutryve@arm.com> |
[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks with conditions, or with multiple VCTPs. This patch improves the LowOv
[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks with conditions, or with multiple VCTPs. This patch improves the LowOverheadLoops pass so it can handle those cases.
It also adds support for VCMPs before the VCTP.
Differential Revision: https://reviews.llvm.org/D78206
show more ...
|
| #
d5eb7ffa |
| 01-May-2020 |
Pierre-vh <pierre.vanhoutryve@arm.com> |
[Target][ARM] Fold or(A, B) more aggressively for I1 vectors
This patch makes the folding of or(A, B) into not(and(not(A), not(B))) more agressive for I1 vector. This only affects Thumb2 MVE and imp
[Target][ARM] Fold or(A, B) more aggressively for I1 vectors
This patch makes the folding of or(A, B) into not(and(not(A), not(B))) more agressive for I1 vector. This only affects Thumb2 MVE and improves codegen, because it removes a lot of msr/mrs instructions on VPR.P0.
This patch also adds a xor(vcmp) -> !vcmp fold for MVE.
Differential Revision: https://reviews.llvm.org/D77202
show more ...
|
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6 |
|
| #
62fdb1f5 |
| 23-Mar-2020 |
Sam Parker <sam.parker@arm.com> |
[DAGCombine] Skip PostInc combine with later users
When decided whether to generate a post-inc load/store, look at the other memory nodes that use the same base address and, if any proceed the curre
[DAGCombine] Skip PostInc combine with later users
When decided whether to generate a post-inc load/store, look at the other memory nodes that use the same base address and, if any proceed the current node, then don't do the combine. The change only seems to be affecting the Arm backend, which I was surprised at, but it appears to fix a lot of our issues around MVE masked load/stores having to store a temporary address after an early post-increment on a shared base address.
Differential Revision: https://reviews.llvm.org/D75847
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
| #
de3e65e6 |
| 18-Feb-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Check loop liveouts
Check that no Q-regs are live out of the loop, unless the instruction within the loop is predicated on the vctp.
Differential Revision: https://reviews.l
[ARM][LowOverheadLoops] Check loop liveouts
Check that no Q-regs are live out of the loop, unless the instruction within the loop is predicated on the vctp.
Differential Revision: https://reviews.llvm.org/D72713
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
8cba99e2 |
| 20-Jan-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in e
[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later in ARMLowOverheadLoops, which is necessary to get actual performance gains for tail-predicated loops.
Differential Revision: https://reviews.llvm.org/D72714
show more ...
|
|
Revision tags: llvmorg-11-init |
|
| #
b4abe7af |
| 30-Dec-2019 |
David Green <david.green@arm.com> |
[ARM] Sink splat to ICmp
This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet
[ARM] Sink splat to ICmp
This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR.
Differential Revision: https://reviews.llvm.org/D70997
show more ...
|
| #
40425183 |
| 20-Dec-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Tail predicate in the presence of vcmp
Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing m
[ARM][MVE] Tail predicate in the presence of vcmp
Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing more than one instruction to define vpr, but each block must somehow be predicated using the vctp. This leaves us with several scenarios which need fixing up: 1) A VPT block with is only predicated by the vctp and has no internal vpr defs. 2) A VPT block which is only predicated by the vctp but has an internal vpr def. 3) A VPT block which is predicated upon the vctp as well as another vpr def. 4) A VPT block which is not predicated upon a vctp, but contains it and all instructions within the block are predicated upon in.
The changes needed are, for: 1) The easy one, just remove the vpst and unpredicate the instructions in the block. 2) Remove the vpst and unpredicate the instructions up to the internal vpr def. Need insert a new vpst to predicate the remaining instructions. 3) No nothing. 4) The vctp will be inside a vpt and the instruction will be removed, so adjust the size of the mask on the vpst.
Differential Revision: https://reviews.llvm.org/D71107
show more ...
|
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3 |
|
| #
b1aba037 |
| 08-Dec-2019 |
David Green <david.green@arm.com> |
[ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does.
Differential Revision: https://revi
[ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does.
Differential Revision: https://reviews.llvm.org/D70968
show more ...
|