#
2fc690ac |
| 24-Sep-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM] LowoverheadLoops: add an option to disable tail-predication
This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -
[ARM] LowoverheadLoops: add an option to disable tail-predication
This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass.
Differential Revision: https://reviews.llvm.org/D88212
show more ...
|
#
9d9a11c7 |
| 24-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Check for LSTP side-effects.
If the LSTP instruction is inserted with an element count low enough to immediately predicate some lanes as false, this can have some unintended effects on any pro
[ARM] Check for LSTP side-effects.
If the LSTP instruction is inserted with an element count low enough to immediately predicate some lanes as false, this can have some unintended effects on any proceeding MVE instructions in the preheader.
Differential Revision: https://reviews.llvm.org/D88209
show more ...
|
#
89c1e35f |
| 22-Sep-2020 |
Stefanos Baziotis <sdi1600105@di.uoa.gr> |
[LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
|
#
94c799fe |
| 22-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Trying to fix asan buildbot
|
#
b4fa884a |
| 22-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Improve VPT predicate tracking
The VPTBlock has been modified to track the 'global' state of the VPR, as well as the state for each block. Each object now just holds a list of instructions tha
[ARM] Improve VPT predicate tracking
The VPTBlock has been modified to track the 'global' state of the VPR, as well as the state for each block. Each object now just holds a list of instructions that makeup the block, while static structures hold the predicate information. This enables global access for querying how both a VPT block and individual instructions are predicated. These changes now allow us, again, to handle more complicated cases where multiple instructions build a predicate and/or where the same predicate in used in multiple blocks.
It doesn't, however, get us back to before the tracking was 'fixed' as some extra logic will be required to properly handle VPT instructions. Currently a VPT could be effectively predicated because of it's inputs, but the existing logic will not detect that and so will refuse to perform the transformation. This can be seen in remat-vctp.ll test where we still don't perform the transform.
Differential Revision: https://reviews.llvm.org/D87681
show more ...
|
#
a0c1dcc3 |
| 21-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Remove MVEDomain from VLDR/STR of P0
Remove the domain from the instructions and create a shouldInspect helper for LowOverheadLoops which queries it or a vpr operand.
Differential Revision: h
[ARM] Remove MVEDomain from VLDR/STR of P0
Remove the domain from the instructions and create a shouldInspect helper for LowOverheadLoops which queries it or a vpr operand.
Differential Revision: https://reviews.llvm.org/D87900
show more ...
|
#
3ce9ec0c |
| 16-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Reorder some logic
Re-order some checks in ValidateMVEInst.
|
#
a63b2a46 |
| 16-Sep-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Fix tail predication predicate tracking
Clear the CurrentPredicate when we find an instruction which would completely overwrite the VPR. This fix essentially means we're back to not really bei
[ARM] Fix tail predication predicate tracking
Clear the CurrentPredicate when we find an instruction which would completely overwrite the VPR. This fix essentially means we're back to not really being able to handle VPT instructions when tail predicating.
Differential Revision: https://reviews.llvm.org/D87610
show more ...
|
#
ef0b9f33 |
| 14-Sep-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT
This patch combines a VCMP followed by a VPST into a VPT, which has the same semantics as the combination of the former two.
|
#
b81c57d6 |
| 09-Sep-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane values
The effects of unpredicated vector instruction with unknown lanes cannot be predicted and therefore
[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane values
The effects of unpredicated vector instruction with unknown lanes cannot be predicted and therefore cannot be tail predicated. This does not apply to predicated vector instructions and so this patch allows tail predication on them.
Differential Revision: https://reviews.llvm.org/D87376
show more ...
|
#
7aabb6ad |
| 07-Sep-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM][LowOverheadLoops] Remove modifications to the correct element count register
After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove mod
[ARM][LowOverheadLoops] Remove modifications to the correct element count register
After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.
show more ...
|
#
b30adfb5 |
| 28-Aug-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the
[ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero.
Differential Revision: https://reviews.llvm.org/D86613
show more ...
|
Revision tags: llvmorg-11.0.0-rc2 |
|
#
c466c5fa |
| 18-Aug-2020 |
Fangrui Song <i@maskray.me> |
[ARM] Fix build after D86087
|
#
3471520b |
| 18-Aug-2020 |
David Green <david.green@arm.com> |
[ARM] Allow tail predication of VLDn
VLD2/4 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up lo
[ARM] Allow tail predication of VLDn
VLD2/4 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up loading extra values into off vector lanes, not effecting the on lanes. The same is true for loads in general where so long as we are not using the other vector lanes, an unpredicated load can be converted to a predicated one.
This marks VLD2 and VLD4 instructions as validForTailPredication and allows any unpredicated load in tail predication loop, which seems to be valid given the other checks we have.
Differential Revision: https://reviews.llvm.org/D86022
show more ...
|
#
31f02ac6 |
| 17-Aug-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM] Use mov operand if the mov cannot be moved while tail predicating
There are some cases where the instruction that sets up the iteration count for a tail predicated loop cannot be moved before
[ARM] Use mov operand if the mov cannot be moved while tail predicating
There are some cases where the instruction that sets up the iteration count for a tail predicated loop cannot be moved before the dlstp, stopping tail predication entirely. This patch checks if the mov operand can be used and if so, uses that instead.
Differential Revision: https://reviews.llvm.org/D86087
show more ...
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
#
3ee580d0 |
| 01-Jul-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Handle reductions
While validating live-out values, record instructions that look like a reduction. This will comprise of a vector op (for now only vadd), a vorr (vmov) which
[ARM][LowOverheadLoops] Handle reductions
While validating live-out values, record instructions that look like a reduction. This will comprise of a vector op (for now only vadd), a vorr (vmov) which store the previous value of vadd and then a vpsel in the exit block which is predicated upon a vctp. This vctp will combine the last two iterations using the vmov and vadd into a vector which can then be consumed by a vaddv.
Once we have determined that it's safe to perform tail-predication, we need to change this sequence of instructions so that the predication doesn't produce incorrect code. This involves changing the register allocation of the vadd so it updates itself and the predication on the final iteration will not update the falsely predicated lanes. This mimics what the vmov, vctp and vpsel do and so we then don't need any of those instructions.
Differential Revision: https://reviews.llvm.org/D75533
show more ...
|
Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
835251f7 |
| 08-Apr-2020 |
Pierre-vh <pierre.vanhoutryve@arm.com> |
[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks with conditions, or with multiple VCTPs. This patch improves the LowOv
[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks with conditions, or with multiple VCTPs. This patch improves the LowOverheadLoops pass so it can handle those cases.
It also adds support for VCMPs before the VCTP.
Differential Revision: https://reviews.llvm.org/D78206
show more ...
|
#
24bf8063 |
| 08-Apr-2020 |
Pierre-vh <pierre.vanhoutryve@arm.com> |
[Target][ARM] Replace outdated getARMVPTBlockMask function
getARMVPTBlockMask was an outdated function that only handled basic block masks: T, TT, TTT and TTTT. This worked fine before the MVE VPT B
[Target][ARM] Replace outdated getARMVPTBlockMask function
getARMVPTBlockMask was an outdated function that only handled basic block masks: T, TT, TTT and TTTT. This worked fine before the MVE VPT Block Insertion Pass improvements as it was the only kind of masks that it could generate, but now it can generate more complex masks that uses E predicates, so it's dangerous to use that function to calculate VPT/VPST block masks.
I replaced it with 2 different functions: - expandPredBlockMask, in ARMBaseInfo. This adds an "E" or "T" at the end of an existing PredBlockMask. - recomputeVPTBlockMask, in Thumb2InstrInfo. This takes an iterator to a VPT/VPST instruction and recomputes its block mask by looking at the predicated instructions that follows it. This should be used to recompute a block mask after removing/adding a predicated instruction to the block.
The expandPredBlockMask function is pretty much imported from the MVE VPT Blocks pass.
I had to change the ARMLowOverheadLoops and MVEVPTBlocks passes as well so they could use these new functions.
Differential Revision: https://reviews.llvm.org/D78201
show more ...
|
#
892af45c |
| 22-Apr-2020 |
David Green <david.green@arm.com> |
[ARM] Distribute MVE post-increments
This adds some extra processing into the Pre-RA ARM load/store optimizer to detect and merge MVE loads/stores and adds of the same base. This we don't always tur
[ARM] Distribute MVE post-increments
This adds some extra processing into the Pre-RA ARM load/store optimizer to detect and merge MVE loads/stores and adds of the same base. This we don't always turn into a post-inc during ISel, and due to the nature of it being a graph we don't always know an order to use for the nodes, not knowing which nodes to make post-inc and which to use the new post-inc of. After ISel, we have an order that we can use to post-inc the following instructions.
So this looks for a loads/store with a starting offset of 0, and an add/sub from the same base, plus a number of other loads/stores. We then do some checks and convert the zero offset load/store into a postinc variant. Any loads/stores after it have the offset subtracted from their immediates. For example: LDR #4 LDR #4 LDR #0 LDR_POSTINC #16 LDR #8 LDR #-8 LDR #12 LDR #-4 ADD #16 It only handles MVE loads/stores at the moment. Normal loads/store will be added in a followup patch, they just have some extra details to ensure that we keep generating LDRD/LDM successfully.
Differential Revision: https://reviews.llvm.org/D77813
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5 |
|
#
dad84828 |
| 13-Mar-2020 |
Pierre-vh <pierre.vanhoutryve@arm.com> |
[Target][ARM] Change VPTMaskValues to the correct encoding
VPTMaskValue was using the "instruction" encoding to represent the masks (= the same encoding as the one used by the instructions in an obj
[Target][ARM] Change VPTMaskValues to the correct encoding
VPTMaskValue was using the "instruction" encoding to represent the masks (= the same encoding as the one used by the instructions in an object file), but it is only used to build MCOperands, so it should use the MCOperand encoding of the masks, which is slightly different.
Differential Revision: https://reviews.llvm.org/D76139
show more ...
|
#
94b195ff |
| 30-Mar-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Add horizontal reduction support
Add a bit more logic into the 'FalseLaneZeros' tracking to enable horizontal reductions and also make the VADDV variants validForTailPredicat
[ARM][LowOverheadLoops] Add horizontal reduction support
Add a bit more logic into the 'FalseLaneZeros' tracking to enable horizontal reductions and also make the VADDV variants validForTailPredication.
Differential Revision: https://reviews.llvm.org/D76708
show more ...
|
#
d7084fa3 |
| 27-Mar-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros
Given that some instructions generate wider result elements than their inputs, flag them as being able to generate non zeros i
[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros
Given that some instructions generate wider result elements than their inputs, flag them as being able to generate non zeros in the false lanes.
Differential Revision: https://reviews.llvm.org/D76766
show more ...
|
#
94cacebc |
| 24-Mar-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Add checks for narrowing
Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely, including checks on specific operations that can generate non-zeros from zero value
[ARM][LowOverheadLoops] Add checks for narrowing
Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely, including checks on specific operations that can generate non-zeros from zero values, e.g VMVN. We can then check that any instructions that retain some information in their output register (all narrowing instructions) that they only use and def registers that always have zeros in their falsely predicated bytes, whether or not tail predication happens.
Most of the logic remains the same, just the names of the data structures and helpers have been renamed to reflect the change in logic. The key change, apart from the opcode checkers, is that the FalseZeros set now strictly contains only instructions which will always generate zeros, and not instructions that could also have their false bytes masked away later.
Differential Revision: https://reviews.llvm.org/D76235
show more ...
|
Revision tags: llvmorg-10.0.0-rc4 |
|
#
d941df36 |
| 11-Mar-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM] Reorder some logic
Move some logic around in LowOverheadLoop::ValidateLiveOut
|
#
ff9ac33e |
| 10-Mar-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Validate tail predication values
Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not.
We want to find out if the t
[ARM][MVE] Validate tail predication values
Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not.
We want to find out if the tail-predicated version of this loop will produce the same values as the loop in its original form. For this to be true, the newly inserted implicit predication must not change the the (observable) results.
We're doing this because many instructions in the loop will not be predicated and so the conversion from VPT predication to tail predication can result in different values being produced, because of falsely predicated lanes not being updated in the converted form.
A masked load, whether through VPT or tail predication, will write zeros to any of the falsely predicated bytes. So, from the loads, we know that the false lanes are zeroed and here we're trying to track that those false lanes remain zero, or where they change, the differences are masked away by their user(s).
All MVE loads and stores have to be predicated, so we know that any load operands, or stored results are equivalent already. Other explicitly predicated instructions will perform the same operation in the original loop and the tail-predicated form too. Because of this, we can insert loads, stores and other predicated instructions into our KnownFalseZeros set and build from there.
Differential Revision: https://reviews.llvm.org/D75452
show more ...
|