#
e93e0d41 |
| 09-Jan-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Update liveness info
After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and prehea
[ARM][LowOverheadLoops] Update liveness info
After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s).
Differential Revision: https://reviews.llvm.org/D72131
show more ...
|
#
0efc9e5a |
| 06-Jan-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] More MVETailPredication debug messages. NFC.
I've added a few more debug messages to MVETailPredication because I wanted to trace better which instructions are added/removed. And while I
[ARM][MVE] More MVETailPredication debug messages. NFC.
I've added a few more debug messages to MVETailPredication because I wanted to trace better which instructions are added/removed. And while I was at it, I factored out one function which I thought was clearer, and have added some comments to describe better the flow between MVETailPredication and ARMLowOverheadLoops.
Differential Revision: https://reviews.llvm.org/D71549
show more ...
|
#
8f6a6763 |
| 03-Jan-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][NFC] Move tail predication checks
Extract the tail predication validation checks out into their own LowOverHeadLoop method.
|
#
acbc9aed |
| 20-Dec-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Fixes for tail predication.
1) Fix an issue with the incorrect value being used for the number of elements being passed to [d|w]lstp. We were trying to check that the value was avai
[ARM][MVE] Fixes for tail predication.
1) Fix an issue with the incorrect value being used for the number of elements being passed to [d|w]lstp. We were trying to check that the value was available at LoopStart, but this doesn't consider that the last instruction in the block could also define the register. Two helpers have been added to RDA for this. 2) Insert some code to now try to move the element count def or the insertion point so that we can perform more tail predication. 3) Related to (1), the same off-by-one could prevent us from generating a low-overhead loop when a mov lr could have been the last instruction in the block. 4) Fix up some instruction attributes so that not all the low-overhead loop instructions are labelled as branches and terminators - as this is not true for dls/dlstp.
Differential Revision: https://reviews.llvm.org/D71609
show more ...
|
#
40425183 |
| 20-Dec-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Tail predicate in the presence of vcmp
Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing m
[ARM][MVE] Tail predicate in the presence of vcmp
Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing more than one instruction to define vpr, but each block must somehow be predicated using the vctp. This leaves us with several scenarios which need fixing up: 1) A VPT block with is only predicated by the vctp and has no internal vpr defs. 2) A VPT block which is only predicated by the vctp but has an internal vpr def. 3) A VPT block which is predicated upon the vctp as well as another vpr def. 4) A VPT block which is not predicated upon a vctp, but contains it and all instructions within the block are predicated upon in.
The changes needed are, for: 1) The easy one, just remove the vpst and unpredicate the instructions in the block. 2) Remove the vpst and unpredicate the instructions up to the internal vpr def. Need insert a new vpst to predicate the remaining instructions. 3) No nothing. 4) The vctp will be inside a vpt and the instruction will be removed, so adjust the size of the mask on the vpst.
Differential Revision: https://reviews.llvm.org/D71107
show more ...
|
#
049f9672 |
| 16-Dec-2019 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC.
In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have quite a few helper functions all looking at the opcod
[ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC.
In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have quite a few helper functions all looking at the opcodes of MVE instructions. This moves all these utility functions to ARMBaseInstrInfo.
Diferential Revision: https://reviews.llvm.org/D71426
show more ...
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3 |
|
#
d97cf1f8 |
| 11-Dec-2019 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop u
[ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions.
To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively.
This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet:
.. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 ..
which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this.
Differential Revision: https://reviews.llvm.org/D71007
show more ...
|
Revision tags: llvmorg-9.0.1-rc2 |
|
#
28166816 |
| 26-Nov-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][ReachingDefs] Remove dead code in loloops.
Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def
[ARM][ReachingDefs] Remove dead code in loloops.
Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def of a register.
For Arm, while tail-predicating, these helpers are used in the low-overhead loops to remove the dead code that calculates the number of loop iterations.
Differential Revision: https://reviews.llvm.org/D70240
show more ...
|
#
cced971f |
| 26-Nov-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][ReachingDefs] RDA in LoLoops
Add several new methods to ReachingDefAnalysis: - getReachingMIDef, instead of returning an integer, return the MachineInstr that produces the def. - getInstFrom
[ARM][ReachingDefs] RDA in LoLoops
Add several new methods to ReachingDefAnalysis: - getReachingMIDef, instead of returning an integer, return the MachineInstr that produces the def. - getInstFromId, return a MachineInstr for which the given integer corresponds to. - hasSameReachingDef, return whether two MachineInstr use the same def of a register. - isRegUsedAfter, return whether a register is used after a given MachineInstr.
These methods have been used in ARMLowOverhead to replace searching for uses/defs.
Differential Revision: https://reviews.llvm.org/D70009
show more ...
|
Revision tags: llvmorg-9.0.1-rc1 |
|
#
8978c12b |
| 18-Nov-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Tail predication conversion
This patch modifies ARMLowOverheadLoops to convert a predicated vector low-overhead loop into a tail-predicatd one. This is currently a very basic conversion,
[ARM][MVE] Tail predication conversion
This patch modifies ARMLowOverheadLoops to convert a predicated vector low-overhead loop into a tail-predicatd one. This is currently a very basic conversion, with the following restrictions: - Operates only on single block loops. - The loop can only contain a single vctp instruction. - No other instructions can write to the vpr. - We only allow a subset of the mve instructions in the loop.
TODO: Pass the number of elements, not the number of iterations to dlstp/wlstp.
Differential Revision: https://reviews.llvm.org/D69945
show more ...
|
#
4ba6d0de |
| 23-Sep-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Use subs during revert.
Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a
[ARM][LowOverheadLoops] Use subs during revert.
Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a cmp.
Differential Revision: https://reviews.llvm.org/D67801
llvm-svn: 372560
show more ...
|
#
566127e3 |
| 23-Sep-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Use tBcc when reverting
Check the branch target ranges and use a tBcc instead of t2Bcc when we can.
Differential Revision: https://reviews.llvm.org/D67796
llvm-svn: 372557
|
Revision tags: llvmorg-9.0.0 |
|
#
36c92227 |
| 17-Sep-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Add LR def safety check
Converting the *LoopStart pseudo instructions into DLS/WLS results in LR being defined. These instructions were inserted on the assumption that LR wou
[ARM][LowOverheadLoops] Add LR def safety check
Converting the *LoopStart pseudo instructions into DLS/WLS results in LR being defined. These instructions were inserted on the assumption that LR would already contain the loop counter because a mov is introduced during ISel as the the consumers in the loop can only use LR. That assumption proved wrong!
So perform a safety check, finding an appropriate place to insert the DLS/WLS instructions or revert if this isn't possible.
Differential Revision: https://reviews.llvm.org/D67539
llvm-svn: 372111
show more ...
|
Revision tags: llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
#
9b9a3084 |
| 15-Aug-2019 |
Eli Friedman <efriedma@quicinc.com> |
[ARM][LowOverheadLoops] Fix generated code for "revert".
Two issues:
1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't really have any visible effect at the moment, but it might ma
[ARM][LowOverheadLoops] Fix generated code for "revert".
Two issues:
1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't really have any visible effect at the moment, but it might matter in the future. 2. The t2CMPri generated for t2WhileLoopStart might need to use a register that isn't LR.
My team found this because we have a patch to track register liveness late in the pass pipeline. I'll look into upstreaming it to help catch issues like this earlier.
Differential Revision: https://reviews.llvm.org/D66243
llvm-svn: 369069
show more ...
|
Revision tags: llvmorg-9.0.0-rc2 |
|
#
173de037 |
| 07-Aug-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Revert after read/write Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: -
[ARM][LowOverheadLoops] Revert after read/write Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: - It relies on all load/store instructions being labelled as such in tablegen. - Actually any use of loop decrement is troublesome because the value doesn't exist! So we need to check for any read/write of LR that occurs between the two instructions and revert if we find anything.
Differential Revision: https://reviews.llvm.org/D65792
llvm-svn: 368130
show more ...
|
#
ed2ea3e4 |
| 30-Jul-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Revert non-header LE target Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure.
Differential Revision: h
[ARM][LowOverheadLoops] Revert non-header LE target Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure.
Differential Revision: https://reviews.llvm.org/D65268
llvm-svn: 367296
show more ...
|
Revision tags: llvmorg-9.0.0-rc1 |
|
#
a19f5a76 |
| 24-Jul-2019 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
Test commit. NFC.
Removed 2 trailing whitespaces in 2 files that used to be in different repos to test my new github monorepo workflow.
llvm-svn: 366904
|
#
4379a400 |
| 22-Jul-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Revert remaining pseudos
ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, ite
[ARM][LowOverheadLoops] Revert remaining pseudos
ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted.
Differential Revision: https://reviews.llvm.org/D65080
llvm-svn: 366691
show more ...
|
Revision tags: llvmorg-10-init |
|
#
08b4a8da |
| 11-Jul-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Correct offset checking This patch addresses a couple of problems: 1) The maximum supported offset of LE is -4094. 2) The offset of WLS also needs to be checked, this use
[ARM][LowOverheadLoops] Correct offset checking This patch addresses a couple of problems: 1) The maximum supported offset of LE is -4094. 2) The offset of WLS also needs to be checked, this uses a maximum positive offset of 4094. The use of BasicBlockUtils has been changed because the block offsets weren't being initialised, but the isBBInRange checks both positive and negative offsets. ARMISelLowering has been tweaked because the test case presented another pattern that we weren't supporting.
llvm-svn: 365749
show more ...
|
#
775b2f59 |
| 10-Jul-2019 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM] Convert lambdas to static helpers
Break up and convert some of the lambdas in ARMLowOverheadLoops into static functions.
llvm-svn: 365623
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
#
98722691 |
| 01-Jul-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM] WLS/LE Code Generation Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics th
[ARM] WLS/LE Code Generation Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction.
Differential Revision: https://reviews.llvm.org/D63816
llvm-svn: 364733
show more ...
|
Revision tags: llvmorg-8.0.1-rc3 |
|
#
bcf0eb7a |
| 25-Jun-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM] Fix for DLS/LE CodeGen
The expensive buildbots highlighted the mir tests were broken, which I've now updated and added --verify-machineinstrs to them. This also uncovered a couple of bugs in t
[ARM] Fix for DLS/LE CodeGen
The expensive buildbots highlighted the mir tests were broken, which I've now updated and added --verify-machineinstrs to them. This also uncovered a couple of bugs in the backend pass, so these have also been fixed.
llvm-svn: 364323
show more ...
|
#
a6fd919c |
| 25-Jun-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM] DLS/LE low-overhead loop code generation
Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decre
[ARM] DLS/LE low-overhead loop code generation
Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decrement_reg is lowered to two, so that we can separate the decrement and branching operations. The pseudo instructions are expanded pre-emission, where we can still decide whether we actually want to generate a low-overhead loop, in a new pass: ARMLowOverheadLoops. The pass currently bails, reverting to an sub, icmp and br, in the cases where a call or stack spill/restore happens between the decrement and branching instructions, or if the loop is too large.
Differential Revision: https://reviews.llvm.org/D63476
llvm-svn: 364288
show more ...
|