ARMLowOverheadLoops.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# c5cf7d91	21-Dec-2021	Kazu Hirata <kazu@google.com>	[ARM] Use range-based for loops (NFC)
# de904900	20-Dec-2021	Kazu Hirata <kazu@google.com>	Revert "[ARM] Use range-based for loops (NFC)" This reverts commit 93d79cac2ede436e1e3e91b5aff702914cdfbca7. This patch seems to break llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.
# 93d79cac	20-Dec-2021	Kazu Hirata <kazu@google.com>	[ARM] Use range-based for loops (NFC)
# 26bd534a	17-Dec-2021	Kazu Hirata <kazu@google.com>	[llvm] Use none_of instead of \!any_of (NFC)
# c2bb9637	11-Dec-2021	Kazu Hirata <kazu@google.com>	Use llvm::any_of and llvm::all_of (NFC)
Revision tags: llvmorg-13.0.1-rc1
# 53801a59	07-Oct-2021	Michael Forster <forster@google.com>	Fix two unused-variable warnings.
# 73346f58	07-Oct-2021	David Green <david.green@arm.com>	[ARM] Introduce a MQPRCopy Currently when creating tail predicated loops, we need to validate that all the live-outs of a loop will be equivalent with and without tail predication, and if they are n [ARM] Introduce a MQPRCopy Currently when creating tail predicated loops, we need to validate that all the live-outs of a loop will be equivalent with and without tail predication, and if they are not we cannot legally create a tail-predicated loop, leaving expensive vctp and vpst instructions in the loop. These notably can include register-allocation instructions like stack loads and stores, and copys lowered from COPYs to MVE_VORRs. Instead of trying to prove this is valid late in the pipeline, this patch introduces a MQPRCopy pseudo instruction that COPY is lowered to. This can then either be converted to a MVE_VORR where possible, or to a couple of VMOVD instructions if not. This way they do not behave differently within and outside of tail-predications regions, and we can know by construction that they are always valid. The idea is that we can do the same with stack load and stores, converting them to VLDR/VSTR or VLDM/VSTM where required to prove tail predication is always valid. This does unfortunately mean inserting multiple VMOVD instructions, instead of a single MVE_VORR, but my experiments show it to be an improvement in general. Differential Revision: https://reviews.llvm.org/D111048 show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# 02cd8a6b	22-Sep-2021	David Green <david.green@arm.com>	[ARM] Allow smaller VMOVL in tail predicated loops This allows VMOVL in tail predicated loops so long as the the vector size the VMOVL is extending into is less than or equal to the size of the VCTP [ARM] Allow smaller VMOVL in tail predicated loops This allows VMOVL in tail predicated loops so long as the the vector size the VMOVL is extending into is less than or equal to the size of the VCTP in the tail predicated loop. These cases represent a sign-extend-inreg (or zero-extend-inreg), which needn't block tail predication as in https://godbolt.org/z/hdTsEbx8Y. For this a vecsize has been added to the TSFlag bits of MVE instructions, which stores the size of the elements that the MVE instruction operates on. In the case of multiple size (such as a MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 = i64, which often (but not always) comes from the instruction encoding directly. A unit test was added, and although only a subset of the vecsizes are currently used, the rest should be useful for other cases. Differential Revision: https://reviews.llvm.org/D109706 show more ...
Revision tags: llvmorg-13.0.0-rc3
# 9cb8f4d1	02-Sep-2021	David Green <david.green@arm.com>	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r [ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638 show more ...
Revision tags: llvmorg-13.0.0-rc2
# 8c50b5fb	11-Aug-2021	David Green <david.green@arm.com>	[ARM] Add extra debug messages for validating live outs. NFC We are running into more and more cases where the liveouts of low overhead loops do not validate. Add some extra debug messages to make i [ARM] Add extra debug messages for validating live outs. NFC We are running into more and more cases where the liveouts of low overhead loops do not validate. Add some extra debug messages to make it clearer why. show more ...
# 4fee756c	06-Aug-2021	Amara Emerson <amara@apple.com>	Delete copy-ctor of MachineFrameInfo. I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo() without declaring the variable as a reference. Deleting the copy-constructor a Delete copy-ctor of MachineFrameInfo. I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo() without declaring the variable as a reference. Deleting the copy-constructor also showed a place in the ARM backend which was doing the same thing, albeit it didn't impact correctness there from the looks of it. show more ...
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init
# ff0ef6a5	05-Jul-2021	Sam Tebbs <samuel.tebbs@arm.com>	[ARM][LowOverheadLoops] Make some stack spills valid for tail predication This patch makes vector spills valid for tail predication when all loads from the same stack slot are within the loop Diffe [ARM][LowOverheadLoops] Make some stack spills valid for tail predication This patch makes vector spills valid for tail predication when all loads from the same stack slot are within the loop Differential Revision: https://reviews.llvm.org/D105443 show more ...
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# bee2f618	13-Jun-2021	David Green <david.green@arm.com>	[ARM] Introduce t2WhileLoopStartTP This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, s [ARM] Introduce t2WhileLoopStartTP This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, so that the ARMLowOverheadLoops pass in the backend can pick the correct one without having to search for it from the operand of a VCTP. Differential Revision: https://reviews.llvm.org/D103236 show more ...
# 00d19c67	01-Jun-2021	Michael Benfield <mbenfield@google.com>	[various] Remove or use variables which are unused but set. This is in preparation for the -Wunused-but-set-variable warning. Differential Revision: https://reviews.llvm.org/D102942
Revision tags: llvmorg-12.0.1-rc1
# 543406a6	24-May-2021	David Green <david.green@arm.com>	[ARM] Allow findLoopPreheader to return headers with multiple loop successors The findLoopPreheader function will currently not find a preheader if it branches to multiple different loop headers. Th [ARM] Allow findLoopPreheader to return headers with multiple loop successors The findLoopPreheader function will currently not find a preheader if it branches to multiple different loop headers. This patch adds an option to relax that, allowing ARMLowOverheadLoops to process more loops successfully. This helps with WhileLoopStart setup instructions that can branch/fallthrough to the low overhead loop and to branch to a separate loop from the same preheader (but I don't believe it is possible for both loops to be low overhead loops). Differential Revision: https://reviews.llvm.org/D102747 show more ...
# e7a6df68	21-May-2021	David Green <david.green@arm.com>	[ARM] Fix the operand used for WLS in ARMLowOverheadLoops The Loop start instruction handled by the ARMLowOverheadLoops are: $lr = t2DoLoopStart $r0 $lr = t2DoLoopStartTP $r1, $r0 $lr = t2WhileLoopS [ARM] Fix the operand used for WLS in ARMLowOverheadLoops The Loop start instruction handled by the ARMLowOverheadLoops are: $lr = t2DoLoopStart $r0 $lr = t2DoLoopStartTP $r1, $r0 $lr = t2WhileLoopStartLR $r0, %bb, implicit-def dead $cpsr All three of these will have LR as the 0 argument, the trip count as the 1 argument. This patch updated a few places in ARMLowOverheadLoops where the 0th arg was being used for t2WhileLoopStartLR instructions as the trip count. One place was entirely removed as it does not seem valid any more, the case the code is trying to protect against should not be able to occur with our correct-by-construction low overhead loops. Differential Revision: https://reviews.llvm.org/D102620 show more ...
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# f22b4c71	19-Mar-2021	Victor Campos <victor.campos@arm.com>	[ARM] Handle debug instrs in ARM Low Overhead Loop pass In function ConvertVPTBlocks(), it is assumed that every instruction within a vector-predicated block is predicated. This is false for debug i [ARM] Handle debug instrs in ARM Low Overhead Loop pass In function ConvertVPTBlocks(), it is assumed that every instruction within a vector-predicated block is predicated. This is false for debug instructions, used by LLVM. Because of this, an assertion failure is reached when an input contains debug instructions inside VPT blocks. In non-assert builds, an out of bounds memory access took place. The present patch properly covers the case of debug instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99075 show more ...
# fad70c30	11-Mar-2021	David Green <david.green@arm.com>	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over t [ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729 show more ...
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# 7786ac83	11-Feb-2021	David Green <david.green@arm.com>	[ARM] Remove dead mov's in preheader of tail predicated loops With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we [ARM] Remove dead mov's in preheader of tail predicated loops With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we remove other dead variables. Differential Revision: https://reviews.llvm.org/D91857 show more ...
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3
# 48230355	02-Feb-2021	David Green <david.green@arm.com>	[ARM] Remove DLS lr, lr A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader. Differential Revision: https://reviews.llvm [ARM] Remove DLS lr, lr A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader. Differential Revision: https://reviews.llvm.org/D78916 show more ...
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init
# 05444417	24-Jan-2021	Kazu Hirata <kazu@google.com>	[Target] Use llvm::append_range (NFC)
# e4847a7f	23-Jan-2021	Kazu Hirata <kazu@google.com>	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit cc7a23828657f35f706343982cf96bb6583d4d73. The X86WinEHState.cpp hunk seems to break certain builds.
# cc7a2382	23-Jan-2021	Kazu Hirata <kazu@google.com>	[Target] Use llvm::append_range (NFC)
Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 1de3e7fd	14-Dec-2020	David Green <david.green@arm.com>	[ARM] Improve handling of empty VPT blocks in tail predicated loops A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the VCTP is removed will become invalid. This fixed the first by [ARM] Improve handling of empty VPT blocks in tail predicated loops A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the VCTP is removed will become invalid. This fixed the first by removing the now empty block and bails out for the second, as we have no simple way of converting a VPT to a VCMP. Differential Revision: https://reviews.llvm.org/D92369 show more ...
# 0447f350	10-Dec-2020	David Green <david.green@arm.com>	[ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop [ARM][RegAlloc] Add t2LoopEndDec We currently have problems with the way that low overhead loops are specified, with LR being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail. Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions. This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them. Differential Revision: https://reviews.llvm.org/D91358 show more ...
123 4 5 6