History log of /llvm-project/llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp (Results 26 – 50 of 148)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# c5cf7d91 21-Dec-2021 Kazu Hirata <kazu@google.com>

[ARM] Use range-based for loops (NFC)


# de904900 20-Dec-2021 Kazu Hirata <kazu@google.com>

Revert "[ARM] Use range-based for loops (NFC)"

This reverts commit 93d79cac2ede436e1e3e91b5aff702914cdfbca7.

This patch seems to break
llvm/test/CodeGen/ARM/constant-islands-cfg.mir under asan.


# 93d79cac 20-Dec-2021 Kazu Hirata <kazu@google.com>

[ARM] Use range-based for loops (NFC)


# 26bd534a 17-Dec-2021 Kazu Hirata <kazu@google.com>

[llvm] Use none_of instead of \!any_of (NFC)


# c2bb9637 11-Dec-2021 Kazu Hirata <kazu@google.com>

Use llvm::any_of and llvm::all_of (NFC)


Revision tags: llvmorg-13.0.1-rc1
# 53801a59 07-Oct-2021 Michael Forster <forster@google.com>

Fix two unused-variable warnings.


# 73346f58 07-Oct-2021 David Green <david.green@arm.com>

[ARM] Introduce a MQPRCopy

Currently when creating tail predicated loops, we need to validate that
all the live-outs of a loop will be equivalent with and without tail
predication, and if they are n

[ARM] Introduce a MQPRCopy

Currently when creating tail predicated loops, we need to validate that
all the live-outs of a loop will be equivalent with and without tail
predication, and if they are not we cannot legally create a
tail-predicated loop, leaving expensive vctp and vpst instructions in
the loop. These notably can include register-allocation instructions
like stack loads and stores, and copys lowered from COPYs to MVE_VORRs.

Instead of trying to prove this is valid late in the pipeline, this
patch introduces a MQPRCopy pseudo instruction that COPY is lowered to.
This can then either be converted to a MVE_VORR where possible, or to a
couple of VMOVD instructions if not. This way they do not behave
differently within and outside of tail-predications regions, and we can
know by construction that they are always valid. The idea is that we can
do the same with stack load and stores, converting them to VLDR/VSTR or
VLDM/VSTM where required to prove tail predication is always valid.

This does unfortunately mean inserting multiple VMOVD instructions,
instead of a single MVE_VORR, but my experiments show it to be an
improvement in general.

Differential Revision: https://reviews.llvm.org/D111048

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# 02cd8a6b 22-Sep-2021 David Green <david.green@arm.com>

[ARM] Allow smaller VMOVL in tail predicated loops

This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP

[ARM] Allow smaller VMOVL in tail predicated loops

This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP in the tail predicated loop. These cases represent a
sign-extend-inreg (or zero-extend-inreg), which needn't block tail
predication as in https://godbolt.org/z/hdTsEbx8Y.

For this a vecsize has been added to the TSFlag bits of MVE
instructions, which stores the size of the elements that the MVE
instruction operates on. In the case of multiple size (such as a
MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be
chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 =
i64, which often (but not always) comes from the instruction encoding
directly. A unit test was added, and although only a subset of the
vecsizes are currently used, the rest should be useful for other cases.

Differential Revision: https://reviews.llvm.org/D109706

show more ...


Revision tags: llvmorg-13.0.0-rc3
# 9cb8f4d1 02-Sep-2021 David Green <david.green@arm.com>

[ARM] Add a tail-predication loop predicate register

The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r

[ARM] Add a tail-predication loop predicate register

The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3 // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.

This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.

A lot of tests needed updating.

Differential Revision: https://reviews.llvm.org/D107638

show more ...


Revision tags: llvmorg-13.0.0-rc2
# 8c50b5fb 11-Aug-2021 David Green <david.green@arm.com>

[ARM] Add extra debug messages for validating live outs. NFC

We are running into more and more cases where the liveouts of low
overhead loops do not validate. Add some extra debug messages to make i

[ARM] Add extra debug messages for validating live outs. NFC

We are running into more and more cases where the liveouts of low
overhead loops do not validate. Add some extra debug messages to make it
clearer why.

show more ...


# 4fee756c 06-Aug-2021 Amara Emerson <amara@apple.com>

Delete copy-ctor of MachineFrameInfo.

I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.

Deleting the copy-constructor a

Delete copy-ctor of MachineFrameInfo.

I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.

Deleting the copy-constructor also showed a place in the ARM backend which was
doing the same thing, albeit it didn't impact correctness there from the looks of it.

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init
# ff0ef6a5 05-Jul-2021 Sam Tebbs <samuel.tebbs@arm.com>

[ARM][LowOverheadLoops] Make some stack spills valid for tail predication

This patch makes vector spills valid for tail predication when all loads
from the same stack slot are within the loop

Diffe

[ARM][LowOverheadLoops] Make some stack spills valid for tail predication

This patch makes vector spills valid for tail predication when all loads
from the same stack slot are within the loop

Differential Revision: https://reviews.llvm.org/D105443

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# bee2f618 13-Jun-2021 David Green <david.green@arm.com>

[ARM] Introduce t2WhileLoopStartTP

This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in
D90591. It keeps a reference to both the tripcount register and the
element count register, s

[ARM] Introduce t2WhileLoopStartTP

This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in
D90591. It keeps a reference to both the tripcount register and the
element count register, so that the ARMLowOverheadLoops pass in the
backend can pick the correct one without having to search for it from
the operand of a VCTP.

Differential Revision: https://reviews.llvm.org/D103236

show more ...


# 00d19c67 01-Jun-2021 Michael Benfield <mbenfield@google.com>

[various] Remove or use variables which are unused but set.

This is in preparation for the -Wunused-but-set-variable warning.

Differential Revision: https://reviews.llvm.org/D102942


Revision tags: llvmorg-12.0.1-rc1
# 543406a6 24-May-2021 David Green <david.green@arm.com>

[ARM] Allow findLoopPreheader to return headers with multiple loop successors

The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. Th

[ARM] Allow findLoopPreheader to return headers with multiple loop successors

The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).

Differential Revision: https://reviews.llvm.org/D102747

show more ...


# e7a6df68 21-May-2021 David Green <david.green@arm.com>

[ARM] Fix the operand used for WLS in ARMLowOverheadLoops

The Loop start instruction handled by the ARMLowOverheadLoops are:
$lr = t2DoLoopStart $r0
$lr = t2DoLoopStartTP $r1, $r0
$lr = t2WhileLoopS

[ARM] Fix the operand used for WLS in ARMLowOverheadLoops

The Loop start instruction handled by the ARMLowOverheadLoops are:
$lr = t2DoLoopStart $r0
$lr = t2DoLoopStartTP $r1, $r0
$lr = t2WhileLoopStartLR $r0, %bb, implicit-def dead $cpsr
All three of these will have LR as the 0 argument, the trip count as the
1 argument.

This patch updated a few places in ARMLowOverheadLoops where the 0th arg
was being used for t2WhileLoopStartLR instructions as the trip count.
One place was entirely removed as it does not seem valid any more, the
case the code is trying to protect against should not be able to occur
with our correct-by-construction low overhead loops.

Differential Revision: https://reviews.llvm.org/D102620

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# f22b4c71 19-Mar-2021 Victor Campos <victor.campos@arm.com>

[ARM] Handle debug instrs in ARM Low Overhead Loop pass

In function ConvertVPTBlocks(), it is assumed that every instruction
within a vector-predicated block is predicated. This is false for debug
i

[ARM] Handle debug instrs in ARM Low Overhead Loop pass

In function ConvertVPTBlocks(), it is assumed that every instruction
within a vector-predicated block is predicated. This is false for debug
instructions, used by LLVM.

Because of this, an assertion failure is reached when an input contains
debug instructions inside VPT blocks. In non-assert builds, an out of
bounds memory access took place.

The present patch properly covers the case of debug instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99075

show more ...


# fad70c30 11-Mar-2021 David Green <david.green@arm.com>

[ARM] Improve WLS lowering

Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over t

[ARM] Improve WLS lowering

Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.

Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:

%wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
%wls0 = extractvalue { i32, i1 } %wls, 0
%wls1 = extractvalue { i32, i1 } %wls, 1
br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
%lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
..
%iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
%cmp = icmp ne i32 %iv.next, 0
br i1 %cmp, label %loop, label %loop.exit

The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).

These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.

%1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
t2B %bb.1
...
bb.2.loop:
%2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
...
%3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
t2B %bb.3

The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.

Differential Revision: https://reviews.llvm.org/D97729

show more ...


Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# 7786ac83 11-Feb-2021 David Green <david.green@arm.com>

[ARM] Remove dead mov's in preheader of tail predicated loops

With t2DoLoopDec we can be left with some extra MOV's in the preheaders
of tail predicated loops. This removes them, in the same way we

[ARM] Remove dead mov's in preheader of tail predicated loops

With t2DoLoopDec we can be left with some extra MOV's in the preheaders
of tail predicated loops. This removes them, in the same way we remove
other dead variables.

Differential Revision: https://reviews.llvm.org/D91857

show more ...


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3
# 48230355 02-Feb-2021 David Green <david.green@arm.com>

[ARM] Remove DLS lr, lr

A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.

Differential Revision: https://reviews.llvm

[ARM] Remove DLS lr, lr

A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.

Differential Revision: https://reviews.llvm.org/D78916

show more ...


Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init
# 05444417 24-Jan-2021 Kazu Hirata <kazu@google.com>

[Target] Use llvm::append_range (NFC)


# e4847a7f 23-Jan-2021 Kazu Hirata <kazu@google.com>

Revert "[Target] Use llvm::append_range (NFC)"

This reverts commit cc7a23828657f35f706343982cf96bb6583d4d73.

The X86WinEHState.cpp hunk seems to break certain builds.


# cc7a2382 23-Jan-2021 Kazu Hirata <kazu@google.com>

[Target] Use llvm::append_range (NFC)


Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 1de3e7fd 14-Dec-2020 David Green <david.green@arm.com>

[ARM] Improve handling of empty VPT blocks in tail predicated loops

A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the
VCTP is removed will become invalid. This fixed the first by

[ARM] Improve handling of empty VPT blocks in tail predicated loops

A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the
VCTP is removed will become invalid. This fixed the first by removing
the now empty block and bails out for the second, as we have no simple
way of converting a VPT to a VCMP.

Differential Revision: https://reviews.llvm.org/D92369

show more ...


# 0447f350 10-Dec-2020 David Green <david.green@arm.com>

[ARM][RegAlloc] Add t2LoopEndDec

We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop

[ARM][RegAlloc] Add t2LoopEndDec

We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.

There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.

Differential Revision: https://reviews.llvm.org/D91358

show more ...


123456