Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# b5b663aa 19-Dec-2022 Nikita Popov <npopov@redhat.com>

[Thumb2] Convert some tests to opaque pointers (NFC)


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 09124402 04-Nov-2021 David Green <david.green@arm.com>

[ARM] Move VPTBlock pass after post-ra scheduling

Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form.

[ARM] Move VPTBlock pass after post-ra scheduling

Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form. This has the unfortunate side effect of severely hampering post-ra
scheduling at times as the instructions are already stuck in vpt blocks,
not allowed to be independently ordered.

This patch addresses that by just moving the creation of VPT blocks
later in the pipeline, after post-ra scheduling has been performed. This
allows more optimal scheduling post-ra before the vpt blocks are
created, leading to more optimal tail predicated loops.

Differential Revision: https://reviews.llvm.org/D113094

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 211ce51f 22-May-2021 David Green <david.green@arm.com>

[ARM] Clean up some tests, removing dead instructions. NFC


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3
# 48230355 02-Feb-2021 David Green <david.green@arm.com>

[ARM] Remove DLS lr, lr

A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.

Differential Revision: https://reviews.llvm

[ARM] Remove DLS lr, lr

A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.

Differential Revision: https://reviews.llvm.org/D78916

show more ...


Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# 73a6cd4b 10-Nov-2020 David Green <david.green@arm.com>

[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR

This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it eas

[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR

This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it easier to
move if needed (as the input is the same as the output), or potentially
remove entirely.

The hint is added after others (from COPY's etc) which still take
precedence. It needed to find a place to add the hint, which currently
uses the post isel custom inserter.

Differential Revision: https://reviews.llvm.org/D89883

show more ...


# b2ac9681 10-Nov-2020 David Green <david.green@arm.com>

[ARM] Alter t2DoLoopStart to define lr

This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more t

[ARM] Alter t2DoLoopStart to define lr

This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

- The hardware loop pass, if UsePhi is set, now generates loops of the
form:
%start = llvm.start.loop.iterations(%N)
loop:
%p = phi [%start], [%dec]
%dec = llvm.loop.decrement.reg(%p, 1)
%c = icmp ne %dec, 0
br %c, loop, exit
- For this a new llvm.start.loop.iterations intrinsic was added, identical
to llvm.set.loop.iterations but produces a value as seen above, gluing
the loop together more through def-use chains.
- This new instrinsic conceptually produces the same output as input,
which is taught to SCEV so that the checks in MVETailPredication are not
affected.
- Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
been left mostly as before. We should now more reliably be able to tell
that the t2DoLoopStart is correct without having to prove it, but
t2WhileLoopStart and tail-predicated loops will remain the same.
- And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6
# 322d0afd 03-Oct-2020 Amara Emerson <amara@apple.com>

[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.

This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle lega

[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.

This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787

show more ...


Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# b4fa884a 22-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Improve VPT predicate tracking

The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions tha

[ARM] Improve VPT predicate tracking

The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions that makeup the block, while static structures
hold the predicate information. This enables global access for
querying how both a VPT block and individual instructions are
predicated. These changes now allow us, again, to handle more
complicated cases where multiple instructions build a predicate
and/or where the same predicate in used in multiple blocks.

It doesn't, however, get us back to before the tracking was 'fixed'
as some extra logic will be required to properly handle VPT
instructions. Currently a VPT could be effectively predicated because
of it's inputs, but the existing logic will not detect that and so
will refuse to perform the transformation. This can be seen in
remat-vctp.ll test where we still don't perform the transform.

Differential Revision: https://reviews.llvm.org/D87681

show more ...


# a63b2a46 16-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Fix tail predication predicate tracking

Clear the CurrentPredicate when we find an instruction which would
completely overwrite the VPR. This fix essentially means we're back
to not really bei

[ARM] Fix tail predication predicate tracking

Clear the CurrentPredicate when we find an instruction which would
completely overwrite the VPR. This fix essentially means we're back
to not really being able to handle VPT instructions when tail
predicating.

Differential Revision: https://reviews.llvm.org/D87610

show more ...


# ac2717bf 16-Sep-2020 Sam Tebbs <samuel.tebbs@arm.com>

[ARM][LowOverheadLoops] Fix tests after ef0b9f3

ef0b9f3 didn't update the tests that it affected.


# 3ebc7552 09-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Try to rematerialize VCTP instructions

We really want to try and avoid spilling P0, which can be difficult
since there's only one register, so try to rematerialize any VCTP
instructions.

Diff

[ARM] Try to rematerialize VCTP instructions

We really want to try and avoid spilling P0, which can be difficult
since there's only one register, so try to rematerialize any VCTP
instructions.

Differential Revision: https://reviews.llvm.org/D87280

show more ...


# a3e41d45 27-Aug-2020 Sam Parker <sam.parker@arm.com>

[ARM] Make MachineVerifier more strict about terminators

Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.

Diff

[ARM] Make MachineVerifier more strict about terminators

Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.

Differential Revision: https://reviews.llvm.org/D40061

show more ...


# c352e7fb 25-Aug-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks

This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the B

[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks

This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the BTC + 1 overflow checks because now the loop tripcount is
passed in to the intrinsic,
- we can immediately use that value to setup a counter for the number of
elements processed by the loop and don't need to materialize BTC + 1.

Differential Revision: https://reviews.llvm.org/D86303

show more ...


Revision tags: llvmorg-11.0.0-rc2
# 22916481 03-Aug-2020 David Green <david.green@arm.com>

[ARM] Convert VPSEL to VMOV in tail predicated loops

VPSEL has slightly different semantics under tail predication (it can
end up selecting from Qn, Qm and Qd). We do not model that at the moment
so

[ARM] Convert VPSEL to VMOV in tail predicated loops

VPSEL has slightly different semantics under tail predication (it can
end up selecting from Qn, Qm and Qd). We do not model that at the moment
so they block tail predicated loops from being formed.

This just converts them into a predicated VMOV instead (via a VORR),
allowing tail predication to happen whilst still modelling the original
behaviour of the input.

Differential Revision: https://reviews.llvm.org/D85110

show more ...


Revision tags: llvmorg-11.0.0-rc1
# 3533e0a0 22-Jul-2020 David Green <david.green@arm.com>

[ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z)

Most MVE instructions can be predicated to fold a select into the
instruction, using the predicate and the selects else as a pass

[ARM] Add patterns for select(p, BinOp(x, y), z) -> BinOpT(x, y,p z)

Most MVE instructions can be predicated to fold a select into the
instruction, using the predicate and the selects else as a passthough.
This adds tablegen patterns for most two operand instructions using the
newly added TwoOpPattern from 1030e82598da.

Differential Revision: https://reviews.llvm.org/D83222

show more ...


Revision tags: llvmorg-12-init
# 595270ae 13-Jul-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Refactor option -disable-mve-tail-predication

This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather t

[ARM][MVE] Refactor option -disable-mve-tail-predication

This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather than several
different ones.

This is also a prep step for D82953, in which we want to reject reductions
unless that is requested with this option.

Differential Revision: https://reviews.llvm.org/D83133

show more ...


Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2
# d1522513 17-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask

To set up a tail-predicated loop, we need to to calculate the number of
elements processed by the loop. We can now use in

[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask

To set up a tail-predicated loop, we need to to calculate the number of
elements processed by the loop. We can now use intrinsic
@llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in
D79100. This intrinsic generates a predicate for the masked loads/stores, and
consumes the Backedge Taken Count (BTC) as its second argument. We can now use
that to reconstruct the loop tripcount, instead of the IR pattern match
approach we were using before.

Many thanks to Eli Friedman and Sam Parker for all their help with this work.

This also adds overflow checks for the different, new expressions that we
create: the loop tripcount, and the sub expression that calculates the
remaining elements to be processed. For the latter, SCEV is not able to
calculate precise enough bounds, so we work around that at the moment, but is
not entirely correct yet, it's conservative. The overflow checks can be
overruled with a force flag, which is thus potentially unsafe (but not really
because the vectoriser is the only place where this intrinsic is emitted at the
moment). It's also good to mention that the tail-predication pass is not yet
enabled by default. We will follow up to see if we can implement these
overflow checks better, either by a change in SCEV or we may want revise the
definition of llvm.get.active.lane.mask.

Differential Revision: https://reviews.llvm.org/D79175

show more ...


Revision tags: llvmorg-10.0.1-rc1
# 835251f7 08-Apr-2020 Pierre-vh <pierre.vanhoutryve@arm.com>

[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.

Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOv

[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.

Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.

It also adds support for VCMPs before the VCTP.

Differential Revision: https://reviews.llvm.org/D78206

show more ...


# d5eb7ffa 01-May-2020 Pierre-vh <pierre.vanhoutryve@arm.com>

[Target][ARM] Fold or(A, B) more aggressively for I1 vectors

This patch makes the folding of or(A, B) into not(and(not(A), not(B)))
more agressive for I1 vector. This only affects Thumb2 MVE and imp

[Target][ARM] Fold or(A, B) more aggressively for I1 vectors

This patch makes the folding of or(A, B) into not(and(not(A), not(B)))
more agressive for I1 vector. This only affects Thumb2 MVE and improves
codegen, because it removes a lot of msr/mrs instructions on VPR.P0.

This patch also adds a xor(vcmp) -> !vcmp fold for MVE.

Differential Revision: https://reviews.llvm.org/D77202

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6
# 62fdb1f5 23-Mar-2020 Sam Parker <sam.parker@arm.com>

[DAGCombine] Skip PostInc combine with later users

When decided whether to generate a post-inc load/store, look at the
other memory nodes that use the same base address and, if any proceed
the curre

[DAGCombine] Skip PostInc combine with later users

When decided whether to generate a post-inc load/store, look at the
other memory nodes that use the same base address and, if any proceed
the current node, then don't do the combine.
The change only seems to be affecting the Arm backend, which I was
surprised at, but it appears to fix a lot of our issues around MVE
masked load/stores having to store a temporary address after an early
post-increment on a shared base address.

Differential Revision: https://reviews.llvm.org/D75847

show more ...


Revision tags: llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# de3e65e6 18-Feb-2020 Sam Parker <sam.parker@arm.com>

[ARM][LowOverheadLoops] Check loop liveouts

Check that no Q-regs are live out of the loop, unless the instruction
within the loop is predicated on the vctp.

Differential Revision: https://reviews.l

[ARM][LowOverheadLoops] Check loop liveouts

Check that no Q-regs are live out of the loop, unless the instruction
within the loop is predicated on the vctp.

Differential Revision: https://reviews.llvm.org/D72713

show more ...


Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1
# 8cba99e2 20-Jan-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks

This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in e

[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks

This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.

Differential Revision: https://reviews.llvm.org/D72714

show more ...


Revision tags: llvmorg-11-init
# b4abe7af 30-Dec-2019 David Green <david.green@arm.com>

[ARM] Sink splat to ICmp

This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet

[ARM] Sink splat to ICmp

This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.

Differential Revision: https://reviews.llvm.org/D70997

show more ...


# 40425183 20-Dec-2019 Sam Parker <sam.parker@arm.com>

[ARM][MVE] Tail predicate in the presence of vcmp

Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing m

[ARM][MVE] Tail predicate in the presence of vcmp

Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing more than one instruction to define vpr, but each block must
somehow be predicated using the vctp. This leaves us with several
scenarios which need fixing up:
1) A VPT block with is only predicated by the vctp and has no
internal vpr defs.
2) A VPT block which is only predicated by the vctp but has an
internal vpr def.
3) A VPT block which is predicated upon the vctp as well as another
vpr def.
4) A VPT block which is not predicated upon a vctp, but contains it
and all instructions within the block are predicated upon in.

The changes needed are, for:
1) The easy one, just remove the vpst and unpredicate the
instructions in the block.
2) Remove the vpst and unpredicate the instructions up to the
internal vpr def. Need insert a new vpst to predicate the
remaining instructions.
3) No nothing.
4) The vctp will be inside a vpt and the instruction will be removed,
so adjust the size of the mask on the vpst.

Differential Revision: https://reviews.llvm.org/D71107

show more ...


Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3
# b1aba037 08-Dec-2019 David Green <david.green@arm.com>

[ARM] Enable MVE masked loads and stores

With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://revi

[ARM] Enable MVE masked loads and stores

With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968

show more ...


12