History log of /llvm-project/llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp (Results 76 – 100 of 148)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 2fc690ac 24-Sep-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM] LowoverheadLoops: add an option to disable tail-predication

This might be useful for testing. We already have an option -tail-predication
but that controls the MVETailPredication pass. This
-

[ARM] LowoverheadLoops: add an option to disable tail-predication

This might be useful for testing. We already have an option -tail-predication
but that controls the MVETailPredication pass. This
-arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops
pass.

Differential Revision: https://reviews.llvm.org/D88212

show more ...


# 9d9a11c7 24-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Check for LSTP side-effects.

If the LSTP instruction is inserted with an element count low enough
to immediately predicate some lanes as false, this can have some
unintended effects on any pro

[ARM] Check for LSTP side-effects.

If the LSTP instruction is inserted with an element count low enough
to immediately predicate some lanes as false, this can have some
unintended effects on any proceeding MVE instructions in the
preheader.

Differential Revision: https://reviews.llvm.org/D88209

show more ...


# 89c1e35f 22-Sep-2020 Stefanos Baziotis <sdi1600105@di.uoa.gr>

[LoopInfo] empty() -> isInnermost(), add isOutermost()

Differential Revision: https://reviews.llvm.org/D82895


# 94c799fe 22-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Trying to fix asan buildbot


# b4fa884a 22-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Improve VPT predicate tracking

The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions tha

[ARM] Improve VPT predicate tracking

The VPTBlock has been modified to track the 'global' state of the
VPR, as well as the state for each block. Each object now just holds
a list of instructions that makeup the block, while static structures
hold the predicate information. This enables global access for
querying how both a VPT block and individual instructions are
predicated. These changes now allow us, again, to handle more
complicated cases where multiple instructions build a predicate
and/or where the same predicate in used in multiple blocks.

It doesn't, however, get us back to before the tracking was 'fixed'
as some extra logic will be required to properly handle VPT
instructions. Currently a VPT could be effectively predicated because
of it's inputs, but the existing logic will not detect that and so
will refuse to perform the transformation. This can be seen in
remat-vctp.ll test where we still don't perform the transform.

Differential Revision: https://reviews.llvm.org/D87681

show more ...


# a0c1dcc3 21-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Remove MVEDomain from VLDR/STR of P0

Remove the domain from the instructions and create a shouldInspect
helper for LowOverheadLoops which queries it or a vpr operand.

Differential Revision: h

[ARM] Remove MVEDomain from VLDR/STR of P0

Remove the domain from the instructions and create a shouldInspect
helper for LowOverheadLoops which queries it or a vpr operand.

Differential Revision: https://reviews.llvm.org/D87900

show more ...


# 3ce9ec0c 16-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Reorder some logic

Re-order some checks in ValidateMVEInst.


# a63b2a46 16-Sep-2020 Sam Parker <sam.parker@arm.com>

[ARM] Fix tail predication predicate tracking

Clear the CurrentPredicate when we find an instruction which would
completely overwrite the VPR. This fix essentially means we're back
to not really bei

[ARM] Fix tail predication predicate tracking

Clear the CurrentPredicate when we find an instruction which would
completely overwrite the VPR. This fix essentially means we're back
to not really being able to handle VPT instructions when tail
predicating.

Differential Revision: https://reviews.llvm.org/D87610

show more ...


# ef0b9f33 14-Sep-2020 Sam Tebbs <samuel.tebbs@arm.com>

[ARM][LowOverheadLoops] Combine a VCMP and VPST into a VPT

This patch combines a VCMP followed by a VPST into a VPT, which has the
same semantics as the combination of the former two.


# b81c57d6 09-Sep-2020 Sam Tebbs <samuel.tebbs@arm.com>

[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane
values

The effects of unpredicated vector instruction with unknown
lanes cannot be predicted and therefore

[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane
values

The effects of unpredicated vector instruction with unknown
lanes cannot be predicted and therefore cannot be tail predicated. This
does not apply to predicated vector instructions and so this patch
allows tail predication on them.

Differential Revision: https://reviews.llvm.org/D87376

show more ...


# 7aabb6ad 07-Sep-2020 Sam Tebbs <samuel.tebbs@arm.com>

[ARM][LowOverheadLoops] Remove modifications to the correct element
count register

After my patch at D86087, code that now uses the mov operand rather than
the vctp operand will no longer remove mod

[ARM][LowOverheadLoops] Remove modifications to the correct element
count register

After my patch at D86087, code that now uses the mov operand rather than
the vctp operand will no longer remove modifications to the vctp operand
as they should. This patch fixes that by explicitly removing
modifications to the vctp operand rather than the register used as the
element count.

show more ...


# b30adfb5 28-Aug-2020 Sam Parker <sam.parker@arm.com>

[ARM][LowOverheadLoops] Liveouts and reductions

Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the

[ARM][LowOverheadLoops] Liveouts and reductions

Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.

Differential Revision: https://reviews.llvm.org/D86613

show more ...


Revision tags: llvmorg-11.0.0-rc2
# c466c5fa 18-Aug-2020 Fangrui Song <i@maskray.me>

[ARM] Fix build after D86087


# 3471520b 18-Aug-2020 David Green <david.green@arm.com>

[ARM] Allow tail predication of VLDn

VLD2/4 instructions cannot be predicated, so we cannot tail predicate
them from autovec. From intrinsics though, they should be valid as they
will just end up lo

[ARM] Allow tail predication of VLDn

VLD2/4 instructions cannot be predicated, so we cannot tail predicate
them from autovec. From intrinsics though, they should be valid as they
will just end up loading extra values into off vector lanes, not
effecting the on lanes. The same is true for loads in general where so
long as we are not using the other vector lanes, an unpredicated load
can be converted to a predicated one.

This marks VLD2 and VLD4 instructions as validForTailPredication and
allows any unpredicated load in tail predication loop, which seems to be
valid given the other checks we have.

Differential Revision: https://reviews.llvm.org/D86022

show more ...


# 31f02ac6 17-Aug-2020 Sam Tebbs <samuel.tebbs@arm.com>

[ARM] Use mov operand if the mov cannot be moved while tail predicating

There are some cases where the instruction that sets up the iteration
count for a tail predicated loop cannot be moved before

[ARM] Use mov operand if the mov cannot be moved while tail predicating

There are some cases where the instruction that sets up the iteration
count for a tail predicated loop cannot be moved before the dlstp,
stopping tail predication entirely. This patch checks if the mov operand
can be used and if so, uses that instead.

Differential Revision: https://reviews.llvm.org/D86087

show more ...


Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3
# 3ee580d0 01-Jul-2020 Sam Parker <sam.parker@arm.com>

[ARM][LowOverheadLoops] Handle reductions

While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which

[ARM][LowOverheadLoops] Handle reductions

While validating live-out values, record instructions that look like
a reduction. This will comprise of a vector op (for now only vadd),
a vorr (vmov) which store the previous value of vadd and then a vpsel
in the exit block which is predicated upon a vctp. This vctp will
combine the last two iterations using the vmov and vadd into a vector
which can then be consumed by a vaddv.

Once we have determined that it's safe to perform tail-predication,
we need to change this sequence of instructions so that the
predication doesn't produce incorrect code. This involves changing
the register allocation of the vadd so it updates itself and the
predication on the final iteration will not update the falsely
predicated lanes. This mimics what the vmov, vctp and vpsel do and
so we then don't need any of those instructions.

Differential Revision: https://reviews.llvm.org/D75533

show more ...


Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1
# 835251f7 08-Apr-2020 Pierre-vh <pierre.vanhoutryve@arm.com>

[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.

Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOv

[Target][ARM] Make Low Overhead Loops coexist with VPT blocks.

Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.

It also adds support for VCMPs before the VCTP.

Differential Revision: https://reviews.llvm.org/D78206

show more ...


# 24bf8063 08-Apr-2020 Pierre-vh <pierre.vanhoutryve@arm.com>

[Target][ARM] Replace outdated getARMVPTBlockMask function

getARMVPTBlockMask was an outdated function that only handled basic
block masks: T, TT, TTT and TTTT. This worked fine before the MVE
VPT B

[Target][ARM] Replace outdated getARMVPTBlockMask function

getARMVPTBlockMask was an outdated function that only handled basic
block masks: T, TT, TTT and TTTT. This worked fine before the MVE
VPT Block Insertion Pass improvements as it was the only kind of
masks that it could generate, but now it can generate more complex
masks that uses E predicates, so it's dangerous to use that function
to calculate VPT/VPST block masks.

I replaced it with 2 different functions:
- expandPredBlockMask, in ARMBaseInfo. This adds an "E" or "T" at
the end of an existing PredBlockMask.
- recomputeVPTBlockMask, in Thumb2InstrInfo. This takes an iterator
to a VPT/VPST instruction and recomputes its block mask by looking
at the predicated instructions that follows it. This should be
used to recompute a block mask after removing/adding a predicated
instruction to the block.

The expandPredBlockMask function is pretty much imported from the MVE
VPT Blocks pass.

I had to change the ARMLowOverheadLoops and MVEVPTBlocks passes as well
so they could use these new functions.

Differential Revision: https://reviews.llvm.org/D78201

show more ...


# 892af45c 22-Apr-2020 David Green <david.green@arm.com>

[ARM] Distribute MVE post-increments

This adds some extra processing into the Pre-RA ARM load/store optimizer
to detect and merge MVE loads/stores and adds of the same base. This we
don't always tur

[ARM] Distribute MVE post-increments

This adds some extra processing into the Pre-RA ARM load/store optimizer
to detect and merge MVE loads/stores and adds of the same base. This we
don't always turn into a post-inc during ISel, and due to the nature of
it being a graph we don't always know an order to use for the nodes, not
knowing which nodes to make post-inc and which to use the new post-inc
of. After ISel, we have an order that we can use to post-inc the
following instructions.

So this looks for a loads/store with a starting offset of 0, and an
add/sub from the same base, plus a number of other loads/stores. We then
do some checks and convert the zero offset load/store into a postinc
variant. Any loads/stores after it have the offset subtracted from their
immediates. For example:
LDR #4 LDR #4
LDR #0 LDR_POSTINC #16
LDR #8 LDR #-8
LDR #12 LDR #-4
ADD #16
It only handles MVE loads/stores at the moment. Normal loads/store will
be added in a followup patch, they just have some extra details to
ensure that we keep generating LDRD/LDM successfully.

Differential Revision: https://reviews.llvm.org/D77813

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5
# dad84828 13-Mar-2020 Pierre-vh <pierre.vanhoutryve@arm.com>

[Target][ARM] Change VPTMaskValues to the correct encoding

VPTMaskValue was using the "instruction" encoding to represent the masks
(= the same encoding as the one used by the instructions in an obj

[Target][ARM] Change VPTMaskValues to the correct encoding

VPTMaskValue was using the "instruction" encoding to represent the masks
(= the same encoding as the one used by the instructions in an object file),
but it is only used to build MCOperands, so it should use the MCOperand
encoding of the masks, which is slightly different.

Differential Revision: https://reviews.llvm.org/D76139

show more ...


# 94b195ff 30-Mar-2020 Sam Parker <sam.parker@arm.com>

[ARM][LowOverheadLoops] Add horizontal reduction support

Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredicat

[ARM][LowOverheadLoops] Add horizontal reduction support

Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D76708

show more ...


# d7084fa3 27-Mar-2020 Sam Parker <sam.parker@arm.com>

[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros

Given that some instructions generate wider result elements than
their inputs, flag them as being able to generate non zeros i

[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros

Given that some instructions generate wider result elements than
their inputs, flag them as being able to generate non zeros in the
false lanes.

Differential Revision: https://reviews.llvm.org/D76766

show more ...


# 94cacebc 24-Mar-2020 Sam Parker <sam.parker@arm.com>

[ARM][LowOverheadLoops] Add checks for narrowing

Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely,
including checks on specific operations that can generate non-zeros
from zero value

[ARM][LowOverheadLoops] Add checks for narrowing

Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely,
including checks on specific operations that can generate non-zeros
from zero values, e.g VMVN. We can then check that any instructions
that retain some information in their output register (all narrowing
instructions) that they only use and def registers that always have
zeros in their falsely predicated bytes, whether or not tail
predication happens.

Most of the logic remains the same, just the names of the data
structures and helpers have been renamed to reflect the change in
logic. The key change, apart from the opcode checkers, is that the
FalseZeros set now strictly contains only instructions which will
always generate zeros, and not instructions that could also have
their false bytes masked away later.

Differential Revision: https://reviews.llvm.org/D76235

show more ...


Revision tags: llvmorg-10.0.0-rc4
# d941df36 11-Mar-2020 Sam Parker <sam.parker@arm.com>

[NFC][ARM] Reorder some logic

Move some logic around in LowOverheadLoop::ValidateLiveOut


# ff9ac33e 10-Mar-2020 Sam Parker <sam.parker@arm.com>

[ARM][MVE] Validate tail predication values

Iterate through the loop and check that the observable values
produced are the same whether tail predication happens or not.

We want to find out if the t

[ARM][MVE] Validate tail predication values

Iterate through the loop and check that the observable values
produced are the same whether tail predication happens or not.

We want to find out if the tail-predicated version of this loop will
produce the same values as the loop in its original form. For this to
be true, the newly inserted implicit predication must not change the
the (observable) results.

We're doing this because many instructions in the loop will not be
predicated and so the conversion from VPT predication to tail
predication can result in different values being produced, because of
falsely predicated lanes not being updated in the converted form.

A masked load, whether through VPT or tail predication, will write
zeros to any of the falsely predicated bytes. So, from the loads, we
know that the false lanes are zeroed and here we're trying to track
that those false lanes remain zero, or where they change, the
differences are masked away by their user(s).

All MVE loads and stores have to be predicated, so we know that any
load operands, or stored results are equivalent already. Other
explicitly predicated instructions will perform the same operation in
the original loop and the tail-predicated form too. Because of this,
we can insert loads, stores and other predicated instructions into
our KnownFalseZeros set and build from there.

Differential Revision: https://reviews.llvm.org/D75452

show more ...


123456