History log of /llvm-project/llvm/lib/Target/ARM/MVETailPredication.cpp (Results 26 – 50 of 65)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# b4b1b841 15-Sep-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[MVE] fix typo in llvm debug message. NFC.


# 676febc0 09-Sep-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value

This adds additional checks for the original scalar loop tripcount value, i.e.
get.active.lane.mask second argument, and perform se

[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value

This adds additional checks for the original scalar loop tripcount value, i.e.
get.active.lane.mask second argument, and perform several sanity checks to see
if it is of the form that we expect similarly like we already do for the IV
which is the first argument of get.active.lane.

Differential Revision: https://reviews.llvm.org/D86074

show more ...


# 4ca60915 28-Aug-2020 David Green <david.green@arm.com>

[ARM] Correct predicate operand for offset gather/scatter

These arm_mve_vldr_gather_offset_predicated and
arm_mve_vstr_scatter_offset_predicated have some extra parameters
meaning the predicate is a

[ARM] Correct predicate operand for offset gather/scatter

These arm_mve_vldr_gather_offset_predicated and
arm_mve_vstr_scatter_offset_predicated have some extra parameters
meaning the predicate is at a later operand. If a loop contains _only_
those masked instructions, we would miss transforming the active lane
mask.

Differential Revision: https://reviews.llvm.org/D86791

show more ...


# c352e7fb 25-Aug-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks

This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the B

[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks

This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the BTC + 1 overflow checks because now the loop tripcount is
passed in to the intrinsic,
- we can immediately use that value to setup a counter for the number of
elements processed by the loop and don't need to materialize BTC + 1.

Differential Revision: https://reviews.llvm.org/D86303

show more ...


Revision tags: llvmorg-11.0.0-rc2
# 9eb9ba07 13-Aug-2020 Anna Welker <anna.welker@arm.com>

[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters

Fix to include non-predicated version of write-back gather in special case
treatment for deducting the instruction type.

[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters

Fix to include non-predicated version of write-back gather in special case
treatment for deducting the instruction type.
(This is fixing https://reviews.llvm.org/D85138 for corner cases)

Differential Revision: https://reviews.llvm.org/D85889

show more ...


# 4fe5615e 12-Aug-2020 Anna Welker <anna.welker@arm.com>

[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters

Widen the scope of memory operations that are allowed to be tail predicated
to include gathers and scatters, such that loo

[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters

Widen the scope of memory operations that are allowed to be tail predicated
to include gathers and scatters, such that loops that are auto-vectorized
with the option -enable-arm-maskedgatscat (and actually end up containing
an MVE gather or scatter) can be tail predicated.

Differential Revision: https://reviews.llvm.org/D85138

show more ...


# 6716e786 11-Aug-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] tail-predication: overflow checks for backedge taken count.

This pick ups the work on the overflow checks for get.active.lane.mask,
which ensure that it is safe to insert the VCTP intrini

[ARM][MVE] tail-predication: overflow checks for backedge taken count.

This pick ups the work on the overflow checks for get.active.lane.mask,
which ensure that it is safe to insert the VCTP intrinisc that enables
tail-predication. For a 2d auto-correlation kernel and its inner loop j:

M = Size - i;
for (j = 0; j < M; j++)
Sum += Input[j] * Input[j+i];

For this inner loop, the SCEV backedge taken count (BTC) expression is:

(-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body>

and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC
cannot be max" could not be determined. So overflow behaviour had to be assumed
in the loop tripcount expression that uses the BTC. As a result
tail-predication had to be forced (with an option) for this case.

This change solves that by using ScalarEvolution's helper
getConstantMaxBackedgeTakenCount which is able to determine the range of BTC,
thus can determine it is safe, so that we no longer need to force tail-predication
as reflected in the changed test cases.

Differential Revision: https://reviews.llvm.org/D85737

show more ...


# 8590e5ab 09-Aug-2020 David Green <david.green@arm.com>

[ARM] Allow vecreduce_add in tail predicated loops

This allows vecreduce_add in loops so that we can tailpredicate them.

Differential Revision: https://reviews.llvm.org/D85454


Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init
# 595270ae 13-Jul-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Refactor option -disable-mve-tail-predication

This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather t

[ARM][MVE] Refactor option -disable-mve-tail-predication

This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather than several
different ones.

This is also a prep step for D82953, in which we want to reject reductions
unless that is requested with this option.

Differential Revision: https://reviews.llvm.org/D83133

show more ...


Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3
# 3324e3a6 30-Jun-2020 Samuel Tebbs <samteb02@e124480.cambridge.arm.com>

[ARM] Allow the fabs intrinsic to be tail predicated

This patch stops the fabs intrinsic from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82570


# 66fa3139 30-Jun-2020 Samuel Tebbs <samteb02@e124480.cambridge.arm.com>

[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated

This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication.

Differential Revision: https://reviews.l

[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated

This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82571

show more ...


# af459076 29-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Tail-predication: clean-up of unused code

After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP
intrinsic can be cloned to exit blocks if there are instructions pr

[ARM][MVE] Tail-predication: clean-up of unused code

After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP
intrinsic can be cloned to exit blocks if there are instructions present in it
that perform the same operation, but this wasn't triggering anymore. However,
it turns out that for handling reductions, see D75533, it's actually easier not
not to have the VCTP in exit blocks, so this removes that code.

This was possible because it turned out that some other code that depended on
this, rematerialization of the trip count enabling more dead code removal
later, wasn't doing much anymore due to more aggressive dead code removal that
was added to the low-overhead loops pass.

Differential Revision: https://reviews.llvm.org/D82773

show more ...


# d9cb811c 30-Jun-2020 Samuel Tebbs <samteb02@e124480.cambridge.arm.com>

[ARM] Allow rounding intrinsics to be tail predicated

This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication.

Differential Revision: https://reviews.llvm.

[ARM] Allow rounding intrinsics to be tail predicated

This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82553

show more ...


Revision tags: llvmorg-10.0.1-rc2
# 1319d9bb 22-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass

Don't revert intrinsic get.active.lane.mask here, this is moved to isel
legalization in D82292.

Differential Revision: https://r

[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass

Don't revert intrinsic get.active.lane.mask here, this is moved to isel
legalization in D82292.

Differential Revision: https://reviews.llvm.org/D82105

show more ...


# 187f627a 25-Jun-2020 Sam Tebbs <samuel.tebbs@arm.com>

[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics

This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication.

Differential revision: https://reviews.llvm.or

[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics

This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication.

Differential revision: https://reviews.llvm.org/D82377

show more ...


# c18b7536 24-Jun-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC.

Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.


# 4aa893b8 19-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] tail-predication: renamed internal option.

Renamed -force-tail-predication to -force-mve-tail-predication because
that's more descriptive and consistent.


# d1522513 17-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask

To set up a tail-predicated loop, we need to to calculate the number of
elements processed by the loop. We can now use in

[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask

To set up a tail-predicated loop, we need to to calculate the number of
elements processed by the loop. We can now use intrinsic
@llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in
D79100. This intrinsic generates a predicate for the masked loads/stores, and
consumes the Backedge Taken Count (BTC) as its second argument. We can now use
that to reconstruct the loop tripcount, instead of the IR pattern match
approach we were using before.

Many thanks to Eli Friedman and Sam Parker for all their help with this work.

This also adds overflow checks for the different, new expressions that we
create: the loop tripcount, and the sub expression that calculates the
remaining elements to be processed. For the latter, SCEV is not able to
calculate precise enough bounds, so we work around that at the moment, but is
not entirely correct yet, it's conservative. The overflow checks can be
overruled with a force flag, which is thus potentially unsafe (but not really
because the vectoriser is the only place where this intrinsic is emitted at the
moment). It's also good to mention that the tail-predication pass is not yet
enabled by default. We will follow up to see if we can implement these
overflow checks better, either by a change in SCEV or we may want revise the
definition of llvm.get.active.lane.mask.

Differential Revision: https://reviews.llvm.org/D79175

show more ...


# 7eed772a 23-May-2020 Sanjay Patel <spatel@rotateright.com>

[PatternMatch] abbreviate vector inst matchers; NFC

Readability is not reduced with these opcodes/match lines,
so reduce odds of awkward wrapping from 80-col limit.


# bcbd26bf 20-May-2020 Florian Hahn <flo@fhahn.com>

[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).

SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using o

[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).

SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

This patch was originally committed as b8a3c34eee06, but broke the
modules build, as LoopAccessAnalysis was using the Expander.

The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537

show more ...


Revision tags: llvmorg-10.0.1-rc1
# 245679b6 15-May-2020 Christopher Tetreault <ctetreau@quicinc.com>

[SVE] Remove usages of VectorType::getNumElements() from ARM

Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen

Reviewed By: dmgreen

Subscribers: tschuett, kristof.beyls, hiraditya,

[SVE] Remove usages of VectorType::getNumElements() from ARM

Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen

Reviewed By: dmgreen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79816

show more ...


# 61b8af03 27-Apr-2020 David Green <david.green@arm.com>

[ARM] Allow fma in tail predicated loops

There are some intrinsics like this that currently block tail
predication, but should be fine. This allows fma through, as the one
that I ran into. There may

[ARM] Allow fma in tail predicated loops

There are some intrinsics like this that currently block tail
predication, but should be fine. This allows fma through, as the one
that I ran into. There may be others that need the same treatment but
I've only done this one here.

Differential Revision: https://reviews.llvm.org/D78385

show more ...


# 0736d1cc 22-Apr-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[ARM][MVE] Tail-predication: some more comments and debug messages. NFC.

Finding the loop tripcount is the first crucial step in preparing a loop for
tail-predication, and this adds a debug message

[ARM][MVE] Tail-predication: some more comments and debug messages. NFC.

Finding the loop tripcount is the first crucial step in preparing a loop for
tail-predication, and this adds a debug message if a tripcount cannot be found.

And while I was at it, I added some more comments here and there.

Differential Revision: https://reviews.llvm.org/D78485

show more ...


# 1ee6ec2b 31-Mar-2020 Eli Friedman <efriedma@quicinc.com>

Remove "mask" operand from shufflevector.

Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and p

Remove "mask" operand from shufflevector.

Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# 0789f280 25-Feb-2020 Roman Lebedev <lebedev.ri@gmail.com>

[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper()

Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
Th

[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper()

Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.

Reviewers: reames, mkazantsev, wmi, sanjoy

Reviewed By: mkazantsev

Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73704

show more ...


123