#
b4b1b841 |
| 15-Sep-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[MVE] fix typo in llvm debug message. NFC.
|
#
676febc0 |
| 09-Sep-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value
This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform se
[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value
This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane.
Differential Revision: https://reviews.llvm.org/D86074
show more ...
|
#
4ca60915 |
| 28-Aug-2020 |
David Green <david.green@arm.com> |
[ARM] Correct predicate operand for offset gather/scatter
These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is a
[ARM] Correct predicate operand for offset gather/scatter
These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask.
Differential Revision: https://reviews.llvm.org/D86791
show more ...
|
#
c352e7fb |
| 25-Aug-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the B
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1.
Differential Revision: https://reviews.llvm.org/D86303
show more ...
|
Revision tags: llvmorg-11.0.0-rc2 |
|
#
9eb9ba07 |
| 13-Aug-2020 |
Anna Welker <anna.welker@arm.com> |
[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters
Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type.
[ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters
Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type. (This is fixing https://reviews.llvm.org/D85138 for corner cases)
Differential Revision: https://reviews.llvm.org/D85889
show more ...
|
#
4fe5615e |
| 12-Aug-2020 |
Anna Welker <anna.welker@arm.com> |
[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters
Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loo
[ARM][MVE] Enable tail predication for loops containing MVE gather/scatters
Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loops that are auto-vectorized with the option -enable-arm-maskedgatscat (and actually end up containing an MVE gather or scatter) can be tail predicated.
Differential Revision: https://reviews.llvm.org/D85138
show more ...
|
#
6716e786 |
| 11-Aug-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] tail-predication: overflow checks for backedge taken count.
This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrini
[ARM][MVE] tail-predication: overflow checks for backedge taken count.
This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j:
M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i];
For this inner loop, the SCEV backedge taken count (BTC) expression is:
(-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body>
and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case.
This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases.
Differential Revision: https://reviews.llvm.org/D85737
show more ...
|
#
8590e5ab |
| 09-Aug-2020 |
David Green <david.green@arm.com> |
[ARM] Allow vecreduce_add in tail predicated loops
This allows vecreduce_add in loops so that we can tailpredicate them.
Differential Revision: https://reviews.llvm.org/D85454
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init |
|
#
595270ae |
| 13-Jul-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather t
[ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones.
This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option.
Differential Revision: https://reviews.llvm.org/D83133
show more ...
|
Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
#
3324e3a6 |
| 30-Jun-2020 |
Samuel Tebbs <samteb02@e124480.cambridge.arm.com> |
[ARM] Allow the fabs intrinsic to be tail predicated
This patch stops the fabs intrinsic from blocking tail predication.
Differential Revision: https://reviews.llvm.org/D82570
|
#
66fa3139 |
| 30-Jun-2020 |
Samuel Tebbs <samteb02@e124480.cambridge.arm.com> |
[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated
This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication.
Differential Revision: https://reviews.l
[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated
This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication.
Differential Revision: https://reviews.llvm.org/D82571
show more ...
|
#
af459076 |
| 29-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-predication: clean-up of unused code
After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP intrinsic can be cloned to exit blocks if there are instructions pr
[ARM][MVE] Tail-predication: clean-up of unused code
After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP intrinsic can be cloned to exit blocks if there are instructions present in it that perform the same operation, but this wasn't triggering anymore. However, it turns out that for handling reductions, see D75533, it's actually easier not not to have the VCTP in exit blocks, so this removes that code.
This was possible because it turned out that some other code that depended on this, rematerialization of the trip count enabling more dead code removal later, wasn't doing much anymore due to more aggressive dead code removal that was added to the low-overhead loops pass.
Differential Revision: https://reviews.llvm.org/D82773
show more ...
|
#
d9cb811c |
| 30-Jun-2020 |
Samuel Tebbs <samteb02@e124480.cambridge.arm.com> |
[ARM] Allow rounding intrinsics to be tail predicated
This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication.
Differential Revision: https://reviews.llvm.
[ARM] Allow rounding intrinsics to be tail predicated
This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication.
Differential Revision: https://reviews.llvm.org/D82553
show more ...
|
Revision tags: llvmorg-10.0.1-rc2 |
|
#
1319d9bb |
| 22-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass
Don't revert intrinsic get.active.lane.mask here, this is moved to isel legalization in D82292.
Differential Revision: https://r
[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass
Don't revert intrinsic get.active.lane.mask here, this is moved to isel legalization in D82292.
Differential Revision: https://reviews.llvm.org/D82105
show more ...
|
#
187f627a |
| 25-Jun-2020 |
Sam Tebbs <samuel.tebbs@arm.com> |
[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics
This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication.
Differential revision: https://reviews.llvm.or
[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics
This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication.
Differential revision: https://reviews.llvm.org/D82377
show more ...
|
#
c18b7536 |
| 24-Jun-2020 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
|
#
4aa893b8 |
| 19-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] tail-predication: renamed internal option.
Renamed -force-tail-predication to -force-mve-tail-predication because that's more descriptive and consistent.
|
#
d1522513 |
| 17-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use in
[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before.
Many thanks to Eli Friedman and Sam Parker for all their help with this work.
This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask.
Differential Revision: https://reviews.llvm.org/D79175
show more ...
|
#
7eed772a |
| 23-May-2020 |
Sanjay Patel <spatel@rotateright.com> |
[PatternMatch] abbreviate vector inst matchers; NFC
Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.
|
#
bcbd26bf |
| 20-May-2020 |
Florian Hahn <flo@fhahn.com> |
[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using o
[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander.
This patch was originally committed as b8a3c34eee06, but broke the modules build, as LoopAccessAnalysis was using the Expander.
The code-gen part of LAA was moved to lib/Transforms recently, so this patch can be landed again.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
show more ...
|
Revision tags: llvmorg-10.0.1-rc1 |
|
#
245679b6 |
| 15-May-2020 |
Christopher Tetreault <ctetreau@quicinc.com> |
[SVE] Remove usages of VectorType::getNumElements() from ARM
Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen
Reviewed By: dmgreen
Subscribers: tschuett, kristof.beyls, hiraditya,
[SVE] Remove usages of VectorType::getNumElements() from ARM
Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen
Reviewed By: dmgreen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79816
show more ...
|
#
61b8af03 |
| 27-Apr-2020 |
David Green <david.green@arm.com> |
[ARM] Allow fma in tail predicated loops
There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may
[ARM] Allow fma in tail predicated loops
There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may be others that need the same treatment but I've only done this one here.
Differential Revision: https://reviews.llvm.org/D78385
show more ...
|
#
0736d1cc |
| 22-Apr-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-predication: some more comments and debug messages. NFC.
Finding the loop tripcount is the first crucial step in preparing a loop for tail-predication, and this adds a debug message
[ARM][MVE] Tail-predication: some more comments and debug messages. NFC.
Finding the loop tripcount is the first crucial step in preparing a loop for tail-predication, and this adds a debug message if a tripcount cannot be found.
And while I was at it, I added some more comments here and there.
Differential Revision: https://reviews.llvm.org/D78485
show more ...
|
#
1ee6ec2b |
| 31-Mar-2020 |
Eli Friedman <efriedma@quicinc.com> |
Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and p
Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that.
Differential Revision: https://reviews.llvm.org/D72467
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
#
0789f280 |
| 25-Feb-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper()
Summary: Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()` Th
[NFC][SCEV] Piping to pass TTI into SCEVExpander::isHighCostExpansionHelper()
Summary: Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()` This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704
show more ...
|