History log of /llvm-project/llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-float-loops.ll (Results 1 – 25 of 45)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1
# 2f7ccaf4 28-Sep-2024 Florian Hahn <flo@fhahn.com>

[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777)

This can help in cases where pointer alignment info is missing, e.g.
https://github.com/llvm/llvm-project/pull/108210

[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777)

This can help in cases where pointer alignment info is missing, e.g.
https://github.com/llvm/llvm-project/pull/108210

The predicate is formed for the complex expression that's passed to
SolveLinEquationWithOverflow and the checks could probably be pushed
closer to the root nodes, which in some cases may be cheaper to check.


PR: https://github.com/llvm/llvm-project/pull/108777

show more ...


Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# eecb99c5 05-Dec-2023 Nikita Popov <npopov@redhat.com>

[Tests] Add disjoint flag to some tests (NFC)

These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation f

[Tests] Add disjoint flag to some tests (NFC)

These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3
# 7b3bbd83 09-Oct-2023 Jay Foad <jay.foad@amd.com>

Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"

This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.


# 2501ae58 09-Oct-2023 Jay Foad <jay.foad@amd.com>

[CodeGen] Really renumber slot indexes before register allocation (#67038)

PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries

[CodeGen] Really renumber slot indexes before register allocation (#67038)

PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.

show more ...


Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 1bcb6a3d 21-Jun-2023 Guozhi Wei <carrot@google.com>

[MBP] Enable duplicating return block to remove jump to return

Sometimes LLVM generates branch to return instruction, like PR63227.

It is because in function MachineBlockPlacement::canTailDuplicate

[MBP] Enable duplicating return block to remove jump to return

Sometimes LLVM generates branch to return instruction, like PR63227.

It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds
we avoid duplicating a BB into another already placed BB to prevent destroying
computed layout. But if the successor BB is a return block, duplicating it will
only reduce taken branches without hurt to any other branches.

Differential Revision: https://reviews.llvm.org/D153093

show more ...


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# c4a60c9d 25-May-2023 sgokhale <sgokhale@nvidia.com>

[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default

This is an attempt to reland D42600 and enabling this optimisation by default.

This also resolves the issue pointed out in the context of PGO

[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default

This is an attempt to reland D42600 and enabling this optimisation by default.

This also resolves the issue pointed out in the context of PGO build.

Differential Revision: https://reviews.llvm.org/D42600

show more ...


Revision tags: llvmorg-16.0.4
# f4999d35 08-May-2023 Alan Zhao <ayzhao@google.com>

Revert "[CodeGen][ShrinkWrap] Split restore point"

This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.

The original commit causes a Chrome build assertion failure with
ThinLTO: https://cr

Revert "[CodeGen][ShrinkWrap] Split restore point"

This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.

The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635

show more ...


# 1ddfd1c8 08-May-2023 sgokhale <sgokhale@nvidia.com>

[CodeGen][ShrinkWrap] Split restore point

Try to reland D42600

Differential Revision: https://reviews.llvm.org/D42600


Revision tags: llvmorg-16.0.3, llvmorg-16.0.2
# bb5befef 13-Apr-2023 sgokhale <sgokhale@nvidia.com>

Revert "[CodeGen][ShrinkWrap] Split restore point"

This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83.

An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833


# 5f0bccc3 11-Apr-2023 sgokhale <sgokhale@nvidia.com>

[CodeGen][ShrinkWrap] Split restore point

This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).

Benchma

[CodeGen][ShrinkWrap] Split restore point

This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).

Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.

Co-authored-by: junbuml

Differential Revision: https://reviews.llvm.org/D42600

show more ...


# 3d7242f0 06-Apr-2023 Dmitry Makogon <d.makogon@g.nsu.ru>

Reapply "[LSR] Preserve LCSSA when rewriting instruction with PHI user"

This reverts commit efd34ba60f3839b0a68b2e32ff9011b6823bc16f.

Reapplies 8ff4832679e1. Missed a failing test. Needed to just
u

Reapply "[LSR] Preserve LCSSA when rewriting instruction with PHI user"

This reverts commit efd34ba60f3839b0a68b2e32ff9011b6823bc16f.

Reapplies 8ff4832679e1. Missed a failing test. Needed to just
update test checks.

show more ...


Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# f4f8f9f1 16-Jan-2023 Simon Pilgrim <llvm-dev@redking.me.uk>

[Thumb2][MVE] Recognise shuffle truncation patterns suitable for ARMISD::MVETRUNC

I'm helping with the remaining regressions on D127115, and one of my candidate fixes caused some regressions with MV

[Thumb2][MVE] Recognise shuffle truncation patterns suitable for ARMISD::MVETRUNC

I'm helping with the remaining regressions on D127115, and one of my candidate fixes caused some regressions with MVE interleaved shuffles due to poor handling of 'truncation' style shuffle masks (0,2,4,6,...).

This patch attempts to use the ARMISD::MVETRUNC node to handle these cases, based off existing code in LowerTruncate.

It handles both (0,2,4,6,...) and (1,3,5,7,....) 'top' style patterns (assuming no endian problems). I shift down the 'top' patterns - a basic search of ARM docs suggests MVE has some top/bottom truncation/narrowing instructions but I don't seem to be able to get them to be used.

Differential Revision: https://reviews.llvm.org/D141791

show more ...


Revision tags: llvmorg-15.0.7
# 90f24bef 09-Jan-2023 David Green <david.green@arm.com>

[ARM] Fold And/Or into CSel if possible

This is the ARM equivalent of D141119, where we fold `and x, (csel 0, 1, cc)`
to `csel ZR, x, cc` if we know that x is 0/1 and for `or x, (csel 0, 1, cc)`
emi

[ARM] Fold And/Or into CSel if possible

This is the ARM equivalent of D141119, where we fold `and x, (csel 0, 1, cc)`
to `csel ZR, x, cc` if we know that x is 0/1 and for `or x, (csel 0, 1, cc)`
emit `csinc x, ZR, cc`. The or pattern gets recognized from a cmov under Arm.

Differential Revision: https://reviews.llvm.org/D141137

show more ...


# b5b663aa 19-Dec-2022 Nikita Popov <npopov@redhat.com>

[Thumb2] Convert some tests to opaque pointers (NFC)


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 8d85e945 16-Nov-2021 Philip Reames <listmail@philipreames.com>

[SCEV] Canonicalize X - urem X, Y patterns

There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could

[SCEV] Canonicalize X - urem X, Y patterns

There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version.

The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency.

The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions.

Differential Revision: https://reviews.llvm.org/D114018

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 52e0cf9d 17-Aug-2021 David Green <david.green@arm.com>

[ARM] Enable subreg liveness

This enables subreg liveness in the arm backend when MVE is present,
which allows the register allocator to detect when subregister are
alive/dead, compared to only acti

[ARM] Enable subreg liveness

This enables subreg liveness in the arm backend when MVE is present,
which allows the register allocator to detect when subregister are
alive/dead, compared to only acting on full registers. This can helps
produce better code on MVE with the way MQPR registers are made up of
SPR registers, but is especially helpful for MQQPR and MQQQQPR
registers, where there are very few "registers" available and being able
to split them up into subregs can help produce much better code.

Differential Revision: https://reviews.llvm.org/D107642

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 48cef1fa 20-Apr-2021 David Green <david.green@arm.com>

[ARM] Create VMOVRRD from adjacent vector extracts

This adds a combine for extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a

[ARM] Create VMOVRRD from adjacent vector extracts

This adds a combine for extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.

This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly. Otherwise the machine verifier is unhappy.

Differential Revision: https://reviews.llvm.org/D100244

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# fad70c30 11-Mar-2021 David Green <david.green@arm.com>

[ARM] Improve WLS lowering

Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over t

[ARM] Improve WLS lowering

Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.

Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:

%wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
%wls0 = extractvalue { i32, i1 } %wls, 0
%wls1 = extractvalue { i32, i1 } %wls, 1
br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
%lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
..
%iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
%cmp = icmp ne i32 %iv.next, 0
br i1 %cmp, label %loop, label %loop.exit

The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).

These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.

%1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
t2B %bb.1
...
bb.2.loop:
%2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
...
%3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
t2B %bb.3

The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.

Differential Revision: https://reviews.llvm.org/D97729

show more ...


Revision tags: llvmorg-12.0.0-rc3
# a968e7b8 04-Mar-2021 David Green <david.green@arm.com>

[ARM] KnownBits for CSINC/CSNEG/CSINV

This adds some simple known bits handling for the three CSINC/NEG/INV
instructions. From the operands known bits we can compute the common
bits of the first ope

[ARM] KnownBits for CSINC/CSNEG/CSINV

This adds some simple known bits handling for the three CSINC/NEG/INV
instructions. From the operands known bits we can compute the common
bits of the first operand and incremented/negated/inverted second
operand. The first, especially CSINC ZR, ZR, comes up fair amount in the
tests. The others are more rare so a unit test for them is added.

Differential Revision: https://reviews.llvm.org/D97788

show more ...


Revision tags: llvmorg-12.0.0-rc2
# a838a4f6 15-Feb-2021 David Green <david.green@arm.com>

[ARM] Extend search for increment in load/store optimizer

Currently the findIncDecAfter will only look at the next instruction for
post-inc candidates in the load/store optimizer. This extends that

[ARM] Extend search for increment in load/store optimizer

Currently the findIncDecAfter will only look at the next instruction for
post-inc candidates in the load/store optimizer. This extends that to a
search through the current BB, until an instruction that modifies or
uses the increment reg is found. This allows more post-inc load/stores
and ldm/stm's to be created, especially in cases where a schedule might
move instructions further apart.

We make sure not to look any further for an SP, as that might invalidate
stack slots that are still in use.

Differential Revision: https://reviews.llvm.org/D95881

show more ...


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3
# 48230355 02-Feb-2021 David Green <david.green@arm.com>

[ARM] Remove DLS lr, lr

A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.

Differential Revision: https://reviews.llvm

[ARM] Remove DLS lr, lr

A DLS lr, lr instruction only moves lr to itself. It need not be emitted
on it's own to save a instruction in the loop preheader.

Differential Revision: https://reviews.llvm.org/D78916

show more ...


Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init
# 835104a1 23-Jan-2021 Nikita Popov <nikita.ppv@gmail.com>

[LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943)

When LSR converts a branch on the pre-inc IV into a branch on the
post-inc IV, the nowrap flags on the addition ma

[LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943)

When LSR converts a branch on the pre-inc IV into a branch on the
post-inc IV, the nowrap flags on the addition may no longer be valid.
Previously, a poison result of the addition might have been ignored,
in which case the program was well defined. After branching on the
post-inc IV, we might be branching on poison, which is undefined behavior.

Fix this by discarding nowrap flags which are not present on the SCEV
expression. Nowrap flags on the SCEV expression are proven by SCEV
to always hold, independently of how the expression will be used.
This is essentially the same fix we applied to IndVars LFTR, which
also performs this kind of pre-inc to post-inc conversion.

I believe a similar problem can also exist for getelementptr inbounds,
but I was not able to come up with a problematic test case. The
inbounds case would have to be addressed in a differently anyway
(as SCEV does not track this property).

Fixes https://bugs.llvm.org/show_bug.cgi?id=46943.

Differential Revision: https://reviews.llvm.org/D95286

show more ...


Revision tags: llvmorg-11.1.0-rc2
# e7dc083a 18-Jan-2021 David Green <david.green@arm.com>

[ARM] Don't handle low overhead branches in AnalyzeBranch

It turns our that the BranchFolder and IfCvt does not like unanalyzable
branches that fall-through. This means that removing the uncondition

[ARM] Don't handle low overhead branches in AnalyzeBranch

It turns our that the BranchFolder and IfCvt does not like unanalyzable
branches that fall-through. This means that removing the unconditional
branches from the end of tail predicated instruction can run into
asserts and verifier issues.

This effectively reverts 372eb2bbb6fb903ce76266e659dfefbaee67722b, but
adds handling to t2DoLoopEndDec which are not branches, so can be safely
skipped.

show more ...


# 372eb2bb 16-Jan-2021 David Green <david.green@arm.com>

[ARM] Add low overhead loops terminators to AnalyzeBranch

This treats low overhead loop branches the same as jump tables and
indirect branches in analyzeBranch - they cannot be analyzed but the
dire

[ARM] Add low overhead loops terminators to AnalyzeBranch

This treats low overhead loop branches the same as jump tables and
indirect branches in analyzeBranch - they cannot be analyzed but the
direct branches on the end of the block may be removed. This helps
remove the unnecessary branches earlier, which can help produce better
codegen (and change block layout in a number of cases).

Differential Revision: https://reviews.llvm.org/D94392

show more ...


Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 0447f350 10-Dec-2020 David Green <david.green@arm.com>

[ARM][RegAlloc] Add t2LoopEndDec

We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop

[ARM][RegAlloc] Add t2LoopEndDec

We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.

There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.

Differential Revision: https://reviews.llvm.org/D91358

show more ...


12