History log of /llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (Results 1 – 22 of 22)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# 56c091ea 21-Nov-2024 Paul Walker <paul.walker@arm.com>

[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856)

This brings the printing of scalable vector constant splats inline with
their fixed length counterparts.


Revision tags: llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 99d6c6d9 05-Jul-2024 Florian Hahn <flo@fhahn.com>

[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)

This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block

[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)

This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651

show more ...


# 3808ba78 20-Jun-2024 Florian Hahn <flo@fhahn.com>

[VPlan] Model middle block via VPIRBasicBlock. (#95816)

Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle bloc

[VPlan] Model middle block via VPIRBasicBlock. (#95816)

Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# a5891fa4 08-Dec-2023 Florian Hahn <flo@fhahn.com>

[VPlan] Initial modeling of VF * UF as VPValue. (#74761)

This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPla

[VPlan] Initial modeling of VF * UF as VPValue. (#74761)

This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)

This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.

The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.

Moved from Phabricator (https://reviews.llvm.org/D157322)

show more ...


# 5ea6a3fc 08-Dec-2023 Florian Hahn <flo@fhahn.com>

[VPlan] Compute scalable VF in preheader for induction increment. (#74762)

UF * VF is loop invariant and can be computed directly in the preheader.
This prepares the code for #74761 and reduces the

[VPlan] Compute scalable VF in preheader for induction increment. (#74762)

UF * VF is loop invariant and can be computed directly in the preheader.
This prepares the code for #74761 and reduces the test changes.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3
# 8d16c680 10-Feb-2023 Luke Lau <luke@igalia.com>

[RISCV] Increase default vectorizer LMUL to 2

After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot.
Whilst

[RISCV] Increase default vectorizer LMUL to 2

After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot.
Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that
a) Doesn't affect register pressure
b) Suitable for the microarchitecture
we would need to teach its heuristics about RISC-V register grouping specifics.
Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving.

[1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`:

LMUL=1 2573 spills
LMUL=2 2583 spills
LMUL=4 2819 spills
LMUL=8 3256 spills

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D143723

show more ...


# 15f9cf16 21-Feb-2023 Luke Lau <luke@igalia.com>

[LV][RISCV] Don't interleave scalable vector loops

It's less clear with scalable vectors than fixed length vectors that
interleaving exposes more ILP, as scalable vectors can be thought of a
sort of

[LV][RISCV] Don't interleave scalable vector loops

It's less clear with scalable vectors than fixed length vectors that
interleaving exposes more ILP, as scalable vectors can be thought of a
sort of hardware form of interleaving, especially with larger LMULs.
This also addresses the unexpected additional unrolling that occurs when
using larger LMULs in the loop vectorizer.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144485

show more ...


# 5a115452 08-Feb-2023 Sander de Smalen <sander.desmalen@arm.com>

Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.

Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for P

Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.

Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part,
e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292',
which was obviously wrong.

Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.

This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.

show more ...


# 1f01cdda 08-Feb-2023 Sander de Smalen <sander.desmalen@arm.com>

Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."

This patch causes a regression, so reverting it while I investigate the issue.

This reverts commit

Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."

This patch causes a regression, so reverting it while I investigate the issue.

This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.

show more ...


Revision tags: llvmorg-16.0.0-rc2
# e6eb84a1 06-Feb-2023 Sander de Smalen <sander.desmalen@arm.com>

[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.

This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:

%v

[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.

This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:

%vscale = call i32 llvm.vscale.i32()
%vf.part1 = mul i32 %vscale, 4
%gep = getelementptr ..., i32 %vf.part1

Which InstCombine then changes into:

%vscale = call i32 llvm.vscale.i32()
%vf.part1 = mul i32 %vscale, 4
%vf.part1.zext = sext i32 %vf.part1 to i64
%gep = getelementptr ..., i32 %vf.part1.zext

D143016 tried to remove these extends, but that only works when
the call to llvm.vscale.i32() has a single use. After doing any
kind of CSE on these calls the combine no longer kicks in.

It seems more sensible to ask DataLayout what type to use, rather
than relying on InstCombine to insert the extend and hoping it can
fold it away.

I've only changed this for indices that are not constant, because
I vaguely remember there was a reason for sticking with i32. It
would also mean patching up loads more tests.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D143267

show more ...


Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# eae26b66 04-Jan-2023 Paul Walker <paul.walker@arm.com>

[IRBuilder] Use canonical i64 type for insertelement index used by vector splats.

Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as a

[IRBuilder] Use canonical i64 type for insertelement index used by vector splats.

Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983

show more ...


# be51fa45 05-Dec-2022 Roman Lebedev <lebedev.ri@gmail.com>

[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1
# bbef90ac 15-Sep-2022 Vitaly Buka <vitalybuka@google.com>

[IRBuilder] Use PoisonValue in CreateMasked*

Followup to 72b776168c7c80d2035c7226488462dcffc97e75

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D133967


Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3
# 4d875910 18-Aug-2022 Philip Reames <preames@rivosinc.com>

[RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL

On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector

[RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL

On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies.

When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all.

This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants.

This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist.

Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine.

Differential Revision: https://reviews.llvm.org/D131519

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 0ae46693 21-Jul-2022 Philip Reames <preames@rivosinc.com>

{RISCV][LV] Split out and expand tests for uniform loads and stores


# 20dd3297 27-Jun-2022 Philip Reames <preames@rivosinc.com>

[LV] Allow scalable vectorization with vscale = 1

This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a typ

[LV] Allow scalable vectorization with vscale = 1

This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542

show more ...


# 9803b0d1 25-Jun-2022 Philip Reames <preames@rivosinc.com>

[RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled

LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by

[RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled

LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN).

The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable.

As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended.

Differential Revision: https://reviews.llvm.org/D128547

show more ...


# ae8fac6f 24-Jun-2022 Philip Reames <preames@rivosinc.com>

[LV][RISCV] Add coverage showing scalable codegen when etype != ELEN

We currently have a costing bug around the etype == ELEN case, so add otherwise duplicate tests to show test diffs as I work on o

[LV][RISCV] Add coverage showing scalable codegen when etype != ELEN

We currently have a costing bug around the etype == ELEN case, so add otherwise duplicate tests to show test diffs as I work on other parts of costing.

show more ...


# 056d6393 24-Jun-2022 Philip Reames <preames@rivosinc.com>

[RISCV] Split a vectorizer test runline so that upcoming changes in defaults are visible


# 46ea4b5e 23-Jun-2022 Philip Reames <preames@rivosinc.com>

[LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter

If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it.

[LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter

If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it. (Well, we could, but we haven't implemented that case.) This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.

show more ...


Revision tags: llvmorg-14.0.6
# 8ae06642 21-Jun-2022 Philip Reames <preames@rivosinc.com>

LoopVect, tests] Add some basic coverage for scalable costing of scatter/gather patterns on RISCV

This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled.


# 2cf320d4 21-Jun-2022 Philip Reames <preames@rivosinc.com>

[LoopVect, tests] Add some basic coverage for scalable costing on RISCV

This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled. For context, I just yes

[LoopVect, tests] Add some basic coverage for scalable costing on RISCV

This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled. For context, I just yesterday fixed a crash in costing of the splat_ptr example - see bbf3fd.

show more ...