History log of /llvm-project/llvm/test/Transforms/SLPVectorizer/RISCV/vec3-base.ll (Results 1 – 12 of 12)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 3133acf1 13-Dec-2024 Han-Kuan Chen <hankuan.chen@sifive.com>

Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)"

This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.


# 82204154 13-Dec-2024 Han-Kuan Chen <hankuan.chen@sifive.com>

[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)


Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# 38fffa63 06-Nov-2024 Paul Walker <paul.walker@arm.com>

[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1
# 7f6bbb3c 20-Sep-2024 Philip Reames <preames@rivosinc.com>

[RISCV][TTI] Reduce cost of a build_vector pattern (#108419)

This change is actually two related changes, but they're very hard to
meaningfully separate as the second balances the first, and yet do

[RISCV][TTI] Reduce cost of a build_vector pattern (#108419)

This change is actually two related changes, but they're very hard to
meaningfully separate as the second balances the first, and yet doesn't
do much good on it's own.

First, we can reduce the cost of a build_vector pattern. Our current
costing for this defers to generic insertelement costing which isn't
unreasonable, but also isn't correct. While inserting N elements
requires N-1 slides and N vmv.s.x, doing the full build_vector only
requires N vslide1down. (Note there are other cases that our build
vector lowering can do more cheaply, this is simply the easiest upper
bound which appears to be "good enough" for SLP costing purposes.)

Second, we need to tell SLP that calls don't preserve vector registers.
Without this, SLP will vectorize scalar code which performs e.g. 4 x
float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the
costing works out that this is in fact the optimal choice - except that
we don't actually have a <2 x float> @exp, and unroll during DAG. This
would be fine (or at least cost neutral) except that the libcall for the
scalar @exp blows all vector registers. So the net effect is we added a
bunch of spills that SLP had no idea about. Thankfully, AArch64 has a
similiar problem, and has taught SLP how to reason about spill cost once
the right TTI hook is implemented.

Now, for some implications...

The SLP solution for spill costing has some inaccuracies. In particular,
it basically just guesses whether a intrinsic will be lowered to a call
or not, and can be wrong in both directions. It also has no mechanism to
differentiate on calling convention.

This has the effect of making partial vectorization (i.e. starting in
scalar) more profitable. In practice, the major effect of this is to
make it more like SLP will vectorize part of a tree in an intersecting
forrest, and then vectorize the remaining tree once those uses have been
removed.

This has the effect of biasing us slightly away from strided, or indexed
loads during vectorization - because the scalar cost is more accurately
modeled, and these instructions look relevatively less profitable.

show more ...


Revision tags: llvmorg-19.1.0
# fa8b737a 11-Sep-2024 Philip Reames <preames@rivosinc.com>

[SLP][RISCV] Add test for 3 element build vector feeding reduce

Our costs for build vectors are currently a bit off which inhibits
vectorization. Fix forthcoming.


# 247d3ea8 05-Sep-2024 Philip Reames <preames@rivosinc.com>

[SLP] Expand non-power-of-two bailout in TryToFindDuplicates

This fixes a crash noticed when doing a downstream merge. The
test case has been reduced, and is included in this commit.

The existing

[SLP] Expand non-power-of-two bailout in TryToFindDuplicates

This fixes a crash noticed when doing a downstream merge. The
test case has been reduced, and is included in this commit.

The existing bailout for non-power-of-two vectors in TryToFindDuplicates
did not consider the case where the list being vectorized had no
root node. This allowed reshuffled scalars to slip through to code
which does not yet expect to handle it.

This was an existing bug (likely introduced by my ed03070e), but
made easier to hit by 63e8a1b1

show more ...


# 63e8a1b1 05-Sep-2024 Philip Reames <preames@rivosinc.com>

[SLP] Enable reordering for non-power-of-two vectors (#106638)

This change tries to enable vector reordering during vectorization for
non-power-of-two vectors. Specifically, my goal is to be able t

[SLP] Enable reordering for non-power-of-two vectors (#106638)

This change tries to enable vector reordering during vectorization for
non-power-of-two vectors. Specifically, my goal is to be able to
vectorize reductions whose operands appear in other than identity order.
(i.e. a[1] + a[0] + a[2]). Our standard pass pipeline, Reassociation
effectively canonicalizes towards this form. So for reduction
vectorization to be wildly applicable, we need this feature.

This change enables the use of a non-empty ReorderIndices structure -
which is effectively required for out of order loads or gathers - while
leaving the ReuseShuffleIndices mechanism unused and disabled. If I've
understood the code structure, the former is used when describing
implicit shuffles required by the vectorization strategy (i.e. loading
elements 0,1,3,2 in the order 0,1,2,3 and then shuffling later), while
the later is used when trying to optimize explode/buildvectors (called
gathers in this code).

I audited all the code enabled by this change, but can't claim to
deeply understand most of it. I added a couple of bailouts in places
which appeared to be difficult to audit and optional optimizations. I've
tried to do so in the least risky way I can, but am not completely
confident in this change. Careful review appreciated.

show more ...


# 2c7786e9 03-Sep-2024 Philip Reames <preames@rivosinc.com>

Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)

This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In gen

Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)

This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.

show more ...


Revision tags: llvmorg-19.1.0-rc4
# 22ba3511 29-Aug-2024 Philip Reames <preames@rivosinc.com>

[RISCV][SLP] Test for <3 x Ty> reductions which require reordering

These tests show a vectorizable reduction where the order of the
reduction has been adjusted so that profitable vectorization requi

[RISCV][SLP] Test for <3 x Ty> reductions which require reordering

These tests show a vectorizable reduction where the order of the
reduction has been adjusted so that profitable vectorization requires
a reordering of the computation. We currently have no reordering
in SLP for non-power-of-two vectors, so this doesn't work.

Note that due to reassociation performed in the standard pipeline,
this is actually the canonical form for a reduction reaching SLP.

show more ...


# ed03070e 27-Aug-2024 Philip Reames <preames@rivosinc.com>

[SLP] Support vectorizing 2^N-1 reductions (#106266)

Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.

Specifically, two

[SLP] Support vectorizing 2^N-1 reductions (#106266)

Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.

Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
when halfing the search VL. The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
of two vectors. This is required to support <3 x Ty> cases.

One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.

show more ...


# acb33a0c 27-Aug-2024 Philip Reames <preames@rivosinc.com>

[RISCV][SLP] Add test coverage for 2^N-1 vector sizes w/FP types

Our cost modeling for FP and integer differs in enough cases that
having both is useful for exercising different logic in SLP.


# 4dda564c 27-Aug-2024 Philip Reames <preames@rivosinc.com>

[RISCV][SLP] Add test coverage for 2^N-1 vector sizes

Mostly copied from the AArch64 coverage for same, but also added
a couple tests for reductions which aren't currently supported.