Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
3133acf1 |
| 13-Dec-2024 |
Han-Kuan Chen <hankuan.chen@sifive.com> |
Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)"
This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.
|
#
82204154 |
| 13-Dec-2024 |
Han-Kuan Chen <hankuan.chen@sifive.com> |
[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
38fffa63 |
| 06-Nov-2024 |
Paul Walker <paul.walker@arm.com> |
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
7f6bbb3c |
| 20-Sep-2024 |
Philip Reames <preames@rivosinc.com> |
[RISCV][TTI] Reduce cost of a build_vector pattern (#108419)
This change is actually two related changes, but they're very hard to
meaningfully separate as the second balances the first, and yet do
[RISCV][TTI] Reduce cost of a build_vector pattern (#108419)
This change is actually two related changes, but they're very hard to
meaningfully separate as the second balances the first, and yet doesn't
do much good on it's own.
First, we can reduce the cost of a build_vector pattern. Our current
costing for this defers to generic insertelement costing which isn't
unreasonable, but also isn't correct. While inserting N elements
requires N-1 slides and N vmv.s.x, doing the full build_vector only
requires N vslide1down. (Note there are other cases that our build
vector lowering can do more cheaply, this is simply the easiest upper
bound which appears to be "good enough" for SLP costing purposes.)
Second, we need to tell SLP that calls don't preserve vector registers.
Without this, SLP will vectorize scalar code which performs e.g. 4 x
float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the
costing works out that this is in fact the optimal choice - except that
we don't actually have a <2 x float> @exp, and unroll during DAG. This
would be fine (or at least cost neutral) except that the libcall for the
scalar @exp blows all vector registers. So the net effect is we added a
bunch of spills that SLP had no idea about. Thankfully, AArch64 has a
similiar problem, and has taught SLP how to reason about spill cost once
the right TTI hook is implemented.
Now, for some implications...
The SLP solution for spill costing has some inaccuracies. In particular,
it basically just guesses whether a intrinsic will be lowered to a call
or not, and can be wrong in both directions. It also has no mechanism to
differentiate on calling convention.
This has the effect of making partial vectorization (i.e. starting in
scalar) more profitable. In practice, the major effect of this is to
make it more like SLP will vectorize part of a tree in an intersecting
forrest, and then vectorize the remaining tree once those uses have been
removed.
This has the effect of biasing us slightly away from strided, or indexed
loads during vectorization - because the scalar cost is more accurately
modeled, and these instructions look relevatively less profitable.
show more ...
|
Revision tags: llvmorg-19.1.0 |
|
#
fa8b737a |
| 11-Sep-2024 |
Philip Reames <preames@rivosinc.com> |
[SLP][RISCV] Add test for 3 element build vector feeding reduce
Our costs for build vectors are currently a bit off which inhibits vectorization. Fix forthcoming.
|
#
247d3ea8 |
| 05-Sep-2024 |
Philip Reames <preames@rivosinc.com> |
[SLP] Expand non-power-of-two bailout in TryToFindDuplicates
This fixes a crash noticed when doing a downstream merge. The test case has been reduced, and is included in this commit.
The existing
[SLP] Expand non-power-of-two bailout in TryToFindDuplicates
This fixes a crash noticed when doing a downstream merge. The test case has been reduced, and is included in this commit.
The existing bailout for non-power-of-two vectors in TryToFindDuplicates did not consider the case where the list being vectorized had no root node. This allowed reshuffled scalars to slip through to code which does not yet expect to handle it.
This was an existing bug (likely introduced by my ed03070e), but made easier to hit by 63e8a1b1
show more ...
|
#
63e8a1b1 |
| 05-Sep-2024 |
Philip Reames <preames@rivosinc.com> |
[SLP] Enable reordering for non-power-of-two vectors (#106638)
This change tries to enable vector reordering during vectorization for
non-power-of-two vectors. Specifically, my goal is to be able t
[SLP] Enable reordering for non-power-of-two vectors (#106638)
This change tries to enable vector reordering during vectorization for
non-power-of-two vectors. Specifically, my goal is to be able to
vectorize reductions whose operands appear in other than identity order.
(i.e. a[1] + a[0] + a[2]). Our standard pass pipeline, Reassociation
effectively canonicalizes towards this form. So for reduction
vectorization to be wildly applicable, we need this feature.
This change enables the use of a non-empty ReorderIndices structure -
which is effectively required for out of order loads or gathers - while
leaving the ReuseShuffleIndices mechanism unused and disabled. If I've
understood the code structure, the former is used when describing
implicit shuffles required by the vectorization strategy (i.e. loading
elements 0,1,3,2 in the order 0,1,2,3 and then shuffling later), while
the later is used when trying to optimize explode/buildvectors (called
gathers in this code).
I audited all the code enabled by this change, but can't claim to
deeply understand most of it. I added a couple of bailouts in places
which appeared to be difficult to audit and optional optimizations. I've
tried to do so in the least risky way I can, but am not completely
confident in this change. Careful review appreciated.
show more ...
|
#
2c7786e9 |
| 03-Sep-2024 |
Philip Reames <preames@rivosinc.com> |
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In gen
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.
This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
show more ...
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
22ba3511 |
| 29-Aug-2024 |
Philip Reames <preames@rivosinc.com> |
[RISCV][SLP] Test for <3 x Ty> reductions which require reordering
These tests show a vectorizable reduction where the order of the reduction has been adjusted so that profitable vectorization requi
[RISCV][SLP] Test for <3 x Ty> reductions which require reordering
These tests show a vectorizable reduction where the order of the reduction has been adjusted so that profitable vectorization requires a reordering of the computation. We currently have no reordering in SLP for non-power-of-two vectors, so this doesn't work.
Note that due to reassociation performed in the standard pipeline, this is actually the canonical form for a reduction reaching SLP.
show more ...
|
#
ed03070e |
| 27-Aug-2024 |
Philip Reames <preames@rivosinc.com> |
[SLP] Support vectorizing 2^N-1 reductions (#106266)
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.
Specifically, two
[SLP] Support vectorizing 2^N-1 reductions (#106266)
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.
Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
when halfing the search VL. The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
of two vectors. This is required to support <3 x Ty> cases.
One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.
show more ...
|
#
acb33a0c |
| 27-Aug-2024 |
Philip Reames <preames@rivosinc.com> |
[RISCV][SLP] Add test coverage for 2^N-1 vector sizes w/FP types
Our cost modeling for FP and integer differs in enough cases that having both is useful for exercising different logic in SLP.
|
#
4dda564c |
| 27-Aug-2024 |
Philip Reames <preames@rivosinc.com> |
[RISCV][SLP] Add test coverage for 2^N-1 vector sizes
Mostly copied from the AArch64 coverage for same, but also added a couple tests for reductions which aren't currently supported.
|