vec3-base.ll - OpenGrok history log for /llvm-project/llvm/test/Transforms/SLPVectorizer/RISCV/vec3-base.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 3133acf1	13-Dec-2024	Han-Kuan Chen <hankuan.chen@sifive.com>	Revert "[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)" This reverts commit 82204154b7bd1f8c487c94c7ef00399d776b29f0.
# 82204154	13-Dec-2024	Han-Kuan Chen <hankuan.chen@sifive.com>	[SLP] Make getSameOpcode support different instructions if they have same semantics. (#112181)
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# 38fffa63	06-Nov-2024	Paul Walker <paul.walker@arm.com>	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1
# 7f6bbb3c	20-Sep-2024	Philip Reames <preames@rivosinc.com>	[RISCV][TTI] Reduce cost of a build_vector pattern (#108419) This change is actually two related changes, but they're very hard to meaningfully separate as the second balances the first, and yet do [RISCV][TTI] Reduce cost of a build_vector pattern (#108419) This change is actually two related changes, but they're very hard to meaningfully separate as the second balances the first, and yet doesn't do much good on it's own. First, we can reduce the cost of a build_vector pattern. Our current costing for this defers to generic insertelement costing which isn't unreasonable, but also isn't correct. While inserting N elements requires N-1 slides and N vmv.s.x, doing the full build_vector only requires N vslide1down. (Note there are other cases that our build vector lowering can do more cheaply, this is simply the easiest upper bound which appears to be "good enough" for SLP costing purposes.) Second, we need to tell SLP that calls don't preserve vector registers. Without this, SLP will vectorize scalar code which performs e.g. 4 x float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the costing works out that this is in fact the optimal choice - except that we don't actually have a <2 x float> @exp, and unroll during DAG. This would be fine (or at least cost neutral) except that the libcall for the scalar @exp blows all vector registers. So the net effect is we added a bunch of spills that SLP had no idea about. Thankfully, AArch64 has a similiar problem, and has taught SLP how to reason about spill cost once the right TTI hook is implemented. Now, for some implications... The SLP solution for spill costing has some inaccuracies. In particular, it basically just guesses whether a intrinsic will be lowered to a call or not, and can be wrong in both directions. It also has no mechanism to differentiate on calling convention. This has the effect of making partial vectorization (i.e. starting in scalar) more profitable. In practice, the major effect of this is to make it more like SLP will vectorize part of a tree in an intersecting forrest, and then vectorize the remaining tree once those uses have been removed. This has the effect of biasing us slightly away from strided, or indexed loads during vectorization - because the scalar cost is more accurately modeled, and these instructions look relevatively less profitable. show more ...
Revision tags: llvmorg-19.1.0
# fa8b737a	11-Sep-2024	Philip Reames <preames@rivosinc.com>	[SLP][RISCV] Add test for 3 element build vector feeding reduce Our costs for build vectors are currently a bit off which inhibits vectorization. Fix forthcoming.
# 247d3ea8	05-Sep-2024	Philip Reames <preames@rivosinc.com>	[SLP] Expand non-power-of-two bailout in TryToFindDuplicates This fixes a crash noticed when doing a downstream merge. The test case has been reduced, and is included in this commit. The existing [SLP] Expand non-power-of-two bailout in TryToFindDuplicates This fixes a crash noticed when doing a downstream merge. The test case has been reduced, and is included in this commit. The existing bailout for non-power-of-two vectors in TryToFindDuplicates did not consider the case where the list being vectorized had no root node. This allowed reshuffled scalars to slip through to code which does not yet expect to handle it. This was an existing bug (likely introduced by my ed03070e), but made easier to hit by 63e8a1b1 show more ...
# 63e8a1b1	05-Sep-2024	Philip Reames <preames@rivosinc.com>	[SLP] Enable reordering for non-power-of-two vectors (#106638) This change tries to enable vector reordering during vectorization for non-power-of-two vectors. Specifically, my goal is to be able t [SLP] Enable reordering for non-power-of-two vectors (#106638) This change tries to enable vector reordering during vectorization for non-power-of-two vectors. Specifically, my goal is to be able to vectorize reductions whose operands appear in other than identity order. (i.e. a[1] + a[0] + a[2]). Our standard pass pipeline, Reassociation effectively canonicalizes towards this form. So for reduction vectorization to be wildly applicable, we need this feature. This change enables the use of a non-empty ReorderIndices structure - which is effectively required for out of order loads or gathers - while leaving the ReuseShuffleIndices mechanism unused and disabled. If I've understood the code structure, the former is used when describing implicit shuffles required by the vectorization strategy (i.e. loading elements 0,1,3,2 in the order 0,1,2,3 and then shuffling later), while the later is used when trying to optimize explode/buildvectors (called gathers in this code). I audited all the code enabled by this change, but can't claim to deeply understand most of it. I added a couple of bailouts in places which appeared to be difficult to audit and optional optimizations. I've tried to do so in the least risky way I can, but am not completely confident in this change. Careful review appreciated. show more ...
# 2c7786e9	03-Sep-2024	Philip Reames <preames@rivosinc.com>	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In gen Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0. show more ...
Revision tags: llvmorg-19.1.0-rc4
# 22ba3511	29-Aug-2024	Philip Reames <preames@rivosinc.com>	[RISCV][SLP] Test for <3 x Ty> reductions which require reordering These tests show a vectorizable reduction where the order of the reduction has been adjusted so that profitable vectorization requi [RISCV][SLP] Test for <3 x Ty> reductions which require reordering These tests show a vectorizable reduction where the order of the reduction has been adjusted so that profitable vectorization requires a reordering of the computation. We currently have no reordering in SLP for non-power-of-two vectors, so this doesn't work. Note that due to reassociation performed in the standard pipeline, this is actually the canonical form for a reduction reaching SLP. show more ...
# ed03070e	27-Aug-2024	Philip Reames <preames@rivosinc.com>	[SLP] Support vectorizing 2^N-1 reductions (#106266) Build on the -slp-vectorize-non-power-of-2 experimental option, and support vectorizing reductions with 2^N-1 sized vector. Specifically, two [SLP] Support vectorizing 2^N-1 reductions (#106266) Build on the -slp-vectorize-non-power-of-2 experimental option, and support vectorizing reductions with 2^N-1 sized vector. Specifically, two related changes: 1) When searching for a profitable VL, start with the 2^N-1 reduction width. If cost model does not select that VL, return to power of two boundaries when halfing the search VL. The later is mostly for simplicity. 2) Reduce the minimum reduction width from 4 to 3 when supporting non-power of two vectors. This is required to support <3 x Ty> cases. One thing which isn't directly related to this change, but I want to note for clarity is that the non-power-of-two vectorization appears to be sensative to operand order of reduction. I haven't yet fully figured out why, but I suspect this is non-power-of-two specific. show more ...
# acb33a0c	27-Aug-2024	Philip Reames <preames@rivosinc.com>	[RISCV][SLP] Add test coverage for 2^N-1 vector sizes w/FP types Our cost modeling for FP and integer differs in enough cases that having both is useful for exercising different logic in SLP.
# 4dda564c	27-Aug-2024	Philip Reames <preames@rivosinc.com>	[RISCV][SLP] Add test coverage for 2^N-1 vector sizes Mostly copied from the AArch64 coverage for same, but also added a couple tests for reductions which aren't currently supported.