History log of /llvm-project/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll (Results 1 – 15 of 15)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 9122c523 15-Nov-2024 Pengcheng Wang <wangpengcheng.pp@bytedance.com>

[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)


This is based on other targets like PPC/AArch64 and some experiments.

This PR will only enable bidirectional schedu

[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)


This is based on other targets like PPC/AArch64 and some experiments.

This PR will only enable bidirectional scheduling and tracking register
pressure.

Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.

show more ...


# 97982a8c 05-Nov-2024 dlav-sc <daniil.avdeev@syntacore.com>

[RISCV][CFI] add function epilogue cfi information (#110810)

This patch adds CFI instructions in the function epilogue.

Before patch:
addi sp, s0, -32
ld ra, 24(sp) # 8-byte Folded Reload
ld s

[RISCV][CFI] add function epilogue cfi information (#110810)

This patch adds CFI instructions in the function epilogue.

Before patch:
addi sp, s0, -32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
addi sp, sp, 32
ret

After patch:
addi sp, s0, -32
.cfi_def_cfa sp, 32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
.cfi_restore ra
.cfi_restore s0
.cfi_restore s1
addi sp, sp, 32
.cfi_def_cfa_offset 0
ret

This functionality is already present in `riscv-gcc`, but it’s not in
`clang` and this slightly impairs the `lldb` debugging experience, e.g.
backtrace.

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7
# 675e7bd1 21-May-2024 Piyou Chen <piyou.chen@sifive.com>

[RISCV] Support postRA vsetvl insertion pass (#70549)

This patch try to get rid of vsetvl implict vl/vtype def-use chain and
improve the register allocation quality by moving the vsetvl insertion

[RISCV] Support postRA vsetvl insertion pass (#70549)

This patch try to get rid of vsetvl implict vl/vtype def-use chain and
improve the register allocation quality by moving the vsetvl insertion
pass after RVV register allocation

It will gain the benefit for the following optimization from

1. unblock scheduler's constraints by removing vl/vtype def-use chain
2. Support RVV re-materialization
3. Support partial spill

This patch add a new option `-riscv-vsetvl-after-rvv-regalloc=<1|0>` to
control this feature and default set as disable.

show more ...


Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 286a366d 16-Jan-2024 Luke Lau <luke@igalia.com>

[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501)

vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX
and
PseudoVMV_X_S_MX with just one pseudo each. These pseud

[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501)

vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX
and
PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR
register
class (just like the actual instruction), so we now only have TableGen
patterns for vectors of LMUL <= 1.
We now rely on the existing combines that shrink LMUL down to 1 for
vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines
later and just inserting the nodes with the correct type in a later
patch.

The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no
longer
carries any information about LMUL, so if it's the only vector pseudo
instruction in a block then it now defaults to LMUL=1.

show more ...


Revision tags: llvmorg-17.0.6
# cf17a24a 27-Nov-2023 Philip Reames <preames@rivosinc.com>

[RISCV] Use subreg extract for extract_vector_elt when vlen is known (#72666)

This is the first in a planned patch series to teach our vector lowering
how to exploit register boundaries in LMUL>1 t

[RISCV] Use subreg extract for extract_vector_elt when vlen is known (#72666)

This is the first in a planned patch series to teach our vector lowering
how to exploit register boundaries in LMUL>1 types when VLEN is known to
be an exact constant. This corresponds to code compiled by clang with
the -mrvv-vector-bits=zvl option.

For extract_vector_elt, if we have a constant index and a known vlen,
then we can identify which register out of a register group is being
accessed. Given this, we can do a sub-register extract for that
register, and then shift any remaining index.

This results in all constant index extracts becoming m1 operations, and
thus eliminates the complexity concern for explode-vector idioms at high
lmul.

show more ...


Revision tags: llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3
# 7b3bbd83 09-Oct-2023 Jay Foad <jay.foad@amd.com>

Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"

This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.


# 2501ae58 09-Oct-2023 Jay Foad <jay.foad@amd.com>

[CodeGen] Really renumber slot indexes before register allocation (#67038)

PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries

[CodeGen] Really renumber slot indexes before register allocation (#67038)

PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.

show more ...


# 45a334d3 04-Oct-2023 Philip Reames <preames@rivosinc.com>

[RISCV] Generaize reduction tree matching to all integer reductions (#68014) (reapply)

This was reverted in 824251c9b349d859a9169196cd9533c619a715ce exposed by this change in a previous patch. Fixe

[RISCV] Generaize reduction tree matching to all integer reductions (#68014) (reapply)

This was reverted in 824251c9b349d859a9169196cd9533c619a715ce exposed by this change in a previous patch. Fixed in 199cbec987ee68d70611db8e7961b43c3dbad83e. Original commit message follows.

This builds on the transform introduced in
https://github.com/llvm/llvm-project/pull/67821, and generalizes it for
all integer reduction types.

A couple of notes:
* This will only form smax/smin/umax/umin reductions when zbb is
enabled. Otherwise, we lower the min/max expressions early. I don't care
about this case, and don't plan to address this further.
* This excludes floating point. Floating point introduces concerns about
associativity. I may or may not do a follow up patch for that case.
* The explodevector test change is mildly undesirable from a clarity
perspective. If anyone sees a good way to rewrite that to stablize the
test, please suggest.

show more ...


# 824251c9 04-Oct-2023 Alex Bradbury <asb@igalia.com>

Revert "[RISCV] Generaize reduction tree matching to all integer reductions (#68014)"

This reverts commit 7a0b9daac9edde4293d2e9fdf30d8b35c04d16a6 and
63bbc250440141b1c51593904fba9bdaa6724280.

I'm

Revert "[RISCV] Generaize reduction tree matching to all integer reductions (#68014)"

This reverts commit 7a0b9daac9edde4293d2e9fdf30d8b35c04d16a6 and
63bbc250440141b1c51593904fba9bdaa6724280.

I'm seeing issues (e.g. on the GCC torture suite) where
combineBinOpOfExtractToReduceTree is called when the V extensions aren't
enabled and triggers a crash due to RISCVSubtarget::getElen asserting.

I'll aim to follow up with a minimal reproducer. Although it's pretty
obvious how to avoid this crash with some extra gating, there are a few
options as to where that should be inserted so I think it's best to
revert and agree the appropriate fix separately.

show more ...


# 7a0b9daa 03-Oct-2023 Philip Reames <preames@rivosinc.com>

[RISCV] Generaize reduction tree matching to all integer reductions (#68014)

This builds on the transform introduced in
https://github.com/llvm/llvm-project/pull/67821, and generalizes it for
all

[RISCV] Generaize reduction tree matching to all integer reductions (#68014)

This builds on the transform introduced in
https://github.com/llvm/llvm-project/pull/67821, and generalizes it for
all integer reduction types.

A couple of notes:
* This will only form smax/smin/umax/umin reductions when zbb is
enabled. Otherwise, we lower the min/max expressions early. I don't care
about this case, and don't plan to address this further.
* This excludes floating point. Floating point introduces concerns about
associativity. I may or may not do a follow up patch for that case.
* The explodevector test change is mildly undesirable from a clarity
perspective. If anyone sees a good way to rewrite that to stablize the
test, please suggest.

show more ...


Revision tags: llvmorg-17.0.2
# f0505c3d 02-Oct-2023 Philip Reames <preames@rivosinc.com>

[RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821)

This change adds two related DAG combines which together will take a
left-reduce scalar add tree of an explode_vector, and w

[RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821)

This change adds two related DAG combines which together will take a
left-reduce scalar add tree of an explode_vector, and will incrementally
form a vector reduction of the vector prefix. If the entire vector is
reduced, the result will be a reduction over the entire vector.

Profitability wise, this relies on vredsum being cheaper than a pair of
extracts and scalar add. Given vredsum is linear in LMUL, and the
vslidedown required for the extract is *also* linear in LMUL, this is
clearly true at higher index values. At N=2, it's a bit questionable,
but I think the vredsum form is probably a better canonical form
anyways.

Note that this only matches left reduces. This happens to be the
motivating example I have (from spec2017 x264). This approach could be
generalized to handle right reduces without much effort, and could be
generalized to handle any reduce whose tree starts with adjacent
elements if desired. The approach fails for a reduce such as (A+C)+(B+D)
because we can't find a root to start the reduce with without scanning
the entire associative add expression. We could maybe explore using
masked reduces for the root node, but that seems of questionable
profitability. (As in, worth questioning - I haven't explored in any
detail.)

This is covering up a deficiency in SLP. If SLP encounters the scalar
form of reduce_or(A) + reduce_sum(a) where a is some common
vectorizeable tree, SLP will sometimes fail to revisit one of the
reductions after vectorizing the other. Fixing this in SLP is hard, and
there's no good reason not to handle the easy cases in the backend.

Another option here would be to do this in VectorCombine or generic DAG.
I chose not to as the profitability of the non-legal typed prefix cases
is very target dependent. I think this makes sense as a starting point,
even if we move it elsewhere later.

This is currently restructed only to add reduces, but obviously makes
sense for any associative reduction operator. Once this is approved, I
plan to extend it in this manner. I'm simply staging work in case we
decide to go in another direction.

show more ...


# cd03d970 29-Sep-2023 Philip Reames <preames@rivosinc.com>

[RISCV] Add test coverage for sum reduction recognition in DAG

And adjust an existing test to not be a simple reduction to preserve test intent.


Revision tags: llvmorg-17.0.1, llvmorg-17.0.0
# e0919b18 13-Sep-2023 Jay Foad <jay.foad@amd.com>

[CodeGen] Renumber slot indexes before register allocation (#66334)

RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering

[CodeGen] Renumber slot indexes before register allocation (#66334)

RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.

show more ...


# 299d710e 11-Sep-2023 Philip Reames <preames@rivosinc.com>

[RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL

This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split bac

[RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL

This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP.

Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2.

There's a subtly here. For this to work, we're *relying* on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG.

It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V):

output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles: 20703
output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles: 23903
output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles: 21604
output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles: 22804
output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles: 15204
output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles: 18404
output/sifive-x280/stack_by_vreg.mca:Total Cycles: 12104
output/sifive-x280/stack_element_by_element.mca:Total Cycles: 4304

I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work.

Differential Revision: https://reviews.llvm.org/D159375

show more ...


Revision tags: llvmorg-17.0.0-rc4
# 7c4f4559 01-Sep-2023 Philip Reames <preames@rivosinc.com>

{RISCV] Add test coverage for fully scalarizing a vector

This pattern comes up heavily when partially vectorizing a forrest in SLP.