fixed-vectors-int-explodevector.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 9122c523	15-Nov-2024	Pengcheng Wang <wangpengcheng.pp@bytedance.com>	[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445) This is based on other targets like PPC/AArch64 and some experiments. This PR will only enable bidirectional schedu [RISCV] Enable bidirectional scheduling and tracking register pressure (#115445) This is based on other targets like PPC/AArch64 and some experiments. This PR will only enable bidirectional scheduling and tracking register pressure. Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome. show more ...
# 97982a8c	05-Nov-2024	dlav-sc <daniil.avdeev@syntacore.com>	[RISCV][CFI] add function epilogue cfi information (#110810) This patch adds CFI instructions in the function epilogue. Before patch: addi sp, s0, -32 ld ra, 24(sp) # 8-byte Folded Reload ld s [RISCV][CFI] add function epilogue cfi information (#110810) This patch adds CFI instructions in the function epilogue. Before patch: addi sp, s0, -32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload addi sp, sp, 32 ret After patch: addi sp, s0, -32 .cfi_def_cfa sp, 32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload .cfi_restore ra .cfi_restore s0 .cfi_restore s1 addi sp, sp, 32 .cfi_def_cfa_offset 0 ret This functionality is already present in `riscv-gcc`, but it’s not in `clang` and this slightly impairs the `lldb` debugging experience, e.g. backtrace. show more ...
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7
# 675e7bd1	21-May-2024	Piyou Chen <piyou.chen@sifive.com>	[RISCV] Support postRA vsetvl insertion pass (#70549) This patch try to get rid of vsetvl implict vl/vtype def-use chain and improve the register allocation quality by moving the vsetvl insertion [RISCV] Support postRA vsetvl insertion pass (#70549) This patch try to get rid of vsetvl implict vl/vtype def-use chain and improve the register allocation quality by moving the vsetvl insertion pass after RVV register allocation It will gain the benefit for the following optimization from 1. unblock scheduler's constraints by removing vl/vtype def-use chain 2. Support RVV re-materialization 3. Support partial spill This patch add a new option `-riscv-vsetvl-after-rvv-regalloc=<1\|0>` to control this feature and default set as disable. show more ...
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 286a366d	16-Jan-2024	Luke Lau <luke@igalia.com>	[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501) vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX and PseudoVMV_X_S_MX with just one pseudo each. These pseud [RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501) vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX and PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR register class (just like the actual instruction), so we now only have TableGen patterns for vectors of LMUL <= 1. We now rely on the existing combines that shrink LMUL down to 1 for vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines later and just inserting the nodes with the correct type in a later patch. The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no longer carries any information about LMUL, so if it's the only vector pseudo instruction in a block then it now defaults to LMUL=1. show more ...
Revision tags: llvmorg-17.0.6
# cf17a24a	27-Nov-2023	Philip Reames <preames@rivosinc.com>	[RISCV] Use subreg extract for extract_vector_elt when vlen is known (#72666) This is the first in a planned patch series to teach our vector lowering how to exploit register boundaries in LMUL>1 t [RISCV] Use subreg extract for extract_vector_elt when vlen is known (#72666) This is the first in a planned patch series to teach our vector lowering how to exploit register boundaries in LMUL>1 types when VLEN is known to be an exact constant. This corresponds to code compiled by clang with the -mrvv-vector-bits=zvl option. For extract_vector_elt, if we have a constant index and a known vlen, then we can identify which register out of a register group is being accessed. Given this, we can do a sub-register extract for that register, and then shift any remaining index. This results in all constant index extracts becoming m1 operations, and thus eliminates the complexity concern for explode-vector idioms at high lmul. show more ...
Revision tags: llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3
# 7b3bbd83	09-Oct-2023	Jay Foad <jay.foad@amd.com>	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.
# 2501ae58	09-Oct-2023	Jay Foad <jay.foad@amd.com>	[CodeGen] Really renumber slot indexes before register allocation (#67038) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries [CodeGen] Really renumber slot indexes before register allocation (#67038) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps. show more ...
# 45a334d3	04-Oct-2023	Philip Reames <preames@rivosinc.com>	[RISCV] Generaize reduction tree matching to all integer reductions (#68014) (reapply) This was reverted in 824251c9b349d859a9169196cd9533c619a715ce exposed by this change in a previous patch. Fixe [RISCV] Generaize reduction tree matching to all integer reductions (#68014) (reapply) This was reverted in 824251c9b349d859a9169196cd9533c619a715ce exposed by this change in a previous patch. Fixed in 199cbec987ee68d70611db8e7961b43c3dbad83e. Original commit message follows. This builds on the transform introduced in https://github.com/llvm/llvm-project/pull/67821, and generalizes it for all integer reduction types. A couple of notes: * This will only form smax/smin/umax/umin reductions when zbb is enabled. Otherwise, we lower the min/max expressions early. I don't care about this case, and don't plan to address this further. * This excludes floating point. Floating point introduces concerns about associativity. I may or may not do a follow up patch for that case. * The explodevector test change is mildly undesirable from a clarity perspective. If anyone sees a good way to rewrite that to stablize the test, please suggest. show more ...
# 824251c9	04-Oct-2023	Alex Bradbury <asb@igalia.com>	Revert "[RISCV] Generaize reduction tree matching to all integer reductions (#68014)" This reverts commit 7a0b9daac9edde4293d2e9fdf30d8b35c04d16a6 and 63bbc250440141b1c51593904fba9bdaa6724280. I'm Revert "[RISCV] Generaize reduction tree matching to all integer reductions (#68014)" This reverts commit 7a0b9daac9edde4293d2e9fdf30d8b35c04d16a6 and 63bbc250440141b1c51593904fba9bdaa6724280. I'm seeing issues (e.g. on the GCC torture suite) where combineBinOpOfExtractToReduceTree is called when the V extensions aren't enabled and triggers a crash due to RISCVSubtarget::getElen asserting. I'll aim to follow up with a minimal reproducer. Although it's pretty obvious how to avoid this crash with some extra gating, there are a few options as to where that should be inserted so I think it's best to revert and agree the appropriate fix separately. show more ...
# 7a0b9daa	03-Oct-2023	Philip Reames <preames@rivosinc.com>	[RISCV] Generaize reduction tree matching to all integer reductions (#68014) This builds on the transform introduced in https://github.com/llvm/llvm-project/pull/67821, and generalizes it for all [RISCV] Generaize reduction tree matching to all integer reductions (#68014) This builds on the transform introduced in https://github.com/llvm/llvm-project/pull/67821, and generalizes it for all integer reduction types. A couple of notes: * This will only form smax/smin/umax/umin reductions when zbb is enabled. Otherwise, we lower the min/max expressions early. I don't care about this case, and don't plan to address this further. * This excludes floating point. Floating point introduces concerns about associativity. I may or may not do a follow up patch for that case. * The explodevector test change is mildly undesirable from a clarity perspective. If anyone sees a good way to rewrite that to stablize the test, please suggest. show more ...
Revision tags: llvmorg-17.0.2
# f0505c3d	02-Oct-2023	Philip Reames <preames@rivosinc.com>	[RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821) This change adds two related DAG combines which together will take a left-reduce scalar add tree of an explode_vector, and w [RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821) This change adds two related DAG combines which together will take a left-reduce scalar add tree of an explode_vector, and will incrementally form a vector reduction of the vector prefix. If the entire vector is reduced, the result will be a reduction over the entire vector. Profitability wise, this relies on vredsum being cheaper than a pair of extracts and scalar add. Given vredsum is linear in LMUL, and the vslidedown required for the extract is also linear in LMUL, this is clearly true at higher index values. At N=2, it's a bit questionable, but I think the vredsum form is probably a better canonical form anyways. Note that this only matches left reduces. This happens to be the motivating example I have (from spec2017 x264). This approach could be generalized to handle right reduces without much effort, and could be generalized to handle any reduce whose tree starts with adjacent elements if desired. The approach fails for a reduce such as (A+C)+(B+D) because we can't find a root to start the reduce with without scanning the entire associative add expression. We could maybe explore using masked reduces for the root node, but that seems of questionable profitability. (As in, worth questioning - I haven't explored in any detail.) This is covering up a deficiency in SLP. If SLP encounters the scalar form of reduce_or(A) + reduce_sum(a) where a is some common vectorizeable tree, SLP will sometimes fail to revisit one of the reductions after vectorizing the other. Fixing this in SLP is hard, and there's no good reason not to handle the easy cases in the backend. Another option here would be to do this in VectorCombine or generic DAG. I chose not to as the profitability of the non-legal typed prefix cases is very target dependent. I think this makes sense as a starting point, even if we move it elsewhere later. This is currently restructed only to add reduces, but obviously makes sense for any associative reduction operator. Once this is approved, I plan to extend it in this manner. I'm simply staging work in case we decide to go in another direction. show more ...
# cd03d970	29-Sep-2023	Philip Reames <preames@rivosinc.com>	[RISCV] Add test coverage for sum reduction recognition in DAG And adjust an existing test to not be a simple reduction to preserve test intent.
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0
# e0919b18	13-Sep-2023	Jay Foad <jay.foad@amd.com>	[CodeGen] Renumber slot indexes before register allocation (#66334) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering [CodeGen] Renumber slot indexes before register allocation (#66334) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions. show more ...
# 299d710e	11-Sep-2023	Philip Reames <preames@rivosinc.com>	[RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split bac [RISCV] Lower fixed vectors extract_vector_elt through stack at high LMUL This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP. Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2. There's a subtly here. For this to work, we're relying on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG. It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V): output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles: 20703 output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles: 23903 output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles: 21604 output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles: 22804 output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles: 15204 output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles: 18404 output/sifive-x280/stack_by_vreg.mca:Total Cycles: 12104 output/sifive-x280/stack_element_by_element.mca:Total Cycles: 4304 I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work. Differential Revision: https://reviews.llvm.org/D159375 show more ...
Revision tags: llvmorg-17.0.0-rc4
# 7c4f4559	01-Sep-2023	Philip Reames <preames@rivosinc.com>	{RISCV] Add test coverage for fully scalarizing a vector This pattern comes up heavily when partially vectorizing a forrest in SLP.