inloop-reduction.ll - OpenGrok history log for /llvm-project/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# b759020c	11-Dec-2024	LiqinWeng <liqin.weng@spacemit.com>	[LV][EVL] Support cast instruction with EVL-vectorization (#108351)
Revision tags: llvmorg-19.1.5
# 82821254	28-Nov-2024	Florian Hahn <flo@fhahn.com>	[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758) If IVUpdateMayOverflow is false, we proved that the induction increment cannot overflow in the vector loop. This allows setting NUW in some ca [LV] Use IVUpdateMayOverflow to set HasNUW. (#111758) If IVUpdateMayOverflow is false, we proved that the induction increment cannot overflow in the vector loop. This allows setting NUW in some cases when folding the tail. PR: https://github.com/llvm/llvm-project/pull/111758 show more ...
# 56c091ea	21-Nov-2024	Paul Walker <paul.walker@arm.com>	[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856) This brings the printing of scalable vector constant splats inline with their fixed length counterparts.
Revision tags: llvmorg-19.1.4
# 4480a22c	06-Nov-2024	Mel Chen <mel.chen@sifive.com>	[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a se [LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform. show more ...
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 00e40c9b	06-Sep-2024	Kolya Panchenko <87679760+nikolaypanchenko@users.noreply.github.com>	[LV] Support binary and unary operations with EVL-vectorization (#93854) The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL argument. The new recipe replaces `VPWidenRecipe` i [LV] Support binary and unary operations with EVL-vectorization (#93854) The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL argument. The new recipe replaces `VPWidenRecipe` in `tryAddExplicitVectorLength` for each binary and unary operations. Follow up patches will extend support for remaining cases, like `FCmp` and `ICmp` show more ...
Revision tags: llvmorg-19.1.0-rc4
# dfde1a72	28-Aug-2024	Mel Chen <mel.chen@sifive.com>	[LV][NFC] Update and clean up the test case LoopVectorize/RISCV/inloop-reduction.ll. (#102907)
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 4eb30cfb	16-Jul-2024	Mel Chen <mel.chen@sifive.com>	[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #761 [LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172. show more ...
# a00754bb	12-Jul-2024	Mel Chen <mel.chen@sifive.com>	[LV] Fix the cost of min/max reductions. (#98453) This patch updates the function `getReductionPatternCost` to handle the cost of min/max reductions by `TTI.getMinMaxReductionCost`.
# 99d6c6d9	05-Jul-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block [VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651 show more ...
# 3808ba78	20-Jun-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Model middle block via VPIRBasicBlock. (#95816) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle bloc [VPlan] Model middle block via VPIRBasicBlock. (#95816) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in https://github.com/llvm/llvm-project/pull/92651. PR: https://github.com/llvm/llvm-project/pull/95816 show more ...
Revision tags: llvmorg-18.1.8
# c46a6e6c	12-Jun-2024	Florian Hahn <flo@fhahn.com>	[LV] Remove unnecessary getRuntimeVF call when computing vector TC. As Step is VF * UF, there is no need to compute it again, which may require multiple instructions for scalable VFs.
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4
# 413a66f3	04-Apr-2024	Alexey Bataev <a.bataev@outlook.com>	[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop [LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750 show more ...
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 5ea6a3fc	08-Dec-2023	Florian Hahn <flo@fhahn.com>	[VPlan] Compute scalable VF in preheader for induction increment. (#74762) UF * VF is loop invariant and can be computed directly in the preheader. This prepares the code for #74761 and reduces the [VPlan] Compute scalable VF in preheader for induction increment. (#74762) UF * VF is loop invariant and can be computed directly in the preheader. This prepares the code for #74761 and reduces the test changes. show more ...
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3
# 8d16c680	10-Feb-2023	Luke Lau <luke@igalia.com>	[RISCV] Increase default vectorizer LMUL to 2 After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst [RISCV] Increase default vectorizer LMUL to 2 After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that a) Doesn't affect register pressure b) Suitable for the microarchitecture we would need to teach its heuristics about RISC-V register grouping specifics. Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving. [1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`: LMUL=1 2573 spills LMUL=2 2583 spills LMUL=4 2819 spills LMUL=8 3256 spills Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D143723 show more ...
# 15f9cf16	21-Feb-2023	Luke Lau <luke@igalia.com>	[LV][RISCV] Don't interleave scalable vector loops It's less clear with scalable vectors than fixed length vectors that interleaving exposes more ILP, as scalable vectors can be thought of a sort of [LV][RISCV] Don't interleave scalable vector loops It's less clear with scalable vectors than fixed length vectors that interleaving exposes more ILP, as scalable vectors can be thought of a sort of hardware form of interleaving, especially with larger LMULs. This also addresses the unexpected additional unrolling that occurs when using larger LMULs in the loop vectorizer. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144485 show more ...
# 5a115452	08-Feb-2023	Sander de Smalen <sander.desmalen@arm.com>	Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for P Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong. Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this. This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b. show more ...
# 1f01cdda	08-Feb-2023	Sander de Smalen <sander.desmalen@arm.com>	Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices." This patch causes a regression, so reverting it while I investigate the issue. This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e. show more ...
Revision tags: llvmorg-16.0.0-rc2
# e6eb84a1	06-Feb-2023	Sander de Smalen <sander.desmalen@arm.com>	[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %v [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices. This is specifically relevant for loops that vectorize using a scalable VF, where the code results in: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1 Which InstCombine then changes into: %vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in. It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away. I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D143267 show more ...
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 5b400150	14-Dec-2022	Nikita Popov <npopov@redhat.com>	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.
# be51fa45	05-Dec-2022	Roman Lebedev <lebedev.ri@gmail.com>	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2
# 45bae1be	04-Aug-2022	jacquesguan <Jianjian.Guan@streamcomputing.com>	[RISCV][test] Add inloop reduction vectorize test. NFC