Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
b759020c |
| 11-Dec-2024 |
LiqinWeng <liqin.weng@spacemit.com> |
[LV][EVL] Support cast instruction with EVL-vectorization (#108351)
|
Revision tags: llvmorg-19.1.5 |
|
#
82821254 |
| 28-Nov-2024 |
Florian Hahn <flo@fhahn.com> |
[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758)
If IVUpdateMayOverflow is false, we proved that the induction increment
cannot overflow in the vector loop. This allows setting NUW in some
ca
[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758)
If IVUpdateMayOverflow is false, we proved that the induction increment
cannot overflow in the vector loop. This allows setting NUW in some
cases when folding the tail.
PR: https://github.com/llvm/llvm-project/pull/111758
show more ...
|
#
56c091ea |
| 21-Nov-2024 |
Paul Walker <paul.walker@arm.com> |
[LLVM][IR] Use splat syntax when printing ConstantExpr based splats. (#116856)
This brings the printing of scalable vector constant splats inline with
their fixed length counterparts.
|
Revision tags: llvmorg-19.1.4 |
|
#
4480a22c |
| 06-Nov-2024 |
Mel Chen <mel.chen@sifive.com> |
[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641)
Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a se
[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641)
Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a select operation to the RHS instead of
undef. Currently, it is applied to out-loop reduction for EVL
vectorization.
This patch performs transformation to convert
select(header_mask, LHS, RHS)
into
vp.merge(all-true, LHS, RHS, EVL)
And always use the predicated reduction select to set the incoming value
of the reduction phi to support out-loop reduction when using tail
folding with EVL.
TODO: Postpone the adjustment of the predicated reduction select to
VPlanTransform. The current adjustment might be too early, which could
lead to a situation where the predicated reduction select is adjusted,
but the EVL recipes cannot be successfully generated during
VPlanTransform.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
00e40c9b |
| 06-Sep-2024 |
Kolya Panchenko <87679760+nikolaypanchenko@users.noreply.github.com> |
[LV] Support binary and unary operations with EVL-vectorization (#93854)
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` i
[LV] Support binary and unary operations with EVL-vectorization (#93854)
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
show more ...
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
dfde1a72 |
| 28-Aug-2024 |
Mel Chen <mel.chen@sifive.com> |
[LV][NFC] Update and clean up the test case LoopVectorize/RISCV/inloop-reduction.ll. (#102907)
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
4eb30cfb |
| 16-Jul-2024 |
Mel Chen <mel.chen@sifive.com> |
[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.
Address one of TODOs from #761
[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184)
Following from #87816, add VPReductionEVLRecipe to describe vector
predication reduction.
Address one of TODOs from #76172.
show more ...
|
#
a00754bb |
| 12-Jul-2024 |
Mel Chen <mel.chen@sifive.com> |
[LV] Fix the cost of min/max reductions. (#98453)
This patch updates the function `getReductionPatternCost` to handle the
cost of min/max reductions by `TTI.getMinMaxReductionCost`.
|
#
99d6c6d9 |
| 05-Jul-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.
Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.
After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.
This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.
Depends on https://github.com/llvm/llvm-project/pull/92525
PR: https://github.com/llvm/llvm-project/pull/92651
show more ...
|
#
3808ba78 |
| 20-Jun-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle bloc
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.
Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.
PR: https://github.com/llvm/llvm-project/pull/95816
show more ...
|
Revision tags: llvmorg-18.1.8 |
|
#
c46a6e6c |
| 12-Jun-2024 |
Florian Hahn <flo@fhahn.com> |
[LV] Remove unnecessary getRuntimeVF call when computing vector TC.
As Step is VF * UF, there is no need to compute it again, which may require multiple instructions for scalable VFs.
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4 |
|
#
413a66f3 |
| 04-Apr-2024 |
Alexey Bataev <a.bataev@outlook.com> |
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.
Currently the Loop
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.
Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.
Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.
Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.
Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.
Differential Revision: https://reviews.llvm.org/D99750
show more ...
|
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
5ea6a3fc |
| 08-Dec-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Compute scalable VF in preheader for induction increment. (#74762)
UF * VF is loop invariant and can be computed directly in the preheader.
This prepares the code for #74761 and reduces the
[VPlan] Compute scalable VF in preheader for induction increment. (#74762)
UF * VF is loop invariant and can be computed directly in the preheader.
This prepares the code for #74761 and reduces the test changes.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
8d16c680 |
| 10-Feb-2023 |
Luke Lau <luke@igalia.com> |
[RISCV] Increase default vectorizer LMUL to 2
After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst
[RISCV] Increase default vectorizer LMUL to 2
After some discussion and experimentation, we have seen that changing the default number of vector register bits to LMUL=2 strikes a sweet spot. Whilst we could be clever here and make the vectorizer smarter about dynamically selecting an LMUL that a) Doesn't affect register pressure b) Suitable for the microarchitecture we would need to teach its heuristics about RISC-V register grouping specifics. Instead this just does the easy, pragmatic thing by changing the default to a safe value that doesn't affect register pressure signifcantly[1], but should increase throughput and unlock more interleaving.
[1] Register spilling when compiling sqlite at various levels of `-riscv-v-register-bit-width-lmul`:
LMUL=1 2573 spills LMUL=2 2583 spills LMUL=4 2819 spills LMUL=8 3256 spills
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D143723
show more ...
|
#
15f9cf16 |
| 21-Feb-2023 |
Luke Lau <luke@igalia.com> |
[LV][RISCV] Don't interleave scalable vector loops
It's less clear with scalable vectors than fixed length vectors that interleaving exposes more ILP, as scalable vectors can be thought of a sort of
[LV][RISCV] Don't interleave scalable vector loops
It's less clear with scalable vectors than fixed length vectors that interleaving exposes more ILP, as scalable vectors can be thought of a sort of hardware form of interleaving, especially with larger LMULs. This also addresses the unexpected additional unrolling that occurs when using larger LMULs in the loop vectorizer.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144485
show more ...
|
#
5a115452 |
| 08-Feb-2023 |
Sander de Smalen <sander.desmalen@arm.com> |
Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for P
Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part, e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292', which was obviously wrong.
Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.
This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.
show more ...
|
#
1f01cdda |
| 08-Feb-2023 |
Sander de Smalen <sander.desmalen@arm.com> |
Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."
This patch causes a regression, so reverting it while I investigate the issue.
This reverts commit
Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."
This patch causes a regression, so reverting it while I investigate the issue.
This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.
show more ...
|
Revision tags: llvmorg-16.0.0-rc2 |
|
#
e6eb84a1 |
| 06-Feb-2023 |
Sander de Smalen <sander.desmalen@arm.com> |
[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
This is specifically relevant for loops that vectorize using a scalable VF, where the code results in:
%v
[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
This is specifically relevant for loops that vectorize using a scalable VF, where the code results in:
%vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %gep = getelementptr ..., i32 %vf.part1
Which InstCombine then changes into:
%vscale = call i32 llvm.vscale.i32() %vf.part1 = mul i32 %vscale, 4 %vf.part1.zext = sext i32 %vf.part1 to i64 %gep = getelementptr ..., i32 %vf.part1.zext
D143016 tried to remove these extends, but that only works when the call to llvm.vscale.i32() has a single use. After doing any kind of CSE on these calls the combine no longer kicks in.
It seems more sensible to ask DataLayout what type to use, rather than relying on InstCombine to insert the extend and hoping it can fold it away.
I've only changed this for indices that are not constant, because I vaguely remember there was a reason for sticking with i32. It would also mean patching up loads more tests.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D143267
show more ...
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
5b400150 |
| 14-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[LoopVectorize] Convert some tests to opaque pointers (NFC)
For these tests update_test_checks.py had to be rerun.
|
#
be51fa45 |
| 05-Dec-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
#
45bae1be |
| 04-Aug-2022 |
jacquesguan <Jianjian.Guan@streamcomputing.com> |
[RISCV][test] Add inloop reduction vectorize test. NFC
|