#
00798354 |
| 13-Jun-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] First step towards VPlan cost modeling. (#92555)
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost
[VPlan] First step towards VPlan cost modeling. (#92555)
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
show more ...
|
Revision tags: llvmorg-18.1.7 |
|
#
577785c5 |
| 18-May-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Remove unused removeLastOperand (NFC).
The last use of the function has been removed a while ago. Remove the unused function.
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
e2a72fa5 |
| 19-Apr-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Introduce recipes for VP loads and stores. (#87816)
Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
https://githu
[VPlan] Introduce recipes for VP loads and stores. (#87816)
Introduce new subclasses of VPWidenMemoryRecipe for VP
(vector-predicated) loads and stores to address multiple TODOs from
https://github.com/llvm/llvm-project/pull/76172
Note that the introduction of the new recipes also improves code-gen for
VP gather/scatters by removing the redundant header mask. With the new
approach, it is not sufficient to look at users of the widened canonical
IV to find all uses of the header mask.
In some cases, a widened IV is used instead of separately widening the
canonical IV. To handle that, first collect all VPValues representing header
masks (by looking at users of both the canonical IV and widened inductions
that are canonical) and then checking all users (recursively) of those header
masks.
Depends on https://github.com/llvm/llvm-project/pull/87411.
PR: https://github.com/llvm/llvm-project/pull/87816
show more ...
|
#
a9bafe91 |
| 17-Apr-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411)
This patch introduces a new VPWidenMemoryRecipe base class and distinct
sub-classes to model loads and stores.
This is a first step
[VPlan] Split VPWidenMemoryInstructionRecipe (NFCI). (#87411)
This patch introduces a new VPWidenMemoryRecipe base class and distinct
sub-classes to model loads and stores.
This is a first step in an effort to simplify and modularize code
generation for widened loads and stores and enable adding further more
specialized memory recipes.
PR: https://github.com/llvm/llvm-project/pull/87411
show more ...
|
Revision tags: llvmorg-18.1.4 |
|
#
34777c23 |
| 16-Apr-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Don't mark VPBlendRecipe as phi-like.
VPBlendRecipes don't get lowered to phis and usually do not appear at the beginning of blocks, due to their masks appearing before them.
This effective
[VPlan] Don't mark VPBlendRecipe as phi-like.
VPBlendRecipes don't get lowered to phis and usually do not appear at the beginning of blocks, due to their masks appearing before them.
This effectively relaxes an over-eager verifier message.
Fixes https://github.com/llvm/llvm-project/issues/88297. Fixes https://github.com/llvm/llvm-project/issues/88804.
show more ...
|
#
6254b6dd |
| 15-Apr-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Version VPValue names in VPSlotTracker. (#81411)
This patch restructures the way names for printing VPValues are handled.
It moves the logic to generate names for printing to VPSlotTracker
[VPlan] Version VPValue names in VPSlotTracker. (#81411)
This patch restructures the way names for printing VPValues are handled.
It moves the logic to generate names for printing to VPSlotTracker.
VPSlotTracker will now version names of the same underlying value if it
is used by multiple VPValues, by adding a .V suffix to the name.
This fixes cases where at the moment the same name is printed for
different VPValues.
PR: https://github.com/llvm/llvm-project/pull/81411
show more ...
|
#
413a66f3 |
| 04-Apr-2024 |
Alexey Bataev <a.bataev@outlook.com> |
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.
Currently the Loop
[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new tail-folding mode using EVL. (#76172)
This patch introduces generating VP intrinsics in the Loop Vectorizer.
Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.
Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.
Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.
Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.
Differential Revision: https://reviews.llvm.org/D99750
show more ...
|
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
911055e3 |
| 26-Feb-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271)
At the moment, some VPInstructions create only a single scalar value,
but use VPTransformatState's 'vector' storage for this
[VPlan] Consistently use (Part, 0) for first lane scalar values (#80271)
At the moment, some VPInstructions create only a single scalar value,
but use VPTransformatState's 'vector' storage for this value. Those
values are effectively uniform-per-VF (or in some cases
uniform-across-VF-and-UF). Using the vector/per-part storage doesn't
interact well with other recipes, that more accurately using (Part,
Lane) to look up scalar values and prevents VPInstructions creating
scalars from interacting with other recipes working with scalars.
This PR tries to unify handling of scalars by using (Part, 0) for scalar
values where only the first lane is demanded. This allows using
VPInstructions with other recipes like VPScalarCastRecipe and is also
needed when using VPInstructions in more cases otuside the vector loop
region to generate scalars.
Depends on https://github.com/llvm/llvm-project/pull/80269
show more ...
|
Revision tags: llvmorg-18.1.0-rc3 |
|
#
9923d29c |
| 20-Feb-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Merge main VPlan verifer with HCFG verifier.
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by
[VPlan] Merge main VPlan verifer with HCFG verifier.
Unify VPlan verifiers in verifyVPlanIsValid. This adds verification for various properties on blocks to the verifier used for VPlans generated by the inner loop vectorizer. It also adds def-use checks for the verifier used in the VPlan native path.
This drops the separate flag to enable HCFG verification. Instead, all VPlans are verified once they have been created, if assertions are enabled.
This also removes VPWidenPHIRecipe from VPHeaderPHIRecipe; it is used to model any phi node in the native path.
show more ...
|
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
0ab539fd |
| 26-Jan-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.
This allows creating scala
[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113)
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.
This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.
Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.
PR: https://github.com/llvm/llvm-project/pull/78113
show more ...
|
Revision tags: llvmorg-19-init |
|
#
f18536d6 |
| 01-Jan-2024 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed
[VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.
https://github.com/llvm/llvm-project/pull/72164
show more ...
|
#
d8607109 |
| 11-Dec-2023 |
Shao-Ce SUN <sunshaoce@outlook.com> |
[NFC][VPlan] Simplify VPValue::removeUser (#74708)
Replaced explicit loops with find + erase.
|
#
a5891fa4 |
| 08-Dec-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Initial modeling of VF * UF as VPValue. (#74761)
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPla
[VPlan] Initial modeling of VF * UF as VPValue. (#74761)
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)
This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.
The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.
Moved from Phabricator (https://reviews.llvm.org/D157322)
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
a0022719 |
| 06-Nov-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Add VPValue::replaceUsesWithIf (NFCI).
Add replaceUsesWithIf helper and use it in a few places.
|
Revision tags: llvmorg-17.0.4 |
|
#
f9306f6d |
| 25-Oct-2023 |
Kazu Hirata <kazu@google.com> |
[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.
We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.
Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
show more ...
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
#
e3afe0b8 |
| 05-May-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow creating widened casts without underlying instruction, introduc
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow creating widened casts without underlying instruction, introduce a new VPWidenCastRecipe that also holds the result type.
This functionality will be used in a follow-up patch to implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149081
show more ...
|
#
b9efffa7 |
| 03-May-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Add assignSlot(const VPBasicBlock *) (NFC).
Factor out utility to simplify D147964 as sugested.
|
Revision tags: llvmorg-16.0.3 |
|
#
3157f03a |
| 24-Apr-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Add VPValue::isLiveIn() (NFC).
This helps to clarify checks in multiple places.
Suggested as cleanup in D147892.
|
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
194f3dc8 |
| 15-Feb-2023 |
Michael Maitland <michaeltmaitland@gmail.com> |
[VPlan] VPWidenIntOrFpInductionRecipe inherits from VPHeaderPHIRecipe
Differential Revision: https://reviews.llvm.org/D144125
|
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
d47bdae2 |
| 17-Jan-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Remove duplicated VPValue IDs (NFCI).
At the moment, both VPValue and VPDef have an ID used when casting via classof. This duplication is cumbersome, because it requires adding IDs for new r
[VPlan] Remove duplicated VPValue IDs (NFCI).
At the moment, both VPValue and VPDef have an ID used when casting via classof. This duplication is cumbersome, because it requires adding IDs for new recipes twice and also requires setting them twice. In a few cases, there's only a VPDef ID and no VPValue ID, which can cause same confusion.
To simplify things, remove the VPValue IDs for different recipes. Instead, only retain the generic VPValue ID (= used VPValues without a corresponding defining recipe) and VPVRecipe for VPValues that are defined by recipes that inherit from VPValue.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D140848
show more ...
|
#
133f0174 |
| 17-Jan-2023 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC).
This specialization is not needed any longer as VPRecipeBase inherits from VPUser and getDefiningRecipe returns a VPRecipeBase.
|
Revision tags: llvmorg-15.0.7 |
|
#
0c5df7cd |
| 30-Nov-2022 |
Florian Hahn <flo@fhahn.com> |
Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b.
The updated version fixes a crash by checking the induction ki
Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b.
The updated version fixes a crash by checking the induction kind instead of the opcode; for integer inductions, the step is always added, but the opcode might not be set.
show more ...
|
Revision tags: llvmorg-15.0.6 |
|
#
bf15f1e4 |
| 28-Nov-2022 |
Florian Hahn <flo@fhahn.com> |
Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8.
This triggers an assertion during AArch64 stage2 builds. Revert
Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8.
This triggers an assertion during AArch64 stage2 builds. Revert while I investigate.
See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio
show more ...
|
#
0fa666ec |
| 28-Nov-2022 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
This patch splits off the logic to transform the canonical IV to a a value for an induction with a different start and step. This transf
[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
This patch splits off the logic to transform the canonical IV to a a value for an induction with a different start and step. This transformation only needs to be done once (independent of VF/UF) and enables sinking of VPScalarIVStepsRecipe as follow-up.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D133758
show more ...
|
#
55f56cdc |
| 16-Nov-2022 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC).
This clarifies the intention of code that uses the helper.
Suggested by @Ayal during review of D136068, thanks!
|