History log of /llvm-project/llvm/lib/Transforms/Vectorize/VPlan.cpp (Results 301 – 325 of 360)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# bad7d6b3 24-Aug-2020 Francesco Petrogalli <francesco.petrogalli@arm.com>

Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]"

Reverting because the commit message doesn't reflect the one agreed on
phabricator at https://reviews.llvm.org/D85794.

This r

Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]"

Reverting because the commit message doesn't reflect the one agreed on
phabricator at https://reviews.llvm.org/D85794.

This reverts commit c8d2b065b98fa91139cc7bb1fd1407f032ef252e.

show more ...


# c8d2b065 07-Aug-2020 Francesco Petrogalli <francesco.petrogalli@arm.com>

[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]

Changes:

* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` a

[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]

Changes:

* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.

To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:

1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.

Differential Revision: https://reviews.llvm.org/D85794

show more ...


# 745bf6cf 06-Aug-2020 David Green <david.green@arm.com>

[LoopVectorizer] Inloop vector reductions

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them toget

[LoopVectorizer] Inloop vector reductions

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069

show more ...


# 3c39db0c 05-Aug-2020 Jordan Rupprecht <rupprecht@google.com>

Revert "[LoopVectorizer] Inloop vector reductions"

This reverts commit e9761688e41cb979a1fa6a79eb18145a75104933. It breaks the build:

```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:

Revert "[LoopVectorizer] Inloop vector reductions"

This reverts commit e9761688e41cb979a1fa6a79eb18145a75104933. It breaks the build:

```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
return ReductionOperations;
```

show more ...


# e9761688 05-Aug-2020 David Green <david.green@arm.com>

[LoopVectorizer] Inloop vector reductions

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them toget

[LoopVectorizer] Inloop vector reductions

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069

show more ...


Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2
# c1034d04 17-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

Follow up of rGe345d547a0d5, and attempt to pacify buildbot:

"error: 'get' is deprecated: The base class version of get with the scalable
argument defaulted to false is deprecated."

Changed VectorT

Follow up of rGe345d547a0d5, and attempt to pacify buildbot:

"error: 'get' is deprecated: The base class version of get with the scalable
argument defaulted to false is deprecated."

Changed VectorType::get() -> FixedVectorType::get().

show more ...


# e345d547 17-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops"

Fixed ARM regression test.

Please see the original commit message rG47650451738c for details.


# d4e183f6 17-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops"

This reverts commit 47650451738c821993c763356854b560a0f9f550
while I investigate the build bot failures.


# 47650451 10-Jun-2020 Sjoerd Meijer <sjoerd.meijer@arm.com>

[LV] Emit @llvm.get.active.mask for tail-folded loops

This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised
loops if the intrinsic is supported by the backend, which is checke

[LV] Emit @llvm.get.active.mask for tail-folded loops

This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised
loops if the intrinsic is supported by the backend, which is checked by
querying TargetTransform hook emitGetActiveLaneMask.

This intrinsic creates a mask representing active and inactive vector lanes,
which is used by the masked load/store instructions that are created for
tail-folded loops. The semantics of @llvm.get.active.mask are described here in
LangRef:

https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics

This intrinsic is also used to provide a hint to the backend. That is, the
second argument of the intrinsic represents the back-edge taken count of the
loop. For MVE, for example, we use that to set up tail-predication, which is a
new form of predication in MVE for vector loops that implicitely predicates the
last vector loop iteration by implicitely setting active/inactive lanes, i.e.
the tail loop is predicated. In order to set up a tail-predicated vector loop,
we need to know the number of data elements processed by the vector loop, which
corresponds the the tripcount of the scalar loop, which we can now reconstruct
using @llvm.get.active.mask.

Differential Revision: https://reviews.llvm.org/D79100

show more ...


# 13bf6039 22-May-2020 Anh Tuyen Tran <anhtuyen@ca.ibm.com>

Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1

Summary:
When handling loops whose VF is 1, fold-tail vectorization sets the
backedge taken count of the original loop with

Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1

Summary:
When handling loops whose VF is 1, fold-tail vectorization sets the
backedge taken count of the original loop with a vector of a single
element. This causes type-mismatch during instruction generartion.

The purpose of this patch is toto address the case of VF==1.

Reviewer: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn), gilr (Gil Rapaport), rengolin (Renato Golin)

Reviewed By: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn)

Subscribers: Ayal (Ayal Zaks), rkruppe (Hanna Kruppe), bmahjour (Bardia Mahjour), rogfer01 (Roger Ferrer Ibanez), vkmr (Vineet Kumar), bollu (Siddharth Bhat), hiraditya (Aditya Kumar), llvm-commits (Mailing List llvm-commits)

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D79976

show more ...


Revision tags: llvmorg-10.0.1-rc1
# 4c8285c7 14-May-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Move emission of \\l\"+\n to dumpBasicBlock (NFC).

The patch standardizes printing of VPRecipes a bit, by hoisting out the
common emission of \\l\"+\n. It simplifies the code and is also a f

[VPlan] Move emission of \\l\"+\n to dumpBasicBlock (NFC).

The patch standardizes printing of VPRecipes a bit, by hoisting out the
common emission of \\l\"+\n. It simplifies the code and is also a first
step towards untangling printing from DOT format output, with the goal
of making the DOT output optional and to provide a more concise debug
output if DOT output is disabled.

Reviewers: gilr, Ayal, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D78883

show more ...


# a3c964a2 25-Apr-2020 Ayal Zaks <ayal.zaks@intel.com>

[LV] Fix recording of BranchTakenCount for FoldTail

When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask.

[LV] Fix recording of BranchTakenCount for FoldTail

When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028

Differential Revision: https://reviews.llvm.org/D78847

show more ...


# 18138e02 13-Apr-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Introduce VPWidenSelectRecipe (NFC).

Widening a selects depends on whether the condition is loop invariant or
not. Rather than checking during codegen-time, the information can be
recorded a

[VPlan] Introduce VPWidenSelectRecipe (NFC).

Widening a selects depends on whether the condition is loop invariant or
not. Rather than checking during codegen-time, the information can be
recorded at the VPlan construction time.

This was suggested as part of D76992, to reduce the reliance on
accessing the original underlying IR values.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77869

show more ...


# e2a18678 06-Apr-2020 Gil Rapaport <gil.rapaport@intel.com>

[LV] Add VPValue operands to VPBlendRecipe (NFCI)

InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
li

[LV] Add VPValue operands to VPBlendRecipe (NFCI)

InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces VPValues for VPBlendRecipe to use as the values to
blend. The recipe is generated with VPValues wrapping the phi's incoming values
of the scalar phi. This reduces ingredient def-use usage by ILV as a step
towards full VPlan-based def-use relations.

Differential Revision: https://reviews.llvm.org/D77539

show more ...


# 16784892 06-Apr-2020 Ayal Zaks <ayal.zaks@intel.com>

[LV] FoldTail w/o Primary Induction

Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.

The canoni

[LV] FoldTail w/o Primary Induction

Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.

The canonical scalar IV having start = 0 and step = VF*UF, created during code
-gen to control the vector loop, is widened into a canonical vector IV having
start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and
step = <VF*UF, VF*UF, ..., VF*UF>.

Differential Revision: https://reviews.llvm.org/D77635

show more ...


# 90be3c24 06-Apr-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Introduce new VPWidenCallRecipe (NFC).

This patch moves calls to their own recipe, to simplify the transition
to VPUser for operands of VPWidenRecipe, as discussed in D76992.

Subsequently a

[VPlan] Introduce new VPWidenCallRecipe (NFC).

This patch moves calls to their own recipe, to simplify the transition
to VPUser for operands of VPWidenRecipe, as discussed in D76992.

Subsequently additional information can be added to the recipe rather
than computing it during the execute step.

Reviewers: rengolin, Ayal, gilr, hsaito

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77467

show more ...


# 49d00824 29-Mar-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).

This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it

[VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).

This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.

Discussed as part of D74695.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76988

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6
# 32851f8d 21-Mar-2020 Guillaume Chatelet <gchatelet@google.com>

[Alignment][NFC] Deprecate VectorUtils::getAlignment

Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/

[Alignment][NFC] Deprecate VectorUtils::getAlignment

Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76542

show more ...


Revision tags: llvmorg-10.0.0-rc5
# fd2c15e6 18-Mar-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Do not print mapping for Value2VPValue.

The latest improvements to VPValue printing make this mapping clear when
printing the operand. Printing the mapping separately is not required
any lon

[VPlan] Do not print mapping for Value2VPValue.

The latest improvements to VPValue printing make this mapping clear when
printing the operand. Printing the mapping separately is not required
any longer.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76375

show more ...


# e6a74803 18-Mar-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Use underlying value for printing, if available.

When the an underlying value is available, we can use its name for
printing, as discussed in D73078.

Reviewers: rengolin, hsaito, Ayal, gilr

[VPlan] Use underlying value for printing, if available.

When the an underlying value is available, we can use its name for
printing, as discussed in D73078.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76200

show more ...


Revision tags: llvmorg-10.0.0-rc4
# 40e7bfc4 05-Mar-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Use consecutive numbers to print VPValues instead of addresses.

Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are la

[VPlan] Use consecutive numbers to print VPValues instead of addresses.

Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are large numbers with
varying distance between them.

This patch adds a simple slot tracker, similar to the ModuleSlotTracker
used for IR values. In order to dump a VPValue or anything containing a
VPValue, a slot tracker for the enclosing VPlan needs to be created. The
existing VPlanPrinter can take care of that for the existing code. We
assign consecutive numbers to each VPValue we encounter in a reverse
post order traversal of the VPlan.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D73078

show more ...


Revision tags: llvmorg-10.0.0-rc3
# 05afa555 03-Mar-2020 Florian Hahn <flo@fhahn.com>

[VPlan] Add getPlan() to VPBlockBase.

This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.

VPBloc

[VPlan] Add getPlan() to VPBlockBase.

This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.

VPBlockBase contains a VPlan pointer, but it should only be set for
the entry block of a plan. This allows moving blocks without updating
the pointer for each moved block and in the future we might introduce a
parent relationship between plans and blocks, similar to the one in LLVM IR.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D74445

show more ...


Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2
# 8647a72c 28-Nov-2019 Gil Rapaport <gil.rapaport@intel.com>

[LV] VPValues for memory operation pointers (NFCI)

Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to f

[LV] VPValues for memory operation pointers (NFCI)

Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to feed these
recipes with pointers based on other ingredients or none at all.
This patch modifies these recipes to use a VPValue for the pointer instead, in
order to reduce ingredient def-use usage by ILV as a step towards full
VPlan-based def-use relations. The recipes are constructed with VPValues bound
to these ingredients, maintaining current behavior.

Differential revision: https://reviews.llvm.org/D70865

show more ...


# 948e7452 09-Dec-2019 Evgeniy Brevnov <evgueni.brevnov@gmail.com>

[LV][NFC] Keep dominator tree up to date during vectorization.


# d62bf161 28-Dec-2019 Gil Rapaport <gil.rapaport@intel.com>

[LV] Use getMask() when printing recipe [NFCI]

Use dedicated API for getting the mask instead of duplicating it.

Differential Revision: https://reviews.llvm.org/D71964


1...<<1112131415