#
bad7d6b3 |
| 24-Aug-2020 |
Francesco Petrogalli <francesco.petrogalli@arm.com> |
Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]"
Reverting because the commit message doesn't reflect the one agreed on phabricator at https://reviews.llvm.org/D85794.
This r
Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]"
Reverting because the commit message doesn't reflect the one agreed on phabricator at https://reviews.llvm.org/D85794.
This reverts commit c8d2b065b98fa91139cc7bb1fd1407f032ef252e.
show more ...
|
#
c8d2b065 |
| 07-Aug-2020 |
Francesco Petrogalli <francesco.petrogalli@arm.com> |
[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` a
[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`. * Add `<` operator to `ElementCount` to be able to use `llvm::SmallSetVector<ElementCount, ...>`. * Bits and pieces around printing the ElementCount to string streams. * Added a static method to `ElementCount` to represent a scalar.
To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places:
1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line.
Differential Revision: https://reviews.llvm.org/D85794
show more ...
|
#
745bf6cf |
| 06-Aug-2020 |
David Green <david.green@arm.com> |
[LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them toget
[LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not.
In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch).
Differential Revision: https://reviews.llvm.org/D75069
show more ...
|
#
3c39db0c |
| 05-Aug-2020 |
Jordan Rupprecht <rupprecht@google.com> |
Revert "[LoopVectorizer] Inloop vector reductions"
This reverts commit e9761688e41cb979a1fa6a79eb18145a75104933. It breaks the build:
``` ~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:
Revert "[LoopVectorizer] Inloop vector reductions"
This reverts commit e9761688e41cb979a1fa6a79eb18145a75104933. It breaks the build:
``` ~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>' return ReductionOperations; ```
show more ...
|
#
e9761688 |
| 05-Aug-2020 |
David Green <david.green@arm.com> |
[LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them toget
[LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not.
In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch).
Differential Revision: https://reviews.llvm.org/D75069
show more ...
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
#
c1034d04 |
| 17-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
Follow up of rGe345d547a0d5, and attempt to pacify buildbot:
"error: 'get' is deprecated: The base class version of get with the scalable argument defaulted to false is deprecated."
Changed VectorT
Follow up of rGe345d547a0d5, and attempt to pacify buildbot:
"error: 'get' is deprecated: The base class version of get with the scalable argument defaulted to false is deprecated."
Changed VectorType::get() -> FixedVectorType::get().
show more ...
|
#
e345d547 |
| 17-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops"
Fixed ARM regression test.
Please see the original commit message rG47650451738c for details.
|
#
d4e183f6 |
| 17-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops"
This reverts commit 47650451738c821993c763356854b560a0f9f550 while I investigate the build bot failures.
|
#
47650451 |
| 10-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[LV] Emit @llvm.get.active.mask for tail-folded loops
This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised loops if the intrinsic is supported by the backend, which is checke
[LV] Emit @llvm.get.active.mask for tail-folded loops
This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised loops if the intrinsic is supported by the backend, which is checked by querying TargetTransform hook emitGetActiveLaneMask.
This intrinsic creates a mask representing active and inactive vector lanes, which is used by the masked load/store instructions that are created for tail-folded loops. The semantics of @llvm.get.active.mask are described here in LangRef:
https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics
This intrinsic is also used to provide a hint to the backend. That is, the second argument of the intrinsic represents the back-edge taken count of the loop. For MVE, for example, we use that to set up tail-predication, which is a new form of predication in MVE for vector loops that implicitely predicates the last vector loop iteration by implicitely setting active/inactive lanes, i.e. the tail loop is predicated. In order to set up a tail-predicated vector loop, we need to know the number of data elements processed by the vector loop, which corresponds the the tripcount of the scalar loop, which we can now reconstruct using @llvm.get.active.mask.
Differential Revision: https://reviews.llvm.org/D79100
show more ...
|
#
13bf6039 |
| 22-May-2020 |
Anh Tuyen Tran <anhtuyen@ca.ibm.com> |
Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1
Summary: When handling loops whose VF is 1, fold-tail vectorization sets the backedge taken count of the original loop with
Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1
Summary: When handling loops whose VF is 1, fold-tail vectorization sets the backedge taken count of the original loop with a vector of a single element. This causes type-mismatch during instruction generartion.
The purpose of this patch is toto address the case of VF==1.
Reviewer: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn), gilr (Gil Rapaport), rengolin (Renato Golin)
Reviewed By: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn)
Subscribers: Ayal (Ayal Zaks), rkruppe (Hanna Kruppe), bmahjour (Bardia Mahjour), rogfer01 (Roger Ferrer Ibanez), vkmr (Vineet Kumar), bollu (Siddharth Bhat), hiraditya (Aditya Kumar), llvm-commits (Mailing List llvm-commits)
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D79976
show more ...
|
Revision tags: llvmorg-10.0.1-rc1 |
|
#
4c8285c7 |
| 14-May-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Move emission of \\l\"+\n to dumpBasicBlock (NFC).
The patch standardizes printing of VPRecipes a bit, by hoisting out the common emission of \\l\"+\n. It simplifies the code and is also a f
[VPlan] Move emission of \\l\"+\n to dumpBasicBlock (NFC).
The patch standardizes printing of VPRecipes a bit, by hoisting out the common emission of \\l\"+\n. It simplifies the code and is also a first step towards untangling printing from DOT format output, with the goal of making the DOT output optional and to provide a more concise debug output if DOT output is disabled.
Reviewers: gilr, Ayal, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D78883
show more ...
|
#
a3c964a2 |
| 25-Apr-2020 |
Ayal Zaks <ayal.zaks@intel.com> |
[LV] Fix recording of BranchTakenCount for FoldTail
When folding tail, branch taken count is computed during initial VPlan execution and recorded to be used by the compare computing the loop's mask.
[LV] Fix recording of BranchTakenCount for FoldTail
When folding tail, branch taken count is computed during initial VPlan execution and recorded to be used by the compare computing the loop's mask. This recording should directly set the State, instead of reusing Value2VPValue mapping which serves original Values present prior to vectorization. The branch taken count may be a constant Value, which may be used elsewhere in the loop; trying to employ Value2VPValue for both leads to the issue reported in https://reviews.llvm.org/D76992#inline-721028
Differential Revision: https://reviews.llvm.org/D78847
show more ...
|
#
18138e02 |
| 13-Apr-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Introduce VPWidenSelectRecipe (NFC).
Widening a selects depends on whether the condition is loop invariant or not. Rather than checking during codegen-time, the information can be recorded a
[VPlan] Introduce VPWidenSelectRecipe (NFC).
Widening a selects depends on whether the condition is loop invariant or not. Rather than checking during codegen-time, the information can be recorded at the VPlan construction time.
This was suggested as part of D76992, to reduce the reliance on accessing the original underlying IR values.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77869
show more ...
|
#
e2a18678 |
| 06-Apr-2020 |
Gil Rapaport <gil.rapaport@intel.com> |
[LV] Add VPValue operands to VPBlendRecipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, li
[LV] Add VPValue operands to VPBlendRecipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit introduces VPValues for VPBlendRecipe to use as the values to blend. The recipe is generated with VPValues wrapping the phi's incoming values of the scalar phi. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations.
Differential Revision: https://reviews.llvm.org/D77539
show more ...
|
#
16784892 |
| 06-Apr-2020 |
Ayal Zaks <ayal.zaks@intel.com> |
[LV] FoldTail w/o Primary Induction
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector induction for use in fold-tail-with-masking, if a primary induction is absent.
The canoni
[LV] FoldTail w/o Primary Induction
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector induction for use in fold-tail-with-masking, if a primary induction is absent.
The canonical scalar IV having start = 0 and step = VF*UF, created during code -gen to control the vector loop, is widened into a canonical vector IV having start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and step = <VF*UF, VF*UF, ..., VF*UF>.
Differential Revision: https://reviews.llvm.org/D77635
show more ...
|
#
90be3c24 |
| 06-Apr-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Introduce new VPWidenCallRecipe (NFC).
This patch moves calls to their own recipe, to simplify the transition to VPUser for operands of VPWidenRecipe, as discussed in D76992.
Subsequently a
[VPlan] Introduce new VPWidenCallRecipe (NFC).
This patch moves calls to their own recipe, to simplify the transition to VPUser for operands of VPWidenRecipe, as discussed in D76992.
Subsequently additional information can be added to the recipe rather than computing it during the execute step.
Reviewers: rengolin, Ayal, gilr, hsaito
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77467
show more ...
|
#
49d00824 |
| 29-Mar-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).
This patch changes VPWidenRecipe to only store a single original IR instruction. This is the first required step towards modeling it
[VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).
This patch changes VPWidenRecipe to only store a single original IR instruction. This is the first required step towards modeling it's operands as VPValues and also towards breaking it up into a VPInstruction.
Discussed as part of D74695.
Reviewers: Ayal, gilr, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D76988
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6 |
|
#
32851f8d |
| 21-Mar-2020 |
Guillaume Chatelet <gchatelet@google.com> |
[Alignment][NFC] Deprecate VectorUtils::getAlignment
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/
[Alignment][NFC] Deprecate VectorUtils::getAlignment
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, rogfer01, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76542
show more ...
|
Revision tags: llvmorg-10.0.0-rc5 |
|
#
fd2c15e6 |
| 18-Mar-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Do not print mapping for Value2VPValue.
The latest improvements to VPValue printing make this mapping clear when printing the operand. Printing the mapping separately is not required any lon
[VPlan] Do not print mapping for Value2VPValue.
The latest improvements to VPValue printing make this mapping clear when printing the operand. Printing the mapping separately is not required any longer.
Reviewers: rengolin, hsaito, Ayal, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D76375
show more ...
|
#
e6a74803 |
| 18-Mar-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Use underlying value for printing, if available.
When the an underlying value is available, we can use its name for printing, as discussed in D73078.
Reviewers: rengolin, hsaito, Ayal, gilr
[VPlan] Use underlying value for printing, if available.
When the an underlying value is available, we can use its name for printing, as discussed in D73078.
Reviewers: rengolin, hsaito, Ayal, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D76200
show more ...
|
Revision tags: llvmorg-10.0.0-rc4 |
|
#
40e7bfc4 |
| 05-Mar-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Use consecutive numbers to print VPValues instead of addresses.
Currently when printing VPValues we use the object address, which makes it hard to distinguish VPValues as they usually are la
[VPlan] Use consecutive numbers to print VPValues instead of addresses.
Currently when printing VPValues we use the object address, which makes it hard to distinguish VPValues as they usually are large numbers with varying distance between them.
This patch adds a simple slot tracker, similar to the ModuleSlotTracker used for IR values. In order to dump a VPValue or anything containing a VPValue, a slot tracker for the enclosing VPlan needs to be created. The existing VPlanPrinter can take care of that for the existing code. We assign consecutive numbers to each VPValue we encounter in a reverse post order traversal of the VPlan.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D73078
show more ...
|
Revision tags: llvmorg-10.0.0-rc3 |
|
#
05afa555 |
| 03-Mar-2020 |
Florian Hahn <flo@fhahn.com> |
[VPlan] Add getPlan() to VPBlockBase.
This patch adds a getPlan accessor to VPBlockBase, which finds the entry block of the plan containing the block and returns the plan set for this block.
VPBloc
[VPlan] Add getPlan() to VPBlockBase.
This patch adds a getPlan accessor to VPBlockBase, which finds the entry block of the plan containing the block and returns the plan set for this block.
VPBlockBase contains a VPlan pointer, but it should only be set for the entry block of a plan. This allows moving blocks without updating the pointer for each moved block and in the future we might introduce a parent relationship between plans and blocks, similar to the one in LLVM IR.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D74445
show more ...
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2 |
|
#
8647a72c |
| 28-Nov-2019 |
Gil Rapaport <gil.rapaport@intel.com> |
[LV] VPValues for memory operation pointers (NFCI)
Memory instruction widening recipes use the pointer operand of their load/store ingredient for generating the needed GEPs, making it difficult to f
[LV] VPValues for memory operation pointers (NFCI)
Memory instruction widening recipes use the pointer operand of their load/store ingredient for generating the needed GEPs, making it difficult to feed these recipes with pointers based on other ingredients or none at all. This patch modifies these recipes to use a VPValue for the pointer instead, in order to reduce ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. The recipes are constructed with VPValues bound to these ingredients, maintaining current behavior.
Differential revision: https://reviews.llvm.org/D70865
show more ...
|
#
948e7452 |
| 09-Dec-2019 |
Evgeniy Brevnov <evgueni.brevnov@gmail.com> |
[LV][NFC] Keep dominator tree up to date during vectorization.
|
#
d62bf161 |
| 28-Dec-2019 |
Gil Rapaport <gil.rapaport@intel.com> |
[LV] Use getMask() when printing recipe [NFCI]
Use dedicated API for getting the mask instead of duplicating it.
Differential Revision: https://reviews.llvm.org/D71964
|