xref: /llvm-project/llvm/docs/VectorizationPlan.rst (revision 309a881dccb82bf1f101cf138bee3e7400968ed8)
1d32e68aeSFlorian Hahn==================
2d32e68aeSFlorian HahnVectorization Plan
3d32e68aeSFlorian Hahn==================
4d32e68aeSFlorian Hahn
5d32e68aeSFlorian Hahn.. contents::
6d32e68aeSFlorian Hahn   :local:
7d32e68aeSFlorian Hahn
8d32e68aeSFlorian HahnAbstract
9d32e68aeSFlorian Hahn========
10d32e68aeSFlorian HahnThe vectorization transformation can be rather complicated, involving several
11d32e68aeSFlorian Hahnpotential alternatives, especially for outer-loops [1]_ but also possibly for
12d32e68aeSFlorian Hahninnermost loops. These alternatives may have significant performance impact,
13d32e68aeSFlorian Hahnboth positive and negative. A cost model is therefore employed to identify the
14d32e68aeSFlorian Hahnbest alternative, including the alternative of avoiding any transformation
15d32e68aeSFlorian Hahnaltogether.
16d32e68aeSFlorian Hahn
17d32e68aeSFlorian HahnThe Vectorization Plan is an explicit model for describing vectorization
18d32e68aeSFlorian Hahncandidates. It serves for both optimizing candidates including estimating their
19d32e68aeSFlorian Hahncost reliably, and for performing their final translation into IR. This
20d32e68aeSFlorian Hahnfacilitates dealing with multiple vectorization candidates.
21d32e68aeSFlorian Hahn
2299de3a68SFlorian HahnCurrent Status
2399de3a68SFlorian Hahn==============
2499de3a68SFlorian HahnVPlan is currently used to drive code-generation in LoopVectorize. VPlans are
2599de3a68SFlorian Hahnconstructed after all cost-based and most legality-related decisions have been
2699de3a68SFlorian Hahntaken. As def-use chains between recipes are now fully modeled in VPlan,
2799de3a68SFlorian HahnVPlan-based analyses and transformations are used to simplify and modularize
2899de3a68SFlorian Hahnthe vectorization process [10]_. Those include transformations to
2999de3a68SFlorian Hahn
3099de3a68SFlorian Hahn1. Legalize the initial VPlan, e.g. by introducing specialized recipes for
3199de3a68SFlorian Hahn   reductions and interleave groups.
3299de3a68SFlorian Hahn
3399de3a68SFlorian Hahn2. Optimize the legalized VPlan, e.g. by removing redundant recipes or
3499de3a68SFlorian Hahn   introducing active-lane-masks.
3599de3a68SFlorian Hahn
3699de3a68SFlorian Hahn3. Apply unroll- and vectorization-factor specific optimizations, e.g. removing
3799de3a68SFlorian Hahn   the backedge to reiterate the vector loop based on VF & UF.
3899de3a68SFlorian Hahn
3999de3a68SFlorian HahnRefer to :numref:`fig-vplan-transform-pipeline` for an overview of the current
4099de3a68SFlorian Hahntransformation pipeline.
4199de3a68SFlorian Hahn
4299de3a68SFlorian HahnNote that some legality checks are already done in VPlan, including checking if
4399de3a68SFlorian Hahnall users of a fixed-order recurrence can be re-ordered. This is implemented as
4499de3a68SFlorian Hahna VPlan-to-VPlan transformation that either applies a valid re-ordering or
4599de3a68SFlorian Hahnbails out marking the VPlan as invalid.
4699de3a68SFlorian Hahn
4799de3a68SFlorian Hahn.. _fig-vplan-transform-pipeline:
4899de3a68SFlorian Hahn.. figure:: ./vplan-transform-pipeline.png
4999de3a68SFlorian Hahn   :width: 800 px
5099de3a68SFlorian Hahn
5199de3a68SFlorian Hahn   VPlan Transformation Pipeline in 2024
5299de3a68SFlorian Hahn
5399de3a68SFlorian Hahn
5499de3a68SFlorian HahnVPlan currently models the complete vector loop, as well as additional parts of
5599de3a68SFlorian Hahnthe vectorization skeleton. Refer to :numref:`fig-vplan-scope` for an overview
5699de3a68SFlorian Hahnof the scope covered by VPlan.
5799de3a68SFlorian Hahn
5899de3a68SFlorian Hahn.. _fig-vplan-scope:
5999de3a68SFlorian Hahn.. figure:: ./vplan-scope.png
6099de3a68SFlorian Hahn   :width: 800 px
6199de3a68SFlorian Hahn
6299de3a68SFlorian Hahn   Scope modeled in VPlan in 2024
6399de3a68SFlorian Hahn
6499de3a68SFlorian Hahn
65d32e68aeSFlorian HahnHigh-level Design
66d32e68aeSFlorian Hahn=================
67d32e68aeSFlorian Hahn
68d32e68aeSFlorian HahnVectorization Workflow
69d32e68aeSFlorian Hahn----------------------
70d32e68aeSFlorian HahnVPlan-based vectorization involves three major steps, taking a "scenario-based
71d32e68aeSFlorian Hahnapproach" to vectorization planning:
72d32e68aeSFlorian Hahn
73d32e68aeSFlorian Hahn1. Legal Step: check if a loop can be legally vectorized; encode constraints and
74d32e68aeSFlorian Hahn   artifacts if so.
75d32e68aeSFlorian Hahn2. Plan Step:
76d32e68aeSFlorian Hahn
77d32e68aeSFlorian Hahn   a. Build initial VPlans following the constraints and decisions taken by
78d32e68aeSFlorian Hahn      Legal Step 1, and compute their cost.
79d32e68aeSFlorian Hahn   b. Apply optimizations to the VPlans, possibly forking additional VPlans.
80d32e68aeSFlorian Hahn      Prune sub-optimal VPlans having relatively high cost.
81d32e68aeSFlorian Hahn3. Execute Step: materialize the best VPlan. Note that this is the only step
82d32e68aeSFlorian Hahn   that modifies the IR.
83d32e68aeSFlorian Hahn
84d32e68aeSFlorian HahnDesign Guidelines
85d32e68aeSFlorian Hahn-----------------
86d32e68aeSFlorian HahnIn what follows, the term "input IR" refers to code that is fed into the
87d32e68aeSFlorian Hahnvectorizer whereas the term "output IR" refers to code that is generated by the
88d32e68aeSFlorian Hahnvectorizer. The output IR contains code that has been vectorized or "widened"
89d32e68aeSFlorian Hahnaccording to a loop Vectorization Factor (VF), and/or loop unroll-and-jammed
90d32e68aeSFlorian Hahnaccording to an Unroll Factor (UF).
91d32e68aeSFlorian HahnThe design of VPlan follows several high-level guidelines:
92d32e68aeSFlorian Hahn
93d32e68aeSFlorian Hahn1. Analysis-like: building and manipulating VPlans must not modify the input IR.
94d32e68aeSFlorian Hahn   In particular, if the best option is not to vectorize at all, the
95d32e68aeSFlorian Hahn   vectorization process terminates before reaching Step 3, and compilation
96d32e68aeSFlorian Hahn   should proceed as if VPlans had not been built.
97d32e68aeSFlorian Hahn
98d32e68aeSFlorian Hahn2. Align Cost & Execute: each VPlan must support both estimating the cost and
99d32e68aeSFlorian Hahn   generating the output IR code, such that the cost estimation evaluates the
100d32e68aeSFlorian Hahn   to-be-generated code reliably.
101d32e68aeSFlorian Hahn
102d32e68aeSFlorian Hahn3. Support vectorizing additional constructs:
103d32e68aeSFlorian Hahn
104d32e68aeSFlorian Hahn   a. Outer-loop vectorization. In particular, VPlan must be able to model the
105d32e68aeSFlorian Hahn      control-flow of the output IR which may include multiple basic-blocks and
106d32e68aeSFlorian Hahn      nested loops.
107d32e68aeSFlorian Hahn   b. SLP vectorization.
108d32e68aeSFlorian Hahn   c. Combinations of the above, including nested vectorization: vectorizing
109d32e68aeSFlorian Hahn      both an inner loop and an outer-loop at the same time (each with its own
110d32e68aeSFlorian Hahn      VF and UF), mixed vectorization: vectorizing a loop with SLP patterns
111d32e68aeSFlorian Hahn      inside [4]_, (re)vectorizing input IR containing vector code.
112d32e68aeSFlorian Hahn   d. Function vectorization [2]_.
113d32e68aeSFlorian Hahn
114d32e68aeSFlorian Hahn4. Support multiple candidates efficiently. In particular, similar candidates
115d32e68aeSFlorian Hahn   related to a range of possible VF's and UF's must be represented efficiently.
116d32e68aeSFlorian Hahn   Potential versioning needs to be supported efficiently.
117d32e68aeSFlorian Hahn
118d32e68aeSFlorian Hahn5. Support vectorizing idioms, such as interleaved groups of strided loads or
119d32e68aeSFlorian Hahn   stores. This is achieved by modeling a sequence of output instructions using
120d32e68aeSFlorian Hahn   a "Recipe", which is responsible for computing its cost and generating its
121d32e68aeSFlorian Hahn   code.
122d32e68aeSFlorian Hahn
123d32e68aeSFlorian Hahn6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization
124d32e68aeSFlorian Hahn   such regions may need to be, for example, predicated and linearized, or
125d32e68aeSFlorian Hahn   replicated VF*UF times to handle scalarized and predicated instructions.
126d32e68aeSFlorian Hahn   Innerloops are also modelled as SESE regions.
127d32e68aeSFlorian Hahn
128d32e68aeSFlorian Hahn7. Support instruction-level analysis and transformation, as part of Planning
129d32e68aeSFlorian Hahn   Step 2.b: During vectorization instructions may need to be traversed, moved,
130d32e68aeSFlorian Hahn   replaced by other instructions or be created. For example, vector idiom
131d32e68aeSFlorian Hahn   detection and formation involves searching for and optimizing instruction
132d32e68aeSFlorian Hahn   patterns.
133d32e68aeSFlorian Hahn
134d32e68aeSFlorian HahnDefinitions
135d32e68aeSFlorian Hahn===========
136d32e68aeSFlorian HahnThe low-level design of VPlan comprises of the following classes.
137d32e68aeSFlorian Hahn
138d32e68aeSFlorian Hahn:LoopVectorizationPlanner:
139d32e68aeSFlorian Hahn  A LoopVectorizationPlanner is designed to handle the vectorization of a loop
140d32e68aeSFlorian Hahn  or a loop nest. It can construct, optimize and discard one or more VPlans,
141d32e68aeSFlorian Hahn  each VPlan modelling a distinct way to vectorize the loop or the loop nest.
142d32e68aeSFlorian Hahn  Once the best VPlan is determined, including the best VF and UF, this VPlan
143d32e68aeSFlorian Hahn  drives the generation of output IR.
144d32e68aeSFlorian Hahn
145d32e68aeSFlorian Hahn:VPlan:
146d32e68aeSFlorian Hahn  A model of a vectorized candidate for a given input IR loop or loop nest. This
147d32e68aeSFlorian Hahn  candidate is represented using a Hierarchical CFG. VPlan supports estimating
148d32e68aeSFlorian Hahn  the cost and driving the generation of the output IR code it represents.
149d32e68aeSFlorian Hahn
150d32e68aeSFlorian Hahn:Hierarchical CFG:
151d32e68aeSFlorian Hahn  A control-flow graph whose nodes are basic-blocks or Hierarchical CFG's. The
152d32e68aeSFlorian Hahn  Hierarchical CFG data structure is similar to the Tile Tree [5]_, where
153d32e68aeSFlorian Hahn  cross-Tile edges are lifted to connect Tiles instead of the original
154d32e68aeSFlorian Hahn  basic-blocks as in Sharir [6]_, promoting the Tile encapsulation. The terms
155d32e68aeSFlorian Hahn  Region and Block are used rather than Tile [5]_ to avoid confusion with loop
156d32e68aeSFlorian Hahn  tiling.
157d32e68aeSFlorian Hahn
158d32e68aeSFlorian Hahn:VPBlockBase:
159d32e68aeSFlorian Hahn  The building block of the Hierarchical CFG. A pure-virtual base-class of
160d32e68aeSFlorian Hahn  VPBasicBlock and VPRegionBlock, see below. VPBlockBase models the hierarchical
161d32e68aeSFlorian Hahn  control-flow relations with other VPBlocks. Note that in contrast to the IR
162d32e68aeSFlorian Hahn  BasicBlock, a VPBlockBase models its control-flow successors and predecessors
163d32e68aeSFlorian Hahn  directly, rather than through a Terminator branch or through predecessor
164d32e68aeSFlorian Hahn  branches that "use" the VPBlockBase.
165d32e68aeSFlorian Hahn
166d32e68aeSFlorian Hahn:VPBasicBlock:
167d32e68aeSFlorian Hahn  VPBasicBlock is a subclass of VPBlockBase, and serves as the leaves of the
168d32e68aeSFlorian Hahn  Hierarchical CFG. It represents a sequence of output IR instructions that will
169d32e68aeSFlorian Hahn  appear consecutively in an output IR basic-block. The instructions of this
170d32e68aeSFlorian Hahn  basic-block originate from one or more VPBasicBlocks. VPBasicBlock holds a
171d32e68aeSFlorian Hahn  sequence of zero or more VPRecipes that model the cost and generation of the
172d32e68aeSFlorian Hahn  output IR instructions.
173d32e68aeSFlorian Hahn
174d32e68aeSFlorian Hahn:VPRegionBlock:
175d32e68aeSFlorian Hahn  VPRegionBlock is a subclass of VPBlockBase. It models a collection of
176d32e68aeSFlorian Hahn  VPBasicBlocks and VPRegionBlocks which form a SESE subgraph of the output IR
177d32e68aeSFlorian Hahn  CFG. A VPRegionBlock may indicate that its contents are to be replicated a
178d32e68aeSFlorian Hahn  constant number of times when output IR is generated, effectively representing
179d32e68aeSFlorian Hahn  a loop with constant trip-count that will be completely unrolled. This is used
180d32e68aeSFlorian Hahn  to support scalarized and predicated instructions with a single model for
181d32e68aeSFlorian Hahn  multiple candidate VF's and UF's.
182d32e68aeSFlorian Hahn
183d32e68aeSFlorian Hahn:VPRecipeBase:
184d32e68aeSFlorian Hahn  A pure-virtual base class modeling a sequence of one or more output IR
185d32e68aeSFlorian Hahn  instructions, possibly based on one or more input IR instructions. These
186d32e68aeSFlorian Hahn  input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe
187d32e68aeSFlorian Hahn  may specify how its ingredients are to be transformed to produce the output IR
188d32e68aeSFlorian Hahn  instructions; e.g., cloned once, replicated multiple times or widened
189d32e68aeSFlorian Hahn  according to selected VF.
190d32e68aeSFlorian Hahn
191d32e68aeSFlorian Hahn:VPValue:
192d32e68aeSFlorian Hahn  The base of VPlan's def-use relations class hierarchy. When instantiated, it
193d32e68aeSFlorian Hahn  models a constant or a live-in Value in VPlan. It has users, which are of type
194d32e68aeSFlorian Hahn  VPUser, but no operands.
195d32e68aeSFlorian Hahn
196d32e68aeSFlorian Hahn:VPUser:
197d32e68aeSFlorian Hahn  A VPUser represents an entity that uses a number of VPValues as operands.
198d32e68aeSFlorian Hahn  VPUser is similar in some aspects to LLVM's User class.
199d32e68aeSFlorian Hahn
200d32e68aeSFlorian Hahn:VPDef:
201d32e68aeSFlorian Hahn  A VPDef represents an entity that defines zero, one or multiple VPValues.
202d32e68aeSFlorian Hahn  It is used to model the fact that recipes in VPlan can define multiple
203d32e68aeSFlorian Hahn  VPValues.
204d32e68aeSFlorian Hahn
205d32e68aeSFlorian Hahn:VPInstruction:
20699de3a68SFlorian Hahn  A VPInstruction is a recipe characterized by a single opcode and optional
20799de3a68SFlorian Hahn  flags, free of ingredients or other meta-data. VPInstructions also extend
20899de3a68SFlorian Hahn  LLVM IR's opcodes with idiomatic operations that enrich the Vectorizer's
20999de3a68SFlorian Hahn  semantics.
210d32e68aeSFlorian Hahn
211d32e68aeSFlorian Hahn:VPTransformState:
212d32e68aeSFlorian Hahn  Stores information used for generating output IR, passed from
213d32e68aeSFlorian Hahn  LoopVectorizationPlanner to its selected VPlan for execution, and used to pass
214d32e68aeSFlorian Hahn  additional information down to VPBlocks and VPRecipes.
215d32e68aeSFlorian Hahn
216d32e68aeSFlorian HahnThe Planning Process and VPlan Roadmap
217d32e68aeSFlorian Hahn======================================
218d32e68aeSFlorian Hahn
219d32e68aeSFlorian HahnTransforming the Loop Vectorizer to use VPlan follows a staged approach. First,
22099de3a68SFlorian HahnVPlan was only used to record the final vectorization decisions, and to execute
22199de3a68SFlorian Hahnthem: the Hierarchical CFG models the planned control-flow, and Recipes capture
22299de3a68SFlorian Hahndecisions taken inside basic-blocks. Currently, VPlan is used also as the basis
223d32e68aeSFlorian Hahnfor taking these decisions, effectively turning them into a series of
224d32e68aeSFlorian HahnVPlan-to-VPlan algorithms. Finally, VPlan will support the planning process
225d32e68aeSFlorian Hahnitself including cost-based analyses for making these decisions, to fully
226d32e68aeSFlorian Hahnsupport compositional and iterative decision making.
227d32e68aeSFlorian Hahn
228d32e68aeSFlorian HahnSome decisions are local to an instruction in the loop, such as whether to widen
229d32e68aeSFlorian Hahnit into a vector instruction or replicate it, keeping the generated instructions
230d32e68aeSFlorian Hahnin place. Other decisions, however, involve moving instructions, replacing them
231d32e68aeSFlorian Hahnwith other instructions, and/or introducing new instructions. For example, a
232d32e68aeSFlorian Hahncast may sink past a later instruction and be widened to handle first-order
233d32e68aeSFlorian Hahnrecurrence; an interleave group of strided gathers or scatters may effectively
234d32e68aeSFlorian Hahnmove to one place where they are replaced with shuffles and a common wide vector
235d32e68aeSFlorian Hahnload or store; new instructions may be introduced to compute masks, shuffle the
236d32e68aeSFlorian Hahnelements of vectors, and pack scalar values into vectors or vice-versa.
237d32e68aeSFlorian Hahn
238d32e68aeSFlorian HahnIn order for VPlan to support making instruction-level decisions and analyses,
239d32e68aeSFlorian Hahnit needs to model the relevant instructions along with their def/use relations.
240d32e68aeSFlorian HahnThis too follows a staged approach: first, the new instructions that compute
241d32e68aeSFlorian Hahnmasks are modeled as VPInstructions, along with their induced def/use subgraph.
242d32e68aeSFlorian HahnThis effectively models masks in VPlan, facilitating VPlan-based predication.
243d32e68aeSFlorian HahnNext, the logic embedded within each Recipe for generating its instructions at
244d32e68aeSFlorian HahnVPlan execution time, will instead take part in the planning process by modeling
245d32e68aeSFlorian Hahnthem as VPInstructions. Finally, only logic that applies to instructions as a
246d32e68aeSFlorian Hahngroup will remain in Recipes, such as interleave groups and potentially other
247d32e68aeSFlorian Hahnidiom groups having synergistic cost.
248d32e68aeSFlorian Hahn
249d32e68aeSFlorian HahnRelated LLVM components
250d32e68aeSFlorian Hahn-----------------------
251d32e68aeSFlorian Hahn1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP
252d32e68aeSFlorian Hahn   tree, where TSLP [3]_ adds Plan Step 2.b.
253d32e68aeSFlorian Hahn
254d32e68aeSFlorian Hahn2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by
255d32e68aeSFlorian Hahn   Polly [7]_.
256d32e68aeSFlorian Hahn
257d32e68aeSFlorian Hahn3. Loop Vectorizer: the Vectorization Plan aims to upgrade the infrastructure of
258d32e68aeSFlorian Hahn   the Loop Vectorizer and extend it to handle outer loops [8]_, [9]_.
259d32e68aeSFlorian Hahn
260d32e68aeSFlorian HahnReferences
261d32e68aeSFlorian Hahn----------
262d32e68aeSFlorian Hahn.. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit
263d32e68aeSFlorian Hahn    Nuzman and Ayal Zaks, PACT 2008.
264d32e68aeSFlorian Hahn
265d32e68aeSFlorian Hahn.. [2] "Proposal for function vectorization and loop vectorization with function
266d32e68aeSFlorian Hahn    calls", Xinmin Tian, [`cfe-dev
267d32e68aeSFlorian Hahn    <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>`_].,
268d32e68aeSFlorian Hahn    March 2, 2016.
269d32e68aeSFlorian Hahn    See also `review <https://reviews.llvm.org/D22792>`_.
270d32e68aeSFlorian Hahn
271d32e68aeSFlorian Hahn.. [3] "Throttling Automatic Vectorization: When Less is More", Vasileios
272d32e68aeSFlorian Hahn    Porpodas and Tim Jones, PACT 2015 and LLVM Developers' Meeting 2015.
273d32e68aeSFlorian Hahn
274d32e68aeSFlorian Hahn.. [4] "Exploiting mixed SIMD parallelism by reducing data reorganization
275d32e68aeSFlorian Hahn    overhead", Hao Zhou and Jingling Xue, CGO 2016.
276d32e68aeSFlorian Hahn
277d32e68aeSFlorian Hahn.. [5] "Register Allocation via Hierarchical Graph Coloring", David Callahan and
278d32e68aeSFlorian Hahn    Brian Koblenz, PLDI 1991
279d32e68aeSFlorian Hahn
280d32e68aeSFlorian Hahn.. [6] "Structural analysis: A new approach to flow analysis in optimizing
281d32e68aeSFlorian Hahn    compilers", M. Sharir, Journal of Computer Languages, Jan. 1980
282d32e68aeSFlorian Hahn
283d32e68aeSFlorian Hahn.. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma
284d32e68aeSFlorian Hahn    thesis, 2011.
285d32e68aeSFlorian Hahn
286d32e68aeSFlorian Hahn.. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,
287d32e68aeSFlorian Hahn    European LLVM Developers' Meeting 2017.
288d32e68aeSFlorian Hahn
289d32e68aeSFlorian Hahn.. [9] "Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop
290d32e68aeSFlorian Hahn    Auto-Vectorization", Intel Vectorizer Team, LLVM Developers' Meeting 2016.
29199de3a68SFlorian Hahn
292*309a881dSFlorian Hahn.. [10] "VPlan: Status Update and Future Roadmap", Ayal Zaks and Florian Hahn,
293*309a881dSFlorian Hahn    LLVM Developers' Meeting 2023, https://www.youtube.com/watch?v=SzGP4PgMuLE
294