xref: /openbsd-src/gnu/gcc/gcc/doc/loop.texi (revision 404b540a9034ac75a6199ad1a32d1bbc7a0d4210)
1*404b540aSrobert@c Copyright (c) 2006 Free Software Foundation, Inc.
2*404b540aSrobert@c Free Software Foundation, Inc.
3*404b540aSrobert@c This is part of the GCC manual.
4*404b540aSrobert@c For copying conditions, see the file gcc.texi.
5*404b540aSrobert
6*404b540aSrobert@c ---------------------------------------------------------------------
7*404b540aSrobert@c Loop Representation
8*404b540aSrobert@c ---------------------------------------------------------------------
9*404b540aSrobert
10*404b540aSrobert@node Loop Analysis and Representation
11*404b540aSrobert@chapter Analysis and Representation of Loops
12*404b540aSrobert
13*404b540aSrobertGCC provides extensive infrastructure for work with natural loops, i.e.,
14*404b540aSrobertstrongly connected components of CFG with only one entry block.  This
15*404b540aSrobertchapter describes representation of loops in GCC, both on GIMPLE and in
16*404b540aSrobertRTL, as well as the interfaces to loop-related analyses (induction
17*404b540aSrobertvariable analysis and number of iterations analysis).
18*404b540aSrobert
19*404b540aSrobert@menu
20*404b540aSrobert* Loop representation::		Representation and analysis of loops.
21*404b540aSrobert* Loop querying::		Getting information about loops.
22*404b540aSrobert* Loop manipulation::		Loop manipulation functions.
23*404b540aSrobert* LCSSA::			Loop-closed SSA form.
24*404b540aSrobert* Scalar evolutions::   	Induction variables on GIMPLE.
25*404b540aSrobert* loop-iv::			Induction variables on RTL.
26*404b540aSrobert* Number of iterations::	Number of iterations analysis.
27*404b540aSrobert* Dependency analysis::		Data dependency analysis.
28*404b540aSrobert* Lambda::			Linear loop transformations framework.
29*404b540aSrobert@end menu
30*404b540aSrobert
31*404b540aSrobert@node Loop representation
32*404b540aSrobert@section Loop representation
33*404b540aSrobert@cindex Loop representation
34*404b540aSrobert@cindex Loop analysis
35*404b540aSrobert
36*404b540aSrobertThis chapter describes the representation of loops in GCC, and functions
37*404b540aSrobertthat can be used to build, modify and analyze this representation.  Most
38*404b540aSrobertof the interfaces and data structures are declared in @file{cfgloop.h}.
39*404b540aSrobertAt the moment, loop structures are analyzed and this information is
40*404b540aSrobertupdated only by the optimization passes that deal with loops, but some
41*404b540aSrobertefforts are being made to make it available throughout most of the
42*404b540aSrobertoptimization passes.
43*404b540aSrobert
44*404b540aSrobertIn general, a natural loop has one entry block (header) and possibly
45*404b540aSrobertseveral back edges (latches) leading to the header from the inside of
46*404b540aSrobertthe loop.  Loops with several latches may appear if several loops share
47*404b540aSroberta single header, or if there is a branching in the middle of the loop.
48*404b540aSrobertThe representation of loops in GCC however allows only loops with a
49*404b540aSrobertsingle latch.  During loop analysis, headers of such loops are split and
50*404b540aSrobertforwarder blocks are created in order to disambiguate their structures.
51*404b540aSrobertA heuristic based on profile information is used to determine whether
52*404b540aSrobertthe latches correspond to sub-loops or to control flow in a single loop.
53*404b540aSrobertThis means that the analysis sometimes changes the CFG, and if you run
54*404b540aSrobertit in the middle of an optimization pass, you must be able to deal with
55*404b540aSrobertthe new blocks.
56*404b540aSrobert
57*404b540aSrobertBody of the loop is the set of blocks that are dominated by its header,
58*404b540aSrobertand reachable from its latch against the direction of edges in CFG.  The
59*404b540aSrobertloops are organized in a containment hierarchy (tree) such that all the
60*404b540aSrobertloops immediately contained inside loop L are the children of L in the
61*404b540aSroberttree.  This tree is represented by the @code{struct loops} structure.
62*404b540aSrobertThe root of this tree is a fake loop that contains all blocks in the
63*404b540aSrobertfunction.  Each of the loops is represented in a @code{struct loop}
64*404b540aSrobertstructure.  Each loop is assigned an index (@code{num} field of the
65*404b540aSrobert@code{struct loop} structure), and the pointer to the loop is stored in
66*404b540aSrobertthe corresponding field of the @code{parray} field of the loops
67*404b540aSrobertstructure.  Index of a sub-loop is always greater than the index of its
68*404b540aSrobertsuper-loop.  The indices do not have to be continuous, there may be
69*404b540aSrobertempty (@code{NULL}) entries in the @code{parray} created by deleting
70*404b540aSrobertloops.  The index of a loop never changes.  The first unused index is
71*404b540aSrobertstored in the @code{num} field of the loops structure.
72*404b540aSrobert
73*404b540aSrobertEach basic block contains the reference to the innermost loop it belongs
74*404b540aSrobertto (@code{loop_father}).  For this reason, it is only possible to have
75*404b540aSrobertone @code{struct loops} structure initialized at the same time for each
76*404b540aSrobertCFG.  It is recommended to use the global variable @code{current_loops}
77*404b540aSrobertto contain the @code{struct loops} structure, especially if the loop
78*404b540aSrobertstructures are updated throughout several passes.  Many of the loop
79*404b540aSrobertmanipulation functions assume that dominance information is up-to-date.
80*404b540aSrobert
81*404b540aSrobertThe loops are analyzed through @code{loop_optimizer_init} function.  The
82*404b540aSrobertargument of this function is a set of flags represented in an integer
83*404b540aSrobertbitmask.  These flags specify what other properties of the loop
84*404b540aSrobertstructures should be calculated/enforced and preserved later:
85*404b540aSrobert
86*404b540aSrobert@itemize
87*404b540aSrobert@item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
88*404b540aSroberta way that each loop has only one entry edge, and additionally, the
89*404b540aSrobertsource block of this entry edge has only one successor.  This creates a
90*404b540aSrobertnatural place where the code can be moved out of the loop, and ensures
91*404b540aSrobertthat the entry edge of the loop leads from its immediate super-loop.
92*404b540aSrobert@item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
93*404b540aSrobertforce the latch block of each loop to have only one successor.  This
94*404b540aSrobertensures that the latch of the loop does not belong to any of its
95*404b540aSrobertsub-loops, and makes manipulation with the loops significantly easier.
96*404b540aSrobertMost of the loop manipulation functions assume that the loops are in
97*404b540aSrobertthis shape.  Note that with this flag, the ``normal'' loop without any
98*404b540aSrobertcontrol flow inside and with one exit consists of two basic blocks.
99*404b540aSrobert@item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
100*404b540aSrobertedges in the strongly connected components that are not natural loops
101*404b540aSrobert(have more than one entry block) are marked with
102*404b540aSrobert@code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags.  The
103*404b540aSrobertflag is not set for blocks and edges that belong to natural loops that
104*404b540aSrobertare in such an irreducible region (but it is set for the entry and exit
105*404b540aSrobertedges of such a loop, if they lead to/from this region).
106*404b540aSrobert@item @code{LOOPS_HAVE_MARKED_SINGLE_EXITS}: If a loop has exactly one
107*404b540aSrobertexit edge, this edge is stored in @code{single_exit} field of the loop
108*404b540aSrobertstructure.  @code{NULL} is stored there otherwise.
109*404b540aSrobert@end itemize
110*404b540aSrobert
111*404b540aSrobertThese properties may also be computed/enforced later, using functions
112*404b540aSrobert@code{create_preheaders}, @code{force_single_succ_latches},
113*404b540aSrobert@code{mark_irreducible_loops} and @code{mark_single_exit_loops}.
114*404b540aSrobert
115*404b540aSrobertThe memory occupied by the loops structures should be freed with
116*404b540aSrobert@code{loop_optimizer_finalize} function.
117*404b540aSrobert
118*404b540aSrobertThe CFG manipulation functions in general do not update loop structures.
119*404b540aSrobertSpecialized versions that additionally do so are provided for the most
120*404b540aSrobertcommon tasks.  On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
121*404b540aSrobertused to cleanup CFG while updating the loops structures if
122*404b540aSrobert@code{current_loops} is set.
123*404b540aSrobert
124*404b540aSrobert@node Loop querying
125*404b540aSrobert@section Loop querying
126*404b540aSrobert@cindex Loop querying
127*404b540aSrobert
128*404b540aSrobertThe functions to query the information about loops are declared in
129*404b540aSrobert@file{cfgloop.h}.  Some of the information can be taken directly from
130*404b540aSrobertthe structures.  @code{loop_father} field of each basic block contains
131*404b540aSrobertthe innermost loop to that the block belongs.  The most useful fields of
132*404b540aSrobertloop structure (that are kept up-to-date at all times) are:
133*404b540aSrobert
134*404b540aSrobert@itemize
135*404b540aSrobert@item @code{header}, @code{latch}: Header and latch basic blocks of the
136*404b540aSrobertloop.
137*404b540aSrobert@item @code{num_nodes}: Number of basic blocks in the loop (including
138*404b540aSrobertthe basic blocks of the sub-loops).
139*404b540aSrobert@item @code{depth}: The depth of the loop in the loops tree, i.e., the
140*404b540aSrobertnumber of super-loops of the loop.
141*404b540aSrobert@item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
142*404b540aSrobertsub-loop, and the sibling of the loop in the loops tree.
143*404b540aSrobert@item @code{single_exit}: The exit edge of the loop, if the loop has
144*404b540aSrobertexactly one exit and the loops were analyzed with
145*404b540aSrobertLOOPS_HAVE_MARKED_SINGLE_EXITS.
146*404b540aSrobert@end itemize
147*404b540aSrobert
148*404b540aSrobertThere are other fields in the loop structures, many of them used only by
149*404b540aSrobertsome of the passes, or not updated during CFG changes; in general, they
150*404b540aSrobertshould not be accessed directly.
151*404b540aSrobert
152*404b540aSrobertThe most important functions to query loop structures are:
153*404b540aSrobert
154*404b540aSrobert@itemize
155*404b540aSrobert@item @code{flow_loops_dump}: Dumps the information about loops to a
156*404b540aSrobertfile.
157*404b540aSrobert@item @code{verify_loop_structure}: Checks consistency of the loop
158*404b540aSrobertstructures.
159*404b540aSrobert@item @code{loop_latch_edge}: Returns the latch edge of a loop.
160*404b540aSrobert@item @code{loop_preheader_edge}: If loops have preheaders, returns
161*404b540aSrobertthe preheader edge of a loop.
162*404b540aSrobert@item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
163*404b540aSrobertanother loop.
164*404b540aSrobert@item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
165*404b540aSrobertto a loop (including its sub-loops).
166*404b540aSrobert@item @code{find_common_loop}: Finds the common super-loop of two loops.
167*404b540aSrobert@item @code{superloop_at_depth}: Returns the super-loop of a loop with
168*404b540aSrobertthe given depth.
169*404b540aSrobert@item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
170*404b540aSrobertnumber of insns in the loop, on GIMPLE and on RTL.
171*404b540aSrobert@item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
172*404b540aSrobertloop.
173*404b540aSrobert@item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
174*404b540aSrobertwith @code{EDGE_LOOP_EXIT} flag.
175*404b540aSrobert@item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
176*404b540aSrobert@code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
177*404b540aSrobertloop in depth-first search order in reversed CFG, ordered by dominance
178*404b540aSrobertrelation, and breath-first search order, respectively.
179*404b540aSrobert@item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
180*404b540aSrobert@item @code{just_once_each_iteration_p}: Returns true if the basic block
181*404b540aSrobertis executed exactly once during each iteration of a loop (that is, it
182*404b540aSrobertdoes not belong to a sub-loop, and it dominates the latch of the loop).
183*404b540aSrobert@end itemize
184*404b540aSrobert
185*404b540aSrobert@node Loop manipulation
186*404b540aSrobert@section Loop manipulation
187*404b540aSrobert@cindex Loop manipulation
188*404b540aSrobert
189*404b540aSrobertThe loops tree can be manipulated using the following functions:
190*404b540aSrobert
191*404b540aSrobert@itemize
192*404b540aSrobert@item @code{flow_loop_tree_node_add}: Adds a node to the tree.
193*404b540aSrobert@item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
194*404b540aSrobert@item @code{add_bb_to_loop}: Adds a basic block to a loop.
195*404b540aSrobert@item @code{remove_bb_from_loops}: Removes a basic block from loops.
196*404b540aSrobert@end itemize
197*404b540aSrobert
198*404b540aSrobertThe specialized versions of several low-level CFG functions that also
199*404b540aSrobertupdate loop structures are provided:
200*404b540aSrobert
201*404b540aSrobert@itemize
202*404b540aSrobert@item @code{loop_split_edge_with}: Splits an edge, and places a
203*404b540aSrobertspecified RTL code on it.  On GIMPLE, the function can still be used,
204*404b540aSrobertbut the code must be NULL.
205*404b540aSrobert@item @code{bsi_insert_on_edge_immediate_loop}: Inserts code on edge,
206*404b540aSrobertsplitting it if necessary.  Only works on GIMPLE.
207*404b540aSrobert@item @code{remove_path}: Removes an edge and all blocks it dominates.
208*404b540aSrobert@item @code{loop_commit_inserts}: Commits insertions scheduled on edges,
209*404b540aSrobertand sets loops for the new blocks.  This function can only be used on
210*404b540aSrobertGIMPLE.
211*404b540aSrobert@item @code{split_loop_exit_edge}: Splits exit edge of the loop,
212*404b540aSrobertensuring that PHI node arguments remain in the loop (this ensures that
213*404b540aSrobertloop-closed SSA form is preserved).  Only useful on GIMPLE.
214*404b540aSrobert@end itemize
215*404b540aSrobert
216*404b540aSrobertFinally, there are some higher-level loop transformations implemented.
217*404b540aSrobertWhile some of them are written so that they should work on non-innermost
218*404b540aSrobertloops, they are mostly untested in that case, and at the moment, they
219*404b540aSrobertare only reliable for the innermost loops:
220*404b540aSrobert
221*404b540aSrobert@itemize
222*404b540aSrobert@item @code{create_iv}: Creates a new induction variable.  Only works on
223*404b540aSrobertGIMPLE.  @code{standard_iv_increment_position} can be used to find a
224*404b540aSrobertsuitable place for the iv increment.
225*404b540aSrobert@item @code{duplicate_loop_to_header_edge},
226*404b540aSrobert@code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
227*404b540aSroberton GIMPLE) duplicate the body of the loop prescribed number of times on
228*404b540aSrobertone of the edges entering loop header, thus performing either loop
229*404b540aSrobertunrolling or loop peeling.  @code{can_duplicate_loop_p}
230*404b540aSrobert(@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
231*404b540aSrobertloop.
232*404b540aSrobert@item @code{loop_version}, @code{tree_ssa_loop_version}: These function
233*404b540aSrobertcreate a copy of a loop, and a branch before them that selects one of
234*404b540aSrobertthem depending on the prescribed condition.  This is useful for
235*404b540aSrobertoptimizations that need to verify some assumptions in runtime (one of
236*404b540aSrobertthe copies of the loop is usually left unchanged, while the other one is
237*404b540aSroberttransformed in some way).
238*404b540aSrobert@item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
239*404b540aSrobertextra iterations to make the number of iterations divisible by unroll
240*404b540aSrobertfactor, updating the exit condition, and removing the exits that now
241*404b540aSrobertcannot be taken.  Works only on GIMPLE.
242*404b540aSrobert@end itemize
243*404b540aSrobert
244*404b540aSrobert@node LCSSA
245*404b540aSrobert@section Loop-closed SSA form
246*404b540aSrobert@cindex LCSSA
247*404b540aSrobert@cindex Loop-closed SSA form
248*404b540aSrobert
249*404b540aSrobertThroughout the loop optimizations on tree level, one extra condition is
250*404b540aSrobertenforced on the SSA form:  No SSA name is used outside of the loop in
251*404b540aSrobertthat it is defined.  The SSA form satisfying this condition is called
252*404b540aSrobert``loop-closed SSA form'' -- LCSSA.  To enforce LCSSA, PHI nodes must be
253*404b540aSrobertcreated at the exits of the loops for the SSA names that are used
254*404b540aSrobertoutside of them.  Only the real operands (not virtual SSA names) are
255*404b540aSrobertheld in LCSSA, in order to save memory.
256*404b540aSrobert
257*404b540aSrobertThere are various benefits of LCSSA:
258*404b540aSrobert
259*404b540aSrobert@itemize
260*404b540aSrobert@item Many optimizations (value range analysis, final value
261*404b540aSrobertreplacement) are interested in the values that are defined in the loop
262*404b540aSrobertand used outside of it, i.e., exactly those for that we create new PHI
263*404b540aSrobertnodes.
264*404b540aSrobert@item In induction variable analysis, it is not necessary to specify the
265*404b540aSrobertloop in that the analysis should be performed -- the scalar evolution
266*404b540aSrobertanalysis always returns the results with respect to the loop in that the
267*404b540aSrobertSSA name is defined.
268*404b540aSrobert@item It makes updating of SSA form during loop transformations simpler.
269*404b540aSrobertWithout LCSSA, operations like loop unrolling may force creation of PHI
270*404b540aSrobertnodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
271*404b540aSrobertupdated locally.  However, since we only keep real operands in LCSSA, we
272*404b540aSrobertcannot use this advantage (we could have local updating of real
273*404b540aSrobertoperands, but it is not much more efficient than to use generic SSA form
274*404b540aSrobertupdating for it as well; the amount of changes to SSA is the same).
275*404b540aSrobert@end itemize
276*404b540aSrobert
277*404b540aSrobertHowever, it also means LCSSA must be updated.  This is usually
278*404b540aSrobertstraightforward, unless you create a new value in loop and use it
279*404b540aSrobertoutside, or unless you manipulate loop exit edges (functions are
280*404b540aSrobertprovided to make these manipulations simple).
281*404b540aSrobert@code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
282*404b540aSrobertLCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
283*404b540aSrobertLCSSA is preserved.
284*404b540aSrobert
285*404b540aSrobert@node Scalar evolutions
286*404b540aSrobert@section Scalar evolutions
287*404b540aSrobert@cindex Scalar evolutions
288*404b540aSrobert@cindex IV analysis on GIMPLE
289*404b540aSrobert
290*404b540aSrobertScalar evolutions (SCEV) are used to represent results of induction
291*404b540aSrobertvariable analysis on GIMPLE.  They enable us to represent variables with
292*404b540aSrobertcomplicated behavior in a simple and consistent way (we only use it to
293*404b540aSrobertexpress values of polynomial induction variables, but it is possible to
294*404b540aSrobertextend it).  The interfaces to SCEV analysis are declared in
295*404b540aSrobert@file{tree-scalar-evolution.h}.  To use scalar evolutions analysis,
296*404b540aSrobert@code{scev_initialize} must be used.  To stop using SCEV,
297*404b540aSrobert@code{scev_finalize} should be used.  SCEV analysis caches results in
298*404b540aSrobertorder to save time and memory.  This cache however is made invalid by
299*404b540aSrobertmost of the loop transformations, including removal of code.  If such a
300*404b540aSroberttransformation is performed, @code{scev_reset} must be called to clean
301*404b540aSrobertthe caches.
302*404b540aSrobert
303*404b540aSrobertGiven an SSA name, its behavior in loops can be analyzed using the
304*404b540aSrobert@code{analyze_scalar_evolution} function.  The returned SCEV however
305*404b540aSrobertdoes not have to be fully analyzed and it may contain references to
306*404b540aSrobertother SSA names defined in the loop.  To resolve these (potentially
307*404b540aSrobertrecursive) references, @code{instantiate_parameters} or
308*404b540aSrobert@code{resolve_mixers} functions must be used.
309*404b540aSrobert@code{instantiate_parameters} is useful when you use the results of SCEV
310*404b540aSrobertonly for some analysis, and when you work with whole nest of loops at
311*404b540aSrobertonce.  It will try replacing all SSA names by their SCEV in all loops,
312*404b540aSrobertincluding the super-loops of the current loop, thus providing a complete
313*404b540aSrobertinformation about the behavior of the variable in the loop nest.
314*404b540aSrobert@code{resolve_mixers} is useful if you work with only one loop at a
315*404b540aSroberttime, and if you possibly need to create code based on the value of the
316*404b540aSrobertinduction variable.  It will only resolve the SSA names defined in the
317*404b540aSrobertcurrent loop, leaving the SSA names defined outside unchanged, even if
318*404b540aSroberttheir evolution in the outer loops is known.
319*404b540aSrobert
320*404b540aSrobertThe SCEV is a normal tree expression, except for the fact that it may
321*404b540aSrobertcontain several special tree nodes.  One of them is
322*404b540aSrobert@code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
323*404b540aSrobertexpressed.  The other one is @code{POLYNOMIAL_CHREC}.  Polynomial chrec
324*404b540aSroberthas three arguments -- base, step and loop (both base and step may
325*404b540aSrobertcontain further polynomial chrecs).  Type of the expression and of base
326*404b540aSrobertand step must be the same.  A variable has evolution
327*404b540aSrobert@code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
328*404b540aSrobertloop) equivalent to @code{x_1} in the following example
329*404b540aSrobert
330*404b540aSrobert@smallexample
331*404b540aSrobertwhile (...)
332*404b540aSrobert  @{
333*404b540aSrobert    x_1 = phi (base, x_2);
334*404b540aSrobert    x_2 = x_1 + step;
335*404b540aSrobert  @}
336*404b540aSrobert@end smallexample
337*404b540aSrobert
338*404b540aSrobertNote that this includes the language restrictions on the operations.
339*404b540aSrobertFor example, if we compile C code and @code{x} has signed type, then the
340*404b540aSrobertoverflow in addition would cause undefined behavior, and we may assume
341*404b540aSrobertthat this does not happen.  Hence, the value with this SCEV cannot
342*404b540aSrobertoverflow (which restricts the number of iterations of such a loop).
343*404b540aSrobert
344*404b540aSrobertIn many cases, one wants to restrict the attention just to affine
345*404b540aSrobertinduction variables.  In this case, the extra expressive power of SCEV
346*404b540aSrobertis not useful, and may complicate the optimizations.  In this case,
347*404b540aSrobert@code{simple_iv} function may be used to analyze a value -- the result
348*404b540aSrobertis a loop-invariant base and step.
349*404b540aSrobert
350*404b540aSrobert@node loop-iv
351*404b540aSrobert@section IV analysis on RTL
352*404b540aSrobert@cindex IV analysis on RTL
353*404b540aSrobert
354*404b540aSrobertThe induction variable on RTL is simple and only allows analysis of
355*404b540aSrobertaffine induction variables, and only in one loop at once.  The interface
356*404b540aSrobertis declared in @file{cfgloop.h}.  Before analyzing induction variables
357*404b540aSrobertin a loop L, @code{iv_analysis_loop_init} function must be called on L.
358*404b540aSrobertAfter the analysis (possibly calling @code{iv_analysis_loop_init} for
359*404b540aSrobertseveral loops) is finished, @code{iv_analysis_done} should be called.
360*404b540aSrobertThe following functions can be used to access the results of the
361*404b540aSrobertanalysis:
362*404b540aSrobert
363*404b540aSrobert@itemize
364*404b540aSrobert@item @code{iv_analyze}: Analyzes a single register used in the given
365*404b540aSrobertinsn.  If no use of the register in this insn is found, the following
366*404b540aSrobertinsns are scanned, so that this function can be called on the insn
367*404b540aSrobertreturned by get_condition.
368*404b540aSrobert@item @code{iv_analyze_result}: Analyzes result of the assignment in the
369*404b540aSrobertgiven insn.
370*404b540aSrobert@item @code{iv_analyze_expr}: Analyzes a more complicated expression.
371*404b540aSrobertAll its operands are analyzed by @code{iv_analyze}, and hence they must
372*404b540aSrobertbe used in the specified insn or one of the following insns.
373*404b540aSrobert@end itemize
374*404b540aSrobert
375*404b540aSrobertThe description of the induction variable is provided in @code{struct
376*404b540aSrobertrtx_iv}.  In order to handle subregs, the representation is a bit
377*404b540aSrobertcomplicated; if the value of the @code{extend} field is not
378*404b540aSrobert@code{UNKNOWN}, the value of the induction variable in the i-th
379*404b540aSrobertiteration is
380*404b540aSrobert
381*404b540aSrobert@smallexample
382*404b540aSrobertdelta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
383*404b540aSrobert@end smallexample
384*404b540aSrobert
385*404b540aSrobertwith the following exception:  if @code{first_special} is true, then the
386*404b540aSrobertvalue in the first iteration (when @code{i} is zero) is @code{delta +
387*404b540aSrobertmult * base}.  However, if @code{extend} is equal to @code{UNKNOWN},
388*404b540aSrobertthen @code{first_special} must be false, @code{delta} 0, @code{mult} 1
389*404b540aSrobertand the value in the i-th iteration is
390*404b540aSrobert
391*404b540aSrobert@smallexample
392*404b540aSrobertsubreg_@{mode@} (base + i * step)
393*404b540aSrobert@end smallexample
394*404b540aSrobert
395*404b540aSrobertThe function @code{get_iv_value} can be used to perform these
396*404b540aSrobertcalculations.
397*404b540aSrobert
398*404b540aSrobert@node Number of iterations
399*404b540aSrobert@section Number of iterations analysis
400*404b540aSrobert@cindex Number of iterations analysis
401*404b540aSrobert
402*404b540aSrobertBoth on GIMPLE and on RTL, there are functions available to determine
403*404b540aSrobertthe number of iterations of a loop, with a similar interface.  In many
404*404b540aSrobertcases, it is not possible to determine number of iterations
405*404b540aSrobertunconditionally -- the determined number is correct only if some
406*404b540aSrobertassumptions are satisfied.  The analysis tries to verify these
407*404b540aSrobertconditions using the information contained in the program; if it fails,
408*404b540aSrobertthe conditions are returned together with the result.  The following
409*404b540aSrobertinformation and conditions are provided by the analysis:
410*404b540aSrobert
411*404b540aSrobert@itemize
412*404b540aSrobert@item @code{assumptions}: If this condition is false, the rest of
413*404b540aSrobertthe information is invalid.
414*404b540aSrobert@item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
415*404b540aSrobertthis condition is true, the loop exits in the first iteration.
416*404b540aSrobert@item @code{infinite}: If this condition is true, the loop is infinite.
417*404b540aSrobertThis condition is only available on RTL.  On GIMPLE, conditions for
418*404b540aSrobertfiniteness of the loop are included in @code{assumptions}.
419*404b540aSrobert@item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
420*404b540aSrobertthat gives number of iterations.  The number of iterations is defined as
421*404b540aSrobertthe number of executions of the loop latch.
422*404b540aSrobert@end itemize
423*404b540aSrobert
424*404b540aSrobertBoth on GIMPLE and on RTL, it necessary for the induction variable
425*404b540aSrobertanalysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
426*404b540aSrobertOn GIMPLE, the results are stored to @code{struct tree_niter_desc}
427*404b540aSrobertstructure.  Number of iterations before the loop is exited through a
428*404b540aSrobertgiven exit can be determined using @code{number_of_iterations_exit}
429*404b540aSrobertfunction.  On RTL, the results are returned in @code{struct niter_desc}
430*404b540aSrobertstructure.  The corresponding function is named
431*404b540aSrobert@code{check_simple_exit}.  There are also functions that pass through
432*404b540aSrobertall the exits of a loop and try to find one with easy to determine
433*404b540aSrobertnumber of iterations -- @code{find_loop_niter} on GIMPLE and
434*404b540aSrobert@code{find_simple_exit} on RTL.  Finally, there are functions that
435*404b540aSrobertprovide the same information, but additionally cache it, so that
436*404b540aSrobertrepeated calls to number of iterations are not so costly --
437*404b540aSrobert@code{number_of_iterations_in_loop} on GIMPLE and
438*404b540aSrobert@code{get_simple_loop_desc} on RTL.
439*404b540aSrobert
440*404b540aSrobertNote that some of these functions may behave slightly differently than
441*404b540aSrobertothers -- some of them return only the expression for the number of
442*404b540aSrobertiterations, and fail if there are some assumptions.  The function
443*404b540aSrobert@code{number_of_iterations_in_loop} works only for single-exit loops,
444*404b540aSrobertand it returns the value for number of iterations higher by one with
445*404b540aSrobertrespect to all other functions (i.e., it returns number of executions of
446*404b540aSrobertthe exit statement, not of the loop latch).
447*404b540aSrobert
448*404b540aSrobert@node Dependency analysis
449*404b540aSrobert@section Data Dependency Analysis
450*404b540aSrobert@cindex Data Dependency Analysis
451*404b540aSrobert
452*404b540aSrobertThe code for the data dependence analysis can be found in
453*404b540aSrobert@file{tree-data-ref.c} and its interface and data structures are
454*404b540aSrobertdescribed in @file{tree-data-ref.h}.  The function that computes the
455*404b540aSrobertdata dependences for all the array and pointer references for a given
456*404b540aSrobertloop is @code{compute_data_dependences_for_loop}.  This function is
457*404b540aSrobertcurrently used by the linear loop transform and the vectorization
458*404b540aSrobertpasses.  Before calling this function, one has to allocate two vectors:
459*404b540aSroberta first vector will contain the set of data references that are
460*404b540aSrobertcontained in the analyzed loop body, and the second vector will contain
461*404b540aSrobertthe dependence relations between the data references.  Thus if the
462*404b540aSrobertvector of data references is of size @code{n}, the vector containing the
463*404b540aSrobertdependence relations will contain @code{n*n} elements.  However if the
464*404b540aSrobertanalyzed loop contains side effects, such as calls that potentially can
465*404b540aSrobertinterfere with the data references in the current analyzed loop, the
466*404b540aSrobertanalysis stops while scanning the loop body for data references, and
467*404b540aSrobertinserts a single @code{chrec_dont_know} in the dependence relation
468*404b540aSrobertarray.
469*404b540aSrobert
470*404b540aSrobertThe data references are discovered in a particular order during the
471*404b540aSrobertscanning of the loop body: the loop body is analyzed in execution order,
472*404b540aSrobertand the data references of each statement are pushed at the end of the
473*404b540aSrobertdata reference array.  Two data references syntactically occur in the
474*404b540aSrobertprogram in the same order as in the array of data references.  This
475*404b540aSrobertsyntactic order is important in some classical data dependence tests,
476*404b540aSrobertand mapping this order to the elements of this array avoids costly
477*404b540aSrobertqueries to the loop body representation.
478*404b540aSrobert
479*404b540aSrobertThree types of data references are currently handled: ARRAY_REF,
480*404b540aSrobertINDIRECT_REF and COMPONENT_REF. The data structure for the data reference
481*404b540aSrobertis @code{data_reference}, where @code{data_reference_p} is a name of a
482*404b540aSrobertpointer to the data reference structure. The structure contains the
483*404b540aSrobertfollowing elements:
484*404b540aSrobert
485*404b540aSrobert@itemize
486*404b540aSrobert@item @code{base_object_info}: Provides information about the base object
487*404b540aSrobertof the data reference and its access functions. These access functions
488*404b540aSrobertrepresent the evolution of the data reference in the loop relative to
489*404b540aSrobertits base, in keeping with the classical meaning of the data reference
490*404b540aSrobertaccess function for the support of arrays. For example, for a reference
491*404b540aSrobert@code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
492*404b540aSrobertone for each array subscript, are:
493*404b540aSrobert@code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
494*404b540aSrobert
495*404b540aSrobert@item @code{first_location_in_loop}: Provides information about the first
496*404b540aSrobertlocation accessed by the data reference in the loop and about the access
497*404b540aSrobertfunction used to represent evolution relative to this location. This data
498*404b540aSrobertis used to support pointers, and is not used for arrays (for which we
499*404b540aSroberthave base objects). Pointer accesses are represented as a one-dimensional
500*404b540aSrobertaccess that starts from the first location accessed in the loop. For
501*404b540aSrobertexample:
502*404b540aSrobert
503*404b540aSrobert@smallexample
504*404b540aSrobert      for1 i
505*404b540aSrobert         for2 j
506*404b540aSrobert          *((int *)p + i + j) = a[i][j];
507*404b540aSrobert@end smallexample
508*404b540aSrobert
509*404b540aSrobertThe access function of the pointer access is @code{@{0, + 4B@}_for2}
510*404b540aSrobertrelative to @code{p + i}. The access functions of the array are
511*404b540aSrobert@code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
512*404b540aSrobertrelative to @code{a}.
513*404b540aSrobert
514*404b540aSrobertUsually, the object the pointer refers to is either unknown, or we can't
515*404b540aSrobertprove that the access is confined to the boundaries of a certain object.
516*404b540aSrobert
517*404b540aSrobertTwo data references can be compared only if at least one of these two
518*404b540aSrobertrepresentations has all its fields filled for both data references.
519*404b540aSrobert
520*404b540aSrobertThe current strategy for data dependence tests is as follows:
521*404b540aSrobertIf both @code{a} and @code{b} are represented as arrays, compare
522*404b540aSrobert@code{a.base_object} and @code{b.base_object};
523*404b540aSrobertif they are equal, apply dependence tests (use access functions based on
524*404b540aSrobertbase_objects).
525*404b540aSrobertElse if both @code{a} and @code{b} are represented as pointers, compare
526*404b540aSrobert@code{a.first_location} and @code{b.first_location};
527*404b540aSrobertif they are equal, apply dependence tests (use access functions based on
528*404b540aSrobertfirst location).
529*404b540aSrobertHowever, if @code{a} and @code{b} are represented differently, only try
530*404b540aSrobertto prove that the bases are definitely different.
531*404b540aSrobert
532*404b540aSrobert@item Aliasing information.
533*404b540aSrobert@item Alignment information.
534*404b540aSrobert@end itemize
535*404b540aSrobert
536*404b540aSrobertThe structure describing the relation between two data references is
537*404b540aSrobert@code{data_dependence_relation} and the shorter name for a pointer to
538*404b540aSrobertsuch a structure is @code{ddr_p}.  This structure contains:
539*404b540aSrobert
540*404b540aSrobert@itemize
541*404b540aSrobert@item a pointer to each data reference,
542*404b540aSrobert@item a tree node @code{are_dependent} that is set to @code{chrec_known}
543*404b540aSrobertif the analysis has proved that there is no dependence between these two
544*404b540aSrobertdata references, @code{chrec_dont_know} if the analysis was not able to
545*404b540aSrobertdetermine any useful result and potentially there could exist a
546*404b540aSrobertdependence between these data references, and @code{are_dependent} is
547*404b540aSrobertset to @code{NULL_TREE} if there exist a dependence relation between the
548*404b540aSrobertdata references, and the description of this dependence relation is
549*404b540aSrobertgiven in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
550*404b540aSrobertarrays,
551*404b540aSrobert@item a boolean that determines whether the dependence relation can be
552*404b540aSrobertrepresented by a classical distance vector,
553*404b540aSrobert@item an array @code{subscripts} that contains a description of each
554*404b540aSrobertsubscript of the data references.  Given two array accesses a
555*404b540aSrobertsubscript is the tuple composed of the access functions for a given
556*404b540aSrobertdimension.  For example, given @code{A[f1][f2][f3]} and
557*404b540aSrobert@code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
558*404b540aSrobertg2), (f3, g3)}.
559*404b540aSrobert@item two arrays @code{dir_vects} and @code{dist_vects} that contain
560*404b540aSrobertclassical representations of the data dependences under the form of
561*404b540aSrobertdirection and distance dependence vectors,
562*404b540aSrobert@item an array of loops @code{loop_nest} that contains the loops to
563*404b540aSrobertwhich the distance and direction vectors refer to.
564*404b540aSrobert@end itemize
565*404b540aSrobert
566*404b540aSrobertSeveral functions for pretty printing the information extracted by the
567*404b540aSrobertdata dependence analysis are available: @code{dump_ddrs} prints with a
568*404b540aSrobertmaximum verbosity the details of a data dependence relations array,
569*404b540aSrobert@code{dump_dist_dir_vectors} prints only the classical distance and
570*404b540aSrobertdirection vectors for a data dependence relations array, and
571*404b540aSrobert@code{dump_data_references} prints the details of the data references
572*404b540aSrobertcontained in a data reference array.
573*404b540aSrobert
574*404b540aSrobert@node Lambda
575*404b540aSrobert@section Linear loop transformations framework
576*404b540aSrobert@cindex Linear loop transformations framework
577*404b540aSrobert
578*404b540aSrobertLambda is a framework that allows transformations of loops using
579*404b540aSrobertnon-singular matrix based transformations of the iteration space and
580*404b540aSrobertloop bounds. This allows compositions of skewing, scaling, interchange,
581*404b540aSrobertand reversal transformations.  These transformations are often used to
582*404b540aSrobertimprove cache behavior or remove inner loop dependencies to allow
583*404b540aSrobertparallelization and vectorization to take place.
584*404b540aSrobert
585*404b540aSrobertTo perform these transformations, Lambda requires that the loopnest be
586*404b540aSrobertconverted into a internal form that can be matrix transformed easily.
587*404b540aSrobertTo do this conversion, the function
588*404b540aSrobert@code{gcc_loopnest_to_lambda_loopnest} is provided.  If the loop cannot
589*404b540aSrobertbe transformed using lambda, this function will return NULL.
590*404b540aSrobert
591*404b540aSrobertOnce a @code{lambda_loopnest} is obtained from the conversion function,
592*404b540aSrobertit can be transformed by using @code{lambda_loopnest_transform}, which
593*404b540aSroberttakes a transformation matrix to apply.  Note that it is up to the
594*404b540aSrobertcaller to verify that the transformation matrix is legal to apply to the
595*404b540aSrobertloop (dependence respecting, etc).  Lambda simply applies whatever
596*404b540aSrobertmatrix it is told to provide.  It can be extended to make legal matrices
597*404b540aSrobertout of any non-singular matrix, but this is not currently implemented.
598*404b540aSrobertLegality of a matrix for a given loopnest can be verified using
599*404b540aSrobert@code{lambda_transform_legal_p}.
600*404b540aSrobert
601*404b540aSrobertGiven a transformed loopnest, conversion back into gcc IR is done by
602*404b540aSrobert@code{lambda_loopnest_to_gcc_loopnest}.  This function will modify the
603*404b540aSrobertloops so that they match the transformed loopnest.
604*404b540aSrobert
605