xref: /netbsd-src/external/gpl3/gcc.old/dist/gcc/doc/loop.texi (revision 8feb0f0b7eaff0608f8350bbfa3098827b4bb91b)
1*8feb0f0bSmrg@c Copyright (C) 2006-2020 Free Software Foundation, Inc.
21debfc3dSmrg@c Free Software Foundation, Inc.
31debfc3dSmrg@c This is part of the GCC manual.
41debfc3dSmrg@c For copying conditions, see the file gcc.texi.
51debfc3dSmrg
61debfc3dSmrg@c ---------------------------------------------------------------------
71debfc3dSmrg@c Loop Representation
81debfc3dSmrg@c ---------------------------------------------------------------------
91debfc3dSmrg
101debfc3dSmrg@node Loop Analysis and Representation
111debfc3dSmrg@chapter Analysis and Representation of Loops
121debfc3dSmrg
131debfc3dSmrgGCC provides extensive infrastructure for work with natural loops, i.e.,
141debfc3dSmrgstrongly connected components of CFG with only one entry block.  This
151debfc3dSmrgchapter describes representation of loops in GCC, both on GIMPLE and in
161debfc3dSmrgRTL, as well as the interfaces to loop-related analyses (induction
171debfc3dSmrgvariable analysis and number of iterations analysis).
181debfc3dSmrg
191debfc3dSmrg@menu
201debfc3dSmrg* Loop representation::         Representation and analysis of loops.
211debfc3dSmrg* Loop querying::               Getting information about loops.
221debfc3dSmrg* Loop manipulation::           Loop manipulation functions.
231debfc3dSmrg* LCSSA::                       Loop-closed SSA form.
241debfc3dSmrg* Scalar evolutions::           Induction variables on GIMPLE.
251debfc3dSmrg* loop-iv::                     Induction variables on RTL.
261debfc3dSmrg* Number of iterations::        Number of iterations analysis.
271debfc3dSmrg* Dependency analysis::         Data dependency analysis.
281debfc3dSmrg@end menu
291debfc3dSmrg
301debfc3dSmrg@node Loop representation
311debfc3dSmrg@section Loop representation
321debfc3dSmrg@cindex Loop representation
331debfc3dSmrg@cindex Loop analysis
341debfc3dSmrg
351debfc3dSmrgThis chapter describes the representation of loops in GCC, and functions
361debfc3dSmrgthat can be used to build, modify and analyze this representation.  Most
371debfc3dSmrgof the interfaces and data structures are declared in @file{cfgloop.h}.
381debfc3dSmrgLoop structures are analyzed and this information disposed or updated
391debfc3dSmrgat the discretion of individual passes.  Still most of the generic
401debfc3dSmrgCFG manipulation routines are aware of loop structures and try to
411debfc3dSmrgkeep them up-to-date.  By this means an increasing part of the
421debfc3dSmrgcompilation pipeline is setup to maintain loop structure across
431debfc3dSmrgpasses to allow attaching meta information to individual loops
441debfc3dSmrgfor consumption by later passes.
451debfc3dSmrg
461debfc3dSmrgIn general, a natural loop has one entry block (header) and possibly
471debfc3dSmrgseveral back edges (latches) leading to the header from the inside of
481debfc3dSmrgthe loop.  Loops with several latches may appear if several loops share
491debfc3dSmrga single header, or if there is a branching in the middle of the loop.
501debfc3dSmrgThe representation of loops in GCC however allows only loops with a
511debfc3dSmrgsingle latch.  During loop analysis, headers of such loops are split and
521debfc3dSmrgforwarder blocks are created in order to disambiguate their structures.
531debfc3dSmrgHeuristic based on profile information and structure of the induction
541debfc3dSmrgvariables in the loops is used to determine whether the latches
551debfc3dSmrgcorrespond to sub-loops or to control flow in a single loop.  This means
561debfc3dSmrgthat the analysis sometimes changes the CFG, and if you run it in the
571debfc3dSmrgmiddle of an optimization pass, you must be able to deal with the new
581debfc3dSmrgblocks.  You may avoid CFG changes by passing
591debfc3dSmrg@code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery,
601debfc3dSmrgnote however that most other loop manipulation functions will not work
611debfc3dSmrgcorrectly for loops with multiple latch edges (the functions that only
621debfc3dSmrgquery membership of blocks to loops and subloop relationships, or
631debfc3dSmrgenumerate and test loop exits, can be expected to work).
641debfc3dSmrg
651debfc3dSmrgBody of the loop is the set of blocks that are dominated by its header,
661debfc3dSmrgand reachable from its latch against the direction of edges in CFG@.  The
671debfc3dSmrgloops are organized in a containment hierarchy (tree) such that all the
681debfc3dSmrgloops immediately contained inside loop L are the children of L in the
691debfc3dSmrgtree.  This tree is represented by the @code{struct loops} structure.
701debfc3dSmrgThe root of this tree is a fake loop that contains all blocks in the
711debfc3dSmrgfunction.  Each of the loops is represented in a @code{struct loop}
721debfc3dSmrgstructure.  Each loop is assigned an index (@code{num} field of the
731debfc3dSmrg@code{struct loop} structure), and the pointer to the loop is stored in
741debfc3dSmrgthe corresponding field of the @code{larray} vector in the loops
751debfc3dSmrgstructure.  The indices do not have to be continuous, there may be
761debfc3dSmrgempty (@code{NULL}) entries in the @code{larray} created by deleting
771debfc3dSmrgloops.  Also, there is no guarantee on the relative order of a loop
781debfc3dSmrgand its subloops in the numbering.  The index of a loop never changes.
791debfc3dSmrg
801debfc3dSmrgThe entries of the @code{larray} field should not be accessed directly.
811debfc3dSmrgThe function @code{get_loop} returns the loop description for a loop with
821debfc3dSmrgthe given index.  @code{number_of_loops} function returns number of
831debfc3dSmrgloops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
841debfc3dSmrgmacro.  The @code{flags} argument of the macro is used to determine
851debfc3dSmrgthe direction of traversal and the set of loops visited.  Each loop is
861debfc3dSmrgguaranteed to be visited exactly once, regardless of the changes to the
871debfc3dSmrgloop tree, and the loops may be removed during the traversal.  The newly
881debfc3dSmrgcreated loops are never traversed, if they need to be visited, this
89*8feb0f0bSmrgmust be done separately after their creation.
901debfc3dSmrg
911debfc3dSmrgEach basic block contains the reference to the innermost loop it belongs
921debfc3dSmrgto (@code{loop_father}).  For this reason, it is only possible to have
931debfc3dSmrgone @code{struct loops} structure initialized at the same time for each
941debfc3dSmrgCFG@.  The global variable @code{current_loops} contains the
951debfc3dSmrg@code{struct loops} structure.  Many of the loop manipulation functions
961debfc3dSmrgassume that dominance information is up-to-date.
971debfc3dSmrg
981debfc3dSmrgThe loops are analyzed through @code{loop_optimizer_init} function.  The
991debfc3dSmrgargument of this function is a set of flags represented in an integer
1001debfc3dSmrgbitmask.  These flags specify what other properties of the loop
1011debfc3dSmrgstructures should be calculated/enforced and preserved later:
1021debfc3dSmrg
1031debfc3dSmrg@itemize
1041debfc3dSmrg@item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no
1051debfc3dSmrgchanges to CFG will be performed in the loop analysis, in particular,
1061debfc3dSmrgloops with multiple latch edges will not be disambiguated.  If a loop
1071debfc3dSmrghas multiple latches, its latch block is set to NULL@.  Most of
1081debfc3dSmrgthe loop manipulation functions will not work for loops in this shape.
1091debfc3dSmrgNo other flags that require CFG changes can be passed to
1101debfc3dSmrgloop_optimizer_init.
1111debfc3dSmrg@item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
1121debfc3dSmrga way that each loop has only one entry edge, and additionally, the
1131debfc3dSmrgsource block of this entry edge has only one successor.  This creates a
1141debfc3dSmrgnatural place where the code can be moved out of the loop, and ensures
1151debfc3dSmrgthat the entry edge of the loop leads from its immediate super-loop.
1161debfc3dSmrg@item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
1171debfc3dSmrgforce the latch block of each loop to have only one successor.  This
1181debfc3dSmrgensures that the latch of the loop does not belong to any of its
1191debfc3dSmrgsub-loops, and makes manipulation with the loops significantly easier.
1201debfc3dSmrgMost of the loop manipulation functions assume that the loops are in
1211debfc3dSmrgthis shape.  Note that with this flag, the ``normal'' loop without any
1221debfc3dSmrgcontrol flow inside and with one exit consists of two basic blocks.
1231debfc3dSmrg@item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
1241debfc3dSmrgedges in the strongly connected components that are not natural loops
1251debfc3dSmrg(have more than one entry block) are marked with
1261debfc3dSmrg@code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags.  The
1271debfc3dSmrgflag is not set for blocks and edges that belong to natural loops that
1281debfc3dSmrgare in such an irreducible region (but it is set for the entry and exit
1291debfc3dSmrgedges of such a loop, if they lead to/from this region).
1301debfc3dSmrg@item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
1311debfc3dSmrgand updated for each loop.  This makes some functions (e.g.,
1321debfc3dSmrg@code{get_loop_exit_edges}) more efficient.  Some functions (e.g.,
1331debfc3dSmrg@code{single_exit}) can be used only if the lists of exits are
1341debfc3dSmrgrecorded.
1351debfc3dSmrg@end itemize
1361debfc3dSmrg
1371debfc3dSmrgThese properties may also be computed/enforced later, using functions
1381debfc3dSmrg@code{create_preheaders}, @code{force_single_succ_latches},
1391debfc3dSmrg@code{mark_irreducible_loops} and @code{record_loop_exits}.
1401debfc3dSmrgThe properties can be queried using @code{loops_state_satisfies_p}.
1411debfc3dSmrg
1421debfc3dSmrgThe memory occupied by the loops structures should be freed with
1431debfc3dSmrg@code{loop_optimizer_finalize} function.  When loop structures are
1441debfc3dSmrgsetup to be preserved across passes this function reduces the
1451debfc3dSmrginformation to be kept up-to-date to a minimum (only
1461debfc3dSmrg@code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} set).
1471debfc3dSmrg
1481debfc3dSmrgThe CFG manipulation functions in general do not update loop structures.
1491debfc3dSmrgSpecialized versions that additionally do so are provided for the most
1501debfc3dSmrgcommon tasks.  On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
1511debfc3dSmrgused to cleanup CFG while updating the loops structures if
1521debfc3dSmrg@code{current_loops} is set.
1531debfc3dSmrg
1541debfc3dSmrgAt the moment loop structure is preserved from the start of GIMPLE
1551debfc3dSmrgloop optimizations until the end of RTL loop optimizations.  During
1561debfc3dSmrgthis time a loop can be tracked by its @code{struct loop} and number.
1571debfc3dSmrg
1581debfc3dSmrg@node Loop querying
1591debfc3dSmrg@section Loop querying
1601debfc3dSmrg@cindex Loop querying
1611debfc3dSmrg
1621debfc3dSmrgThe functions to query the information about loops are declared in
1631debfc3dSmrg@file{cfgloop.h}.  Some of the information can be taken directly from
1641debfc3dSmrgthe structures.  @code{loop_father} field of each basic block contains
1651debfc3dSmrgthe innermost loop to that the block belongs.  The most useful fields of
1661debfc3dSmrgloop structure (that are kept up-to-date at all times) are:
1671debfc3dSmrg
1681debfc3dSmrg@itemize
1691debfc3dSmrg@item @code{header}, @code{latch}: Header and latch basic blocks of the
1701debfc3dSmrgloop.
1711debfc3dSmrg@item @code{num_nodes}: Number of basic blocks in the loop (including
1721debfc3dSmrgthe basic blocks of the sub-loops).
1731debfc3dSmrg@item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
1741debfc3dSmrgsub-loop, and the sibling of the loop in the loops tree.
1751debfc3dSmrg@end itemize
1761debfc3dSmrg
1771debfc3dSmrgThere are other fields in the loop structures, many of them used only by
1781debfc3dSmrgsome of the passes, or not updated during CFG changes; in general, they
1791debfc3dSmrgshould not be accessed directly.
1801debfc3dSmrg
1811debfc3dSmrgThe most important functions to query loop structures are:
1821debfc3dSmrg
1831debfc3dSmrg@itemize
1841debfc3dSmrg@item @code{loop_depth}: The depth of the loop in the loops tree, i.e., the
1851debfc3dSmrgnumber of super-loops of the loop.
1861debfc3dSmrg@item @code{flow_loops_dump}: Dumps the information about loops to a
1871debfc3dSmrgfile.
1881debfc3dSmrg@item @code{verify_loop_structure}: Checks consistency of the loop
1891debfc3dSmrgstructures.
1901debfc3dSmrg@item @code{loop_latch_edge}: Returns the latch edge of a loop.
1911debfc3dSmrg@item @code{loop_preheader_edge}: If loops have preheaders, returns
1921debfc3dSmrgthe preheader edge of a loop.
1931debfc3dSmrg@item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
1941debfc3dSmrganother loop.
1951debfc3dSmrg@item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
1961debfc3dSmrgto a loop (including its sub-loops).
1971debfc3dSmrg@item @code{find_common_loop}: Finds the common super-loop of two loops.
1981debfc3dSmrg@item @code{superloop_at_depth}: Returns the super-loop of a loop with
1991debfc3dSmrgthe given depth.
2001debfc3dSmrg@item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
2011debfc3dSmrgnumber of insns in the loop, on GIMPLE and on RTL.
2021debfc3dSmrg@item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
2031debfc3dSmrgloop.
2041debfc3dSmrg@item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
2051debfc3dSmrgwith @code{EDGE_LOOP_EXIT} flag.
2061debfc3dSmrg@item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
2071debfc3dSmrg@code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
2081debfc3dSmrgloop in depth-first search order in reversed CFG, ordered by dominance
2091debfc3dSmrgrelation, and breath-first search order, respectively.
2101debfc3dSmrg@item @code{single_exit}: Returns the single exit edge of the loop, or
2111debfc3dSmrg@code{NULL} if the loop has more than one exit.  You can only use this
2121debfc3dSmrgfunction if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
2131debfc3dSmrg@item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
2141debfc3dSmrg@item @code{just_once_each_iteration_p}: Returns true if the basic block
2151debfc3dSmrgis executed exactly once during each iteration of a loop (that is, it
2161debfc3dSmrgdoes not belong to a sub-loop, and it dominates the latch of the loop).
2171debfc3dSmrg@end itemize
2181debfc3dSmrg
2191debfc3dSmrg@node Loop manipulation
2201debfc3dSmrg@section Loop manipulation
2211debfc3dSmrg@cindex Loop manipulation
2221debfc3dSmrg
2231debfc3dSmrgThe loops tree can be manipulated using the following functions:
2241debfc3dSmrg
2251debfc3dSmrg@itemize
2261debfc3dSmrg@item @code{flow_loop_tree_node_add}: Adds a node to the tree.
2271debfc3dSmrg@item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
2281debfc3dSmrg@item @code{add_bb_to_loop}: Adds a basic block to a loop.
2291debfc3dSmrg@item @code{remove_bb_from_loops}: Removes a basic block from loops.
2301debfc3dSmrg@end itemize
2311debfc3dSmrg
2321debfc3dSmrgMost low-level CFG functions update loops automatically.  The following
2331debfc3dSmrgfunctions handle some more complicated cases of CFG manipulations:
2341debfc3dSmrg
2351debfc3dSmrg@itemize
2361debfc3dSmrg@item @code{remove_path}: Removes an edge and all blocks it dominates.
2371debfc3dSmrg@item @code{split_loop_exit_edge}: Splits exit edge of the loop,
2381debfc3dSmrgensuring that PHI node arguments remain in the loop (this ensures that
2391debfc3dSmrgloop-closed SSA form is preserved).  Only useful on GIMPLE.
2401debfc3dSmrg@end itemize
2411debfc3dSmrg
2421debfc3dSmrgFinally, there are some higher-level loop transformations implemented.
2431debfc3dSmrgWhile some of them are written so that they should work on non-innermost
2441debfc3dSmrgloops, they are mostly untested in that case, and at the moment, they
2451debfc3dSmrgare only reliable for the innermost loops:
2461debfc3dSmrg
2471debfc3dSmrg@itemize
2481debfc3dSmrg@item @code{create_iv}: Creates a new induction variable.  Only works on
2491debfc3dSmrgGIMPLE@.  @code{standard_iv_increment_position} can be used to find a
2501debfc3dSmrgsuitable place for the iv increment.
2511debfc3dSmrg@item @code{duplicate_loop_to_header_edge},
2521debfc3dSmrg@code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
2531debfc3dSmrgon GIMPLE) duplicate the body of the loop prescribed number of times on
2541debfc3dSmrgone of the edges entering loop header, thus performing either loop
2551debfc3dSmrgunrolling or loop peeling.  @code{can_duplicate_loop_p}
2561debfc3dSmrg(@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
2571debfc3dSmrgloop.
2581debfc3dSmrg@item @code{loop_version}: This function creates a copy of a loop, and
2591debfc3dSmrga branch before them that selects one of them depending on the
2601debfc3dSmrgprescribed condition.  This is useful for optimizations that need to
2611debfc3dSmrgverify some assumptions in runtime (one of the copies of the loop is
2621debfc3dSmrgusually left unchanged, while the other one is transformed in some way).
2631debfc3dSmrg@item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
2641debfc3dSmrgextra iterations to make the number of iterations divisible by unroll
2651debfc3dSmrgfactor, updating the exit condition, and removing the exits that now
2661debfc3dSmrgcannot be taken.  Works only on GIMPLE.
2671debfc3dSmrg@end itemize
2681debfc3dSmrg
2691debfc3dSmrg@node LCSSA
2701debfc3dSmrg@section Loop-closed SSA form
2711debfc3dSmrg@cindex LCSSA
2721debfc3dSmrg@cindex Loop-closed SSA form
2731debfc3dSmrg
2741debfc3dSmrgThroughout the loop optimizations on tree level, one extra condition is
2751debfc3dSmrgenforced on the SSA form:  No SSA name is used outside of the loop in
2761debfc3dSmrgthat it is defined.  The SSA form satisfying this condition is called
2771debfc3dSmrg``loop-closed SSA form'' -- LCSSA@.  To enforce LCSSA, PHI nodes must be
2781debfc3dSmrgcreated at the exits of the loops for the SSA names that are used
2791debfc3dSmrgoutside of them.  Only the real operands (not virtual SSA names) are
2801debfc3dSmrgheld in LCSSA, in order to save memory.
2811debfc3dSmrg
2821debfc3dSmrgThere are various benefits of LCSSA:
2831debfc3dSmrg
2841debfc3dSmrg@itemize
2851debfc3dSmrg@item Many optimizations (value range analysis, final value
2861debfc3dSmrgreplacement) are interested in the values that are defined in the loop
2871debfc3dSmrgand used outside of it, i.e., exactly those for that we create new PHI
2881debfc3dSmrgnodes.
2891debfc3dSmrg@item In induction variable analysis, it is not necessary to specify the
2901debfc3dSmrgloop in that the analysis should be performed -- the scalar evolution
2911debfc3dSmrganalysis always returns the results with respect to the loop in that the
2921debfc3dSmrgSSA name is defined.
2931debfc3dSmrg@item It makes updating of SSA form during loop transformations simpler.
2941debfc3dSmrgWithout LCSSA, operations like loop unrolling may force creation of PHI
2951debfc3dSmrgnodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
2961debfc3dSmrgupdated locally.  However, since we only keep real operands in LCSSA, we
2971debfc3dSmrgcannot use this advantage (we could have local updating of real
2981debfc3dSmrgoperands, but it is not much more efficient than to use generic SSA form
2991debfc3dSmrgupdating for it as well; the amount of changes to SSA is the same).
3001debfc3dSmrg@end itemize
3011debfc3dSmrg
3021debfc3dSmrgHowever, it also means LCSSA must be updated.  This is usually
3031debfc3dSmrgstraightforward, unless you create a new value in loop and use it
3041debfc3dSmrgoutside, or unless you manipulate loop exit edges (functions are
3051debfc3dSmrgprovided to make these manipulations simple).
3061debfc3dSmrg@code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
3071debfc3dSmrgLCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
3081debfc3dSmrgLCSSA is preserved.
3091debfc3dSmrg
3101debfc3dSmrg@node Scalar evolutions
3111debfc3dSmrg@section Scalar evolutions
3121debfc3dSmrg@cindex Scalar evolutions
3131debfc3dSmrg@cindex IV analysis on GIMPLE
3141debfc3dSmrg
3151debfc3dSmrgScalar evolutions (SCEV) are used to represent results of induction
3161debfc3dSmrgvariable analysis on GIMPLE@.  They enable us to represent variables with
3171debfc3dSmrgcomplicated behavior in a simple and consistent way (we only use it to
3181debfc3dSmrgexpress values of polynomial induction variables, but it is possible to
3191debfc3dSmrgextend it).  The interfaces to SCEV analysis are declared in
3201debfc3dSmrg@file{tree-scalar-evolution.h}.  To use scalar evolutions analysis,
3211debfc3dSmrg@code{scev_initialize} must be used.  To stop using SCEV,
3221debfc3dSmrg@code{scev_finalize} should be used.  SCEV analysis caches results in
3231debfc3dSmrgorder to save time and memory.  This cache however is made invalid by
3241debfc3dSmrgmost of the loop transformations, including removal of code.  If such a
3251debfc3dSmrgtransformation is performed, @code{scev_reset} must be called to clean
3261debfc3dSmrgthe caches.
3271debfc3dSmrg
3281debfc3dSmrgGiven an SSA name, its behavior in loops can be analyzed using the
3291debfc3dSmrg@code{analyze_scalar_evolution} function.  The returned SCEV however
3301debfc3dSmrgdoes not have to be fully analyzed and it may contain references to
3311debfc3dSmrgother SSA names defined in the loop.  To resolve these (potentially
3321debfc3dSmrgrecursive) references, @code{instantiate_parameters} or
3331debfc3dSmrg@code{resolve_mixers} functions must be used.
3341debfc3dSmrg@code{instantiate_parameters} is useful when you use the results of SCEV
3351debfc3dSmrgonly for some analysis, and when you work with whole nest of loops at
3361debfc3dSmrgonce.  It will try replacing all SSA names by their SCEV in all loops,
3371debfc3dSmrgincluding the super-loops of the current loop, thus providing a complete
3381debfc3dSmrginformation about the behavior of the variable in the loop nest.
3391debfc3dSmrg@code{resolve_mixers} is useful if you work with only one loop at a
3401debfc3dSmrgtime, and if you possibly need to create code based on the value of the
3411debfc3dSmrginduction variable.  It will only resolve the SSA names defined in the
3421debfc3dSmrgcurrent loop, leaving the SSA names defined outside unchanged, even if
3431debfc3dSmrgtheir evolution in the outer loops is known.
3441debfc3dSmrg
3451debfc3dSmrgThe SCEV is a normal tree expression, except for the fact that it may
3461debfc3dSmrgcontain several special tree nodes.  One of them is
3471debfc3dSmrg@code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
3481debfc3dSmrgexpressed.  The other one is @code{POLYNOMIAL_CHREC}.  Polynomial chrec
3491debfc3dSmrghas three arguments -- base, step and loop (both base and step may
3501debfc3dSmrgcontain further polynomial chrecs).  Type of the expression and of base
3511debfc3dSmrgand step must be the same.  A variable has evolution
3521debfc3dSmrg@code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
3531debfc3dSmrgloop) equivalent to @code{x_1} in the following example
3541debfc3dSmrg
3551debfc3dSmrg@smallexample
3561debfc3dSmrgwhile (@dots{})
3571debfc3dSmrg  @{
3581debfc3dSmrg    x_1 = phi (base, x_2);
3591debfc3dSmrg    x_2 = x_1 + step;
3601debfc3dSmrg  @}
3611debfc3dSmrg@end smallexample
3621debfc3dSmrg
3631debfc3dSmrgNote that this includes the language restrictions on the operations.
3641debfc3dSmrgFor example, if we compile C code and @code{x} has signed type, then the
3651debfc3dSmrgoverflow in addition would cause undefined behavior, and we may assume
3661debfc3dSmrgthat this does not happen.  Hence, the value with this SCEV cannot
3671debfc3dSmrgoverflow (which restricts the number of iterations of such a loop).
3681debfc3dSmrg
3691debfc3dSmrgIn many cases, one wants to restrict the attention just to affine
3701debfc3dSmrginduction variables.  In this case, the extra expressive power of SCEV
3711debfc3dSmrgis not useful, and may complicate the optimizations.  In this case,
3721debfc3dSmrg@code{simple_iv} function may be used to analyze a value -- the result
3731debfc3dSmrgis a loop-invariant base and step.
3741debfc3dSmrg
3751debfc3dSmrg@node loop-iv
3761debfc3dSmrg@section IV analysis on RTL
3771debfc3dSmrg@cindex IV analysis on RTL
3781debfc3dSmrg
3791debfc3dSmrgThe induction variable on RTL is simple and only allows analysis of
3801debfc3dSmrgaffine induction variables, and only in one loop at once.  The interface
3811debfc3dSmrgis declared in @file{cfgloop.h}.  Before analyzing induction variables
3821debfc3dSmrgin a loop L, @code{iv_analysis_loop_init} function must be called on L.
3831debfc3dSmrgAfter the analysis (possibly calling @code{iv_analysis_loop_init} for
3841debfc3dSmrgseveral loops) is finished, @code{iv_analysis_done} should be called.
3851debfc3dSmrgThe following functions can be used to access the results of the
3861debfc3dSmrganalysis:
3871debfc3dSmrg
3881debfc3dSmrg@itemize
3891debfc3dSmrg@item @code{iv_analyze}: Analyzes a single register used in the given
3901debfc3dSmrginsn.  If no use of the register in this insn is found, the following
3911debfc3dSmrginsns are scanned, so that this function can be called on the insn
3921debfc3dSmrgreturned by get_condition.
3931debfc3dSmrg@item @code{iv_analyze_result}: Analyzes result of the assignment in the
3941debfc3dSmrggiven insn.
3951debfc3dSmrg@item @code{iv_analyze_expr}: Analyzes a more complicated expression.
3961debfc3dSmrgAll its operands are analyzed by @code{iv_analyze}, and hence they must
3971debfc3dSmrgbe used in the specified insn or one of the following insns.
3981debfc3dSmrg@end itemize
3991debfc3dSmrg
4001debfc3dSmrgThe description of the induction variable is provided in @code{struct
4011debfc3dSmrgrtx_iv}.  In order to handle subregs, the representation is a bit
4021debfc3dSmrgcomplicated; if the value of the @code{extend} field is not
4031debfc3dSmrg@code{UNKNOWN}, the value of the induction variable in the i-th
4041debfc3dSmrgiteration is
4051debfc3dSmrg
4061debfc3dSmrg@smallexample
4071debfc3dSmrgdelta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
4081debfc3dSmrg@end smallexample
4091debfc3dSmrg
4101debfc3dSmrgwith the following exception:  if @code{first_special} is true, then the
4111debfc3dSmrgvalue in the first iteration (when @code{i} is zero) is @code{delta +
4121debfc3dSmrgmult * base}.  However, if @code{extend} is equal to @code{UNKNOWN},
4131debfc3dSmrgthen @code{first_special} must be false, @code{delta} 0, @code{mult} 1
4141debfc3dSmrgand the value in the i-th iteration is
4151debfc3dSmrg
4161debfc3dSmrg@smallexample
4171debfc3dSmrgsubreg_@{mode@} (base + i * step)
4181debfc3dSmrg@end smallexample
4191debfc3dSmrg
4201debfc3dSmrgThe function @code{get_iv_value} can be used to perform these
4211debfc3dSmrgcalculations.
4221debfc3dSmrg
4231debfc3dSmrg@node Number of iterations
4241debfc3dSmrg@section Number of iterations analysis
4251debfc3dSmrg@cindex Number of iterations analysis
4261debfc3dSmrg
4271debfc3dSmrgBoth on GIMPLE and on RTL, there are functions available to determine
4281debfc3dSmrgthe number of iterations of a loop, with a similar interface.  The
4291debfc3dSmrgnumber of iterations of a loop in GCC is defined as the number of
4301debfc3dSmrgexecutions of the loop latch.  In many cases, it is not possible to
4311debfc3dSmrgdetermine the number of iterations unconditionally -- the determined
4321debfc3dSmrgnumber is correct only if some assumptions are satisfied.  The analysis
4331debfc3dSmrgtries to verify these conditions using the information contained in the
4341debfc3dSmrgprogram; if it fails, the conditions are returned together with the
4351debfc3dSmrgresult.  The following information and conditions are provided by the
4361debfc3dSmrganalysis:
4371debfc3dSmrg
4381debfc3dSmrg@itemize
4391debfc3dSmrg@item @code{assumptions}: If this condition is false, the rest of
4401debfc3dSmrgthe information is invalid.
4411debfc3dSmrg@item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
4421debfc3dSmrgthis condition is true, the loop exits in the first iteration.
4431debfc3dSmrg@item @code{infinite}: If this condition is true, the loop is infinite.
4441debfc3dSmrgThis condition is only available on RTL@.  On GIMPLE, conditions for
4451debfc3dSmrgfiniteness of the loop are included in @code{assumptions}.
4461debfc3dSmrg@item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
4471debfc3dSmrgthat gives number of iterations.  The number of iterations is defined as
4481debfc3dSmrgthe number of executions of the loop latch.
4491debfc3dSmrg@end itemize
4501debfc3dSmrg
4511debfc3dSmrgBoth on GIMPLE and on RTL, it necessary for the induction variable
4521debfc3dSmrganalysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
4531debfc3dSmrgOn GIMPLE, the results are stored to @code{struct tree_niter_desc}
4541debfc3dSmrgstructure.  Number of iterations before the loop is exited through a
4551debfc3dSmrggiven exit can be determined using @code{number_of_iterations_exit}
4561debfc3dSmrgfunction.  On RTL, the results are returned in @code{struct niter_desc}
4571debfc3dSmrgstructure.  The corresponding function is named
4581debfc3dSmrg@code{check_simple_exit}.  There are also functions that pass through
4591debfc3dSmrgall the exits of a loop and try to find one with easy to determine
4601debfc3dSmrgnumber of iterations -- @code{find_loop_niter} on GIMPLE and
4611debfc3dSmrg@code{find_simple_exit} on RTL@.  Finally, there are functions that
4621debfc3dSmrgprovide the same information, but additionally cache it, so that
4631debfc3dSmrgrepeated calls to number of iterations are not so costly --
4641debfc3dSmrg@code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
4651debfc3dSmrgon RTL.
4661debfc3dSmrg
4671debfc3dSmrgNote that some of these functions may behave slightly differently than
4681debfc3dSmrgothers -- some of them return only the expression for the number of
4691debfc3dSmrgiterations, and fail if there are some assumptions.  The function
4701debfc3dSmrg@code{number_of_latch_executions} works only for single-exit loops.
4711debfc3dSmrgThe function @code{number_of_cond_exit_executions} can be used to
4721debfc3dSmrgdetermine number of executions of the exit condition of a single-exit
4731debfc3dSmrgloop (i.e., the @code{number_of_latch_executions} increased by one).
4741debfc3dSmrg
4751debfc3dSmrgOn GIMPLE, below constraint flags affect semantics of some APIs of number
4761debfc3dSmrgof iterations analyzer:
4771debfc3dSmrg
4781debfc3dSmrg@itemize
4791debfc3dSmrg@item @code{LOOP_C_INFINITE}: If this constraint flag is set, the loop
4801debfc3dSmrgis known to be infinite.  APIs like @code{number_of_iterations_exit} can
4811debfc3dSmrgreturn false directly without doing any analysis.
4821debfc3dSmrg@item @code{LOOP_C_FINITE}: If this constraint flag is set, the loop is
4831debfc3dSmrgknown to be finite, in other words, loop's number of iterations can be
4841debfc3dSmrgcomputed with @code{assumptions} be true.
4851debfc3dSmrg@end itemize
4861debfc3dSmrg
4871debfc3dSmrgGenerally, the constraint flags are set/cleared by consumers which are
4881debfc3dSmrgloop optimizers.  It's also the consumers' responsibility to set/clear
4891debfc3dSmrgconstraints correctly.  Failing to do that might result in hard to track
4901debfc3dSmrgdown bugs in scev/niter consumers.  One typical use case is vectorizer:
4911debfc3dSmrgit drives number of iterations analyzer by setting @code{LOOP_C_FINITE}
4921debfc3dSmrgand vectorizes possibly infinite loop by versioning loop with analysis
4931debfc3dSmrgresult.  In return, constraints set by consumers can also help number of
4941debfc3dSmrgiterations analyzer in following optimizers.  For example, @code{niter}
4951debfc3dSmrgof a loop versioned under @code{assumptions} is valid unconditionally.
4961debfc3dSmrg
4971debfc3dSmrgOther constraints may be added in the future, for example, a constraint
4981debfc3dSmrgindicating that loops' latch must roll thus @code{may_be_zero} would be
4991debfc3dSmrgfalse unconditionally.
5001debfc3dSmrg
5011debfc3dSmrg@node Dependency analysis
5021debfc3dSmrg@section Data Dependency Analysis
5031debfc3dSmrg@cindex Data Dependency Analysis
5041debfc3dSmrg
5051debfc3dSmrgThe code for the data dependence analysis can be found in
5061debfc3dSmrg@file{tree-data-ref.c} and its interface and data structures are
5071debfc3dSmrgdescribed in @file{tree-data-ref.h}.  The function that computes the
5081debfc3dSmrgdata dependences for all the array and pointer references for a given
5091debfc3dSmrgloop is @code{compute_data_dependences_for_loop}.  This function is
5101debfc3dSmrgcurrently used by the linear loop transform and the vectorization
5111debfc3dSmrgpasses.  Before calling this function, one has to allocate two vectors:
5121debfc3dSmrga first vector will contain the set of data references that are
5131debfc3dSmrgcontained in the analyzed loop body, and the second vector will contain
5141debfc3dSmrgthe dependence relations between the data references.  Thus if the
5151debfc3dSmrgvector of data references is of size @code{n}, the vector containing the
5161debfc3dSmrgdependence relations will contain @code{n*n} elements.  However if the
5171debfc3dSmrganalyzed loop contains side effects, such as calls that potentially can
5181debfc3dSmrginterfere with the data references in the current analyzed loop, the
5191debfc3dSmrganalysis stops while scanning the loop body for data references, and
5201debfc3dSmrginserts a single @code{chrec_dont_know} in the dependence relation
5211debfc3dSmrgarray.
5221debfc3dSmrg
5231debfc3dSmrgThe data references are discovered in a particular order during the
5241debfc3dSmrgscanning of the loop body: the loop body is analyzed in execution order,
5251debfc3dSmrgand the data references of each statement are pushed at the end of the
5261debfc3dSmrgdata reference array.  Two data references syntactically occur in the
5271debfc3dSmrgprogram in the same order as in the array of data references.  This
5281debfc3dSmrgsyntactic order is important in some classical data dependence tests,
5291debfc3dSmrgand mapping this order to the elements of this array avoids costly
5301debfc3dSmrgqueries to the loop body representation.
5311debfc3dSmrg
5321debfc3dSmrgThree types of data references are currently handled: ARRAY_REF,
5331debfc3dSmrgINDIRECT_REF and COMPONENT_REF@. The data structure for the data reference
5341debfc3dSmrgis @code{data_reference}, where @code{data_reference_p} is a name of a
5351debfc3dSmrgpointer to the data reference structure. The structure contains the
5361debfc3dSmrgfollowing elements:
5371debfc3dSmrg
5381debfc3dSmrg@itemize
5391debfc3dSmrg@item @code{base_object_info}: Provides information about the base object
5401debfc3dSmrgof the data reference and its access functions. These access functions
5411debfc3dSmrgrepresent the evolution of the data reference in the loop relative to
5421debfc3dSmrgits base, in keeping with the classical meaning of the data reference
5431debfc3dSmrgaccess function for the support of arrays. For example, for a reference
5441debfc3dSmrg@code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
5451debfc3dSmrgone for each array subscript, are:
5461debfc3dSmrg@code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
5471debfc3dSmrg
5481debfc3dSmrg@item @code{first_location_in_loop}: Provides information about the first
5491debfc3dSmrglocation accessed by the data reference in the loop and about the access
5501debfc3dSmrgfunction used to represent evolution relative to this location. This data
5511debfc3dSmrgis used to support pointers, and is not used for arrays (for which we
5521debfc3dSmrghave base objects). Pointer accesses are represented as a one-dimensional
5531debfc3dSmrgaccess that starts from the first location accessed in the loop. For
5541debfc3dSmrgexample:
5551debfc3dSmrg
5561debfc3dSmrg@smallexample
5571debfc3dSmrg      for1 i
5581debfc3dSmrg         for2 j
5591debfc3dSmrg          *((int *)p + i + j) = a[i][j];
5601debfc3dSmrg@end smallexample
5611debfc3dSmrg
5621debfc3dSmrgThe access function of the pointer access is @code{@{0, + 4B@}_for2}
5631debfc3dSmrgrelative to @code{p + i}. The access functions of the array are
5641debfc3dSmrg@code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
5651debfc3dSmrgrelative to @code{a}.
5661debfc3dSmrg
5671debfc3dSmrgUsually, the object the pointer refers to is either unknown, or we cannot
5681debfc3dSmrgprove that the access is confined to the boundaries of a certain object.
5691debfc3dSmrg
5701debfc3dSmrgTwo data references can be compared only if at least one of these two
5711debfc3dSmrgrepresentations has all its fields filled for both data references.
5721debfc3dSmrg
5731debfc3dSmrgThe current strategy for data dependence tests is as follows:
5741debfc3dSmrgIf both @code{a} and @code{b} are represented as arrays, compare
5751debfc3dSmrg@code{a.base_object} and @code{b.base_object};
5761debfc3dSmrgif they are equal, apply dependence tests (use access functions based on
5771debfc3dSmrgbase_objects).
5781debfc3dSmrgElse if both @code{a} and @code{b} are represented as pointers, compare
5791debfc3dSmrg@code{a.first_location} and @code{b.first_location};
5801debfc3dSmrgif they are equal, apply dependence tests (use access functions based on
5811debfc3dSmrgfirst location).
5821debfc3dSmrgHowever, if @code{a} and @code{b} are represented differently, only try
5831debfc3dSmrgto prove that the bases are definitely different.
5841debfc3dSmrg
5851debfc3dSmrg@item Aliasing information.
5861debfc3dSmrg@item Alignment information.
5871debfc3dSmrg@end itemize
5881debfc3dSmrg
5891debfc3dSmrgThe structure describing the relation between two data references is
5901debfc3dSmrg@code{data_dependence_relation} and the shorter name for a pointer to
5911debfc3dSmrgsuch a structure is @code{ddr_p}.  This structure contains:
5921debfc3dSmrg
5931debfc3dSmrg@itemize
5941debfc3dSmrg@item a pointer to each data reference,
5951debfc3dSmrg@item a tree node @code{are_dependent} that is set to @code{chrec_known}
5961debfc3dSmrgif the analysis has proved that there is no dependence between these two
5971debfc3dSmrgdata references, @code{chrec_dont_know} if the analysis was not able to
5981debfc3dSmrgdetermine any useful result and potentially there could exist a
5991debfc3dSmrgdependence between these data references, and @code{are_dependent} is
6001debfc3dSmrgset to @code{NULL_TREE} if there exist a dependence relation between the
6011debfc3dSmrgdata references, and the description of this dependence relation is
6021debfc3dSmrggiven in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
6031debfc3dSmrgarrays,
6041debfc3dSmrg@item a boolean that determines whether the dependence relation can be
6051debfc3dSmrgrepresented by a classical distance vector,
6061debfc3dSmrg@item an array @code{subscripts} that contains a description of each
6071debfc3dSmrgsubscript of the data references.  Given two array accesses a
6081debfc3dSmrgsubscript is the tuple composed of the access functions for a given
6091debfc3dSmrgdimension.  For example, given @code{A[f1][f2][f3]} and
6101debfc3dSmrg@code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
6111debfc3dSmrgg2), (f3, g3)}.
6121debfc3dSmrg@item two arrays @code{dir_vects} and @code{dist_vects} that contain
6131debfc3dSmrgclassical representations of the data dependences under the form of
6141debfc3dSmrgdirection and distance dependence vectors,
6151debfc3dSmrg@item an array of loops @code{loop_nest} that contains the loops to
6161debfc3dSmrgwhich the distance and direction vectors refer to.
6171debfc3dSmrg@end itemize
6181debfc3dSmrg
6191debfc3dSmrgSeveral functions for pretty printing the information extracted by the
6201debfc3dSmrgdata dependence analysis are available: @code{dump_ddrs} prints with a
6211debfc3dSmrgmaximum verbosity the details of a data dependence relations array,
6221debfc3dSmrg@code{dump_dist_dir_vectors} prints only the classical distance and
6231debfc3dSmrgdirection vectors for a data dependence relations array, and
6241debfc3dSmrg@code{dump_data_references} prints the details of the data references
6251debfc3dSmrgcontained in a data reference array.
626