1@c markers: BUG TODO 2 3@c Copyright (C) 1988-2018 Free Software Foundation, Inc. 4@c This is part of the GCC manual. 5@c For copying conditions, see the file gcc.texi. 6 7@node Passes 8@chapter Passes and Files of the Compiler 9@cindex passes and files of the compiler 10@cindex files and passes of the compiler 11@cindex compiler passes and files 12@cindex pass dumps 13 14This chapter is dedicated to giving an overview of the optimization and 15code generation passes of the compiler. In the process, it describes 16some of the language front end interface, though this description is no 17where near complete. 18 19@menu 20* Parsing pass:: The language front end turns text into bits. 21* Gimplification pass:: The bits are turned into something we can optimize. 22* Pass manager:: Sequencing the optimization passes. 23* Tree SSA passes:: Optimizations on a high-level representation. 24* RTL passes:: Optimizations on a low-level representation. 25* Optimization info:: Dumping optimization information from passes. 26@end menu 27 28@node Parsing pass 29@section Parsing pass 30@cindex GENERIC 31@findex lang_hooks.parse_file 32The language front end is invoked only once, via 33@code{lang_hooks.parse_file}, to parse the entire input. The language 34front end may use any intermediate language representation deemed 35appropriate. The C front end uses GENERIC trees (@pxref{GENERIC}), plus 36a double handful of language specific tree codes defined in 37@file{c-common.def}. The Fortran front end uses a completely different 38private representation. 39 40@cindex GIMPLE 41@cindex gimplification 42@cindex gimplifier 43@cindex language-independent intermediate representation 44@cindex intermediate representation lowering 45@cindex lowering, language-dependent intermediate representation 46At some point the front end must translate the representation used in the 47front end to a representation understood by the language-independent 48portions of the compiler. Current practice takes one of two forms. 49The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function, 50and uses the gimplifier callbacks to convert the language-specific tree 51nodes directly to GIMPLE before passing the function off to be compiled. 52The Fortran front end converts from a private representation to GENERIC, 53which is later lowered to GIMPLE when the function is compiled. Which 54route to choose probably depends on how well GENERIC (plus extensions) 55can be made to match up with the source language and necessary parsing 56data structures. 57 58BUG: Gimplification must occur before nested function lowering, 59and nested function lowering must be done by the front end before 60passing the data off to cgraph. 61 62TODO: Cgraph should control nested function lowering. It would 63only be invoked when it is certain that the outer-most function 64is used. 65 66TODO: Cgraph needs a gimplify_function callback. It should be 67invoked when (1) it is certain that the function is used, (2) 68warning flags specified by the user require some amount of 69compilation in order to honor, (3) the language indicates that 70semantic analysis is not complete until gimplification occurs. 71Hum@dots{} this sounds overly complicated. Perhaps we should just 72have the front end gimplify always; in most cases it's only one 73function call. 74 75The front end needs to pass all function definitions and top level 76declarations off to the middle-end so that they can be compiled and 77emitted to the object file. For a simple procedural language, it is 78usually most convenient to do this as each top level declaration or 79definition is seen. There is also a distinction to be made between 80generating functional code and generating complete debug information. 81The only thing that is absolutely required for functional code is that 82function and data @emph{definitions} be passed to the middle-end. For 83complete debug information, function, data and type declarations 84should all be passed as well. 85 86@findex rest_of_decl_compilation 87@findex rest_of_type_compilation 88@findex cgraph_finalize_function 89In any case, the front end needs each complete top-level function or 90data declaration, and each data definition should be passed to 91@code{rest_of_decl_compilation}. Each complete type definition should 92be passed to @code{rest_of_type_compilation}. Each function definition 93should be passed to @code{cgraph_finalize_function}. 94 95TODO: I know rest_of_compilation currently has all sorts of 96RTL generation semantics. I plan to move all code generation 97bits (both Tree and RTL) to compile_function. Should we hide 98cgraph from the front ends and move back to rest_of_compilation 99as the official interface? Possibly we should rename all three 100interfaces such that the names match in some meaningful way and 101that is more descriptive than "rest_of". 102 103The middle-end will, at its option, emit the function and data 104definitions immediately or queue them for later processing. 105 106@node Gimplification pass 107@section Gimplification pass 108 109@cindex gimplification 110@cindex GIMPLE 111@dfn{Gimplification} is a whimsical term for the process of converting 112the intermediate representation of a function into the GIMPLE language 113(@pxref{GIMPLE}). The term stuck, and so words like ``gimplification'', 114``gimplify'', ``gimplifier'' and the like are sprinkled throughout this 115section of code. 116 117While a front end may certainly choose to generate GIMPLE directly if 118it chooses, this can be a moderately complex process unless the 119intermediate language used by the front end is already fairly simple. 120Usually it is easier to generate GENERIC trees plus extensions 121and let the language-independent gimplifier do most of the work. 122 123@findex gimplify_function_tree 124@findex gimplify_expr 125@findex lang_hooks.gimplify_expr 126The main entry point to this pass is @code{gimplify_function_tree} 127located in @file{gimplify.c}. From here we process the entire 128function gimplifying each statement in turn. The main workhorse 129for this pass is @code{gimplify_expr}. Approximately everything 130passes through here at least once, and it is from here that we 131invoke the @code{lang_hooks.gimplify_expr} callback. 132 133The callback should examine the expression in question and return 134@code{GS_UNHANDLED} if the expression is not a language specific 135construct that requires attention. Otherwise it should alter the 136expression in some way to such that forward progress is made toward 137producing valid GIMPLE@. If the callback is certain that the 138transformation is complete and the expression is valid GIMPLE, it 139should return @code{GS_ALL_DONE}. Otherwise it should return 140@code{GS_OK}, which will cause the expression to be processed again. 141If the callback encounters an error during the transformation (because 142the front end is relying on the gimplification process to finish 143semantic checks), it should return @code{GS_ERROR}. 144 145@node Pass manager 146@section Pass manager 147 148The pass manager is located in @file{passes.c}, @file{tree-optimize.c} 149and @file{tree-pass.h}. 150It processes passes as described in @file{passes.def}. 151Its job is to run all of the individual passes in the correct order, 152and take care of standard bookkeeping that applies to every pass. 153 154The theory of operation is that each pass defines a structure that 155represents everything we need to know about that pass---when it 156should be run, how it should be run, what intermediate language 157form or on-the-side data structures it needs. We register the pass 158to be run in some particular order, and the pass manager arranges 159for everything to happen in the correct order. 160 161The actuality doesn't completely live up to the theory at present. 162Command-line switches and @code{timevar_id_t} enumerations must still 163be defined elsewhere. The pass manager validates constraints but does 164not attempt to (re-)generate data structures or lower intermediate 165language form based on the requirements of the next pass. Nevertheless, 166what is present is useful, and a far sight better than nothing at all. 167 168Each pass should have a unique name. 169Each pass may have its own dump file (for GCC debugging purposes). 170Passes with a name starting with a star do not dump anything. 171Sometimes passes are supposed to share a dump file / option name. 172To still give these unique names, you can use a prefix that is delimited 173by a space from the part that is used for the dump file / option name. 174E.g. When the pass name is "ud dce", the name used for dump file/options 175is "dce". 176 177TODO: describe the global variables set up by the pass manager, 178and a brief description of how a new pass should use it. 179I need to look at what info RTL passes use first@enddots{} 180 181@node Tree SSA passes 182@section Tree SSA passes 183 184The following briefly describes the Tree optimization passes that are 185run after gimplification and what source files they are located in. 186 187@itemize @bullet 188@item Remove useless statements 189 190This pass is an extremely simple sweep across the gimple code in which 191we identify obviously dead code and remove it. Here we do things like 192simplify @code{if} statements with constant conditions, remove 193exception handling constructs surrounding code that obviously cannot 194throw, remove lexical bindings that contain no variables, and other 195assorted simplistic cleanups. The idea is to get rid of the obvious 196stuff quickly rather than wait until later when it's more work to get 197rid of it. This pass is located in @file{tree-cfg.c} and described by 198@code{pass_remove_useless_stmts}. 199 200@item OpenMP lowering 201 202If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers 203OpenMP constructs into GIMPLE. 204 205Lowering of OpenMP constructs involves creating replacement 206expressions for local variables that have been mapped using data 207sharing clauses, exposing the control flow of most synchronization 208directives and adding region markers to facilitate the creation of the 209control flow graph. The pass is located in @file{omp-low.c} and is 210described by @code{pass_lower_omp}. 211 212@item OpenMP expansion 213 214If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands 215parallel regions into their own functions to be invoked by the thread 216library. The pass is located in @file{omp-low.c} and is described by 217@code{pass_expand_omp}. 218 219@item Lower control flow 220 221This pass flattens @code{if} statements (@code{COND_EXPR}) 222and moves lexical bindings (@code{BIND_EXPR}) out of line. After 223this pass, all @code{if} statements will have exactly two @code{goto} 224statements in its @code{then} and @code{else} arms. Lexical binding 225information for each statement will be found in @code{TREE_BLOCK} rather 226than being inferred from its position under a @code{BIND_EXPR}. This 227pass is found in @file{gimple-low.c} and is described by 228@code{pass_lower_cf}. 229 230@item Lower exception handling control flow 231 232This pass decomposes high-level exception handling constructs 233(@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form 234that explicitly represents the control flow involved. After this 235pass, @code{lookup_stmt_eh_region} will return a non-negative 236number for any statement that may have EH control flow semantics; 237examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} 238for exact semantics. Exact control flow may be extracted from 239@code{foreach_reachable_handler}. The EH region nesting tree is defined 240in @file{except.h} and built in @file{except.c}. The lowering pass 241itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. 242 243@item Build the control flow graph 244 245This pass decomposes a function into basic blocks and creates all of 246the edges that connect them. It is located in @file{tree-cfg.c} and 247is described by @code{pass_build_cfg}. 248 249@item Find all referenced variables 250 251This pass walks the entire function and collects an array of all 252variables referenced in the function, @code{referenced_vars}. The 253index at which a variable is found in the array is used as a UID 254for the variable within this function. This data is needed by the 255SSA rewriting routines. The pass is located in @file{tree-dfa.c} 256and is described by @code{pass_referenced_vars}. 257 258@item Enter static single assignment form 259 260This pass rewrites the function such that it is in SSA form. After 261this pass, all @code{is_gimple_reg} variables will be referenced by 262@code{SSA_NAME}, and all occurrences of other variables will be 263annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have 264been inserted as necessary for each basic block. This pass is 265located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. 266 267@item Warn for uninitialized variables 268 269This pass scans the function for uses of @code{SSA_NAME}s that 270are fed by default definition. For non-parameter variables, such 271uses are uninitialized. The pass is run twice, before and after 272optimization (if turned on). In the first pass we only warn for uses that are 273positively uninitialized; in the second pass we warn for uses that 274are possibly uninitialized. The pass is located in @file{tree-ssa.c} 275and is defined by @code{pass_early_warn_uninitialized} and 276@code{pass_late_warn_uninitialized}. 277 278@item Dead code elimination 279 280This pass scans the function for statements without side effects whose 281result is unused. It does not do memory life analysis, so any value 282that is stored in memory is considered used. The pass is run multiple 283times throughout the optimization process. It is located in 284@file{tree-ssa-dce.c} and is described by @code{pass_dce}. 285 286@item Dominator optimizations 287 288This pass performs trivial dominator-based copy and constant propagation, 289expression simplification, and jump threading. It is run multiple times 290throughout the optimization process. It is located in @file{tree-ssa-dom.c} 291and is described by @code{pass_dominator}. 292 293@item Forward propagation of single-use variables 294 295This pass attempts to remove redundant computation by substituting 296variables that are used once into the expression that uses them and 297seeing if the result can be simplified. It is located in 298@file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. 299 300@item Copy Renaming 301 302This pass attempts to change the name of compiler temporaries involved in 303copy operations such that SSA->normal can coalesce the copy away. When compiler 304temporaries are copies of user variables, it also renames the compiler 305temporary to the user variable resulting in better use of user symbols. It is 306located in @file{tree-ssa-copyrename.c} and is described by 307@code{pass_copyrename}. 308 309@item PHI node optimizations 310 311This pass recognizes forms of PHI inputs that can be represented as 312conditional expressions and rewrites them into straight line code. 313It is located in @file{tree-ssa-phiopt.c} and is described by 314@code{pass_phiopt}. 315 316@item May-alias optimization 317 318This pass performs a flow sensitive SSA-based points-to analysis. 319The resulting may-alias, must-alias, and escape analysis information 320is used to promote variables from in-memory addressable objects to 321non-aliased variables that can be renamed into SSA form. We also 322update the @code{VDEF}/@code{VUSE} memory tags for non-renameable 323aggregates so that we get fewer false kills. The pass is located 324in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. 325 326Interprocedural points-to information is located in 327@file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. 328 329@item Profiling 330 331This pass instruments the function in order to collect runtime block 332and value profiling data. Such data may be fed back into the compiler 333on a subsequent run so as to allow optimization based on expected 334execution frequencies. The pass is located in @file{tree-profile.c} and 335is described by @code{pass_ipa_tree_profile}. 336 337@item Static profile estimation 338 339This pass implements series of heuristics to guess propababilities 340of branches. The resulting predictions are turned into edge profile 341by propagating branches across the control flow graphs. 342The pass is located in @file{tree-profile.c} and is described by 343@code{pass_profile}. 344 345@item Lower complex arithmetic 346 347This pass rewrites complex arithmetic operations into their component 348scalar arithmetic operations. The pass is located in @file{tree-complex.c} 349and is described by @code{pass_lower_complex}. 350 351@item Scalar replacement of aggregates 352 353This pass rewrites suitable non-aliased local aggregate variables into 354a set of scalar variables. The resulting scalar variables are 355rewritten into SSA form, which allows subsequent optimization passes 356to do a significantly better job with them. The pass is located in 357@file{tree-sra.c} and is described by @code{pass_sra}. 358 359@item Dead store elimination 360 361This pass eliminates stores to memory that are subsequently overwritten 362by another store, without any intervening loads. The pass is located 363in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. 364 365@item Tail recursion elimination 366 367This pass transforms tail recursion into a loop. It is located in 368@file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. 369 370@item Forward store motion 371 372This pass sinks stores and assignments down the flowgraph closer to their 373use point. The pass is located in @file{tree-ssa-sink.c} and is 374described by @code{pass_sink_code}. 375 376@item Partial redundancy elimination 377 378This pass eliminates partially redundant computations, as well as 379performing load motion. The pass is located in @file{tree-ssa-pre.c} 380and is described by @code{pass_pre}. 381 382Just before partial redundancy elimination, if 383@option{-funsafe-math-optimizations} is on, GCC tries to convert 384divisions to multiplications by the reciprocal. The pass is located 385in @file{tree-ssa-math-opts.c} and is described by 386@code{pass_cse_reciprocal}. 387 388@item Full redundancy elimination 389 390This is a simpler form of PRE that only eliminates redundancies that 391occur on all paths. It is located in @file{tree-ssa-pre.c} and 392described by @code{pass_fre}. 393 394@item Loop optimization 395 396The main driver of the pass is placed in @file{tree-ssa-loop.c} 397and described by @code{pass_loop}. 398 399The optimizations performed by this pass are: 400 401Loop invariant motion. This pass moves only invariants that 402would be hard to handle on RTL level (function calls, operations that expand to 403nontrivial sequences of insns). With @option{-funswitch-loops} it also moves 404operands of conditions that are invariant out of the loop, so that we can use 405just trivial invariantness analysis in loop unswitching. The pass also includes 406store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. 407 408Canonical induction variable creation. This pass creates a simple counter 409for number of iterations of the loop and replaces the exit condition of the 410loop using it, in case when a complicated analysis is necessary to determine 411the number of iterations. Later optimizations then may determine the number 412easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. 413 414Induction variable optimizations. This pass performs standard induction 415variable optimizations, including strength reduction, induction variable 416merging and induction variable elimination. The pass is implemented in 417@file{tree-ssa-loop-ivopts.c}. 418 419Loop unswitching. This pass moves the conditional jumps that are invariant 420out of the loops. To achieve this, a duplicate of the loop is created for 421each possible outcome of conditional jump(s). The pass is implemented in 422@file{tree-ssa-loop-unswitch.c}. 423 424Loop splitting. If a loop contains a conditional statement that is 425always true for one part of the iteration space and false for the other 426this pass splits the loop into two, one dealing with one side the other 427only with the other, thereby removing one inner-loop conditional. The 428pass is implemented in @file{tree-ssa-loop-split.c}. 429 430The optimizations also use various utility functions contained in 431@file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and 432@file{cfgloopmanip.c}. 433 434Vectorization. This pass transforms loops to operate on vector types 435instead of scalar types. Data parallelism across loop iterations is exploited 436to group data elements from consecutive iterations into a vector and operate 437on them in parallel. Depending on available target support the loop is 438conceptually unrolled by a factor @code{VF} (vectorization factor), which is 439the number of elements operated upon in parallel in each iteration, and the 440@code{VF} copies of each scalar operation are fused to form a vector operation. 441Additional loop transformations such as peeling and versioning may take place 442to align the number of iterations, and to align the memory accesses in the 443loop. 444The pass is implemented in @file{tree-vectorizer.c} (the main driver), 445@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts 446and general loop utilities), @file{tree-vect-slp} (loop-aware SLP 447functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. 448Analysis of data references is in @file{tree-data-ref.c}. 449 450SLP Vectorization. This pass performs vectorization of straight-line code. The 451pass is implemented in @file{tree-vectorizer.c} (the main driver), 452@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and 453@file{tree-vect-data-refs.c}. 454 455Autoparallelization. This pass splits the loop iteration space to run 456into several threads. The pass is implemented in @file{tree-parloops.c}. 457 458Graphite is a loop transformation framework based on the polyhedral 459model. Graphite stands for Gimple Represented as Polyhedra. The 460internals of this infrastructure are documented in 461@w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on 462this representation are implemented in the various @file{graphite-*} 463files. 464 465@item Tree level if-conversion for vectorizer 466 467This pass applies if-conversion to simple loops to help vectorizer. 468We identify if convertible loops, if-convert statements and merge 469basic blocks in one big block. The idea is to present loop in such 470form so that vectorizer can have one to one mapping between statements 471and available vector operations. This pass is located in 472@file{tree-if-conv.c} and is described by @code{pass_if_conversion}. 473 474@item Conditional constant propagation 475 476This pass relaxes a lattice of values in order to identify those 477that must be constant even in the presence of conditional branches. 478The pass is located in @file{tree-ssa-ccp.c} and is described 479by @code{pass_ccp}. 480 481A related pass that works on memory loads and stores, and not just 482register values, is located in @file{tree-ssa-ccp.c} and described by 483@code{pass_store_ccp}. 484 485@item Conditional copy propagation 486 487This is similar to constant propagation but the lattice of values is 488the ``copy-of'' relation. It eliminates redundant copies from the 489code. The pass is located in @file{tree-ssa-copy.c} and described by 490@code{pass_copy_prop}. 491 492A related pass that works on memory copies, and not just register 493copies, is located in @file{tree-ssa-copy.c} and described by 494@code{pass_store_copy_prop}. 495 496@item Value range propagation 497 498This transformation is similar to constant propagation but 499instead of propagating single constant values, it propagates 500known value ranges. The implementation is based on Patterson's 501range propagation algorithm (Accurate Static Branch Prediction by 502Value Range Propagation, J. R. C. Patterson, PLDI '95). In 503contrast to Patterson's algorithm, this implementation does not 504propagate branch probabilities nor it uses more than a single 505range per SSA name. This means that the current implementation 506cannot be used for branch prediction (though adapting it would 507not be difficult). The pass is located in @file{tree-vrp.c} and is 508described by @code{pass_vrp}. 509 510@item Folding built-in functions 511 512This pass simplifies built-in functions, as applicable, with constant 513arguments or with inferable string lengths. It is located in 514@file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. 515 516@item Split critical edges 517 518This pass identifies critical edges and inserts empty basic blocks 519such that the edge is no longer critical. The pass is located in 520@file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. 521 522@item Control dependence dead code elimination 523 524This pass is a stronger form of dead code elimination that can 525eliminate unnecessary control flow statements. It is located 526in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. 527 528@item Tail call elimination 529 530This pass identifies function calls that may be rewritten into 531jumps. No code transformation is actually applied here, but the 532data and control flow problem is solved. The code transformation 533requires target support, and so is delayed until RTL@. In the 534meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. 535The pass is located in @file{tree-tailcall.c} and is described by 536@code{pass_tail_calls}. The RTL transformation is handled by 537@code{fixup_tail_calls} in @file{calls.c}. 538 539@item Warn for function return without value 540 541For non-void functions, this pass locates return statements that do 542not specify a value and issues a warning. Such a statement may have 543been injected by falling off the end of the function. This pass is 544run last so that we have as much time as possible to prove that the 545statement is not reachable. It is located in @file{tree-cfg.c} and 546is described by @code{pass_warn_function_return}. 547 548@item Leave static single assignment form 549 550This pass rewrites the function such that it is in normal form. At 551the same time, we eliminate as many single-use temporaries as possible, 552so the intermediate language is no longer GIMPLE, but GENERIC@. The 553pass is located in @file{tree-outof-ssa.c} and is described by 554@code{pass_del_ssa}. 555 556@item Merge PHI nodes that feed into one another 557 558This is part of the CFG cleanup passes. It attempts to join PHI nodes 559from a forwarder CFG block into another block with PHI nodes. The 560pass is located in @file{tree-cfgcleanup.c} and is described by 561@code{pass_merge_phi}. 562 563@item Return value optimization 564 565If a function always returns the same local variable, and that local 566variable is an aggregate type, then the variable is replaced with the 567return value for the function (i.e., the function's DECL_RESULT). This 568is equivalent to the C++ named return value optimization applied to 569GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by 570@code{pass_nrv}. 571 572@item Return slot optimization 573 574If a function returns a memory object and is called as @code{var = 575foo()}, this pass tries to change the call so that the address of 576@code{var} is sent to the caller to avoid an extra memory copy. This 577pass is located in @code{tree-nrv.c} and is described by 578@code{pass_return_slot}. 579 580@item Optimize calls to @code{__builtin_object_size} 581 582This is a propagation pass similar to CCP that tries to remove calls 583to @code{__builtin_object_size} when the size of the object can be 584computed at compile-time. This pass is located in 585@file{tree-object-size.c} and is described by 586@code{pass_object_sizes}. 587 588@item Loop invariant motion 589 590This pass removes expensive loop-invariant computations out of loops. 591The pass is located in @file{tree-ssa-loop.c} and described by 592@code{pass_lim}. 593 594@item Loop nest optimizations 595 596This is a family of loop transformations that works on loop nests. It 597includes loop interchange, scaling, skewing and reversal and they are 598all geared to the optimization of data locality in array traversals 599and the removal of dependencies that hamper optimizations such as loop 600parallelization and vectorization. The pass is located in 601@file{tree-loop-linear.c} and described by 602@code{pass_linear_transform}. 603 604@item Removal of empty loops 605 606This pass removes loops with no code in them. The pass is located in 607@file{tree-ssa-loop-ivcanon.c} and described by 608@code{pass_empty_loop}. 609 610@item Unrolling of small loops 611 612This pass completely unrolls loops with few iterations. The pass 613is located in @file{tree-ssa-loop-ivcanon.c} and described by 614@code{pass_complete_unroll}. 615 616@item Predictive commoning 617 618This pass makes the code reuse the computations from the previous 619iterations of the loops, especially loads and stores to memory. 620It does so by storing the values of these computations to a bank 621of temporary variables that are rotated at the end of loop. To avoid 622the need for this rotation, the loop is then unrolled and the copies 623of the loop body are rewritten to use the appropriate version of 624the temporary variable. This pass is located in @file{tree-predcom.c} 625and described by @code{pass_predcom}. 626 627@item Array prefetching 628 629This pass issues prefetch instructions for array references inside 630loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and 631described by @code{pass_loop_prefetch}. 632 633@item Reassociation 634 635This pass rewrites arithmetic expressions to enable optimizations that 636operate on them, like redundancy elimination and vectorization. The 637pass is located in @file{tree-ssa-reassoc.c} and described by 638@code{pass_reassoc}. 639 640@item Optimization of @code{stdarg} functions 641 642This pass tries to avoid the saving of register arguments into the 643stack on entry to @code{stdarg} functions. If the function doesn't 644use any @code{va_start} macros, no registers need to be saved. If 645@code{va_start} macros are used, the @code{va_list} variables don't 646escape the function, it is only necessary to save registers that will 647be used in @code{va_arg} macros. For instance, if @code{va_arg} is 648only used with integral types in the function, floating point 649registers don't need to be saved. This pass is located in 650@code{tree-stdarg.c} and described by @code{pass_stdarg}. 651 652@end itemize 653 654@node RTL passes 655@section RTL passes 656 657The following briefly describes the RTL generation and optimization 658passes that are run after the Tree optimization passes. 659 660@itemize @bullet 661@item RTL generation 662 663@c Avoiding overfull is tricky here. 664The source files for RTL generation include 665@file{stmt.c}, 666@file{calls.c}, 667@file{expr.c}, 668@file{explow.c}, 669@file{expmed.c}, 670@file{function.c}, 671@file{optabs.c} 672and @file{emit-rtl.c}. 673Also, the file 674@file{insn-emit.c}, generated from the machine description by the 675program @code{genemit}, is used in this pass. The header file 676@file{expr.h} is used for communication within this pass. 677 678@findex genflags 679@findex gencodes 680The header files @file{insn-flags.h} and @file{insn-codes.h}, 681generated from the machine description by the programs @code{genflags} 682and @code{gencodes}, tell this pass which standard names are available 683for use and which patterns correspond to them. 684 685@item Generation of exception landing pads 686 687This pass generates the glue that handles communication between the 688exception handling library routines and the exception handlers within 689the function. Entry points in the function that are invoked by the 690exception handling library are called @dfn{landing pads}. The code 691for this pass is located in @file{except.c}. 692 693@item Control flow graph cleanup 694 695This pass removes unreachable code, simplifies jumps to next, jumps to 696jump, jumps across jumps, etc. The pass is run multiple times. 697For historical reasons, it is occasionally referred to as the ``jump 698optimization pass''. The bulk of the code for this pass is in 699@file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} 700and @file{jump.c}. 701 702@item Forward propagation of single-def values 703 704This pass attempts to remove redundant computation by substituting 705variables that come from a single definition, and 706seeing if the result can be simplified. It performs copy propagation 707and addressing mode selection. The pass is run twice, with values 708being propagated into loops only on the second run. The code is 709located in @file{fwprop.c}. 710 711@item Common subexpression elimination 712 713This pass removes redundant computation within basic blocks, and 714optimizes addressing modes based on cost. The pass is run twice. 715The code for this pass is located in @file{cse.c}. 716 717@item Global common subexpression elimination 718 719This pass performs two 720different types of GCSE depending on whether you are optimizing for 721size or not (LCM based GCSE tends to increase code size for a gain in 722speed, while Morel-Renvoise based GCSE does not). 723When optimizing for size, GCSE is done using Morel-Renvoise Partial 724Redundancy Elimination, with the exception that it does not try to move 725invariants out of loops---that is left to the loop optimization pass. 726If MR PRE GCSE is done, code hoisting (aka unification) is also done, as 727well as load motion. 728If you are optimizing for speed, LCM (lazy code motion) based GCSE is 729done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM 730based GCSE also does loop invariant code motion. We also perform load 731and store motion when optimizing for speed. 732Regardless of which type of GCSE is used, the GCSE pass also performs 733global constant and copy propagation. 734The source file for this pass is @file{gcse.c}, and the LCM routines 735are in @file{lcm.c}. 736 737@item Loop optimization 738 739This pass performs several loop related optimizations. 740The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain 741generic loop analysis and manipulation code. Initialization and finalization 742of loop structures is handled by @file{loop-init.c}. 743A loop invariant motion pass is implemented in @file{loop-invariant.c}. 744Basic block level optimizations---unrolling, and peeling loops--- 745are implemented in @file{loop-unroll.c}. 746Replacing of the exit condition of loops by special machine-dependent 747instructions is handled by @file{loop-doloop.c}. 748 749@item Jump bypassing 750 751This pass is an aggressive form of GCSE that transforms the control 752flow graph of a function by propagating constants into conditional 753branch instructions. The source file for this pass is @file{gcse.c}. 754 755@item If conversion 756 757This pass attempts to replace conditional branches and surrounding 758assignments with arithmetic, boolean value producing comparison 759instructions, and conditional move instructions. In the very last 760invocation after reload/LRA, it will generate predicated instructions 761when supported by the target. The code is located in @file{ifcvt.c}. 762 763@item Web construction 764 765This pass splits independent uses of each pseudo-register. This can 766improve effect of the other transformation, such as CSE or register 767allocation. The code for this pass is located in @file{web.c}. 768 769@item Instruction combination 770 771This pass attempts to combine groups of two or three instructions that 772are related by data flow into single instructions. It combines the 773RTL expressions for the instructions by substitution, simplifies the 774result using algebra, and then attempts to match the result against 775the machine description. The code is located in @file{combine.c}. 776 777@item Mode switching optimization 778 779This pass looks for instructions that require the processor to be in a 780specific ``mode'' and minimizes the number of mode changes required to 781satisfy all users. What these modes are, and what they apply to are 782completely target-specific. The code for this pass is located in 783@file{mode-switching.c}. 784 785@cindex modulo scheduling 786@cindex sms, swing, software pipelining 787@item Modulo scheduling 788 789This pass looks at innermost loops and reorders their instructions 790by overlapping different iterations. Modulo scheduling is performed 791immediately before instruction scheduling. The code for this pass is 792located in @file{modulo-sched.c}. 793 794@item Instruction scheduling 795 796This pass looks for instructions whose output will not be available by 797the time that it is used in subsequent instructions. Memory loads and 798floating point instructions often have this behavior on RISC machines. 799It re-orders instructions within a basic block to try to separate the 800definition and use of items that otherwise would cause pipeline 801stalls. This pass is performed twice, before and after register 802allocation. The code for this pass is located in @file{haifa-sched.c}, 803@file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and 804@file{sched-vis.c}. 805 806@item Register allocation 807 808These passes make sure that all occurrences of pseudo registers are 809eliminated, either by allocating them to a hard register, replacing 810them by an equivalent expression (e.g.@: a constant) or by placing 811them on the stack. This is done in several subpasses: 812 813@itemize @bullet 814@item 815The integrated register allocator (@acronym{IRA}). It is called 816integrated because coalescing, register live range splitting, and hard 817register preferencing are done on-the-fly during coloring. It also 818has better integration with the reload/LRA pass. Pseudo-registers spilled 819by the allocator or the reload/LRA have still a chance to get 820hard-registers if the reload/LRA evicts some pseudo-registers from 821hard-registers. The allocator helps to choose better pseudos for 822spilling based on their live ranges and to coalesce stack slots 823allocated for the spilled pseudo-registers. IRA is a regional 824register allocator which is transformed into Chaitin-Briggs allocator 825if there is one region. By default, IRA chooses regions using 826register pressure but the user can force it to use one region or 827regions corresponding to all loops. 828 829Source files of the allocator are @file{ira.c}, @file{ira-build.c}, 830@file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c}, 831@file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h} 832and @file{ira-int.h} used for the communication between the allocator 833and the rest of the compiler and between the IRA files. 834 835@cindex reloading 836@item 837Reloading. This pass renumbers pseudo registers with the hardware 838registers numbers they were allocated. Pseudo registers that did not 839get hard registers are replaced with stack slots. Then it finds 840instructions that are invalid because a value has failed to end up in 841a register, or has ended up in a register of the wrong kind. It fixes 842up these instructions by reloading the problematical values 843temporarily into registers. Additional instructions are generated to 844do the copying. 845 846The reload pass also optionally eliminates the frame pointer and inserts 847instructions to save and restore call-clobbered registers around calls. 848 849Source files are @file{reload.c} and @file{reload1.c}, plus the header 850@file{reload.h} used for communication between them. 851 852@cindex Local Register Allocator (LRA) 853@item 854This pass is a modern replacement of the reload pass. Source files 855are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c}, 856@file{lra-constraints.c}, @file{lra-eliminations.c}, 857@file{lra-lives.c}, @file{lra-remat.c}, @file{lra-spills.c}, the 858header @file{lra-int.h} used for communication between them, and the 859header @file{lra.h} used for communication between LRA and the rest of 860compiler. 861 862Unlike the reload pass, intermediate LRA decisions are reflected in 863RTL as much as possible. This reduces the number of target-dependent 864macros and hooks, leaving instruction constraints as the primary 865source of control. 866 867LRA is run on targets for which TARGET_LRA_P returns true. 868@end itemize 869 870@item Basic block reordering 871 872This pass implements profile guided code positioning. If profile 873information is not available, various types of static analysis are 874performed to make the predictions normally coming from the profile 875feedback (IE execution frequency, branch probability, etc). It is 876implemented in the file @file{bb-reorder.c}, and the various 877prediction routines are in @file{predict.c}. 878 879@item Variable tracking 880 881This pass computes where the variables are stored at each 882position in code and generates notes describing the variable locations 883to RTL code. The location lists are then generated according to these 884notes to debug information if the debugging information format supports 885location lists. The code is located in @file{var-tracking.c}. 886 887@item Delayed branch scheduling 888 889This optional pass attempts to find instructions that can go into the 890delay slots of other instructions, usually jumps and calls. The code 891for this pass is located in @file{reorg.c}. 892 893@item Branch shortening 894 895On many RISC machines, branch instructions have a limited range. 896Thus, longer sequences of instructions must be used for long branches. 897In this pass, the compiler figures out what how far each instruction 898will be from each other instruction, and therefore whether the usual 899instructions, or the longer sequences, must be used for each branch. 900The code for this pass is located in @file{final.c}. 901 902@item Register-to-stack conversion 903 904Conversion from usage of some hard registers to usage of a register 905stack may be done at this point. Currently, this is supported only 906for the floating-point registers of the Intel 80387 coprocessor. The 907code for this pass is located in @file{reg-stack.c}. 908 909@item Final 910 911This pass outputs the assembler code for the function. The source files 912are @file{final.c} plus @file{insn-output.c}; the latter is generated 913automatically from the machine description by the tool @file{genoutput}. 914The header file @file{conditions.h} is used for communication between 915these files. 916 917@item Debugging information output 918 919This is run after final because it must output the stack slot offsets 920for pseudo registers that did not get hard registers. Source files 921are @file{dbxout.c} for DBX symbol table format, @file{dwarfout.c} for 922DWARF symbol table format, files @file{dwarf2out.c} and @file{dwarf2asm.c} 923for DWARF2 symbol table format, and @file{vmsdbgout.c} for VMS debug 924symbol table format. 925 926@end itemize 927 928@node Optimization info 929@section Optimization info 930@include optinfo.texi 931