1@c markers: CROSSREF BUG TODO 2 3@c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 4@c 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software 5@c Foundation, Inc. 6@c This is part of the GCC manual. 7@c For copying conditions, see the file gcc.texi. 8 9@node Passes 10@chapter Passes and Files of the Compiler 11@cindex passes and files of the compiler 12@cindex files and passes of the compiler 13@cindex compiler passes and files 14 15This chapter is dedicated to giving an overview of the optimization and 16code generation passes of the compiler. In the process, it describes 17some of the language front end interface, though this description is no 18where near complete. 19 20@menu 21* Parsing pass:: The language front end turns text into bits. 22* Gimplification pass:: The bits are turned into something we can optimize. 23* Pass manager:: Sequencing the optimization passes. 24* Tree SSA passes:: Optimizations on a high-level representation. 25* RTL passes:: Optimizations on a low-level representation. 26@end menu 27 28@node Parsing pass 29@section Parsing pass 30@cindex GENERIC 31@findex lang_hooks.parse_file 32The language front end is invoked only once, via 33@code{lang_hooks.parse_file}, to parse the entire input. The language 34front end may use any intermediate language representation deemed 35appropriate. The C front end uses GENERIC trees (CROSSREF), plus 36a double handful of language specific tree codes defined in 37@file{c-common.def}. The Fortran front end uses a completely different 38private representation. 39 40@cindex GIMPLE 41@cindex gimplification 42@cindex gimplifier 43@cindex language-independent intermediate representation 44@cindex intermediate representation lowering 45@cindex lowering, language-dependent intermediate representation 46At some point the front end must translate the representation used in the 47front end to a representation understood by the language-independent 48portions of the compiler. Current practice takes one of two forms. 49The C front end manually invokes the gimplifier (CROSSREF) on each function, 50and uses the gimplifier callbacks to convert the language-specific tree 51nodes directly to GIMPLE (CROSSREF) before passing the function off to 52be compiled. 53The Fortran front end converts from a private representation to GENERIC, 54which is later lowered to GIMPLE when the function is compiled. Which 55route to choose probably depends on how well GENERIC (plus extensions) 56can be made to match up with the source language and necessary parsing 57data structures. 58 59BUG: Gimplification must occur before nested function lowering, 60and nested function lowering must be done by the front end before 61passing the data off to cgraph. 62 63TODO: Cgraph should control nested function lowering. It would 64only be invoked when it is certain that the outer-most function 65is used. 66 67TODO: Cgraph needs a gimplify_function callback. It should be 68invoked when (1) it is certain that the function is used, (2) 69warning flags specified by the user require some amount of 70compilation in order to honor, (3) the language indicates that 71semantic analysis is not complete until gimplification occurs. 72Hum@dots{} this sounds overly complicated. Perhaps we should just 73have the front end gimplify always; in most cases it's only one 74function call. 75 76The front end needs to pass all function definitions and top level 77declarations off to the middle-end so that they can be compiled and 78emitted to the object file. For a simple procedural language, it is 79usually most convenient to do this as each top level declaration or 80definition is seen. There is also a distinction to be made between 81generating functional code and generating complete debug information. 82The only thing that is absolutely required for functional code is that 83function and data @emph{definitions} be passed to the middle-end. For 84complete debug information, function, data and type declarations 85should all be passed as well. 86 87@findex rest_of_decl_compilation 88@findex rest_of_type_compilation 89@findex cgraph_finalize_function 90In any case, the front end needs each complete top-level function or 91data declaration, and each data definition should be passed to 92@code{rest_of_decl_compilation}. Each complete type definition should 93be passed to @code{rest_of_type_compilation}. Each function definition 94should be passed to @code{cgraph_finalize_function}. 95 96TODO: I know rest_of_compilation currently has all sorts of 97RTL generation semantics. I plan to move all code generation 98bits (both Tree and RTL) to compile_function. Should we hide 99cgraph from the front ends and move back to rest_of_compilation 100as the official interface? Possibly we should rename all three 101interfaces such that the names match in some meaningful way and 102that is more descriptive than "rest_of". 103 104The middle-end will, at its option, emit the function and data 105definitions immediately or queue them for later processing. 106 107@node Gimplification pass 108@section Gimplification pass 109 110@cindex gimplification 111@cindex GIMPLE 112@dfn{Gimplification} is a whimsical term for the process of converting 113the intermediate representation of a function into the GIMPLE language 114(CROSSREF). The term stuck, and so words like ``gimplification'', 115``gimplify'', ``gimplifier'' and the like are sprinkled throughout this 116section of code. 117 118@cindex GENERIC 119While a front end may certainly choose to generate GIMPLE directly if 120it chooses, this can be a moderately complex process unless the 121intermediate language used by the front end is already fairly simple. 122Usually it is easier to generate GENERIC trees plus extensions 123and let the language-independent gimplifier do most of the work. 124 125@findex gimplify_function_tree 126@findex gimplify_expr 127@findex lang_hooks.gimplify_expr 128The main entry point to this pass is @code{gimplify_function_tree} 129located in @file{gimplify.c}. From here we process the entire 130function gimplifying each statement in turn. The main workhorse 131for this pass is @code{gimplify_expr}. Approximately everything 132passes through here at least once, and it is from here that we 133invoke the @code{lang_hooks.gimplify_expr} callback. 134 135The callback should examine the expression in question and return 136@code{GS_UNHANDLED} if the expression is not a language specific 137construct that requires attention. Otherwise it should alter the 138expression in some way to such that forward progress is made toward 139producing valid GIMPLE@. If the callback is certain that the 140transformation is complete and the expression is valid GIMPLE, it 141should return @code{GS_ALL_DONE}. Otherwise it should return 142@code{GS_OK}, which will cause the expression to be processed again. 143If the callback encounters an error during the transformation (because 144the front end is relying on the gimplification process to finish 145semantic checks), it should return @code{GS_ERROR}. 146 147@node Pass manager 148@section Pass manager 149 150The pass manager is located in @file{passes.c}, @file{tree-optimize.c} 151and @file{tree-pass.h}. 152Its job is to run all of the individual passes in the correct order, 153and take care of standard bookkeeping that applies to every pass. 154 155The theory of operation is that each pass defines a structure that 156represents everything we need to know about that pass---when it 157should be run, how it should be run, what intermediate language 158form or on-the-side data structures it needs. We register the pass 159to be run in some particular order, and the pass manager arranges 160for everything to happen in the correct order. 161 162The actuality doesn't completely live up to the theory at present. 163Command-line switches and @code{timevar_id_t} enumerations must still 164be defined elsewhere. The pass manager validates constraints but does 165not attempt to (re-)generate data structures or lower intermediate 166language form based on the requirements of the next pass. Nevertheless, 167what is present is useful, and a far sight better than nothing at all. 168 169Each pass should have a unique name. 170Each pass may have its own dump file (for GCC debugging purposes). 171Passes with a name starting with a star do not dump anything. 172Sometimes passes are supposed to share a dump file / option name. 173To still give these unique names, you can use a prefix that is delimited 174by a space from the part that is used for the dump file / option name. 175E.g. When the pass name is "ud dce", the name used for dump file/options 176is "dce". 177 178TODO: describe the global variables set up by the pass manager, 179and a brief description of how a new pass should use it. 180I need to look at what info RTL passes use first@enddots{} 181 182@node Tree SSA passes 183@section Tree SSA passes 184 185The following briefly describes the Tree optimization passes that are 186run after gimplification and what source files they are located in. 187 188@itemize @bullet 189@item Remove useless statements 190 191This pass is an extremely simple sweep across the gimple code in which 192we identify obviously dead code and remove it. Here we do things like 193simplify @code{if} statements with constant conditions, remove 194exception handling constructs surrounding code that obviously cannot 195throw, remove lexical bindings that contain no variables, and other 196assorted simplistic cleanups. The idea is to get rid of the obvious 197stuff quickly rather than wait until later when it's more work to get 198rid of it. This pass is located in @file{tree-cfg.c} and described by 199@code{pass_remove_useless_stmts}. 200 201@item Mudflap declaration registration 202 203If mudflap (@pxref{Optimize Options,,-fmudflap -fmudflapth 204-fmudflapir,gcc,Using the GNU Compiler Collection (GCC)}) is 205enabled, we generate code to register some variable declarations with 206the mudflap runtime. Specifically, the runtime tracks the lifetimes of 207those variable declarations that have their addresses taken, or whose 208bounds are unknown at compile time (@code{extern}). This pass generates 209new exception handling constructs (@code{try}/@code{finally}), and so 210must run before those are lowered. In addition, the pass enqueues 211declarations of static variables whose lifetimes extend to the entire 212program. The pass is located in @file{tree-mudflap.c} and is described 213by @code{pass_mudflap_1}. 214 215@item OpenMP lowering 216 217If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers 218OpenMP constructs into GIMPLE. 219 220Lowering of OpenMP constructs involves creating replacement 221expressions for local variables that have been mapped using data 222sharing clauses, exposing the control flow of most synchronization 223directives and adding region markers to facilitate the creation of the 224control flow graph. The pass is located in @file{omp-low.c} and is 225described by @code{pass_lower_omp}. 226 227@item OpenMP expansion 228 229If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands 230parallel regions into their own functions to be invoked by the thread 231library. The pass is located in @file{omp-low.c} and is described by 232@code{pass_expand_omp}. 233 234@item Lower control flow 235 236This pass flattens @code{if} statements (@code{COND_EXPR}) 237and moves lexical bindings (@code{BIND_EXPR}) out of line. After 238this pass, all @code{if} statements will have exactly two @code{goto} 239statements in its @code{then} and @code{else} arms. Lexical binding 240information for each statement will be found in @code{TREE_BLOCK} rather 241than being inferred from its position under a @code{BIND_EXPR}. This 242pass is found in @file{gimple-low.c} and is described by 243@code{pass_lower_cf}. 244 245@item Lower exception handling control flow 246 247This pass decomposes high-level exception handling constructs 248(@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form 249that explicitly represents the control flow involved. After this 250pass, @code{lookup_stmt_eh_region} will return a non-negative 251number for any statement that may have EH control flow semantics; 252examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} 253for exact semantics. Exact control flow may be extracted from 254@code{foreach_reachable_handler}. The EH region nesting tree is defined 255in @file{except.h} and built in @file{except.c}. The lowering pass 256itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. 257 258@item Build the control flow graph 259 260This pass decomposes a function into basic blocks and creates all of 261the edges that connect them. It is located in @file{tree-cfg.c} and 262is described by @code{pass_build_cfg}. 263 264@item Find all referenced variables 265 266This pass walks the entire function and collects an array of all 267variables referenced in the function, @code{referenced_vars}. The 268index at which a variable is found in the array is used as a UID 269for the variable within this function. This data is needed by the 270SSA rewriting routines. The pass is located in @file{tree-dfa.c} 271and is described by @code{pass_referenced_vars}. 272 273@item Enter static single assignment form 274 275This pass rewrites the function such that it is in SSA form. After 276this pass, all @code{is_gimple_reg} variables will be referenced by 277@code{SSA_NAME}, and all occurrences of other variables will be 278annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have 279been inserted as necessary for each basic block. This pass is 280located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. 281 282@item Warn for uninitialized variables 283 284This pass scans the function for uses of @code{SSA_NAME}s that 285are fed by default definition. For non-parameter variables, such 286uses are uninitialized. The pass is run twice, before and after 287optimization (if turned on). In the first pass we only warn for uses that are 288positively uninitialized; in the second pass we warn for uses that 289are possibly uninitialized. The pass is located in @file{tree-ssa.c} 290and is defined by @code{pass_early_warn_uninitialized} and 291@code{pass_late_warn_uninitialized}. 292 293@item Dead code elimination 294 295This pass scans the function for statements without side effects whose 296result is unused. It does not do memory life analysis, so any value 297that is stored in memory is considered used. The pass is run multiple 298times throughout the optimization process. It is located in 299@file{tree-ssa-dce.c} and is described by @code{pass_dce}. 300 301@item Dominator optimizations 302 303This pass performs trivial dominator-based copy and constant propagation, 304expression simplification, and jump threading. It is run multiple times 305throughout the optimization process. It is located in @file{tree-ssa-dom.c} 306and is described by @code{pass_dominator}. 307 308@item Forward propagation of single-use variables 309 310This pass attempts to remove redundant computation by substituting 311variables that are used once into the expression that uses them and 312seeing if the result can be simplified. It is located in 313@file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. 314 315@item Copy Renaming 316 317This pass attempts to change the name of compiler temporaries involved in 318copy operations such that SSA->normal can coalesce the copy away. When compiler 319temporaries are copies of user variables, it also renames the compiler 320temporary to the user variable resulting in better use of user symbols. It is 321located in @file{tree-ssa-copyrename.c} and is described by 322@code{pass_copyrename}. 323 324@item PHI node optimizations 325 326This pass recognizes forms of PHI inputs that can be represented as 327conditional expressions and rewrites them into straight line code. 328It is located in @file{tree-ssa-phiopt.c} and is described by 329@code{pass_phiopt}. 330 331@item May-alias optimization 332 333This pass performs a flow sensitive SSA-based points-to analysis. 334The resulting may-alias, must-alias, and escape analysis information 335is used to promote variables from in-memory addressable objects to 336non-aliased variables that can be renamed into SSA form. We also 337update the @code{VDEF}/@code{VUSE} memory tags for non-renameable 338aggregates so that we get fewer false kills. The pass is located 339in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. 340 341Interprocedural points-to information is located in 342@file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. 343 344@item Profiling 345 346This pass rewrites the function in order to collect runtime block 347and value profiling data. Such data may be fed back into the compiler 348on a subsequent run so as to allow optimization based on expected 349execution frequencies. The pass is located in @file{predict.c} and 350is described by @code{pass_profile}. 351 352@item Lower complex arithmetic 353 354This pass rewrites complex arithmetic operations into their component 355scalar arithmetic operations. The pass is located in @file{tree-complex.c} 356and is described by @code{pass_lower_complex}. 357 358@item Scalar replacement of aggregates 359 360This pass rewrites suitable non-aliased local aggregate variables into 361a set of scalar variables. The resulting scalar variables are 362rewritten into SSA form, which allows subsequent optimization passes 363to do a significantly better job with them. The pass is located in 364@file{tree-sra.c} and is described by @code{pass_sra}. 365 366@item Dead store elimination 367 368This pass eliminates stores to memory that are subsequently overwritten 369by another store, without any intervening loads. The pass is located 370in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. 371 372@item Tail recursion elimination 373 374This pass transforms tail recursion into a loop. It is located in 375@file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. 376 377@item Forward store motion 378 379This pass sinks stores and assignments down the flowgraph closer to their 380use point. The pass is located in @file{tree-ssa-sink.c} and is 381described by @code{pass_sink_code}. 382 383@item Partial redundancy elimination 384 385This pass eliminates partially redundant computations, as well as 386performing load motion. The pass is located in @file{tree-ssa-pre.c} 387and is described by @code{pass_pre}. 388 389Just before partial redundancy elimination, if 390@option{-funsafe-math-optimizations} is on, GCC tries to convert 391divisions to multiplications by the reciprocal. The pass is located 392in @file{tree-ssa-math-opts.c} and is described by 393@code{pass_cse_reciprocal}. 394 395@item Full redundancy elimination 396 397This is a simpler form of PRE that only eliminates redundancies that 398occur an all paths. It is located in @file{tree-ssa-pre.c} and 399described by @code{pass_fre}. 400 401@item Loop optimization 402 403The main driver of the pass is placed in @file{tree-ssa-loop.c} 404and described by @code{pass_loop}. 405 406The optimizations performed by this pass are: 407 408Loop invariant motion. This pass moves only invariants that 409would be hard to handle on RTL level (function calls, operations that expand to 410nontrivial sequences of insns). With @option{-funswitch-loops} it also moves 411operands of conditions that are invariant out of the loop, so that we can use 412just trivial invariantness analysis in loop unswitching. The pass also includes 413store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. 414 415Canonical induction variable creation. This pass creates a simple counter 416for number of iterations of the loop and replaces the exit condition of the 417loop using it, in case when a complicated analysis is necessary to determine 418the number of iterations. Later optimizations then may determine the number 419easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. 420 421Induction variable optimizations. This pass performs standard induction 422variable optimizations, including strength reduction, induction variable 423merging and induction variable elimination. The pass is implemented in 424@file{tree-ssa-loop-ivopts.c}. 425 426Loop unswitching. This pass moves the conditional jumps that are invariant 427out of the loops. To achieve this, a duplicate of the loop is created for 428each possible outcome of conditional jump(s). The pass is implemented in 429@file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the 430RTL level loop unswitching in @file{loop-unswitch.c}, but currently 431the RTL level pass is not completely redundant yet due to deficiencies 432in tree level alias analysis. 433 434The optimizations also use various utility functions contained in 435@file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and 436@file{cfgloopmanip.c}. 437 438Vectorization. This pass transforms loops to operate on vector types 439instead of scalar types. Data parallelism across loop iterations is exploited 440to group data elements from consecutive iterations into a vector and operate 441on them in parallel. Depending on available target support the loop is 442conceptually unrolled by a factor @code{VF} (vectorization factor), which is 443the number of elements operated upon in parallel in each iteration, and the 444@code{VF} copies of each scalar operation are fused to form a vector operation. 445Additional loop transformations such as peeling and versioning may take place 446to align the number of iterations, and to align the memory accesses in the 447loop. 448The pass is implemented in @file{tree-vectorizer.c} (the main driver), 449@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts 450and general loop utilities), @file{tree-vect-slp} (loop-aware SLP 451functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. 452Analysis of data references is in @file{tree-data-ref.c}. 453 454SLP Vectorization. This pass performs vectorization of straight-line code. The 455pass is implemented in @file{tree-vectorizer.c} (the main driver), 456@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and 457@file{tree-vect-data-refs.c}. 458 459Autoparallelization. This pass splits the loop iteration space to run 460into several threads. The pass is implemented in @file{tree-parloops.c}. 461 462Graphite is a loop transformation framework based on the polyhedral 463model. Graphite stands for Gimple Represented as Polyhedra. The 464internals of this infrastructure are documented in 465@w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on 466this representation are implemented in the various @file{graphite-*} 467files. 468 469@item Tree level if-conversion for vectorizer 470 471This pass applies if-conversion to simple loops to help vectorizer. 472We identify if convertible loops, if-convert statements and merge 473basic blocks in one big block. The idea is to present loop in such 474form so that vectorizer can have one to one mapping between statements 475and available vector operations. This patch re-introduces COND_EXPR 476at GIMPLE level. This pass is located in @file{tree-if-conv.c} and is 477described by @code{pass_if_conversion}. 478 479@item Conditional constant propagation 480 481This pass relaxes a lattice of values in order to identify those 482that must be constant even in the presence of conditional branches. 483The pass is located in @file{tree-ssa-ccp.c} and is described 484by @code{pass_ccp}. 485 486A related pass that works on memory loads and stores, and not just 487register values, is located in @file{tree-ssa-ccp.c} and described by 488@code{pass_store_ccp}. 489 490@item Conditional copy propagation 491 492This is similar to constant propagation but the lattice of values is 493the ``copy-of'' relation. It eliminates redundant copies from the 494code. The pass is located in @file{tree-ssa-copy.c} and described by 495@code{pass_copy_prop}. 496 497A related pass that works on memory copies, and not just register 498copies, is located in @file{tree-ssa-copy.c} and described by 499@code{pass_store_copy_prop}. 500 501@item Value range propagation 502 503This transformation is similar to constant propagation but 504instead of propagating single constant values, it propagates 505known value ranges. The implementation is based on Patterson's 506range propagation algorithm (Accurate Static Branch Prediction by 507Value Range Propagation, J. R. C. Patterson, PLDI '95). In 508contrast to Patterson's algorithm, this implementation does not 509propagate branch probabilities nor it uses more than a single 510range per SSA name. This means that the current implementation 511cannot be used for branch prediction (though adapting it would 512not be difficult). The pass is located in @file{tree-vrp.c} and is 513described by @code{pass_vrp}. 514 515@item Folding built-in functions 516 517This pass simplifies built-in functions, as applicable, with constant 518arguments or with inferable string lengths. It is located in 519@file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. 520 521@item Split critical edges 522 523This pass identifies critical edges and inserts empty basic blocks 524such that the edge is no longer critical. The pass is located in 525@file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. 526 527@item Control dependence dead code elimination 528 529This pass is a stronger form of dead code elimination that can 530eliminate unnecessary control flow statements. It is located 531in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. 532 533@item Tail call elimination 534 535This pass identifies function calls that may be rewritten into 536jumps. No code transformation is actually applied here, but the 537data and control flow problem is solved. The code transformation 538requires target support, and so is delayed until RTL@. In the 539meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. 540The pass is located in @file{tree-tailcall.c} and is described by 541@code{pass_tail_calls}. The RTL transformation is handled by 542@code{fixup_tail_calls} in @file{calls.c}. 543 544@item Warn for function return without value 545 546For non-void functions, this pass locates return statements that do 547not specify a value and issues a warning. Such a statement may have 548been injected by falling off the end of the function. This pass is 549run last so that we have as much time as possible to prove that the 550statement is not reachable. It is located in @file{tree-cfg.c} and 551is described by @code{pass_warn_function_return}. 552 553@item Mudflap statement annotation 554 555If mudflap is enabled, we rewrite some memory accesses with code to 556validate that the memory access is correct. In particular, expressions 557involving pointer dereferences (@code{INDIRECT_REF}, @code{ARRAY_REF}, 558etc.) are replaced by code that checks the selected address range 559against the mudflap runtime's database of valid regions. This check 560includes an inline lookup into a direct-mapped cache, based on 561shift/mask operations of the pointer value, with a fallback function 562call into the runtime. The pass is located in @file{tree-mudflap.c} and 563is described by @code{pass_mudflap_2}. 564 565@item Leave static single assignment form 566 567This pass rewrites the function such that it is in normal form. At 568the same time, we eliminate as many single-use temporaries as possible, 569so the intermediate language is no longer GIMPLE, but GENERIC@. The 570pass is located in @file{tree-outof-ssa.c} and is described by 571@code{pass_del_ssa}. 572 573@item Merge PHI nodes that feed into one another 574 575This is part of the CFG cleanup passes. It attempts to join PHI nodes 576from a forwarder CFG block into another block with PHI nodes. The 577pass is located in @file{tree-cfgcleanup.c} and is described by 578@code{pass_merge_phi}. 579 580@item Return value optimization 581 582If a function always returns the same local variable, and that local 583variable is an aggregate type, then the variable is replaced with the 584return value for the function (i.e., the function's DECL_RESULT). This 585is equivalent to the C++ named return value optimization applied to 586GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by 587@code{pass_nrv}. 588 589@item Return slot optimization 590 591If a function returns a memory object and is called as @code{var = 592foo()}, this pass tries to change the call so that the address of 593@code{var} is sent to the caller to avoid an extra memory copy. This 594pass is located in @code{tree-nrv.c} and is described by 595@code{pass_return_slot}. 596 597@item Optimize calls to @code{__builtin_object_size} 598 599This is a propagation pass similar to CCP that tries to remove calls 600to @code{__builtin_object_size} when the size of the object can be 601computed at compile-time. This pass is located in 602@file{tree-object-size.c} and is described by 603@code{pass_object_sizes}. 604 605@item Loop invariant motion 606 607This pass removes expensive loop-invariant computations out of loops. 608The pass is located in @file{tree-ssa-loop.c} and described by 609@code{pass_lim}. 610 611@item Loop nest optimizations 612 613This is a family of loop transformations that works on loop nests. It 614includes loop interchange, scaling, skewing and reversal and they are 615all geared to the optimization of data locality in array traversals 616and the removal of dependencies that hamper optimizations such as loop 617parallelization and vectorization. The pass is located in 618@file{tree-loop-linear.c} and described by 619@code{pass_linear_transform}. 620 621@item Removal of empty loops 622 623This pass removes loops with no code in them. The pass is located in 624@file{tree-ssa-loop-ivcanon.c} and described by 625@code{pass_empty_loop}. 626 627@item Unrolling of small loops 628 629This pass completely unrolls loops with few iterations. The pass 630is located in @file{tree-ssa-loop-ivcanon.c} and described by 631@code{pass_complete_unroll}. 632 633@item Predictive commoning 634 635This pass makes the code reuse the computations from the previous 636iterations of the loops, especially loads and stores to memory. 637It does so by storing the values of these computations to a bank 638of temporary variables that are rotated at the end of loop. To avoid 639the need for this rotation, the loop is then unrolled and the copies 640of the loop body are rewritten to use the appropriate version of 641the temporary variable. This pass is located in @file{tree-predcom.c} 642and described by @code{pass_predcom}. 643 644@item Array prefetching 645 646This pass issues prefetch instructions for array references inside 647loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and 648described by @code{pass_loop_prefetch}. 649 650@item Reassociation 651 652This pass rewrites arithmetic expressions to enable optimizations that 653operate on them, like redundancy elimination and vectorization. The 654pass is located in @file{tree-ssa-reassoc.c} and described by 655@code{pass_reassoc}. 656 657@item Optimization of @code{stdarg} functions 658 659This pass tries to avoid the saving of register arguments into the 660stack on entry to @code{stdarg} functions. If the function doesn't 661use any @code{va_start} macros, no registers need to be saved. If 662@code{va_start} macros are used, the @code{va_list} variables don't 663escape the function, it is only necessary to save registers that will 664be used in @code{va_arg} macros. For instance, if @code{va_arg} is 665only used with integral types in the function, floating point 666registers don't need to be saved. This pass is located in 667@code{tree-stdarg.c} and described by @code{pass_stdarg}. 668 669@end itemize 670 671@node RTL passes 672@section RTL passes 673 674The following briefly describes the RTL generation and optimization 675passes that are run after the Tree optimization passes. 676 677@itemize @bullet 678@item RTL generation 679 680@c Avoiding overfull is tricky here. 681The source files for RTL generation include 682@file{stmt.c}, 683@file{calls.c}, 684@file{expr.c}, 685@file{explow.c}, 686@file{expmed.c}, 687@file{function.c}, 688@file{optabs.c} 689and @file{emit-rtl.c}. 690Also, the file 691@file{insn-emit.c}, generated from the machine description by the 692program @code{genemit}, is used in this pass. The header file 693@file{expr.h} is used for communication within this pass. 694 695@findex genflags 696@findex gencodes 697The header files @file{insn-flags.h} and @file{insn-codes.h}, 698generated from the machine description by the programs @code{genflags} 699and @code{gencodes}, tell this pass which standard names are available 700for use and which patterns correspond to them. 701 702@item Generation of exception landing pads 703 704This pass generates the glue that handles communication between the 705exception handling library routines and the exception handlers within 706the function. Entry points in the function that are invoked by the 707exception handling library are called @dfn{landing pads}. The code 708for this pass is located in @file{except.c}. 709 710@item Control flow graph cleanup 711 712This pass removes unreachable code, simplifies jumps to next, jumps to 713jump, jumps across jumps, etc. The pass is run multiple times. 714For historical reasons, it is occasionally referred to as the ``jump 715optimization pass''. The bulk of the code for this pass is in 716@file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} 717and @file{jump.c}. 718 719@item Forward propagation of single-def values 720 721This pass attempts to remove redundant computation by substituting 722variables that come from a single definition, and 723seeing if the result can be simplified. It performs copy propagation 724and addressing mode selection. The pass is run twice, with values 725being propagated into loops only on the second run. The code is 726located in @file{fwprop.c}. 727 728@item Common subexpression elimination 729 730This pass removes redundant computation within basic blocks, and 731optimizes addressing modes based on cost. The pass is run twice. 732The code for this pass is located in @file{cse.c}. 733 734@item Global common subexpression elimination 735 736This pass performs two 737different types of GCSE depending on whether you are optimizing for 738size or not (LCM based GCSE tends to increase code size for a gain in 739speed, while Morel-Renvoise based GCSE does not). 740When optimizing for size, GCSE is done using Morel-Renvoise Partial 741Redundancy Elimination, with the exception that it does not try to move 742invariants out of loops---that is left to the loop optimization pass. 743If MR PRE GCSE is done, code hoisting (aka unification) is also done, as 744well as load motion. 745If you are optimizing for speed, LCM (lazy code motion) based GCSE is 746done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM 747based GCSE also does loop invariant code motion. We also perform load 748and store motion when optimizing for speed. 749Regardless of which type of GCSE is used, the GCSE pass also performs 750global constant and copy propagation. 751The source file for this pass is @file{gcse.c}, and the LCM routines 752are in @file{lcm.c}. 753 754@item Loop optimization 755 756This pass performs several loop related optimizations. 757The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain 758generic loop analysis and manipulation code. Initialization and finalization 759of loop structures is handled by @file{loop-init.c}. 760A loop invariant motion pass is implemented in @file{loop-invariant.c}. 761Basic block level optimizations---unrolling, peeling and unswitching loops--- 762are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}. 763Replacing of the exit condition of loops by special machine-dependent 764instructions is handled by @file{loop-doloop.c}. 765 766@item Jump bypassing 767 768This pass is an aggressive form of GCSE that transforms the control 769flow graph of a function by propagating constants into conditional 770branch instructions. The source file for this pass is @file{gcse.c}. 771 772@item If conversion 773 774This pass attempts to replace conditional branches and surrounding 775assignments with arithmetic, boolean value producing comparison 776instructions, and conditional move instructions. In the very last 777invocation after reload, it will generate predicated instructions 778when supported by the target. The code is located in @file{ifcvt.c}. 779 780@item Web construction 781 782This pass splits independent uses of each pseudo-register. This can 783improve effect of the other transformation, such as CSE or register 784allocation. The code for this pass is located in @file{web.c}. 785 786@item Instruction combination 787 788This pass attempts to combine groups of two or three instructions that 789are related by data flow into single instructions. It combines the 790RTL expressions for the instructions by substitution, simplifies the 791result using algebra, and then attempts to match the result against 792the machine description. The code is located in @file{combine.c}. 793 794@item Register movement 795 796This pass looks for cases where matching constraints would force an 797instruction to need a reload, and this reload would be a 798register-to-register move. It then attempts to change the registers 799used by the instruction to avoid the move instruction. The code is 800located in @file{regmove.c}. 801 802@item Mode switching optimization 803 804This pass looks for instructions that require the processor to be in a 805specific ``mode'' and minimizes the number of mode changes required to 806satisfy all users. What these modes are, and what they apply to are 807completely target-specific. The code for this pass is located in 808@file{mode-switching.c}. 809 810@cindex modulo scheduling 811@cindex sms, swing, software pipelining 812@item Modulo scheduling 813 814This pass looks at innermost loops and reorders their instructions 815by overlapping different iterations. Modulo scheduling is performed 816immediately before instruction scheduling. The code for this pass is 817located in @file{modulo-sched.c}. 818 819@item Instruction scheduling 820 821This pass looks for instructions whose output will not be available by 822the time that it is used in subsequent instructions. Memory loads and 823floating point instructions often have this behavior on RISC machines. 824It re-orders instructions within a basic block to try to separate the 825definition and use of items that otherwise would cause pipeline 826stalls. This pass is performed twice, before and after register 827allocation. The code for this pass is located in @file{haifa-sched.c}, 828@file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and 829@file{sched-vis.c}. 830 831@item Register allocation 832 833These passes make sure that all occurrences of pseudo registers are 834eliminated, either by allocating them to a hard register, replacing 835them by an equivalent expression (e.g.@: a constant) or by placing 836them on the stack. This is done in several subpasses: 837 838@itemize @bullet 839@item 840Register move optimizations. This pass makes some simple RTL code 841transformations which improve the subsequent register allocation. The 842source file is @file{regmove.c}. 843 844@item 845The integrated register allocator (@acronym{IRA}). It is called 846integrated because coalescing, register live range splitting, and hard 847register preferencing are done on-the-fly during coloring. It also 848has better integration with the reload pass. Pseudo-registers spilled 849by the allocator or the reload have still a chance to get 850hard-registers if the reload evicts some pseudo-registers from 851hard-registers. The allocator helps to choose better pseudos for 852spilling based on their live ranges and to coalesce stack slots 853allocated for the spilled pseudo-registers. IRA is a regional 854register allocator which is transformed into Chaitin-Briggs allocator 855if there is one region. By default, IRA chooses regions using 856register pressure but the user can force it to use one region or 857regions corresponding to all loops. 858 859Source files of the allocator are @file{ira.c}, @file{ira-build.c}, 860@file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c}, 861@file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h} 862and @file{ira-int.h} used for the communication between the allocator 863and the rest of the compiler and between the IRA files. 864 865@cindex reloading 866@item 867Reloading. This pass renumbers pseudo registers with the hardware 868registers numbers they were allocated. Pseudo registers that did not 869get hard registers are replaced with stack slots. Then it finds 870instructions that are invalid because a value has failed to end up in 871a register, or has ended up in a register of the wrong kind. It fixes 872up these instructions by reloading the problematical values 873temporarily into registers. Additional instructions are generated to 874do the copying. 875 876The reload pass also optionally eliminates the frame pointer and inserts 877instructions to save and restore call-clobbered registers around calls. 878 879Source files are @file{reload.c} and @file{reload1.c}, plus the header 880@file{reload.h} used for communication between them. 881@end itemize 882 883@item Basic block reordering 884 885This pass implements profile guided code positioning. If profile 886information is not available, various types of static analysis are 887performed to make the predictions normally coming from the profile 888feedback (IE execution frequency, branch probability, etc). It is 889implemented in the file @file{bb-reorder.c}, and the various 890prediction routines are in @file{predict.c}. 891 892@item Variable tracking 893 894This pass computes where the variables are stored at each 895position in code and generates notes describing the variable locations 896to RTL code. The location lists are then generated according to these 897notes to debug information if the debugging information format supports 898location lists. The code is located in @file{var-tracking.c}. 899 900@item Delayed branch scheduling 901 902This optional pass attempts to find instructions that can go into the 903delay slots of other instructions, usually jumps and calls. The code 904for this pass is located in @file{reorg.c}. 905 906@item Branch shortening 907 908On many RISC machines, branch instructions have a limited range. 909Thus, longer sequences of instructions must be used for long branches. 910In this pass, the compiler figures out what how far each instruction 911will be from each other instruction, and therefore whether the usual 912instructions, or the longer sequences, must be used for each branch. 913The code for this pass is located in @file{final.c}. 914 915@item Register-to-stack conversion 916 917Conversion from usage of some hard registers to usage of a register 918stack may be done at this point. Currently, this is supported only 919for the floating-point registers of the Intel 80387 coprocessor. The 920code for this pass is located in @file{reg-stack.c}. 921 922@item Final 923 924This pass outputs the assembler code for the function. The source files 925are @file{final.c} plus @file{insn-output.c}; the latter is generated 926automatically from the machine description by the tool @file{genoutput}. 927The header file @file{conditions.h} is used for communication between 928these files. If mudflap is enabled, the queue of deferred declarations 929and any addressed constants (e.g., string literals) is processed by 930@code{mudflap_finish_file} into a synthetic constructor function 931containing calls into the mudflap runtime. 932 933@item Debugging information output 934 935This is run after final because it must output the stack slot offsets 936for pseudo registers that did not get hard registers. Source files 937are @file{dbxout.c} for DBX symbol table format, @file{sdbout.c} for 938SDB symbol table format, @file{dwarfout.c} for DWARF symbol table 939format, files @file{dwarf2out.c} and @file{dwarf2asm.c} for DWARF2 940symbol table format, and @file{vmsdbgout.c} for VMS debug symbol table 941format. 942 943@end itemize 944