xref: /llvm-project/llvm/docs/Passes.rst (revision 18f8106f310ee702046a11f360af47947c030d2e)
1====================================
2LLVM's Analysis and Transform Passes
3====================================
4
5.. contents::
6    :local:
7
8.. toctree::
9   :hidden:
10
11   KernelInfo
12
13Introduction
14============
15.. warning:: This document is not updated frequently, and the list of passes
16  is most likely incomplete. It is possible to list passes known by the opt
17  tool using ``opt -print-passes``.
18
19This document serves as a high level summary of the optimization features that
20LLVM provides.  Optimizations are implemented as Passes that traverse some
21portion of a program to either collect information or transform the program.
22The table below divides the passes that LLVM provides into three categories.
23Analysis passes compute information that other passes can use or for debugging
24or program visualization purposes.  Transform passes can use (or invalidate)
25the analysis passes.  Transform passes all mutate the program in some way.
26Utility passes provides some utility but don't otherwise fit categorization.
27For example passes to extract functions to bitcode or write a module to bitcode
28are neither analysis nor transform passes.  The table of contents above
29provides a quick summary of each pass and links to the more complete pass
30description later in the document.
31
32Analysis Passes
33===============
34
35This section describes the LLVM Analysis Passes.
36
37``aa-eval``: Exhaustive Alias Analysis Precision Evaluator
38----------------------------------------------------------
39
40This is a simple N^2 alias analysis accuracy evaluator.  Basically, for each
41function in the program, it simply queries to see how the alias analysis
42implementation answers alias queries between each pair of pointers in the
43function.
44
45This is inspired and adapted from code by: Naveen Neelakantam, Francesco
46Spadini, and Wojciech Stryjewski.
47
48``basic-aa``: Basic Alias Analysis (stateless AA impl)
49------------------------------------------------------
50
51A basic alias analysis pass that implements identities (two different globals
52cannot alias, etc), but does no stateful analysis.
53
54``basiccg``: Basic CallGraph Construction
55-----------------------------------------
56
57Yet to be written.
58
59.. _passes-da:
60
61``da``: Dependence Analysis
62---------------------------
63
64Dependence analysis framework, which is used to detect dependences in memory
65accesses.
66
67``domfrontier``: Dominance Frontier Construction
68------------------------------------------------
69
70This pass is a simple dominator construction algorithm for finding forward
71dominator frontiers.
72
73``domtree``: Dominator Tree Construction
74----------------------------------------
75
76This pass is a simple dominator construction algorithm for finding forward
77dominators.
78
79
80``dot-callgraph``: Print Call Graph to "dot" file
81-------------------------------------------------
82
83This pass, only available in ``opt``, prints the call graph into a ``.dot``
84graph.  This graph can then be processed with the "dot" tool to convert it to
85postscript or some other suitable format.
86
87``dot-cfg``: Print CFG of function to "dot" file
88------------------------------------------------
89
90This pass, only available in ``opt``, prints the control flow graph into a
91``.dot`` graph.  This graph can then be processed with the :program:`dot` tool
92to convert it to postscript or some other suitable format.
93Additionally the ``-cfg-func-name=<substring>`` option can be used to filter the
94functions that are printed. All functions that contain the specified substring
95will be printed.
96
97``dot-cfg-only``: Print CFG of function to "dot" file (with no function bodies)
98-------------------------------------------------------------------------------
99
100This pass, only available in ``opt``, prints the control flow graph into a
101``.dot`` graph, omitting the function bodies.  This graph can then be processed
102with the :program:`dot` tool to convert it to postscript or some other suitable
103format.
104Additionally the ``-cfg-func-name=<substring>`` option can be used to filter the
105functions that are printed. All functions that contain the specified substring
106will be printed.
107
108``dot-dom``: Print dominance tree of function to "dot" file
109-----------------------------------------------------------
110
111This pass, only available in ``opt``, prints the dominator tree into a ``.dot``
112graph.  This graph can then be processed with the :program:`dot` tool to
113convert it to postscript or some other suitable format.
114
115``dot-dom-only``: Print dominance tree of function to "dot" file (with no function bodies)
116------------------------------------------------------------------------------------------
117
118This pass, only available in ``opt``, prints the dominator tree into a ``.dot``
119graph, omitting the function bodies.  This graph can then be processed with the
120:program:`dot` tool to convert it to postscript or some other suitable format.
121
122``dot-post-dom``: Print postdominance tree of function to "dot" file
123--------------------------------------------------------------------
124
125This pass, only available in ``opt``, prints the post dominator tree into a
126``.dot`` graph.  This graph can then be processed with the :program:`dot` tool
127to convert it to postscript or some other suitable format.
128
129``dot-post-dom-only``: Print postdominance tree of function to "dot" file (with no function bodies)
130---------------------------------------------------------------------------------------------------
131
132This pass, only available in ``opt``, prints the post dominator tree into a
133``.dot`` graph, omitting the function bodies.  This graph can then be processed
134with the :program:`dot` tool to convert it to postscript or some other suitable
135format.
136
137``globals-aa``: Simple mod/ref analysis for globals
138---------------------------------------------------
139
140This simple pass provides alias and mod/ref information for global values that
141do not have their address taken, and keeps track of whether functions read or
142write memory (are "pure").  For this simple (but very common) case, we can
143provide pretty accurate and useful information.
144
145``instcount``: Counts the various types of ``Instruction``\ s
146-------------------------------------------------------------
147
148This pass collects the count of all instructions and reports them.
149
150``iv-users``: Induction Variable Users
151--------------------------------------
152
153Bookkeeping for "interesting" users of expressions computed from induction
154variables.
155
156``kernel-info``: GPU Kernel Info
157--------------------------------
158
159Reports various statistics for codes compiled for GPUs.  This pass is
160:doc:`documented separately<KernelInfo>`.
161
162``lazy-value-info``: Lazy Value Information Analysis
163----------------------------------------------------
164
165Interface for lazy computation of value constraint information.
166
167``lint``: Statically lint-checks LLVM IR
168----------------------------------------
169
170This pass statically checks for common and easily-identified constructs which
171produce undefined or likely unintended behavior in LLVM IR.
172
173It is not a guarantee of correctness, in two ways.  First, it isn't
174comprehensive.  There are checks which could be done statically which are not
175yet implemented.  Some of these are indicated by TODO comments, but those
176aren't comprehensive either.  Second, many conditions cannot be checked
177statically.  This pass does no dynamic instrumentation, so it can't check for
178all possible problems.
179
180Another limitation is that it assumes all code will be executed.  A store
181through a null pointer in a basic block which is never reached is harmless, but
182this pass will warn about it anyway.
183
184Optimization passes may make conditions that this pass checks for more or less
185obvious.  If an optimization pass appears to be introducing a warning, it may
186be that the optimization pass is merely exposing an existing condition in the
187code.
188
189This code may be run before :ref:`instcombine <passes-instcombine>`.  In many
190cases, instcombine checks for the same kinds of things and turns instructions
191with undefined behavior into unreachable (or equivalent).  Because of this,
192this pass makes some effort to look through bitcasts and so on.
193
194``loops``: Natural Loop Information
195-----------------------------------
196
197This analysis is used to identify natural loops and determine the loop depth of
198various nodes of the CFG.  Note that the loops identified may actually be
199several natural loops that share the same header node... not just a single
200natural loop.
201
202``memdep``: Memory Dependence Analysis
203--------------------------------------
204
205An analysis that determines, for a given memory operation, what preceding
206memory operations it depends on.  It builds on alias analysis information, and
207tries to provide a lazy, caching interface to a common kind of alias
208information query.
209
210``print<module-debuginfo>``: Decodes module-level debug info
211------------------------------------------------------------
212
213This pass decodes the debug info metadata in a module and prints it to standard output in a
214(sufficiently-prepared-) human-readable form.
215
216``postdomtree``: Post-Dominator Tree Construction
217-------------------------------------------------
218
219This pass is a simple post-dominator construction algorithm for finding
220post-dominators.
221
222``print-alias-sets``: Alias Set Printer
223---------------------------------------
224
225Yet to be written.
226
227``print-callgraph``: Print a call graph
228---------------------------------------
229
230This pass, only available in ``opt``, prints the call graph to standard error
231in a human-readable form.
232
233``print-callgraph-sccs``: Print SCCs of the Call Graph
234------------------------------------------------------
235
236This pass, only available in ``opt``, prints the SCCs of the call graph to
237standard error in a human-readable form.
238
239``print-cfg-sccs``: Print SCCs of each function CFG
240---------------------------------------------------
241
242This pass, only available in ``opt``, prints the SCCs of each function CFG to
243standard error in a human-readable fom.
244
245``function(print)``: Print function to stderr
246---------------------------------------------
247
248The ``PrintFunctionPass`` class is designed to be pipelined with other
249``FunctionPasses``, and prints out the functions of the module as they are
250processed.
251
252``module(print)``: Print module to stderr
253-----------------------------------------
254
255This pass simply prints out the entire module when it is executed.
256
257``regions``: Detect single entry single exit regions
258----------------------------------------------------
259
260The ``RegionInfo`` pass detects single entry single exit regions in a function,
261where a region is defined as any subgraph that is connected to the remaining
262graph at only two spots.  Furthermore, a hierarchical region tree is built.
263
264.. _passes-scalar-evolution:
265
266``scalar-evolution``: Scalar Evolution Analysis
267-----------------------------------------------
268
269The ``ScalarEvolution`` analysis can be used to analyze and categorize scalar
270expressions in loops.  It specializes in recognizing general induction
271variables, representing them with the abstract and opaque ``SCEV`` class.
272Given this analysis, trip counts of loops and other important properties can be
273obtained.
274
275This analysis is primarily useful for induction variable substitution and
276strength reduction.
277
278``scev-aa``: ScalarEvolution-based Alias Analysis
279-------------------------------------------------
280
281Simple alias analysis implemented in terms of ``ScalarEvolution`` queries.
282
283This differs from traditional loop dependence analysis in that it tests for
284dependencies within a single iteration of a loop, rather than dependencies
285between different iterations.
286
287``ScalarEvolution`` has a more complete understanding of pointer arithmetic
288than ``BasicAliasAnalysis``' collection of ad-hoc analyses.
289
290``stack-safety``: Stack Safety Analysis
291---------------------------------------
292
293The ``StackSafety`` analysis can be used to determine if stack allocated
294variables can be considered safe from memory access bugs.
295
296This analysis' primary purpose is to be used by sanitizers to avoid unnecessary
297instrumentation of safe variables.
298
299Transform Passes
300================
301
302This section describes the LLVM Transform Passes.
303
304``adce``: Aggressive Dead Code Elimination
305------------------------------------------
306
307ADCE aggressively tries to eliminate code.  This pass is similar to :ref:`DCE
308<passes-dce>` but it assumes that values are dead until proven otherwise.  This
309is similar to :ref:`SCCP <passes-sccp>`, except applied to the liveness of
310values.
311
312``always-inline``: Inliner for ``always_inline`` functions
313----------------------------------------------------------
314
315A custom inliner that handles only functions that are marked as "always
316inline".
317
318``argpromotion``: Promote 'by reference' arguments to scalars
319-------------------------------------------------------------
320
321This pass promotes "by reference" arguments to be "by value" arguments.  In
322practice, this means looking for internal functions that have pointer
323arguments.  If it can prove, through the use of alias analysis, that an
324argument is *only* loaded, then it can pass the value into the function instead
325of the address of the value.  This can cause recursive simplification of code
326and lead to the elimination of allocas (especially in C++ template code like
327the STL).
328
329This pass also handles aggregate arguments that are passed into a function,
330scalarizing them if the elements of the aggregate are only loaded.  Note that
331it refuses to scalarize aggregates which would require passing in more than
332three operands to the function, because passing thousands of operands for a
333large array or structure is unprofitable!
334
335Note that this transformation could also be done for arguments that are only
336stored to (returning the value instead), but does not currently.  This case
337would be best handled when and if LLVM starts supporting multiple return values
338from functions.
339
340``block-placement``: Profile Guided Basic Block Placement
341---------------------------------------------------------
342
343This pass is a very simple profile guided basic block placement algorithm.  The
344idea is to put frequently executed blocks together at the start of the function
345and hopefully increase the number of fall-through conditional branches.  If
346there is no profile information for a particular function, this pass basically
347orders blocks in depth-first order.
348
349``break-crit-edges``: Break critical edges in CFG
350-------------------------------------------------
351
352Break all of the critical edges in the CFG by inserting a dummy basic block.
353It may be "required" by passes that cannot deal with critical edges.  This
354transformation obviously invalidates the CFG, but can update forward dominator
355(set, immediate dominators, tree, and frontier) information.
356
357``codegenprepare``: Optimize for code generation
358------------------------------------------------
359
360This pass munges the code in the input function to better prepare it for
361SelectionDAG-based code generation.  This works around limitations in its
362basic-block-at-a-time approach.  It should eventually be removed.
363
364``constmerge``: Merge Duplicate Global Constants
365------------------------------------------------
366
367Merges duplicate global constants together into a single constant that is
368shared.  This is useful because some passes (i.e., TraceValues) insert a lot of
369string constants into the program, regardless of whether or not an existing
370string is available.
371
372.. _passes-dce:
373
374``dce``: Dead Code Elimination
375------------------------------
376
377Dead code elimination is similar to dead instruction elimination, but it
378rechecks instructions that were used by removed instructions to see if they
379are newly dead.
380
381``deadargelim``: Dead Argument Elimination
382------------------------------------------
383
384This pass deletes dead arguments from internal functions.  Dead argument
385elimination removes arguments which are directly dead, as well as arguments
386only passed into function calls as dead arguments of other functions.  This
387pass also deletes dead arguments in a similar way.
388
389This pass is often useful as a cleanup pass to run after aggressive
390interprocedural passes, which add possibly-dead arguments.
391
392``dse``: Dead Store Elimination
393-------------------------------
394
395A trivial dead store elimination that only considers basic-block local
396redundant stores.
397
398.. _passes-function-attrs:
399
400``function-attrs``: Deduce function attributes
401----------------------------------------------
402
403A simple interprocedural pass which walks the call-graph, looking for functions
404which do not access or only read non-local memory, and marking them
405``readnone``/``readonly``.  In addition, it marks function arguments (of
406pointer type) "``nocapture``" if a call to the function does not create any
407copies of the pointer value that outlive the call.  This more or less means
408that the pointer is only dereferenced, and not returned from the function or
409stored in a global.  This pass is implemented as a bottom-up traversal of the
410call-graph.
411
412``globaldce``: Dead Global Elimination
413--------------------------------------
414
415This transform is designed to eliminate unreachable internal globals from the
416program.  It uses an aggressive algorithm, searching out globals that are known
417to be alive.  After it finds all of the globals which are needed, it deletes
418whatever is left over.  This allows it to delete recursive chunks of the
419program which are unreachable.
420
421``globalopt``: Global Variable Optimizer
422----------------------------------------
423
424This pass transforms simple global variables that never have their address
425taken.  If obviously true, it marks read/write globals as constant, deletes
426variables only stored to, etc.
427
428``gvn``: Global Value Numbering
429-------------------------------
430
431This pass performs global value numbering to eliminate fully and partially
432redundant instructions.  It also performs redundant load elimination.
433
434.. _passes-indvars:
435
436``indvars``: Canonicalize Induction Variables
437---------------------------------------------
438
439This transformation analyzes and transforms the induction variables (and
440computations derived from them) into simpler forms suitable for subsequent
441analysis and transformation.
442
443This transformation makes the following changes to each loop with an
444identifiable induction variable:
445
446* All loops are transformed to have a *single* canonical induction variable
447  which starts at zero and steps by one.
448* The canonical induction variable is guaranteed to be the first PHI node in
449  the loop header block.
450* Any pointer arithmetic recurrences are raised to use array subscripts.
451
452If the trip count of a loop is computable, this pass also makes the following
453changes:
454
455* The exit condition for the loop is canonicalized to compare the induction
456  value against the exit value.  This turns loops like:
457
458  .. code-block:: c++
459
460    for (i = 7; i*i < 1000; ++i)
461
462    into
463
464  .. code-block:: c++
465
466    for (i = 0; i != 25; ++i)
467
468* Any use outside of the loop of an expression derived from the indvar is
469  changed to compute the derived value outside of the loop, eliminating the
470  dependence on the exit value of the induction variable.  If the only purpose
471  of the loop is to compute the exit value of some derived expression, this
472  transformation will make the loop dead.
473
474This transformation should be followed by strength reduction after all of the
475desired loop transformations have been performed.  Additionally, on targets
476where it is profitable, the loop could be transformed to count down to zero
477(the "do loop" optimization).
478
479``inline``: Function Integration/Inlining
480-----------------------------------------
481
482Bottom-up inlining of functions into callees.
483
484.. _passes-instcombine:
485
486``instcombine``: Combine redundant instructions
487-----------------------------------------------
488
489Combine instructions to form fewer, simple instructions.  This pass does not
490modify the CFG. This pass is where algebraic simplification happens.
491
492This pass combines things like:
493
494.. code-block:: llvm
495
496  %Y = add i32 %X, 1
497  %Z = add i32 %Y, 1
498
499into:
500
501.. code-block:: llvm
502
503  %Z = add i32 %X, 2
504
505This is a simple worklist driven algorithm.
506
507This pass guarantees that the following canonicalizations are performed on the
508program:
509
510#. If a binary operator has a constant operand, it is moved to the right-hand
511   side.
512#. Bitwise operators with constant operands are always grouped so that shifts
513   are performed first, then ``or``\ s, then ``and``\ s, then ``xor``\ s.
514#. Compare instructions are converted from ``<``, ``>``, ``≤``, or ``≥`` to
515   ``=`` or ``≠`` if possible.
516#. All ``cmp`` instructions on boolean values are replaced with logical
517   operations.
518#. ``add X, X`` is represented as ``mul X, 2`` ⇒ ``shl X, 1``
519#. Multiplies with a constant power-of-two argument are transformed into
520   shifts.
521#. … etc.
522
523This pass can also simplify calls to specific well-known function calls (e.g.
524runtime library functions).  For example, a call ``exit(3)`` that occurs within
525the ``main()`` function can be transformed into simply ``return 3``. Whether or
526not library calls are simplified is controlled by the
527:ref:`-function-attrs <passes-function-attrs>` pass and LLVM's knowledge of
528library calls on different targets.
529
530.. _passes-aggressive-instcombine:
531
532``aggressive-instcombine``: Combine expression patterns
533--------------------------------------------------------
534
535Combine expression patterns to form expressions with fewer, simple instructions.
536
537For example, this pass reduce width of expressions post-dominated by TruncInst
538into smaller width when applicable.
539
540It differs from instcombine pass in that it can modify CFG and contains pattern
541optimization that requires higher complexity than the O(1), thus, it should run fewer
542times than instcombine pass.
543
544``internalize``: Internalize Global Symbols
545-------------------------------------------
546
547This pass loops over all of the functions in the input module, looking for a
548main function.  If a main function is found, all other functions and all global
549variables with initializers are marked as internal.
550
551``ipsccp``: Interprocedural Sparse Conditional Constant Propagation
552-------------------------------------------------------------------
553
554An interprocedural variant of :ref:`Sparse Conditional Constant Propagation
555<passes-sccp>`.
556
557``ir-normalizer``: Transforms IR into a normal form that's easier to diff
558----------------------------------------------------------------------------
559
560This pass aims to transform LLVM Modules into a normal form by reordering and
561renaming instructions while preserving the same semantics. The normalizer makes
562it easier to spot semantic differences while diffing two modules which have
563undergone two different passes.
564
565``jump-threading``: Jump Threading
566----------------------------------
567
568Jump threading tries to find distinct threads of control flow running through a
569basic block.  This pass looks at blocks that have multiple predecessors and
570multiple successors.  If one or more of the predecessors of the block can be
571proven to always cause a jump to one of the successors, we forward the edge
572from the predecessor to the successor by duplicating the contents of this
573block.
574
575An example of when this can occur is code like this:
576
577.. code-block:: c++
578
579  if () { ...
580    X = 4;
581  }
582  if (X < 3) {
583
584In this case, the unconditional branch at the end of the first if can be
585revectored to the false side of the second if.
586
587.. _passes-lcssa:
588
589``lcssa``: Loop-Closed SSA Form Pass
590------------------------------------
591
592This pass transforms loops by placing phi nodes at the end of the loops for all
593values that are live across the loop boundary.  For example, it turns the left
594into the right code:
595
596.. code-block:: c++
597
598  for (...)                for (...)
599      if (c)                   if (c)
600          X1 = ...                 X1 = ...
601      else                     else
602          X2 = ...                 X2 = ...
603      X3 = phi(X1, X2)         X3 = phi(X1, X2)
604  ... = X3 + 4              X4 = phi(X3)
605                              ... = X4 + 4
606
607This is still valid LLVM; the extra phi nodes are purely redundant, and will be
608trivially eliminated by ``InstCombine``.  The major benefit of this
609transformation is that it makes many other loop optimizations, such as
610``LoopUnswitch``\ ing, simpler. You can read more in the
611:ref:`loop terminology section for the LCSSA form <loop-terminology-lcssa>`.
612
613.. _passes-licm:
614
615``licm``: Loop Invariant Code Motion
616------------------------------------
617
618This pass performs loop invariant code motion, attempting to remove as much
619code from the body of a loop as possible.  It does this by either hoisting code
620into the preheader block, or by sinking code to the exit blocks if it is safe.
621This pass also promotes must-aliased memory locations in the loop to live in
622registers, thus hoisting and sinking "invariant" loads and stores.
623
624Hoisting operations out of loops is a canonicalization transform. It enables
625and simplifies subsequent optimizations in the middle-end. Rematerialization
626of hoisted instructions to reduce register pressure is the responsibility of
627the back-end, which has more accurate information about register pressure and
628also handles other optimizations than LICM that increase live-ranges.
629
630This pass uses alias analysis for two purposes:
631
632#. Moving loop invariant loads and calls out of loops.  If we can determine
633   that a load or call inside of a loop never aliases anything stored to, we
634   can hoist it or sink it like any other instruction.
635
636#. Scalar Promotion of Memory.  If there is a store instruction inside of the
637   loop, we try to move the store to happen AFTER the loop instead of inside of
638   the loop.  This can only happen if a few conditions are true:
639
640   #. The pointer stored through is loop invariant.
641   #. There are no stores or loads in the loop which *may* alias the pointer.
642      There are no calls in the loop which mod/ref the pointer.
643
644   If these conditions are true, we can promote the loads and stores in the
645   loop of the pointer to use a temporary alloca'd variable.  We then use the
646   :ref:`mem2reg <passes-mem2reg>` functionality to construct the appropriate
647   SSA form for the variable.
648
649``loop-deletion``: Delete dead loops
650------------------------------------
651
652This file implements the Dead Loop Deletion Pass.  This pass is responsible for
653eliminating loops with non-infinite computable trip counts that have no side
654effects or volatile instructions, and do not contribute to the computation of
655the function's return value.
656
657.. _passes-loop-extract:
658
659``loop-extract``: Extract loops into new functions
660--------------------------------------------------
661
662A pass wrapper around the ``ExtractLoop()`` scalar transformation to extract
663each top-level loop into its own new function.  If the loop is the *only* loop
664in a given function, it is not touched.  This is a pass most useful for
665debugging via bugpoint.
666
667``loop-reduce``: Loop Strength Reduction
668----------------------------------------
669
670This pass performs a strength reduction on array references inside loops that
671have as one or more of their components the loop induction variable.  This is
672accomplished by creating a new value to hold the initial value of the array
673access for the first iteration, and then creating a new GEP instruction in the
674loop to increment the value by the appropriate amount.
675
676.. _passes-loop-rotate:
677
678``loop-rotate``: Rotate Loops
679-----------------------------
680
681A simple loop rotation transformation.  A summary of it can be found in
682:ref:`Loop Terminology for Rotated Loops <loop-terminology-loop-rotate>`.
683
684
685.. _passes-loop-simplify:
686
687``loop-simplify``: Canonicalize natural loops
688---------------------------------------------
689
690This pass performs several transformations to transform natural loops into a
691simpler form, which makes subsequent analyses and transformations simpler and
692more effective. A summary of it can be found in
693:ref:`Loop Terminology, Loop Simplify Form <loop-terminology-loop-simplify>`.
694
695Loop pre-header insertion guarantees that there is a single, non-critical entry
696edge from outside of the loop to the loop header.  This simplifies a number of
697analyses and transformations, such as :ref:`LICM <passes-licm>`.
698
699Loop exit-block insertion guarantees that all exit blocks from the loop (blocks
700which are outside of the loop that have predecessors inside of the loop) only
701have predecessors from inside of the loop (and are thus dominated by the loop
702header).  This simplifies transformations such as store-sinking that are built
703into LICM.
704
705This pass also guarantees that loops will have exactly one backedge.
706
707Note that the :ref:`simplifycfg <passes-simplifycfg>` pass will clean up blocks
708which are split out but end up being unnecessary, so usage of this pass should
709not pessimize generated code.
710
711This pass obviously modifies the CFG, but updates loop information and
712dominator information.
713
714``loop-unroll``: Unroll loops
715-----------------------------
716
717This pass implements a simple loop unroller.  It works best when loops have
718been canonicalized by the :ref:`indvars <passes-indvars>` pass, allowing it to
719determine the trip counts of loops easily.
720
721``loop-unroll-and-jam``: Unroll and Jam loops
722---------------------------------------------
723
724This pass implements a simple unroll and jam classical loop optimisation pass.
725It transforms loop from:
726
727.. code-block:: c++
728
729  for i.. i+= 1              for i.. i+= 4
730    for j..                    for j..
731      code(i, j)                 code(i, j)
732                                 code(i+1, j)
733                                 code(i+2, j)
734                                 code(i+3, j)
735                             remainder loop
736
737Which can be seen as unrolling the outer loop and "jamming" (fusing) the inner
738loops into one. When variables or loads can be shared in the new inner loop, this
739can lead to significant performance improvements. It uses
740:ref:`Dependence Analysis <passes-da>` for proving the transformations are safe.
741
742``lower-global-dtors``: Lower global destructors
743------------------------------------------------
744
745This pass lowers global module destructors (``llvm.global_dtors``) by creating
746wrapper functions that are registered as global constructors in
747``llvm.global_ctors`` and which contain a call to ``__cxa_atexit`` to register
748their destructor functions.
749
750``lower-atomic``: Lower atomic intrinsics to non-atomic form
751------------------------------------------------------------
752
753This pass lowers atomic intrinsics to non-atomic form for use in a known
754non-preemptible environment.
755
756The pass does not verify that the environment is non-preemptible (in general
757this would require knowledge of the entire call graph of the program including
758any libraries which may not be available in bitcode form); it simply lowers
759every atomic intrinsic.
760
761``lower-invoke``: Lower invokes to calls, for unwindless code generators
762------------------------------------------------------------------------
763
764This transformation is designed for use by code generators which do not yet
765support stack unwinding.  This pass converts ``invoke`` instructions to
766``call`` instructions, so that any exception-handling ``landingpad`` blocks
767become dead code (which can be removed by running the ``-simplifycfg`` pass
768afterwards).
769
770``lower-switch``: Lower ``SwitchInst``\ s to branches
771-----------------------------------------------------
772
773Rewrites switch instructions with a sequence of branches, which allows targets
774to get away with not implementing the switch instruction until it is
775convenient.
776
777.. _passes-mem2reg:
778
779``mem2reg``: Promote Memory to Register
780---------------------------------------
781
782This file promotes memory references to be register references.  It promotes
783alloca instructions which only have loads and stores as uses.  An ``alloca`` is
784transformed by using dominator frontiers to place phi nodes, then traversing
785the function in depth-first order to rewrite loads and stores as appropriate.
786This is just the standard SSA construction algorithm to construct "pruned" SSA
787form.
788
789``memcpyopt``: MemCpy Optimization
790----------------------------------
791
792This pass performs various transformations related to eliminating ``memcpy``
793calls, or transforming sets of stores into ``memset``\ s.
794
795``mergefunc``: Merge Functions
796------------------------------
797
798This pass looks for equivalent functions that are mergeable and folds them.
799
800Total-ordering is introduced among the functions set: we define comparison
801that answers for every two functions which of them is greater. It allows to
802arrange functions into the binary tree.
803
804For every new function we check for equivalent in tree.
805
806If equivalent exists we fold such functions. If both functions are overridable,
807we move the functionality into a new internal function and leave two
808overridable thunks to it.
809
810If there is no equivalent, then we add this function to tree.
811
812Lookup routine has O(log(n)) complexity, while whole merging process has
813complexity of O(n*log(n)).
814
815Read
816:doc:`this <MergeFunctions>`
817article for more details.
818
819``mergereturn``: Unify function exit nodes
820------------------------------------------
821
822Ensure that functions have at most one ``ret`` instruction in them.
823Additionally, it keeps track of which node is the new exit node of the CFG.
824
825``partial-inliner``: Partial Inliner
826------------------------------------
827
828This pass performs partial inlining, typically by inlining an ``if`` statement
829that surrounds the body of the function.
830
831``reassociate``: Reassociate expressions
832----------------------------------------
833
834This pass reassociates commutative expressions in an order that is designed to
835promote better constant propagation, GCSE, :ref:`LICM <passes-licm>`, PRE, etc.
836
837For example: 4 + (x + 5) ⇒ x + (4 + 5)
838
839In the implementation of this algorithm, constants are assigned rank = 0,
840function arguments are rank = 1, and other values are assigned ranks
841corresponding to the reverse post order traversal of current function (starting
842at 2), which effectively gives values in deep loops higher rank than values not
843in loops.
844
845``rel-lookup-table-converter``: Relative lookup table converter
846---------------------------------------------------------------
847
848This pass converts lookup tables to PIC-friendly relative lookup tables.
849
850``reg2mem``: Demote all values to stack slots
851---------------------------------------------
852
853This file demotes all registers to memory references.  It is intended to be the
854inverse of :ref:`mem2reg <passes-mem2reg>`.  By converting to ``load``
855instructions, the only values live across basic blocks are ``alloca``
856instructions and ``load`` instructions before ``phi`` nodes.  It is intended
857that this should make CFG hacking much easier.  To make later hacking easier,
858the entry block is split into two, such that all introduced ``alloca``
859instructions (and nothing else) are in the entry block.
860
861``sroa``: Scalar Replacement of Aggregates
862------------------------------------------
863
864The well-known scalar replacement of aggregates transformation.  This transform
865breaks up ``alloca`` instructions of aggregate type (structure or array) into
866individual ``alloca`` instructions for each member if possible.  Then, if
867possible, it transforms the individual ``alloca`` instructions into nice clean
868scalar SSA form.
869
870.. _passes-sccp:
871
872``sccp``: Sparse Conditional Constant Propagation
873-------------------------------------------------
874
875Sparse conditional constant propagation and merging, which can be summarized
876as:
877
878* Assumes values are constant unless proven otherwise
879* Assumes BasicBlocks are dead unless proven otherwise
880* Proves values to be constant, and replaces them with constants
881* Proves conditional branches to be unconditional
882
883Note that this pass has a habit of making definitions be dead.  It is a good
884idea to run a :ref:`DCE <passes-dce>` pass sometime after running this pass.
885
886.. _passes-simplifycfg:
887
888``simplifycfg``: Simplify the CFG
889---------------------------------
890
891Performs dead code elimination and basic block merging.  Specifically:
892
893* Removes basic blocks with no predecessors.
894* Merges a basic block into its predecessor if there is only one and the
895  predecessor only has one successor.
896* Eliminates PHI nodes for basic blocks with a single predecessor.
897* Eliminates a basic block that only contains an unconditional branch.
898
899``sink``: Code sinking
900----------------------
901
902This pass moves instructions into successor blocks, when possible, so that they
903aren't executed on paths where their results aren't needed.
904
905.. _passes-simple-loop-unswitch:
906
907``simple-loop-unswitch``: Unswitch loops
908----------------------------------------
909
910This pass transforms loops that contain branches on loop-invariant conditions
911to have multiple loops.  For example, it turns the left into the right code:
912
913.. code-block:: c++
914
915  for (...)                  if (lic)
916      A                          for (...)
917      if (lic)                       A; B; C
918          B                  else
919      C                          for (...)
920                                     A; C
921
922This can increase the size of the code exponentially (doubling it every time a
923loop is unswitched) so we only unswitch if the resultant code will be smaller
924than a threshold.
925
926This pass expects :ref:`LICM <passes-licm>` to be run before it to hoist
927invariant conditions out of the loop, to make the unswitching opportunity
928obvious.
929
930``strip``: Strip all symbols from a module
931------------------------------------------
932
933Performs code stripping.  This transformation can delete:
934
935* names for virtual registers
936* symbols for internal globals and functions
937* debug information
938
939Note that this transformation makes code much less readable, so it should only
940be used in situations where the strip utility would be used, such as reducing
941code size or making it harder to reverse engineer code.
942
943``strip-dead-debug-info``: Strip debug info for unused symbols
944--------------------------------------------------------------
945
946Performs code stripping. Similar to strip, but only strips debug info for
947unused symbols.
948
949``strip-dead-prototypes``: Strip Unused Function Prototypes
950-----------------------------------------------------------
951
952This pass loops over all of the functions in the input module, looking for dead
953declarations and removes them.  Dead declarations are declarations of functions
954for which no implementation is available (i.e., declarations for unused library
955functions).
956
957``strip-debug-declare``: Strip all ``llvm.dbg.declare`` intrinsics and
958``#dbg_declare`` records.
959-------------------------------------------------------------------
960
961Performs code stripping. Similar to strip, but only strips
962``llvm.dbg.declare`` intrinsics.
963
964``strip-nondebug``: Strip all symbols, except dbg symbols, from a module
965------------------------------------------------------------------------
966
967Performs code stripping. Similar to strip, but dbg info is preserved.
968
969``tailcallelim``: Tail Call Elimination
970---------------------------------------
971
972This file transforms calls of the current function (self recursion) followed by
973a return instruction with a branch to the entry of the function, creating a
974loop.  This pass also implements the following extensions to the basic
975algorithm:
976
977#. Trivial instructions between the call and return do not prevent the
978   transformation from taking place, though currently the analysis cannot
979   support moving any really useful instructions (only dead ones).
980#. This pass transforms functions that are prevented from being tail recursive
981   by an associative expression to use an accumulator variable, thus compiling
982   the typical naive factorial or fib implementation into efficient code.
983#. TRE is performed if the function returns void, if the return returns the
984   result returned by the call, or if the function returns a run-time constant
985   on all exits from the function.  It is possible, though unlikely, that the
986   return returns something else (like constant 0), and can still be TRE'd.  It
987   can be TRE'd if *all other* return instructions in the function return the
988   exact same value.
989#. If it can prove that callees do not access their caller stack frame, they
990   are marked as eligible for tail call elimination (by the code generator).
991
992Utility Passes
993==============
994
995This section describes the LLVM Utility Passes.
996
997``deadarghaX0r``: Dead Argument Hacking (BUGPOINT USE ONLY; DO NOT USE)
998-----------------------------------------------------------------------
999
1000Same as dead argument elimination, but deletes arguments to functions which are
1001external.  This is only for use by :doc:`bugpoint <Bugpoint>`.
1002
1003``extract-blocks``: Extract Basic Blocks From Module (for bugpoint use)
1004-----------------------------------------------------------------------
1005
1006This pass is used by bugpoint to extract all blocks from the module into their
1007own functions.
1008
1009``instnamer``: Assign names to anonymous instructions
1010-----------------------------------------------------
1011
1012This is a little utility pass that gives instructions names, this is mostly
1013useful when diffing the effect of an optimization because deleting an unnamed
1014instruction can change all other instruction numbering, making the diff very
1015noisy.
1016
1017.. _passes-verify:
1018
1019``verify``: Module Verifier
1020---------------------------
1021
1022Verifies an LLVM IR code.  This is useful to run after an optimization which is
1023undergoing testing.  Note that llvm-as verifies its input before emitting
1024bitcode, and also that malformed bitcode is likely to make LLVM crash.  All
1025language front-ends are therefore encouraged to verify their output before
1026performing optimizing transformations.
1027
1028#. Both of a binary operator's parameters are of the same type.
1029#. Verify that the indices of mem access instructions match other operands.
1030#. Verify that arithmetic and other things are only performed on first-class
1031   types.  Verify that shifts and logicals only happen on integrals f.e.
1032#. All of the constants in a switch statement are of the correct type.
1033#. The code is in valid SSA form.
1034#. It is illegal to put a label into any other type (like a structure) or to
1035   return one.
1036#. Only phi nodes can be self referential: ``%x = add i32 %x``, ``%x`` is
1037   invalid.
1038#. PHI nodes must have an entry for each predecessor, with no extras.
1039#. PHI nodes must be the first thing in a basic block, all grouped together.
1040#. PHI nodes must have at least one entry.
1041#. All basic blocks should only end with terminator insts, not contain them.
1042#. The entry node to a function must not have predecessors.
1043#. All Instructions must be embedded into a basic block.
1044#. Functions cannot take a void-typed parameter.
1045#. Verify that a function's argument list agrees with its declared type.
1046#. It is illegal to specify a name for a void value.
1047#. It is illegal to have an internal global value with no initializer.
1048#. It is illegal to have a ``ret`` instruction that returns a value that does
1049   not agree with the function return value type.
1050#. Function call argument types match the function prototype.
1051#. All other things that are tested by asserts spread about the code.
1052
1053Note that this does not provide full security verification (like Java), but
1054instead just tries to ensure that code is well-formed.
1055
1056.. _passes-view-cfg:
1057
1058``view-cfg``: View CFG of function
1059----------------------------------
1060
1061Displays the control flow graph using the GraphViz tool.
1062Additionally the ``-cfg-func-name=<substring>`` option can be used to filter the
1063functions that are displayed. All functions that contain the specified substring
1064will be displayed.
1065
1066``view-cfg-only``: View CFG of function (with no function bodies)
1067-----------------------------------------------------------------
1068
1069Displays the control flow graph using the GraphViz tool, but omitting function
1070bodies.
1071Additionally the ``-cfg-func-name=<substring>`` option can be used to filter the
1072functions that are displayed. All functions that contain the specified substring
1073will be displayed.
1074
1075``view-dom``: View dominance tree of function
1076---------------------------------------------
1077
1078Displays the dominator tree using the GraphViz tool.
1079
1080``view-dom-only``: View dominance tree of function (with no function bodies)
1081----------------------------------------------------------------------------
1082
1083Displays the dominator tree using the GraphViz tool, but omitting function
1084bodies.
1085
1086``view-post-dom``: View postdominance tree of function
1087------------------------------------------------------
1088
1089Displays the post dominator tree using the GraphViz tool.
1090
1091``view-post-dom-only``: View postdominance tree of function (with no function bodies)
1092-------------------------------------------------------------------------------------
1093
1094Displays the post dominator tree using the GraphViz tool, but omitting function
1095bodies.
1096
1097``transform-warning``: Report missed forced transformations
1098-----------------------------------------------------------
1099
1100Emits warnings about not yet applied forced transformations (e.g. from
1101``#pragma omp simd``).
1102