xref: /llvm-project/flang/docs/Overview.md (revision 06eb10dadfaeaadc5d0d95d38bea4bfb5253e077)
1<!--===- docs/Overview.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# Overview of Compiler Phases
10
11```{contents}
12---
13local:
14---
15```
16The Flang compiler transforms Fortran source code into an executable file.
17This transformation proceeds in three high level phases -- analysis, lowering,
18and code generation/linking.
19
20The first high level phase (analysis) transforms Fortran source code into a
21decorated parse tree and a symbol table.  During this phase, all user
22related errors are detected and reported.
23
24The second high level phase (lowering), changes the decorated parse tree and
25symbol table into the Fortran Intermediate Representation (FIR), which is a
26dialect of LLVM's Multi-Level Intermediate Representation or MLIR.  It then
27runs a series of passes on the FIR code which verify its validity, perform a
28series of optimizations, and finally transform it into LLVM's Intermediate
29Representation, or LLVM IR
30
31The third high level phase generates machine code and invokes a linker to
32produce an executable file.
33
34This document describes the first two high level phases.  Each of these is
35described in more detailed phases.
36
37Each detailed phase is described -- its inputs and outputs along with how to
38produce a readable version of the outputs.
39
40Each detailed phase produces either correct output or fatal errors.
41
42## Analysis
43
44This high level phase validates that the program is correct and creates all of
45the information needed for lowering.
46
47### Prescan and Preprocess
48
49See [Preprocessing.md](Preprocessing.md).
50
51**Input:** Fortran source and header files, command line macro definitions,
52  set of enabled compiler directives (to be treated as directives rather than
53  comments).
54
55**Output:**
56- A "cooked" character stream: the entire program as a contiguous stream of
57  normalized Fortran source.
58  Extraneous whitespace and comments are removed (except comments that are
59  compiler directives that are not disabled) and case is normalized.  Also,
60  directives are processed and macros expanded.
61- Provenance information mapping each character back to the source it came from.
62  This is used in subsequent phases that need source locations.  This includes
63  error messages, optimization reports, and debugging information.
64
65**Entry point:** `parser::Parsing::Prescan`
66
67**Commands:**
68 - `flang -fc1 -E src.f90` dumps the cooked character stream
69 - `flang -fc1 -fdebug-dump-provenance src.f90` dumps provenance
70   information
71
72### Parsing
73
74**Input:** Cooked character stream
75
76**Output:** A parse tree for each Fortran program unit in the source code
77representing a syntactically correct program, rooted at the program unit.  See:
78[Parsing.md](Parsing.md) and [ParserCombinators.md](ParserCombinators.md).
79
80**Entry point:** `parser::Parsing::Parse`
81
82**Commands:**
83  - `flang -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree
84  - `flang -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran
85  - `flang -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log
86  - `flang -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree
87
88### Semantic processing
89
90**Input:** the parse tree, the cooked character stream, and provenance
91information
92
93**Output:**
94* a symbol table
95* modified parse tree
96* module files, (see: [ModFiles.md](ModFiles.md))
97* the intrinsic procedure table
98* the target characteristics
99* the runtime derived type derived type tables (see: [RuntimeTypeInfo.md](RuntimeTypeInfo.md))
100
101**Entry point:** `semantics::Semantics::Perform`
102
103For more detail on semantic analysis, see: [Semantics.md](Semantics.md).
104Semantic processing performs several tasks:
105* validates labels, see: [LabelResolution.md](LabelResolution.md).
106* canonicalizes DO statements,
107* canonicalizes OpenACC and OpenMP code
108* resolves names, building a tree of scopes and symbols
109* rewrites the parse tree to correct parsing mistakes (when needed) once semantic information is available to clarify the program's meaning
110* checks the validity of declarations
111* analyzes expressions and statements, emitting error messages where appropriate
112* creates module files if the source code contains modules,
113  see [ModFiles.md](ModFiles.md).
114
115In the course of semantic analysis, the compiler:
116* creates the symbol table
117* decorates the parse tree with semantic information (such as pointers into the symbol table)
118* creates the intrinsic procedure table
119* folds constant expressions
120
121At the end of semantic processing, all validation of the user's program is complete.  This is the last detailed phase of analysis processing.
122
123**Commands:**
124  - `flang -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis
125  - `flang -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table
126  - `flang -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table
127
128## Lowering
129
130Lowering takes the parse tree and symbol table produced by analysis and
131produces LLVM IR.
132
133### Create the lowering bridge
134
135**Inputs:**
136  - the parse tree
137  - the symbol table
138  - The default KINDs for intrinsic types (specified by default or command line option)
139  - The intrinsic procedure table (created in semantics processing)
140  - The target characteristics (created during semantics processing)
141  - The cooked character stream
142  - The target triple -- CPU type, vendor, operating system
143  - The mapping between Fortran KIND values to FIR KIND values
144
145The lowering bridge is a container that holds all of the information needed for lowering.
146
147**Output:** A container with all of the information needed for lowering
148
149**Entry point:** lower::LoweringBridge::create
150
151### Initial lowering
152
153**Input:** the lowering bridge
154
155**Output:** A Fortran IR (FIR) representation of the program.
156
157**Entry point:** `lower::LoweringBridge::lower`
158
159The compiler then takes the information in the lowering bridge and creates a
160pre-FIR tree or PFT.  The PFT is a list of programs and modules.  The programs
161and modules contain lists of function-like units.  The function-like units
162contain a list of evaluations.  All of these contain pointers back into the
163parse tree.  The compiler walks the PFT generating FIR.
164
165**Commands:**
166  - `flang -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree
167  - `flang -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir
168
169### Transformation passes
170
171**Input:** initial version of the FIR code
172
173**Output:** An LLVM IR representation of the program
174
175**Entry point:** `mlir::PassManager::run`
176
177The compiler then runs a series of passes over the FIR code.  The first is a
178verification pass.  It's followed by a series of transformation passes that
179perform various optimizations and transformations.  The final pass creates an
180LLVM IR representation of the program.
181
182**Commands:**
183  - `flang -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error
184  - `flang -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll
185
186## Object code generation and linking
187
188After the LLVM IR is created, the flang driver invokes LLVM's existing
189infrastructure to generate object code and invoke a linker to create the
190executable file.
191