xref: /llvm-project/flang/docs/Semantics.md (revision b7ff03206d668cd5a620a9d4e1b22ea112ed56e3)
1932aae77SSourabh Singh Tomar<!--===- docs/Semantics.md
2932aae77SSourabh Singh Tomar
3932aae77SSourabh Singh Tomar   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4932aae77SSourabh Singh Tomar   See https://llvm.org/LICENSE.txt for license information.
5932aae77SSourabh Singh Tomar   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6932aae77SSourabh Singh Tomar
7932aae77SSourabh Singh Tomar-->
8932aae77SSourabh Singh Tomar
9eaff2004Ssameeran joshi# Semantic Analysis
10eaff2004Ssameeran joshi
11*b7ff0320Scor3ntin```{contents}
12*b7ff0320Scor3ntin---
13*b7ff0320Scor3ntinlocal:
14*b7ff0320Scor3ntin---
15271a7bb1SRichard Barton```
16271a7bb1SRichard Barton
17eaff2004Ssameeran joshiThe semantic analysis pass determines if a syntactically correct Fortran
18eaff2004Ssameeran joshiprogram is is legal by enforcing the constraints of the language.
19eaff2004Ssameeran joshi
20eaff2004Ssameeran joshiThe input is a parse tree with a `Program` node at the root;
21eaff2004Ssameeran joshiand a "cooked" character stream, a contiguous stream of characters
22eaff2004Ssameeran joshicontaining a normalized form of the Fortran source.
23eaff2004Ssameeran joshi
24eaff2004Ssameeran joshiThe semantic analysis pass takes a parse tree for a syntactically
25eaff2004Ssameeran joshicorrect Fortran program and determines whether it is legal by enforcing
26eaff2004Ssameeran joshithe constraints of the language.
27eaff2004Ssameeran joshi
28eaff2004Ssameeran joshiIf the program is not legal, the results of the semantic pass will be a list of
29eaff2004Ssameeran joshierrors associated with the program.
30eaff2004Ssameeran joshi
31eaff2004Ssameeran joshiIf the program is legal, the semantic pass will produce a (possibly modified)
32eaff2004Ssameeran joshiparse tree for the semantically correct program with each name mapped to a symbol
33eaff2004Ssameeran joshiand each expression fully analyzed.
34eaff2004Ssameeran joshi
35eaff2004Ssameeran joshiAll user errors are detected either prior to or during semantic analysis.
36eaff2004Ssameeran joshiAfter it completes successfully the program should compile with no error messages.
37eaff2004Ssameeran joshiThere may still be warnings or informational messages.
38eaff2004Ssameeran joshi
39eaff2004Ssameeran joshi## Phases of Semantic Analysis
40eaff2004Ssameeran joshi
41eaff2004Ssameeran joshi1. [Validate labels](#validate-labels) -
42eaff2004Ssameeran joshi   Check all constraints on labels and branches
43eaff2004Ssameeran joshi2. [Rewrite DO loops](#rewrite-do-loops) -
44eaff2004Ssameeran joshi   Convert all occurrences of `LabelDoStmt` to `DoConstruct`.
45eaff2004Ssameeran joshi3. [Name resolution](#name-resolution) -
46eaff2004Ssameeran joshi   Analyze names and declarations, build a tree of Scopes containing Symbols,
47eaff2004Ssameeran joshi   and fill in the `Name::symbol` data member in the parse tree
48eaff2004Ssameeran joshi4. [Rewrite parse tree](#rewrite-parse-tree) -
49eaff2004Ssameeran joshi   Fix incorrect parses based on symbol information
50eaff2004Ssameeran joshi5. [Expression analysis](#expression-analysis) -
51eaff2004Ssameeran joshi   Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and
52eaff2004Ssameeran joshi   `Variable::typedExpr` with analyzed expressions; fix incorrect parses
53eaff2004Ssameeran joshi   based on the result of this analysis
54eaff2004Ssameeran joshi6. [Statement semantics](#statement-semantics) -
55eaff2004Ssameeran joshi   Perform remaining semantic checks on the execution parts of subprograms
56eaff2004Ssameeran joshi7. [Write module files](#write-module-files) -
57eaff2004Ssameeran joshi   If no errors have occurred, write out `.mod` files for modules and submodules
58eaff2004Ssameeran joshi
59eaff2004Ssameeran joshiIf phase 1 or phase 2 encounter an error on any of the program units,
60eaff2004Ssameeran joshicompilation terminates. Otherwise, phases 3-6 are all performed even if
61eaff2004Ssameeran joshierrors occur.
62eaff2004Ssameeran joshiModule files are written (phase 7) only if there are no errors.
63eaff2004Ssameeran joshi
64eaff2004Ssameeran joshi### Validate labels
65eaff2004Ssameeran joshi
66eaff2004Ssameeran joshiPerform semantic checks related to labels and branches:
67eaff2004Ssameeran joshi- check that any labels that are referenced are defined and in scope
68eaff2004Ssameeran joshi- check branches into loop bodies
69eaff2004Ssameeran joshi- check that labeled `DO` loops are properly nested
70eaff2004Ssameeran joshi- check labels in data transfer statements
71eaff2004Ssameeran joshi
72eaff2004Ssameeran joshi### Rewrite DO loops
73eaff2004Ssameeran joshi
74eaff2004Ssameeran joshiThis phase normalizes the parse tree by removing all unstructured `DO` loops
75eaff2004Ssameeran joshiand replacing them with `DO` constructs.
76eaff2004Ssameeran joshi
77eaff2004Ssameeran joshi### Name resolution
78eaff2004Ssameeran joshi
79eaff2004Ssameeran joshiThe name resolution phase walks the parse tree and constructs the symbol table.
80eaff2004Ssameeran joshi
81eaff2004Ssameeran joshiThe symbol table consists of a tree of `Scope` objects rooted at the global scope.
82eaff2004Ssameeran joshiThe global scope is owned by the `SemanticsContext` object.
83eaff2004Ssameeran joshiIt contains a `Scope` for each program unit in the compilation.
84eaff2004Ssameeran joshi
85eaff2004Ssameeran joshiEach `Scope` in the scope tree contains child scopes representing other scopes
86eaff2004Ssameeran joshilexically nested in it.
87eaff2004Ssameeran joshiEach `Scope` also contains a map of `CharBlock` to `Symbol` representing names
88eaff2004Ssameeran joshideclared in that scope. (All names in the symbol table are represented as
89eaff2004Ssameeran joshi`CharBlock` objects, i.e. as substrings of the cooked character stream.)
90eaff2004Ssameeran joshi
91eaff2004Ssameeran joshiAll `Symbol` objects are owned by the symbol table data structures.
92eaff2004Ssameeran joshiThey should be accessed as `Symbol *` or `Symbol &` outside of the symbol
93eaff2004Ssameeran joshitable classes as they can't be created, copied, or moved.
94eaff2004Ssameeran joshiThe `Symbol` class has functions and data common across all symbols, and a
95eaff2004Ssameeran joshi`details` field that contains more information specific to that type of symbol.
96eaff2004Ssameeran joshiMany symbols also have types, represented by `DeclTypeSpec`.
97eaff2004Ssameeran joshiTypes are also owned by scopes.
98eaff2004Ssameeran joshi
99eaff2004Ssameeran joshiName resolution happens on the parse tree in this order:
100eaff2004Ssameeran joshi1. Process the specification of a program unit:
101eaff2004Ssameeran joshi   1. Create a new scope for the unit
102eaff2004Ssameeran joshi   2. Create a symbol for each contained subprogram containing just the name
103eaff2004Ssameeran joshi   3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.)
104eaff2004Ssameeran joshi   4. Process the specification part of the unit
105eaff2004Ssameeran joshi2. Apply the same process recursively to nested subprograms
106eaff2004Ssameeran joshi3. Process the execution part of the program unit
107eaff2004Ssameeran joshi4. Process the execution parts of nested subprograms recursively
108eaff2004Ssameeran joshi
109eaff2004Ssameeran joshiAfter the completion of this phase, every `Name` corresponds to a `Symbol`
110eaff2004Ssameeran joshiunless an error occurred.
111eaff2004Ssameeran joshi
112eaff2004Ssameeran joshi### Rewrite parse tree
113eaff2004Ssameeran joshi
114eaff2004Ssameeran joshiThe parser cannot build a completely correct parse tree without symbol information.
115eaff2004Ssameeran joshiThis phase corrects mis-parses based on symbols:
116eaff2004Ssameeran joshi- Array element assignments may be parsed as statement functions: `a(i) = ...`
117eaff2004Ssameeran joshi- Namelist group names without `NML=` may be parsed as format expressions
118eaff2004Ssameeran joshi- A file unit number expression may be parsed as a character variable
119eaff2004Ssameeran joshi
120eaff2004Ssameeran joshiThis phase also produces an internal error if it finds a `Name` that does not
121eaff2004Ssameeran joshihave its `symbol` data member filled in. This error is suppressed if other
122eaff2004Ssameeran joshierrors have occurred because in that case a `Name` corresponding to an erroneous
123eaff2004Ssameeran joshisymbol may not be resolved.
124eaff2004Ssameeran joshi
125eaff2004Ssameeran joshi### Expression analysis
126eaff2004Ssameeran joshi
127eaff2004Ssameeran joshiExpressions that occur in the specification part are analyzed during name
128eaff2004Ssameeran joshiresolution, for example, initial values, array bounds, type parameters.
129eaff2004Ssameeran joshiAny remaining expressions are analyzed in this phase.
130eaff2004Ssameeran joshi
131eaff2004Ssameeran joshiFor each `Variable` and top-level `Expr` (i.e. one that is not nested below
132eaff2004Ssameeran joshianother `Expr` in the parse tree) the analyzed form of the expression is saved
133eaff2004Ssameeran joshiin the `typedExpr` data member. After this phase has completed, the analyzed
134eaff2004Ssameeran joshiexpression can be accessed using `semantics::GetExpr()`.
135eaff2004Ssameeran joshi
136eaff2004Ssameeran joshiThis phase also corrects mis-parses based on the result of expression analysis:
137eaff2004Ssameeran joshi- An expression like `a(b)` is parsed as a function reference but may need
138eaff2004Ssameeran joshi  to be rewritten to an array element reference (if `a` is an object entity)
139eaff2004Ssameeran joshi  or to a structure constructor (if `a` is a derive type)
140eaff2004Ssameeran joshi- An expression like `a(b:c)` is parsed as an array section but may need to be
141eaff2004Ssameeran joshi  rewritten as a substring if `a` is an object with type CHARACTER
142eaff2004Ssameeran joshi
143eaff2004Ssameeran joshi### Statement semantics
144eaff2004Ssameeran joshi
145eaff2004Ssameeran joshiMultiple independent checkers driven by the `SemanticsVisitor` framework
146eaff2004Ssameeran joshiperform the remaining semantic checks.
147eaff2004Ssameeran joshiBy this phase, all names and expressions that can be successfully resolved
148eaff2004Ssameeran joshihave been. But there may be names without symbols or expressions without
149eaff2004Ssameeran joshianalyzed form if errors occurred earlier.
150eaff2004Ssameeran joshi
151641ede93Speter klausler### Initialization processing
152641ede93Speter klausler
153641ede93Speter klauslerFortran supports many means of specifying static initializers for variables,
154641ede93Speter klauslerobject pointers, and procedure pointers, as well as default initializers for
155641ede93Speter klauslerderived type object components, pointers, and type parameters.
156641ede93Speter klausler
157641ede93Speter klauslerNon-pointer static initializers of variables and named constants are
158641ede93Speter klauslerscanned, analyzed, folded, scalar-expanded, and validated as they are
159641ede93Speter klauslertraversed during declaration processing in name resolution.
160641ede93Speter klauslerSo are the default initializers of non-pointer object components in
161641ede93Speter klauslernon-parameterized derived types.
162641ede93Speter klauslerName constant arrays with implied shapes take their actual shape from
163641ede93Speter klauslerthe initialization expression.
164641ede93Speter klausler
165641ede93Speter klauslerDefault initializers of non-pointer components and type parameters
166641ede93Speter klauslerin distinct parameterized
167641ede93Speter klauslerderived type instantiations are similarly processed as those instances
168641ede93Speter klauslerare created, as their expressions may depend on the values of type
169641ede93Speter klauslerparameters.
170641ede93Speter klauslerError messages produced during parameterized derived type instantiation
171641ede93Speter klauslerare decorated with contextual attachments that point to the declarations
172641ede93Speter klausleror other type specifications that caused the instantiation.
173641ede93Speter klausler
174641ede93Speter klauslerStatic initializations in `DATA` statements are collected, validated,
175641ede93Speter klauslerand converted into static initialization in the symbol table, as if
176641ede93Speter klauslerthe initialized objects had used the newer style of static initialization
177641ede93Speter klauslerin their entity declarations.
178641ede93Speter klausler
179641ede93Speter klauslerAll statically initialized pointers, and default component initializers for
180641ede93Speter klauslerpointers, are processed late in name resolution after all specification parts
181641ede93Speter klauslerhave been traversed.
182641ede93Speter klauslerThis allows for forward references even in the presence of `IMPLICIT NONE`.
183641ede93Speter klauslerObject pointer initializers in parameterized derived type instantiations are
184641ede93Speter klausleralso cloned and folded at this late stage.
185641ede93Speter klauslerValidation of pointer initializers takes place later in declaration
186641ede93Speter klauslerchecking (below).
187641ede93Speter klausler
188641ede93Speter klausler### Declaration checking
189641ede93Speter klausler
190641ede93Speter klauslerWhenever possible, the enforcement of constraints and "shalls" pertaining to
191641ede93Speter klauslerproperties of symbols is deferred to a single read-only pass over the symbol table
192641ede93Speter klauslerthat takes place after all name resolution and typing is complete.
193641ede93Speter klausler
194eaff2004Ssameeran joshi### Write module files
195eaff2004Ssameeran joshi
196eaff2004Ssameeran joshiSeparate compilation information is written out on successful compilation
197eaff2004Ssameeran joshiof modules and submodules. These are used as input to name resolution
198eaff2004Ssameeran joshiin program units that `USE` the modules.
199eaff2004Ssameeran joshi
200eaff2004Ssameeran joshiModule files are stripped down Fortran source for the module.
201eaff2004Ssameeran joshiParts that aren't needed to compile dependent program units (e.g. action statements)
202eaff2004Ssameeran joshiare omitted.
203eaff2004Ssameeran joshi
204eaff2004Ssameeran joshiThe module file for module `m` is named `m.mod` and the module file for
205eaff2004Ssameeran joshisubmodule `s` of module `m` is named `m-s.mod`.
206