1<!--===- docs/Semantics.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9# Semantic Analysis 10 11```{contents} 12--- 13local: 14--- 15``` 16 17The semantic analysis pass determines if a syntactically correct Fortran 18program is is legal by enforcing the constraints of the language. 19 20The input is a parse tree with a `Program` node at the root; 21and a "cooked" character stream, a contiguous stream of characters 22containing a normalized form of the Fortran source. 23 24The semantic analysis pass takes a parse tree for a syntactically 25correct Fortran program and determines whether it is legal by enforcing 26the constraints of the language. 27 28If the program is not legal, the results of the semantic pass will be a list of 29errors associated with the program. 30 31If the program is legal, the semantic pass will produce a (possibly modified) 32parse tree for the semantically correct program with each name mapped to a symbol 33and each expression fully analyzed. 34 35All user errors are detected either prior to or during semantic analysis. 36After it completes successfully the program should compile with no error messages. 37There may still be warnings or informational messages. 38 39## Phases of Semantic Analysis 40 411. [Validate labels](#validate-labels) - 42 Check all constraints on labels and branches 432. [Rewrite DO loops](#rewrite-do-loops) - 44 Convert all occurrences of `LabelDoStmt` to `DoConstruct`. 453. [Name resolution](#name-resolution) - 46 Analyze names and declarations, build a tree of Scopes containing Symbols, 47 and fill in the `Name::symbol` data member in the parse tree 484. [Rewrite parse tree](#rewrite-parse-tree) - 49 Fix incorrect parses based on symbol information 505. [Expression analysis](#expression-analysis) - 51 Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and 52 `Variable::typedExpr` with analyzed expressions; fix incorrect parses 53 based on the result of this analysis 546. [Statement semantics](#statement-semantics) - 55 Perform remaining semantic checks on the execution parts of subprograms 567. [Write module files](#write-module-files) - 57 If no errors have occurred, write out `.mod` files for modules and submodules 58 59If phase 1 or phase 2 encounter an error on any of the program units, 60compilation terminates. Otherwise, phases 3-6 are all performed even if 61errors occur. 62Module files are written (phase 7) only if there are no errors. 63 64### Validate labels 65 66Perform semantic checks related to labels and branches: 67- check that any labels that are referenced are defined and in scope 68- check branches into loop bodies 69- check that labeled `DO` loops are properly nested 70- check labels in data transfer statements 71 72### Rewrite DO loops 73 74This phase normalizes the parse tree by removing all unstructured `DO` loops 75and replacing them with `DO` constructs. 76 77### Name resolution 78 79The name resolution phase walks the parse tree and constructs the symbol table. 80 81The symbol table consists of a tree of `Scope` objects rooted at the global scope. 82The global scope is owned by the `SemanticsContext` object. 83It contains a `Scope` for each program unit in the compilation. 84 85Each `Scope` in the scope tree contains child scopes representing other scopes 86lexically nested in it. 87Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names 88declared in that scope. (All names in the symbol table are represented as 89`CharBlock` objects, i.e. as substrings of the cooked character stream.) 90 91All `Symbol` objects are owned by the symbol table data structures. 92They should be accessed as `Symbol *` or `Symbol &` outside of the symbol 93table classes as they can't be created, copied, or moved. 94The `Symbol` class has functions and data common across all symbols, and a 95`details` field that contains more information specific to that type of symbol. 96Many symbols also have types, represented by `DeclTypeSpec`. 97Types are also owned by scopes. 98 99Name resolution happens on the parse tree in this order: 1001. Process the specification of a program unit: 101 1. Create a new scope for the unit 102 2. Create a symbol for each contained subprogram containing just the name 103 3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.) 104 4. Process the specification part of the unit 1052. Apply the same process recursively to nested subprograms 1063. Process the execution part of the program unit 1074. Process the execution parts of nested subprograms recursively 108 109After the completion of this phase, every `Name` corresponds to a `Symbol` 110unless an error occurred. 111 112### Rewrite parse tree 113 114The parser cannot build a completely correct parse tree without symbol information. 115This phase corrects mis-parses based on symbols: 116- Array element assignments may be parsed as statement functions: `a(i) = ...` 117- Namelist group names without `NML=` may be parsed as format expressions 118- A file unit number expression may be parsed as a character variable 119 120This phase also produces an internal error if it finds a `Name` that does not 121have its `symbol` data member filled in. This error is suppressed if other 122errors have occurred because in that case a `Name` corresponding to an erroneous 123symbol may not be resolved. 124 125### Expression analysis 126 127Expressions that occur in the specification part are analyzed during name 128resolution, for example, initial values, array bounds, type parameters. 129Any remaining expressions are analyzed in this phase. 130 131For each `Variable` and top-level `Expr` (i.e. one that is not nested below 132another `Expr` in the parse tree) the analyzed form of the expression is saved 133in the `typedExpr` data member. After this phase has completed, the analyzed 134expression can be accessed using `semantics::GetExpr()`. 135 136This phase also corrects mis-parses based on the result of expression analysis: 137- An expression like `a(b)` is parsed as a function reference but may need 138 to be rewritten to an array element reference (if `a` is an object entity) 139 or to a structure constructor (if `a` is a derive type) 140- An expression like `a(b:c)` is parsed as an array section but may need to be 141 rewritten as a substring if `a` is an object with type CHARACTER 142 143### Statement semantics 144 145Multiple independent checkers driven by the `SemanticsVisitor` framework 146perform the remaining semantic checks. 147By this phase, all names and expressions that can be successfully resolved 148have been. But there may be names without symbols or expressions without 149analyzed form if errors occurred earlier. 150 151### Initialization processing 152 153Fortran supports many means of specifying static initializers for variables, 154object pointers, and procedure pointers, as well as default initializers for 155derived type object components, pointers, and type parameters. 156 157Non-pointer static initializers of variables and named constants are 158scanned, analyzed, folded, scalar-expanded, and validated as they are 159traversed during declaration processing in name resolution. 160So are the default initializers of non-pointer object components in 161non-parameterized derived types. 162Name constant arrays with implied shapes take their actual shape from 163the initialization expression. 164 165Default initializers of non-pointer components and type parameters 166in distinct parameterized 167derived type instantiations are similarly processed as those instances 168are created, as their expressions may depend on the values of type 169parameters. 170Error messages produced during parameterized derived type instantiation 171are decorated with contextual attachments that point to the declarations 172or other type specifications that caused the instantiation. 173 174Static initializations in `DATA` statements are collected, validated, 175and converted into static initialization in the symbol table, as if 176the initialized objects had used the newer style of static initialization 177in their entity declarations. 178 179All statically initialized pointers, and default component initializers for 180pointers, are processed late in name resolution after all specification parts 181have been traversed. 182This allows for forward references even in the presence of `IMPLICIT NONE`. 183Object pointer initializers in parameterized derived type instantiations are 184also cloned and folded at this late stage. 185Validation of pointer initializers takes place later in declaration 186checking (below). 187 188### Declaration checking 189 190Whenever possible, the enforcement of constraints and "shalls" pertaining to 191properties of symbols is deferred to a single read-only pass over the symbol table 192that takes place after all name resolution and typing is complete. 193 194### Write module files 195 196Separate compilation information is written out on successful compilation 197of modules and submodules. These are used as input to name resolution 198in program units that `USE` the modules. 199 200Module files are stripped down Fortran source for the module. 201Parts that aren't needed to compile dependent program units (e.g. action statements) 202are omitted. 203 204The module file for module `m` is named `m.mod` and the module file for 205submodule `s` of module `m` is named `m-s.mod`. 206