1[//]: # ($NetBSD: README.md,v 1.9 2022/07/08 20:27:36 rillig Exp $) 2 3# Introduction 4 5Lint1 analyzes a single translation unit of C code. 6 7* It reads the output of the C preprocessor, retaining the comments. 8* The lexer in `scan.l` and `lex.c` splits the input into tokens. 9* The parser in `cgram.y` creates types and expressions from the tokens. 10* It checks declarations in `decl.c`. 11* It checks initializations in `init.c`. 12* It checks types and expressions in `tree.c`. 13 14To see how a specific lint message is triggered, read the corresponding unit 15test in `tests/usr.bin/xlint/lint1/msg_???.c`. 16 17# Features 18 19## Type checking 20 21Lint has stricter type checking than most C compilers. 22 23In _strict bool mode_, lint treats `bool` as a type that is incompatible with 24other scalar types, like in C#, Go, Java. 25See the test `d_c99_bool_strict.c` for details. 26 27Lint warns about type conversions that may result in alignment problems. 28See the test `msg_135.c` for examples. 29 30## Control flow analysis 31 32Lint roughly tracks the control flow inside a single function. 33It doesn't follow `goto` statements precisely though, 34it rather assumes that each label is reachable. 35See the test `msg_193.c` for examples. 36 37## Error handling 38 39Lint tries to continue parsing and checking even after seeing errors. 40This part of lint is not robust though, so expect some crashes here, 41as variables may not be properly initialized or be null pointers. 42The cleanup after handling a parse error is often incomplete. 43 44## Configurable diagnostic messages 45 46Whether lint prints a message and whether each message is an error, a warning 47or just informational depends on several things: 48 49* The language level, with its possible values: 50 * traditional C (`-t`) 51 * migration from traditional C and C90 (default) 52 * C90 (`-s`) 53 * C99 (`-S`) 54 * C11 (`-Ac11`) 55* In GCC mode (`-g`), lint allows several GNU extensions, 56 reducing the amount of printed messages. 57* In strict bool mode (`-T`), lint issues errors when `bool` is mixed with 58 other scalar types, reusing the existing messages 107 and 211, while also 59 defining new messages that are specific to strict bool mode. 60* The option `-a` performs the check for lossy conversions from large integer 61 types, the option `-aa` extends this check to small integer types as well, 62 reusing the same message ID. 63* The option `-X` suppresses arbitrary messages by their message ID. 64* The option `-q` enables additional queries that are not suitable as regular 65 warnings but may be interesting to look at on a case-by-case basis. 66 67# Fundamental types 68 69Lint mainly analyzes expressions (`tnode_t`), which are formed from operators 70(`op_t`) and their operands (`tnode_t`). 71Each node has a type (`type_t`) and a few other properties. 72 73## type_t 74 75The elementary types are `int`, `_Bool`, `unsigned long`, `pointer` and so on, 76as defined in `tspec_t`. 77 78Actual types like `int`, `const char *` are created by `gettyp(INT)`, 79or by deriving new types from existing types, using `block_derive_pointer`, 80`block_derive_array` and `block_derive_function`. 81(See [below](#memory-management) for the meaning of the prefix `block_`.) 82 83After a type has been created, it should not be modified anymore. 84Ideally all references to types would be `const`, but that's a lot of work. 85Before modifying a type, 86it needs to be copied using `block_dup_type` or `expr_dup_type`. 87 88## tnode_t 89 90When lint parses an expression, 91it builds a tree of nodes representing the AST. 92Each node has an operator that defines which other members may be accessed. 93The operators and their properties are defined in `ops.def`. 94Some examples for operators: 95 96| Operator | Meaning | 97|----------|---------------------------------------------------------| 98| CON | compile-time constant in `tn_val` | 99| NAME | references the identifier in `tn_sym` | 100| UPLUS | the unary operator `+tn_left` | 101| PLUS | the binary operator `tn_left + tn_right` | 102| CALL | a function call, typically CALL(LOAD(NAME("function"))) | 103| ICALL | an indirect function call | 104| CVT | an implicit conversion or an explicit cast | 105 106See `debug_node` for how to interpret the members of `tnode_t`. 107 108## sym_t 109 110There is a single symbol table (`symtab`) for the whole translation unit. 111This means that the same identifier may appear multiple times. 112To distinguish the identifiers, each symbol has a block level. 113Symbols from inner scopes are added to the beginning of the table, 114so they are found first when looking for the identifier. 115 116# Memory management 117 118## Block scope 119 120The memory that is allocated by the `block_*_alloc` functions is freed at the 121end of analyzing the block, that is, after the closing `}`. 122See `compound_statement_rbrace:` in `cgram.y`. 123 124## Expression scope 125 126The memory that is allocated by the `expr_*_alloc` functions is freed at the 127end of analyzing the expression. 128See `expr_free_all`. 129 130# Null pointers 131 132* Expressions can be null. 133 * This typically happens in case of syntax errors or other errors. 134* The subtype of a pointer, array or function is never null. 135 136# Common variable names 137 138| Name | Type | Meaning | 139|------|-----------|------------------------------------------------------| 140| t | `tspec_t` | a simple type such as `INT`, `FUNC`, `PTR` | 141| tp | `type_t` | a complete type such as `pointer to array[3] of int` | 142| stp | `type_t` | the subtype of a pointer, array or function | 143| tn | `tnode_t` | a tree node, mostly used for expressions | 144| op | `op_t` | an operator used in an expression | 145| ln | `tnode_t` | the left-hand operand of a binary operator | 146| rn | `tnode_t` | the right-hand operand of a binary operator | 147| sym | `sym_t` | a symbol from the symbol table | 148 149# Abbreviations in variable names 150 151| Abbr | Expanded | 152|------|---------------------------------------------| 153| l | left | 154| r | right | 155| o | old (during type conversions) | 156| n | new (during type conversions) | 157| op | operator | 158| arg | the number of the argument, for diagnostics | 159 160# Debugging 161 162Useful breakpoints are: 163 164| Function/Code | File | Remarks | 165|---------------------|---------|------------------------------------------------------| 166| build_binary | tree.c | Creates an expression for a unary or binary operator | 167| initialization_expr | init.c | Checks a single initializer | 168| expr | tree.c | Checks a full expression | 169| typeok | tree.c | Checks two types for compatibility | 170| vwarning_at | err.c | Prints a warning | 171| verror_at | err.c | Prints an error | 172| assert_failed | err.c | Prints the location of a failed assertion | 173| `switch (yyn)` | cgram.c | Reduction of a grammar rule | 174 175# Tests 176 177The tests are in `tests/usr.bin/xlint`. 178By default, each test is run with the lint flags `-g` for GNU mode, 179`-S` for C99 mode and `-w` to report warnings as errors. 180 181Each test can override the lint flags using comments of the following forms: 182 183* `/* lint1-flags: -tw */` replaces the default flags. 184* `/* lint1-extra-flags: -p */` adds to the default flags. 185 186Most tests check the diagnostics that lint generates. 187They do this by placing `expect` comments near the location of the diagnostic. 188The comment `/* expect+1: ... */` expects a diagnostic to be generated for the 189code 1 line below, `/* expect-5: ... */` expects a diagnostic to be generated 190for the code 5 lines above. 191Each `expect` comment must be in a single line. 192At the start and the end of the comment, the placeholder `...` stands for an 193arbitrary sequence of characters. 194There may be other code or comments in the same line of the `.c` file. 195 196Each diagnostic has its own test `msg_???.c` that triggers the corresponding 197diagnostic. 198Most other tests focus on a single feature. 199 200## Adding a new test 201 2021. Run `make add-test NAME=test_name`. 2032. Run `cd ../../../tests/usr.bin/xlint/lint1`. 2043. Sort the `FILES` lines in `Makefile`. 2054. Make the test generate the desired diagnostics. 2065. Run `./accept.sh test_name` until it no longer complains. 2076. Run `cd ../../..`. 2087. Run `cvs commit distrib/sets/lists/tests/mi tests/usr.bin/xlint`. 209