xref: /llvm-project/clang/docs/ConstantInterpreter.rst (revision e4009ed3d68ba8d9e78721ce5afc2b3a7edd6f36)
1====================
2Constant Interpreter
3====================
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11The constexpr interpreter aims to replace the existing tree evaluator in
12clang, improving performance on constructs which are executed inefficiently
13by the evaluator. The interpreter is activated using the following flags:
14
15* ``-fexperimental-new-constant-interpreter`` enables the interpreter,
16  emitting an error if an unsupported feature is encountered
17
18Bytecode Compilation
19====================
20
21Bytecode compilation is handled in ``Compiler.h`` for statements
22and for expressions. The compiler has two different
23backends: one to generate bytecode for functions (``ByteCodeEmitter``) and
24one to directly evaluate expressions as they are compiled, without
25generating bytecode (``EvalEmitter``). All functions are compiled to
26bytecode, while toplevel expressions used in constant contexts are directly
27evaluated since the bytecode would never be reused. This mechanism aims to
28pave the way towards replacing the evaluator, improving its performance on
29functions and loops, while being just as fast on single-use toplevel
30expressions.
31
32The interpreter relies on stack-based, strongly-typed opcodes. The glue
33logic between the code generator, along with the enumeration and
34description of opcodes, can be found in ``Opcodes.td``. The opcodes are
35implemented as generic template methods in ``Interp.h`` and instantiated
36with the relevant primitive types by the interpreter loop or by the
37evaluating emitter.
38
39Primitive Types
40---------------
41
42* ``PT_{U|S}int{8|16|32|64}``
43
44  Signed or unsigned integers of a specific bit width, implemented using
45  the ```Integral``` type.
46
47* ``PT_IntAP{S}``
48
49  Signed or unsigned integers of an arbitrary, but fixed width used to
50  implement integral types which are required by the target, but are not
51  supported by the host. Under the hood, they rely on ``APInt``. The
52  ``Integral`` specialisation for these types is required by opcodes to
53  share an implementation with fixed integrals.
54
55* ``PT_Bool``
56
57  Representation for boolean types, essentially a 1-bit unsigned
58  ``Integral``.
59
60* ``PT_Float``
61
62  Arbitrary, but fixed precision floating point numbers. Could be
63  specialised in the future similarly to integers in order to improve
64  floating point performance.
65
66* ``PT_Ptr``
67
68  Pointer type, defined in ``"Pointer.h"``. The most common type of
69  pointer is a "BlockPointer", which points to an ``interp::Block``.
70  But other pointer types exist, such as typeid pointers or
71  integral pointers.
72
73* ``PT_FnPtr``
74
75  Function pointer type, can also be a null function pointer. Defined
76  in ``"FunctionPointer.h"``.
77
78* ``PT_MemberPtr``
79
80  Member pointer type, can also be a null member pointer. Defined
81  in ``"MemberPointer.h"``
82
83Composite types
84---------------
85
86The interpreter distinguishes two kinds of composite types: arrays and
87records (structs and classes). Unions are represented as records, except
88at most a single field can be marked as active. The contents of inactive
89fields are kept until they are reactivated and overwritten.
90Complex numbers (``_Complex``) and vectors
91(``__attribute((vector_size(16)))``) are treated as arrays.
92
93
94Bytecode Execution
95==================
96
97Bytecode is executed using a stack-based interpreter. The execution
98context consists of an ``InterpStack``, along with a chain of
99``InterpFrame`` objects storing the call frames. Frames are built by
100call instructions and destroyed by return instructions. They perform
101one allocation to reserve space for all locals in a single block.
102These objects store all the required information to emit stack traces
103whenever evaluation fails.
104
105Memory Organisation
106===================
107
108Memory management in the interpreter relies on 3 data structures: ``Block``
109objects which store the data and associated inline metadata, ``Pointer``
110objects which refer to or into blocks, and ``Descriptor`` structures which
111describe blocks and subobjects nested inside blocks.
112
113Blocks
114------
115
116Blocks contain data interleaved with metadata. They are allocated either
117statically in the code generator (globals, static members, dummy parameter
118values etc.) or dynamically in the interpreter, when creating the frame
119containing the local variables of a function. Blocks are associated with a
120descriptor that characterises the entire allocation, along with a few
121additional attributes:
122
123* ``IsStatic`` indicates whether the block has static duration in the
124  interpreter, i.e. it is not a local in a frame.
125
126* ``DeclID`` identifies each global declaration (it is set to an invalid
127  and irrelevant value for locals) in order to prevent illegal writes and
128  reads involving globals and temporaries with static storage duration.
129
130Static blocks are never deallocated, but local ones might be deallocated
131even when there are live pointers to them. Pointers are only valid as
132long as the blocks they point to are valid, so a block with pointers to
133it whose lifetime ends is kept alive until all pointers to it go out of
134scope. Since the frame is destroyed on function exit, such blocks are
135turned into a ``DeadBlock`` and copied to storage managed by the
136interpreter itself, not the frame. Reads and writes to these blocks are
137illegal and cause an appropriate diagnostic to be emitted. When the last
138pointer goes out of scope, dead blocks are also deallocated.
139
140The lifetime of blocks is managed through 3 methods stored in the
141descriptor of the block:
142
143* **CtorFn**: initializes the metadata which is store in the block,
144  alongside actual data. Invokes the default constructors of objects
145  which are not trivial (``Pointer``, ``RealFP``, etc.)
146
147* **DtorFn**: invokes the destructors of non-trivial objects.
148
149* **MoveFn**: moves a block to dead storage.
150
151Non-static blocks track all the pointers into them through an intrusive
152doubly-linked list, required to adjust and invalidate all pointers when
153transforming a block into a dead block. If the lifetime of an object ends,
154all pointers to it are invalidated, emitting the appropriate diagnostics when
155dereferenced.
156
157The interpreter distinguishes 3 different kinds of blocks:
158
159* **Primitives**
160
161  A block containing a single primitive with no additional metadata.
162
163* **Arrays of primitives**
164
165  An array of primitives contains a pointer to an ``InitMap`` storage as its
166  first field: the initialisation map is a bit map indicating all elements of
167  the array which were initialised. If the pointer is null, no elements were
168  initialised, while a value of ``(InitMap*)-1`` indicates that the object was
169  fully initialised. When all fields are initialised, the map is deallocated
170  and replaced with that token.
171
172  Array elements are stored sequentially, without padding, after the pointer
173  to the map.
174
175* **Arrays of composites and records**
176
177  Each element in an array of composites is preceded by an ``InlineDescriptor``
178  which stores the attributes specific to the field and not the whole
179  allocation site. Descriptors and elements are stored sequentially in the
180  block.
181  Records are laid out identically to arrays of composites: each field and base
182  class is preceded by an inline descriptor. The ``InlineDescriptor``
183  has the following fields:
184
185   * **Offset**: byte offset into the array or record, used to step back to the
186     parent array or record.
187   * **IsConst**: flag indicating if the field is const-qualified.
188   * **IsInitialized**: flag indicating whether the field or element was
189     initialized. For non-primitive fields, this is only relevant to determine
190     the dynamic type of objects during construction.
191   * **IsBase**: flag indicating whether the record is a base class. In that
192     case, the offset can be used to identify the derived class.
193   * **IsActive**: indicates if the field is the active field of a union.
194   * **IsMutable**: indicates if the field is marked as mutable.
195
196Inline descriptors are filled in by the `CtorFn` of blocks, which leaves storage
197in an uninitialised, but valid state.
198
199Descriptors
200-----------
201
202Descriptors are generated at bytecode compilation time and contain information
203required to determine if a particular memory access is allowed in constexpr.
204They also carry all the information required to emit a diagnostic involving
205a memory access, such as the declaration which originates the block.
206Currently there is a single kind of descriptor encoding information for all
207block types.
208
209Pointers
210--------
211
212Pointers, implemented in ``Pointer.h`` are represented as a tagged union.
213
214 * **BlockPointer**: used to reference memory allocated and managed by the
215   interpreter, being the only pointer kind which allows dereferencing in the
216   interpreter
217 * **TypeIDPointer**: tracks information for the opaque type returned by
218   ``typeid``
219 * **IntegralPointer**: a pointer formed from an integer,
220   think ``(int*)123``.
221
222Besides the previously mentioned union, a number of other pointer-like types
223have their own type:
224
225 * **FunctionPointer** tracks functions.
226 * **MemberPointer** tracks C++ object members
227
228BlockPointer
229~~~~~~~~~~~~
230
231Block pointers track a ``Pointee``, the block to which they point, along
232with a ``Base`` and an ``Offset``. The base identifies the innermost field,
233while the offset points to an array element relative to the base (including
234one-past-end pointers). The offset identifies the array element or field
235which is referenced, while the base points to the outer object or array which
236contains the field. These two fields allow all pointers to be uniquely
237identified, disambiguated and characterised.
238
239As an example, consider the following structure:
240
241.. code-block:: c
242
243    struct A {
244        struct B {
245            int x;
246            int y;
247        } b;
248        struct C {
249            int a;
250            int b;
251        } c[2];
252        int z;
253    };
254    constexpr A a;
255
256On the target, ``&a`` and ``&a.b.x`` are equal. So are ``&a.c[0]`` and
257``&a.c[0].a``. In the interpreter, all these pointers must be
258distinguished since the are all allowed to address distinct range of
259memory.
260
261In the interpreter, the object would require 240 bytes of storage and
262would have its field interleaved with metadata. The pointers which can
263be derived to the object are illustrated in the following diagram:
264
265::
266
267      0   16  32  40  56  64  80  96  112 120 136 144 160 176 184 200 208 224 240
268  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
269  + B | D | D | x | D | y | D | D | D | a | D | b | D | D | a | D | b | D | z |
270  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
271      ^   ^   ^       ^       ^   ^   ^       ^       ^   ^       ^       ^
272      |   |   |       |       |   |   |   &a.c[0].b   |   |   &a.c[1].b   |
273      a   |&a.b.x   &a.y    &a.c  |&a.c[0].a          |&a.c[1].a          |
274        &a.b                   &a.c[0]            &a.c[1]               &a.z
275
276The ``Base`` offset of all pointers points to the start of a field or
277an array and is preceded by an inline descriptor (unless ``Base`` is
278zero, pointing to the root). All the relevant attributes can be read
279from either the inline descriptor or the descriptor of the block.
280
281
282Array elements are identified by the ``Offset`` field of pointers,
283pointing to past the inline descriptors for composites and before
284the actual data in the case of primitive arrays. The ``Offset``
285points to the offset where primitives can be read from. As an example,
286``a.c + 1`` would have the same base as ``a.c`` since it is an element
287of ``a.c``, but its offset would point to ``&a.c[1]``. The
288array-to-pointer decay operation adjusts a pointer to an array (where
289the offset is equal to the base) to a pointer to the first element.
290
291TypeInfoPointer
292~~~~~~~~~~~~~~~
293
294``TypeInfoPointer`` tracks two types: the type assigned to
295``std::type_info`` and the type which was passed to ``typeinfo``.
296It is part of the taged union in ``Pointer``.
297