1==================== 2Constant Interpreter 3==================== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11The constexpr interpreter aims to replace the existing tree evaluator in 12clang, improving performance on constructs which are executed inefficiently 13by the evaluator. The interpreter is activated using the following flags: 14 15* ``-fexperimental-new-constant-interpreter`` enables the interpreter, 16 emitting an error if an unsupported feature is encountered 17 18Bytecode Compilation 19==================== 20 21Bytecode compilation is handled in ``Compiler.h`` for statements 22and for expressions. The compiler has two different 23backends: one to generate bytecode for functions (``ByteCodeEmitter``) and 24one to directly evaluate expressions as they are compiled, without 25generating bytecode (``EvalEmitter``). All functions are compiled to 26bytecode, while toplevel expressions used in constant contexts are directly 27evaluated since the bytecode would never be reused. This mechanism aims to 28pave the way towards replacing the evaluator, improving its performance on 29functions and loops, while being just as fast on single-use toplevel 30expressions. 31 32The interpreter relies on stack-based, strongly-typed opcodes. The glue 33logic between the code generator, along with the enumeration and 34description of opcodes, can be found in ``Opcodes.td``. The opcodes are 35implemented as generic template methods in ``Interp.h`` and instantiated 36with the relevant primitive types by the interpreter loop or by the 37evaluating emitter. 38 39Primitive Types 40--------------- 41 42* ``PT_{U|S}int{8|16|32|64}`` 43 44 Signed or unsigned integers of a specific bit width, implemented using 45 the ```Integral``` type. 46 47* ``PT_IntAP{S}`` 48 49 Signed or unsigned integers of an arbitrary, but fixed width used to 50 implement integral types which are required by the target, but are not 51 supported by the host. Under the hood, they rely on ``APInt``. The 52 ``Integral`` specialisation for these types is required by opcodes to 53 share an implementation with fixed integrals. 54 55* ``PT_Bool`` 56 57 Representation for boolean types, essentially a 1-bit unsigned 58 ``Integral``. 59 60* ``PT_Float`` 61 62 Arbitrary, but fixed precision floating point numbers. Could be 63 specialised in the future similarly to integers in order to improve 64 floating point performance. 65 66* ``PT_Ptr`` 67 68 Pointer type, defined in ``"Pointer.h"``. The most common type of 69 pointer is a "BlockPointer", which points to an ``interp::Block``. 70 But other pointer types exist, such as typeid pointers or 71 integral pointers. 72 73* ``PT_FnPtr`` 74 75 Function pointer type, can also be a null function pointer. Defined 76 in ``"FunctionPointer.h"``. 77 78* ``PT_MemberPtr`` 79 80 Member pointer type, can also be a null member pointer. Defined 81 in ``"MemberPointer.h"`` 82 83Composite types 84--------------- 85 86The interpreter distinguishes two kinds of composite types: arrays and 87records (structs and classes). Unions are represented as records, except 88at most a single field can be marked as active. The contents of inactive 89fields are kept until they are reactivated and overwritten. 90Complex numbers (``_Complex``) and vectors 91(``__attribute((vector_size(16)))``) are treated as arrays. 92 93 94Bytecode Execution 95================== 96 97Bytecode is executed using a stack-based interpreter. The execution 98context consists of an ``InterpStack``, along with a chain of 99``InterpFrame`` objects storing the call frames. Frames are built by 100call instructions and destroyed by return instructions. They perform 101one allocation to reserve space for all locals in a single block. 102These objects store all the required information to emit stack traces 103whenever evaluation fails. 104 105Memory Organisation 106=================== 107 108Memory management in the interpreter relies on 3 data structures: ``Block`` 109objects which store the data and associated inline metadata, ``Pointer`` 110objects which refer to or into blocks, and ``Descriptor`` structures which 111describe blocks and subobjects nested inside blocks. 112 113Blocks 114------ 115 116Blocks contain data interleaved with metadata. They are allocated either 117statically in the code generator (globals, static members, dummy parameter 118values etc.) or dynamically in the interpreter, when creating the frame 119containing the local variables of a function. Blocks are associated with a 120descriptor that characterises the entire allocation, along with a few 121additional attributes: 122 123* ``IsStatic`` indicates whether the block has static duration in the 124 interpreter, i.e. it is not a local in a frame. 125 126* ``DeclID`` identifies each global declaration (it is set to an invalid 127 and irrelevant value for locals) in order to prevent illegal writes and 128 reads involving globals and temporaries with static storage duration. 129 130Static blocks are never deallocated, but local ones might be deallocated 131even when there are live pointers to them. Pointers are only valid as 132long as the blocks they point to are valid, so a block with pointers to 133it whose lifetime ends is kept alive until all pointers to it go out of 134scope. Since the frame is destroyed on function exit, such blocks are 135turned into a ``DeadBlock`` and copied to storage managed by the 136interpreter itself, not the frame. Reads and writes to these blocks are 137illegal and cause an appropriate diagnostic to be emitted. When the last 138pointer goes out of scope, dead blocks are also deallocated. 139 140The lifetime of blocks is managed through 3 methods stored in the 141descriptor of the block: 142 143* **CtorFn**: initializes the metadata which is store in the block, 144 alongside actual data. Invokes the default constructors of objects 145 which are not trivial (``Pointer``, ``RealFP``, etc.) 146 147* **DtorFn**: invokes the destructors of non-trivial objects. 148 149* **MoveFn**: moves a block to dead storage. 150 151Non-static blocks track all the pointers into them through an intrusive 152doubly-linked list, required to adjust and invalidate all pointers when 153transforming a block into a dead block. If the lifetime of an object ends, 154all pointers to it are invalidated, emitting the appropriate diagnostics when 155dereferenced. 156 157The interpreter distinguishes 3 different kinds of blocks: 158 159* **Primitives** 160 161 A block containing a single primitive with no additional metadata. 162 163* **Arrays of primitives** 164 165 An array of primitives contains a pointer to an ``InitMap`` storage as its 166 first field: the initialisation map is a bit map indicating all elements of 167 the array which were initialised. If the pointer is null, no elements were 168 initialised, while a value of ``(InitMap*)-1`` indicates that the object was 169 fully initialised. When all fields are initialised, the map is deallocated 170 and replaced with that token. 171 172 Array elements are stored sequentially, without padding, after the pointer 173 to the map. 174 175* **Arrays of composites and records** 176 177 Each element in an array of composites is preceded by an ``InlineDescriptor`` 178 which stores the attributes specific to the field and not the whole 179 allocation site. Descriptors and elements are stored sequentially in the 180 block. 181 Records are laid out identically to arrays of composites: each field and base 182 class is preceded by an inline descriptor. The ``InlineDescriptor`` 183 has the following fields: 184 185 * **Offset**: byte offset into the array or record, used to step back to the 186 parent array or record. 187 * **IsConst**: flag indicating if the field is const-qualified. 188 * **IsInitialized**: flag indicating whether the field or element was 189 initialized. For non-primitive fields, this is only relevant to determine 190 the dynamic type of objects during construction. 191 * **IsBase**: flag indicating whether the record is a base class. In that 192 case, the offset can be used to identify the derived class. 193 * **IsActive**: indicates if the field is the active field of a union. 194 * **IsMutable**: indicates if the field is marked as mutable. 195 196Inline descriptors are filled in by the `CtorFn` of blocks, which leaves storage 197in an uninitialised, but valid state. 198 199Descriptors 200----------- 201 202Descriptors are generated at bytecode compilation time and contain information 203required to determine if a particular memory access is allowed in constexpr. 204They also carry all the information required to emit a diagnostic involving 205a memory access, such as the declaration which originates the block. 206Currently there is a single kind of descriptor encoding information for all 207block types. 208 209Pointers 210-------- 211 212Pointers, implemented in ``Pointer.h`` are represented as a tagged union. 213 214 * **BlockPointer**: used to reference memory allocated and managed by the 215 interpreter, being the only pointer kind which allows dereferencing in the 216 interpreter 217 * **TypeIDPointer**: tracks information for the opaque type returned by 218 ``typeid`` 219 * **IntegralPointer**: a pointer formed from an integer, 220 think ``(int*)123``. 221 222Besides the previously mentioned union, a number of other pointer-like types 223have their own type: 224 225 * **FunctionPointer** tracks functions. 226 * **MemberPointer** tracks C++ object members 227 228BlockPointer 229~~~~~~~~~~~~ 230 231Block pointers track a ``Pointee``, the block to which they point, along 232with a ``Base`` and an ``Offset``. The base identifies the innermost field, 233while the offset points to an array element relative to the base (including 234one-past-end pointers). The offset identifies the array element or field 235which is referenced, while the base points to the outer object or array which 236contains the field. These two fields allow all pointers to be uniquely 237identified, disambiguated and characterised. 238 239As an example, consider the following structure: 240 241.. code-block:: c 242 243 struct A { 244 struct B { 245 int x; 246 int y; 247 } b; 248 struct C { 249 int a; 250 int b; 251 } c[2]; 252 int z; 253 }; 254 constexpr A a; 255 256On the target, ``&a`` and ``&a.b.x`` are equal. So are ``&a.c[0]`` and 257``&a.c[0].a``. In the interpreter, all these pointers must be 258distinguished since the are all allowed to address distinct range of 259memory. 260 261In the interpreter, the object would require 240 bytes of storage and 262would have its field interleaved with metadata. The pointers which can 263be derived to the object are illustrated in the following diagram: 264 265:: 266 267 0 16 32 40 56 64 80 96 112 120 136 144 160 176 184 200 208 224 240 268 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 269 + B | D | D | x | D | y | D | D | D | a | D | b | D | D | a | D | b | D | z | 270 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ 271 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 272 | | | | | | | &a.c[0].b | | &a.c[1].b | 273 a |&a.b.x &a.y &a.c |&a.c[0].a |&a.c[1].a | 274 &a.b &a.c[0] &a.c[1] &a.z 275 276The ``Base`` offset of all pointers points to the start of a field or 277an array and is preceded by an inline descriptor (unless ``Base`` is 278zero, pointing to the root). All the relevant attributes can be read 279from either the inline descriptor or the descriptor of the block. 280 281 282Array elements are identified by the ``Offset`` field of pointers, 283pointing to past the inline descriptors for composites and before 284the actual data in the case of primitive arrays. The ``Offset`` 285points to the offset where primitives can be read from. As an example, 286``a.c + 1`` would have the same base as ``a.c`` since it is an element 287of ``a.c``, but its offset would point to ``&a.c[1]``. The 288array-to-pointer decay operation adjusts a pointer to an array (where 289the offset is equal to the base) to a pointer to the first element. 290 291TypeInfoPointer 292~~~~~~~~~~~~~~~ 293 294``TypeInfoPointer`` tracks two types: the type assigned to 295``std::type_info`` and the type which was passed to ``typeinfo``. 296It is part of the taged union in ``Pointer``. 297