1============================================ 2Implementation plans for ``-fbounds-safety`` 3============================================ 4 5.. contents:: 6 :local: 7 8Gradual updates with experimental flag 9====================================== 10 11The feature will be implemented as a series of smaller PRs and we will guard our 12implementation with an experimental flag ``-fexperimental-bounds-safety`` until 13the usable model is fully available. Once the model is ready for use, we will 14expose the flag ``-fbounds-safety``. 15 16Possible patch sets 17------------------- 18 19* External bounds annotations and the (late) parsing logic. 20* Internal bounds annotations (wide pointers) and their parsing logic. 21* Clang code generation for wide pointers with debug information. 22* Pointer cast semantics involving bounds annotations (this could be divided 23 into multiple sub-PRs). 24* CFG analysis for pairs of related pointer and count assignments and the likes. 25* Bounds check expressions in AST and the Clang code generation (this could also 26 be divided into multiple sub-PRs). 27 28Proposed implementation 29======================= 30 31External bounds annotations 32--------------------------- 33 34The bounds annotations are C type attributes appertaining to pointer types. If 35an attribute is added to the position of a declaration attribute, e.g., ``int 36*ptr __counted_by(size)``, the attribute appertains to the outermost pointer 37type of the declaration (``int *``). 38 39New sugar types 40--------------- 41 42An external bounds annotation creates a type sugar of the underlying pointer 43types. We will introduce a new sugar type, ``DynamicBoundsPointerType`` to 44represent ``__counted_by`` or ``__sized_by``. Using ``AttributedType`` would not 45be sufficient because the type needs to hold the count or size expression as 46well as some metadata necessary for analysis, while this type may be implemented 47through inheritance from ``AttributedType``. Treating the annotations as type 48sugars means two types with incompatible external bounds annotations may be 49considered canonically the same types. This is sometimes necessary, for example, 50to make the ``__counted_by`` and friends not participate in function 51overloading. However, this design requires a separate logic to walk through the 52entire type hierarchy to check type compatibility of bounds annotations. 53 54Late parsing for C 55------------------ 56 57A bounds annotation such as ``__counted_by(count)`` can be added to type of a 58struct field declaration where count is another field of the same struct 59declared later. Similarly, the annotation may apply to type of a function 60parameter declaration which precedes the parameter count in the same function. 61This means parsing the argument of bounds annotations must be done after the 62parser has the whole context of a struct or a function declaration. Clang has 63late parsing logic for C++ declaration attributes that require late parsing, 64while the C declaration attributes and C/C++ type attributes do not have the 65same logic. This requires introducing late parsing logic for C/C++ type 66attributes. 67 68Internal bounds annotations 69--------------------------- 70 71``__indexable`` and ``__bidi_indexable`` alter pointer representations to be 72equivalent to a struct with the pointer and the corresponding bounds fields. 73Despite this difference in their representations, they are still pointers in 74terms of types of operations that are allowed and their semantics. For instance, 75a pointer dereference on a ``__bidi_indexable`` pointer will return the 76dereferenced value same as plain C pointers, modulo the extra bounds checks 77being performed before dereferencing the wide pointer. This means mapping the 78wide pointers to struct types with equivalent layout won’t be sufficient. To 79represent the wide pointers in Clang AST, we add an extra field in the 80PointerType class to indicate the internal bounds of the pointer. This ensures 81pointers of different representations are mapped to different canonical types 82while they are still treated as pointers. 83 84In LLVM IR, wide pointers will be emitted as structs of equivalent 85representations. Clang CodeGen will handle them as Aggregate in 86``TypeEvaluationKind (TEK)``. ``AggExprEmitter`` was extended to handle pointer 87operations returning wide pointers. Alternatively, a new ``TEK`` and an 88expression emitter dedicated to wide pointers could be introduced. 89 90Default bounds annotations 91-------------------------- 92 93The model may implicitly add ``__bidi_indexable`` or ``__single`` depending on 94the context of the declaration that has the pointer type. ``__bidi_indexable`` 95implicitly adds to local variables, while ``__single`` implicitly adds to 96pointer types specifying struct fields, function parameters, or global 97variables. This means the parser may first create the pointer type without any 98default pointer attribute and then recreate the type once the parser has the 99declaration context and determined the default attribute accordingly. 100 101This also requires the parser to reset the type of the declaration with the 102newly created type with the right default attribute. 103 104Promotion expression 105-------------------- 106 107A new expression will be introduced to represent the conversion from a pointer 108with an external bounds annotation, such as ``__counted_by``, to 109``__bidi_indexable``. This type of conversion cannot be handled by normal 110CastExprs because it requires an extra subexpression(s) to provide the bounds 111information necessary to create a wide pointer. 112 113Bounds check expression 114----------------------- 115 116Bounds checks are part of semantics defined in the ``-fbounds-safety`` language 117model. Hence, exposing the bounds checks and other semantic actions in the AST 118is desirable. A new expression for bounds checks has been added to the AST. The 119bounds check expression has a ``BoundsCheckKind`` to indicate the kind of checks 120and has the additional sub-expressions that are necessary to perform the check 121according to the kind. 122 123Paired assignment check 124----------------------- 125 126``-fbounds-safety`` enforces that variables or fields related with the same 127external bounds annotation (e.g., ``buf`` and ``count`` related with 128``__counted_by`` in the example below) must be updated side by side within the 129same basic block and without side effect in between. 130 131.. code-block:: c 132 133 typedef struct { 134 int *__counted_by(count) buf; size_t count; 135 } sized_buf_t; 136 137 void alloc_buf(sized_buf_t *sbuf, sized_t nelems) { 138 sbuf->buf = (int *)malloc(sizeof(int) * nelems); 139 sbuf->count = nelems; 140 } 141 142To implement this rule, the compiler requires a linear representation of 143statements to understand the ordering and the adjacency between the two or more 144assignments. The Clang CFG is used to implement this analysis as Clang CFG 145provides a linear view of statements within each ``CFGBlock`` (Clang 146``CFGBlock`` represents a single basic block in a source-level CFG). 147 148Bounds check optimizations 149-------------------------- 150 151In ``-fbounds-safety``, the Clang frontend emits run-time checks for every 152memory dereference if the type system or analyses in the frontend couldn’t 153verify its bounds safety. The implementation relies on LLVM optimizations to 154remove redundant run-time checks. Using this optimization strategy, if the 155original source code already has bounds checks, the fewer additional checks 156``-fbounds-safety`` will introduce. The LLVM ``ConstraintElimination`` pass is 157design to remove provable redundant checks (please check Florian Hahn’s 158presentation in 2021 LLVM Dev Meeting and the implementation to learn more). In 159the following example, ``-fbounds-safety`` implicitly adds the redundant bounds 160checks that the optimizer can remove: 161 162.. code-block:: c 163 164 void fill_array_with_indices(int *__counted_by(count) p, size_t count) { 165 for (size_t i = 0; i < count; ++i) { 166 // implicit bounds checks: 167 // if (p + i < p || p + i + 1 > p + count) trap(); 168 p[i] = i; 169 } 170 } 171 172``ConstraintElimination`` collects the following facts and determines if the 173bounds checks can be safely removed: 174 175* Inside the for-loop, ``0 <= i < count``, hence ``1 <= i + 1 <= count``. 176* Pointer arithmetic ``p + count`` in the if-condition doesn’t wrap. 177* ``-fbounds-safety`` treats pointer arithmetic overflow as deterministically 178 two’s complement computation, not an undefined behavior. Therefore, 179 getelementptr does not typically have inbounds keyword. However, the compiler 180 does emit inbounds for ``p + count`` in this case because 181 ``__counted_by(count)`` has the invariant that p has at least as many as 182 elements as count. Using this information, ``ConstraintElimination`` is able 183 to determine ``p + count`` doesn’t wrap. 184* Accordingly, ``p + i`` and ``p + i + 1`` also don’t wrap. 185* Therefore, ``p <= p + i`` and ``p + i + 1 <= p + count``. 186* The if-condition simplifies to false and becomes dead code that the subsequent 187 optimization passes can remove. 188 189``OptRemarks`` can be utilized to provide insights into performance tuning. It 190has the capability to report on checks that it cannot eliminate, possibly with 191reasons, allowing programmers to adjust their code to unlock further 192optimizations. 193 194Debugging 195========= 196 197Internal bounds annotations 198--------------------------- 199 200Internal bounds annotations change a pointer into a wide pointer. The debugger 201needs to understand that wide pointers are essentially pointers with a struct 202layout. To handle this, a wide pointer is described as a record type in the 203debug info. The type name has a special name prefix (e.g., 204``__bounds_safety$bidi_indexable``) which can be recognized by a debug info 205consumer to provide support that goes beyond showing the internal structure of 206the wide pointer. There are no DWARF extensions needed to support wide pointers. 207In our implementation, LLDB recognizes wide pointer types by name and 208reconstructs them as wide pointer Clang AST types for use in the expression 209evaluator. 210 211External bounds annotations 212--------------------------- 213 214Similar to internal bounds annotations, external bound annotations are described 215as a typedef to their underlying pointer type in the debug info, and the bounds 216are encoded as strings in the typedef’s name (e.g., 217``__bounds_safety$counted_by:N``). 218 219Recognizing ``-fbounds-safety`` traps 220------------------------------------- 221 222Clang emits debug info for ``-fbounds-safety`` traps as inlined functions, where 223the function name encodes the error message. LLDB implements a frame recognizer 224to surface a human-readable error cause to the end user. A debug info consumer 225that is unaware of this sees an inlined function whose name encodes an error 226message (e.g., : ``__bounds_safety$Bounds check failed``). 227 228Expression Parsing 229------------------ 230 231In our implementation, LLDB’s expression evaluator does not enable the 232``-fbounds-safety`` language option because it’s currently unable to fully 233reconstruct the pointers with external bounds annotations, and also because the 234evaluator operates in C++ mode, utilizing C++ reference types, while 235``-fbounds-safety`` does not currently support C++. This means LLDB’s expression 236evaluator can only evaluate a subset of the ``-fbounds-safety`` language model. 237Specifically, it’s capable of evaluating the wide pointers that already exist in 238the source code. All other expressions are evaluated according to C/C++ 239semantics. 240 241C++ support 242=========== 243 244C++ has multiple options to write code in a bounds-safe manner, such as 245following the bounds-safety core guidelines and/or using hardened libc++ along 246with the `C++ Safe Buffer model 247<https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734>`_. However, these 248techniques may require ABI changes and may not be applicable to code 249interoperating with C. When the ABI of an existing program needs to be preserved 250and for headers shared between C and C++, ``-fbounds-safety`` offers a potential 251solution. 252 253``-fbounds-safety`` is not currently supported in C++, but we believe the 254general approach would be applicable for future efforts. 255