xref: /llvm-project/clang/docs/BoundsSafetyImplPlans.rst (revision f47778455d28285341d0eeceb53ae7cd95d07d36)
1============================================
2Implementation plans for ``-fbounds-safety``
3============================================
4
5.. contents::
6   :local:
7
8Gradual updates with experimental flag
9======================================
10
11The feature will be implemented as a series of smaller PRs and we will guard our
12implementation with an experimental flag ``-fexperimental-bounds-safety`` until
13the usable model is fully available. Once the model is ready for use, we will
14expose the flag ``-fbounds-safety``.
15
16Possible patch sets
17-------------------
18
19* External bounds annotations and the (late) parsing logic.
20* Internal bounds annotations (wide pointers) and their parsing logic.
21* Clang code generation for wide pointers with debug information.
22* Pointer cast semantics involving bounds annotations (this could be divided
23  into multiple sub-PRs).
24* CFG analysis for pairs of related pointer and count assignments and the likes.
25* Bounds check expressions in AST and the Clang code generation (this could also
26  be divided into multiple sub-PRs).
27
28Proposed implementation
29=======================
30
31External bounds annotations
32---------------------------
33
34The bounds annotations are C type attributes appertaining to pointer types. If
35an attribute is added to the position of a declaration attribute, e.g., ``int
36*ptr __counted_by(size)``, the attribute appertains to the outermost pointer
37type of the declaration (``int *``).
38
39New sugar types
40---------------
41
42An external bounds annotation creates a type sugar of the underlying pointer
43types. We will introduce a new sugar type, ``DynamicBoundsPointerType`` to
44represent ``__counted_by`` or ``__sized_by``. Using ``AttributedType`` would not
45be sufficient because the type needs to hold the count or size expression as
46well as some metadata necessary for analysis, while this type may be implemented
47through inheritance from ``AttributedType``. Treating the annotations as type
48sugars means two types with incompatible external bounds annotations may be
49considered canonically the same types. This is sometimes necessary, for example,
50to make the ``__counted_by`` and friends not participate in function
51overloading. However, this design requires a separate logic to walk through the
52entire type hierarchy to check type compatibility of bounds annotations.
53
54Late parsing for C
55------------------
56
57A bounds annotation such as ``__counted_by(count)`` can be added to type of a
58struct field declaration where count is another field of the same struct
59declared later. Similarly, the annotation may apply to type of a function
60parameter declaration which precedes the parameter count in the same function.
61This means parsing the argument of bounds annotations must be done after the
62parser has the whole context of a struct or a function declaration. Clang has
63late parsing logic for C++ declaration attributes that require late parsing,
64while the C declaration attributes and C/C++ type attributes do not have the
65same logic. This requires introducing late parsing logic for C/C++ type
66attributes.
67
68Internal bounds annotations
69---------------------------
70
71``__indexable`` and ``__bidi_indexable`` alter pointer representations to be
72equivalent to a struct with the pointer and the corresponding bounds fields.
73Despite this difference in their representations, they are still pointers in
74terms of types of operations that are allowed and their semantics. For instance,
75a pointer dereference on a ``__bidi_indexable`` pointer will return the
76dereferenced value same as plain C pointers, modulo the extra bounds checks
77being performed before dereferencing the wide pointer. This means mapping the
78wide pointers to struct types with equivalent layout won’t be sufficient. To
79represent the wide pointers in Clang AST, we add an extra field in the
80PointerType class to indicate the internal bounds of the pointer. This ensures
81pointers of different representations are mapped to different canonical types
82while they are still treated as pointers.
83
84In LLVM IR, wide pointers will be emitted as structs of equivalent
85representations. Clang CodeGen will handle them as Aggregate in
86``TypeEvaluationKind (TEK)``. ``AggExprEmitter`` was extended to handle pointer
87operations returning wide pointers. Alternatively, a new ``TEK`` and an
88expression emitter dedicated to wide pointers could be introduced.
89
90Default bounds annotations
91--------------------------
92
93The model may implicitly add ``__bidi_indexable`` or ``__single`` depending on
94the context of the declaration that has the pointer type. ``__bidi_indexable``
95implicitly adds to local variables, while ``__single`` implicitly adds to
96pointer types specifying struct fields, function parameters, or global
97variables. This means the parser may first create the pointer type without any
98default pointer attribute and then recreate the type once the parser has the
99declaration context and determined the default attribute accordingly.
100
101This also requires the parser to reset the type of the declaration with the
102newly created type with the right default attribute.
103
104Promotion expression
105--------------------
106
107A new expression will be introduced to represent the conversion from a pointer
108with an external bounds annotation, such as ``__counted_by``, to
109``__bidi_indexable``. This type of conversion cannot be handled by normal
110CastExprs because it requires an extra subexpression(s) to provide the bounds
111information necessary to create a wide pointer.
112
113Bounds check expression
114-----------------------
115
116Bounds checks are part of semantics defined in the ``-fbounds-safety`` language
117model. Hence, exposing the bounds checks and other semantic actions in the AST
118is desirable. A new expression for bounds checks has been added to the AST. The
119bounds check expression has a ``BoundsCheckKind`` to indicate the kind of checks
120and has the additional sub-expressions that are necessary to perform the check
121according to the kind.
122
123Paired assignment check
124-----------------------
125
126``-fbounds-safety`` enforces that variables or fields related with the same
127external bounds annotation (e.g., ``buf`` and ``count`` related with
128``__counted_by`` in the example below) must be updated side by side within the
129same basic block and without side effect in between.
130
131.. code-block:: c
132
133   typedef struct {
134      int *__counted_by(count) buf; size_t count;
135   } sized_buf_t;
136
137   void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
138      sbuf->buf = (int *)malloc(sizeof(int) * nelems);
139      sbuf->count = nelems;
140   }
141
142To implement this rule, the compiler requires a linear representation of
143statements to understand the ordering and the adjacency between the two or more
144assignments. The Clang CFG is used to implement this analysis as Clang CFG
145provides a linear view of statements within each ``CFGBlock`` (Clang
146``CFGBlock`` represents a single basic block in a source-level CFG).
147
148Bounds check optimizations
149--------------------------
150
151In ``-fbounds-safety``, the Clang frontend emits run-time checks for every
152memory dereference if the type system or analyses in the frontend couldn’t
153verify its bounds safety. The implementation relies on LLVM optimizations to
154remove redundant run-time checks. Using this optimization strategy, if the
155original source code already has bounds checks, the fewer additional checks
156``-fbounds-safety`` will introduce. The LLVM ``ConstraintElimination`` pass is
157design to remove provable redundant checks (please check Florian Hahn’s
158presentation in 2021 LLVM Dev Meeting and the implementation to learn more). In
159the following example, ``-fbounds-safety`` implicitly adds the redundant bounds
160checks that the optimizer can remove:
161
162.. code-block:: c
163
164   void fill_array_with_indices(int *__counted_by(count) p, size_t count) {
165      for (size_t i = 0; i < count; ++i) {
166         // implicit bounds checks:
167         //   if (p + i < p || p + i + 1 > p + count) trap();
168         p[i] = i;
169      }
170   }
171
172``ConstraintElimination`` collects the following facts and determines if the
173bounds checks can be safely removed:
174
175* Inside the for-loop, ``0 <= i < count``, hence ``1 <= i + 1 <= count``.
176* Pointer arithmetic ``p + count`` in the if-condition doesn’t wrap.
177* ``-fbounds-safety`` treats pointer arithmetic overflow as deterministically
178  two’s complement computation, not an undefined behavior. Therefore,
179  getelementptr does not typically have inbounds keyword. However, the compiler
180  does emit inbounds for ``p + count`` in this case because
181  ``__counted_by(count)`` has the invariant that p has at least as many as
182  elements as count. Using this information, ``ConstraintElimination`` is able
183  to determine ``p + count`` doesn’t wrap.
184* Accordingly, ``p + i`` and ``p + i + 1`` also don’t wrap.
185* Therefore, ``p <= p + i`` and ``p + i + 1 <= p + count``.
186* The if-condition simplifies to false and becomes dead code that the subsequent
187  optimization passes can remove.
188
189``OptRemarks`` can be utilized to provide insights into performance tuning. It
190has the capability to report on checks that it cannot eliminate, possibly with
191reasons, allowing programmers to adjust their code to unlock further
192optimizations.
193
194Debugging
195=========
196
197Internal bounds annotations
198---------------------------
199
200Internal bounds annotations change a pointer into a wide pointer. The debugger
201needs to understand that wide pointers are essentially pointers with a struct
202layout. To handle this, a wide pointer is described as a record type in the
203debug info. The type name has a special name prefix (e.g.,
204``__bounds_safety$bidi_indexable``) which can be recognized by a debug info
205consumer to provide support that goes beyond showing the internal structure of
206the wide pointer. There are no DWARF extensions needed to support wide pointers.
207In our implementation, LLDB recognizes wide pointer types by name and
208reconstructs them as wide pointer Clang AST types for use in the expression
209evaluator.
210
211External bounds annotations
212---------------------------
213
214Similar to internal bounds annotations, external bound annotations are described
215as a typedef to their underlying pointer type in the debug info, and the bounds
216are encoded as strings in the typedef’s name (e.g.,
217``__bounds_safety$counted_by:N``).
218
219Recognizing ``-fbounds-safety`` traps
220-------------------------------------
221
222Clang emits debug info for ``-fbounds-safety`` traps as inlined functions, where
223the function name encodes the error message. LLDB implements a frame recognizer
224to surface a human-readable error cause to the end user. A debug info consumer
225that is unaware of this sees an inlined function whose name encodes an error
226message (e.g., : ``__bounds_safety$Bounds check failed``).
227
228Expression Parsing
229------------------
230
231In our implementation, LLDB’s expression evaluator does not enable the
232``-fbounds-safety`` language option because it’s currently unable to fully
233reconstruct the pointers with external bounds annotations, and also because the
234evaluator operates in C++ mode, utilizing C++ reference types, while
235``-fbounds-safety`` does not currently support C++. This means LLDB’s expression
236evaluator can only evaluate a subset of the ``-fbounds-safety`` language model.
237Specifically, it’s capable of evaluating the wide pointers that already exist in
238the source code. All other expressions are evaluated according to C/C++
239semantics.
240
241C++ support
242===========
243
244C++ has multiple options to write code in a bounds-safe manner, such as
245following the bounds-safety core guidelines and/or using hardened libc++ along
246with the `C++ Safe Buffer model
247<https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734>`_. However, these
248techniques may require ABI changes and may not be applicable to code
249interoperating with C. When the ABI of an existing program needs to be preserved
250and for headers shared between C and C++, ``-fbounds-safety`` offers a potential
251solution.
252
253``-fbounds-safety`` is not currently supported in C++, but we believe the
254general approach would be applicable for future efforts.
255