1================ 2C++ Safe Buffers 3================ 4 5.. contents:: 6 :local: 7 8 9Introduction 10============ 11 12Clang can be used to harden your C++ code against buffer overflows, an otherwise 13common security issue with C-based languages. 14 15The solution described in this document is an integrated programming model as 16it combines: 17 18- a family of opt-in Clang warnings (``-Wunsafe-buffer-usage``) emitted at 19 during compilation to help you update your code to encapsulate and propagate 20 the bounds information associated with pointers; 21- runtime assertions implemented as part of 22 (`libc++ hardening modes <https://libcxx.llvm.org/Hardening.html>`_) 23 that eliminate undefined behavior as long as the coding convention 24 is followed and the bounds information is therefore available and correct. 25 26The goal of this work is to enable development of bounds-safe C++ code. It is 27not a "push-button" solution; depending on your codebase's existing 28coding style, significant (even if largely mechanical) changes to your code 29may be necessary. However, it allows you to achieve valuable safety guarantees 30on security-critical parts of your codebase. 31 32This solution is under active development. It is already useful for its purpose 33but more work is being done to improve ergonomics and safety guarantees 34and reduce adoption costs. 35 36The solution aligns in spirit with the "Ranges" safety profile 37that was `proposed <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3274r0.pdf>`_ 38by Bjarne Stroustrup for standardization alongside other C++ safety features. 39 40 41Pre-Requisites 42============== 43 44In order to achieve bounds safety, your codebase needs to have access to 45well-encapsulated bounds-safe container, view, and iterator types. 46If your project uses libc++, standard container and view types such as 47``std::vector`` and ``std::span`` can be made bounds-safe by enabling 48the "fast" `hardening mode <https://libcxx.llvm.org/Hardening.html>`_ 49(passing ``-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST``) to your 50compiler) or any of the stricter hardening modes. 51 52In order to harden iterators, you'll need to also obtain a libc++ binary 53built with ``_LIBCPP_ABI_BOUNDED_ITERATORS`` -- which is a libc++ ABI setting 54that needs to be set for your entire target platform if you need to maintain 55binary compatibility with the rest of the platform. 56 57A relatively fresh version of C++ is recommended. In particular, the very useful 58standard view class ``std::span`` requires C++20. 59 60Other implementations of the C++ standard library may provide different 61flags to enable such hardening. 62 63If you're using custom containers and views, they will need to be hardened 64this way as well, but you don't necessarily need to do this ahead of time. 65 66This approach can theoretically be applied to plain C codebases, 67assuming that safe primitives are developed to encapsulate all buffer accesses, 68acting as "hardened custom containers" to replace raw pointers. 69However, such approach would be very unergonomic in C, and safety guarantees 70will be lower due to lack of good encapsulation technology. A better approach 71to bounds safety for non-C++ programs, 72`-fbounds-safety <https://clang.llvm.org/docs/BoundsSafety.html>`_, 73is currently in development. 74 75Technically, safety guarantees cannot be provided without hardening 76the entire technology stack, including all of your dependencies. 77However, applying such hardening technology to even a small portion 78of your code may be significantly better than nothing. 79 80 81The Programming Model for C++ 82============================= 83 84Assuming that hardened container, view, and iterator classes are available, 85what remains is to make sure they are used consistently in your code. 86Below we define the specific coding convention that needs to be followed 87in order to guarantee safety and how the compiler technology 88around ``-Wunsafe-buffer-usage`` assists with that. 89 90 91Buffer operations should never be performed over raw pointers 92------------------------------------------------------------- 93 94Every time a memory access is made, a bounds-safe program must guarantee 95that the range of accessed memory addresses falls into the boundaries 96of the memory allocated for the object that's being accessed. 97In order to establish such a guarantee, the information about such valid range 98of addresses -- the **bounds information** associated with the accessed address 99-- must be formally available every time a memory access is performed. 100 101A raw pointer does not naturally carry any bounds information. 102The bounds information for the pointer may be available *somewhere*, but 103it is not associated with the pointer in a formal manner, so a memory access 104performed through a raw pointer cannot be automatically verified to be 105bounds-safe by the compiler. 106 107That said, the Safe Buffers programming model does **not** try to eliminate 108**all** pointer usage. Instead it assumes that most pointers point to 109individual objects, not buffers, and therefore they typically aren't 110associated with buffer overflow risks. For that reason, in order to identify 111the code that requires manual intervention, it is desirable to initially shift 112the focus away from the pointers themselves, and instead focus on their 113**usage patterns**. 114 115The compiler warning ``-Wunsafe-buffer-usage`` is built to assist you 116with this step of the process. A ``-Wunsafe-buffer-usage`` warning is 117emitted whenever one of the following **buffer operations** are performed 118on a raw pointer: 119 120- array indexing with ``[]``, 121- pointer arithmetic, 122- bounds-unsafe standard C functions such as ``std::memcpy()``, 123- C++ smart pointer operations such as ``std::unique_ptr<T[N]>::operator[]()``, 124 which unfortunately cannot be made fully safe within the rules of 125 the C++ standard (as of C++23). 126 127This is sufficient for identifying each raw buffer pointer in the program at 128**at least one point** during its lifetime across your software stack. 129 130For example, both of the following functions are flagged by 131``-Wunsafe-buffer-usage`` because ``pointer`` gets identified as an unsafe 132buffer pointer. Even though the second function does not directly access 133the buffer, the pointer arithmetic operation inside it may easily be 134the only formal "hint" in the program that the pointer does indeed point 135to a buffer of multiple objects:: 136 137 int get_last_element(int *pointer, size_t size) { 138 return ptr[sz - 1]; // warning: unsafe buffer access 139 } 140 141 int *get_last_element_ptr(int *pointer, size_t size) { 142 return ptr + (size - 1); // warning: unsafe pointer arithmetic 143 } 144 145 146All buffers need to be encapsulated into safe container and view types 147---------------------------------------------------------------------- 148 149It immediately follows from the previous requirement that once an unsafe pointer 150is identified at any point during its lifetime, it should be immediately wrapped 151into a safe container type (if the allocation site is "nearby") or a safe 152view type (if the allocation site is "far away"). Not only memory accesses, 153but also non-access operations such as pointer arithmetic need to be covered 154this way in order to benefit from the respective runtime bounds checks. 155 156If a **container** type (``std::array``, ``std::vector``, ``std::string``) 157is used for allocating the buffer, this is the best-case scenario because 158the container naturally has access to the correct bounds information for the 159buffer, and the runtime bounds checks immediately kick in. Additionally, 160the container type may provide automatic lifetime management for the buffer 161(which may or may not be desirable). 162 163If a **view** type is used (``std::span``, ``std::string_view``), this typically 164means that the bounds information for the "adopted" pointer needs to be passed 165to the view's constructor manually. This makes runtime checks immediately 166kick in with respect to the provided bounds information, which is an immediate 167improvement over the raw pointer. However, this situation is still fundamentally 168insufficient for security purposes, because **bounds information provided 169this way cannot be guaranteed to be correct**. 170 171For example, the function ``get_last_element()`` we've seen in the previous 172section can be made **slightly** safer this way:: 173 174 int get_last_element(int *pointer, size_t size) { 175 std::span<int> sp(pointer, size); 176 return sp[size - 1]; // warning addressed 177 } 178 179Here ``std::span`` eliminates the potential concern that the operation 180``size - 1`` may overflow when ``sz`` is equal to ``0``, leading to a buffer 181"underrun". However, such program does not provide a guarantee that 182the variable ``sz`` correctly represents the **actual** size fo the buffer 183pointed to by ``ptr``. The ``std::span`` constructed this way may be ill-formed. 184It may fail to protect you from overrunning the original buffer. 185 186The following example demonstrates one of the most dangerous anti-patterns 187of this nature:: 188 189 void convert_data(int *source_buf, size_t source_size, 190 int *target_buf, size_t target_size) { 191 // Terrible: mismatched pointer / size. 192 std::span<int> target_span(target_buf, source_size); 193 // ... 194 } 195 196The second parameter of ``std::span`` should never be the **desired** size 197of the buffer. It should always be the **actual** size of the buffer. 198Such code often indicates that the original code has already contained 199a vulnerability -- and the use of a safe view class failed to prevent it. 200 201If ``target_span`` actually needs to be of size ``source_size``, a significantly 202safer way to produce such a span would be to build it with the correct size 203first, and then resize it to the desired size by calling ``.first()``:: 204 205 void convert_data(int *source_buf, size_t source_size, 206 int *target_buf, size_t target_size) { 207 // Safer. 208 std::span<int> target_span(target_buf, target_size).first(source_size); 209 // ... 210 } 211 212However, these are still half-measures. This code still accepts the 213bounds information from the caller in an **informal** manner, and such bounds 214information cannot be guaranteed to be correct. 215 216In order to mitigate problems of this nature in their entirety, 217the third guideline is imposed. 218 219 220Encapsulation of bounds information must be respected continuously 221------------------------------------------------------------------ 222 223The allocation site of the object is the only reliable source of bounds 224information for that object. For objects with long lifespans across 225multiple functions or even libraries in the software stack, it is essential 226to formally preserve the original bounds information as it's being passed 227from one piece of code to another. 228 229Standard container and view classes are designed to preserve bounds information 230correctly **by construction**. However, they offer a number of ways to "break" 231encapsulation, which may cause you to temporarily lose track of the correct 232bounds information: 233 234- The two-parameter constructor ``std::span(ptr, size)`` allows you to 235 assemble an ill-formed ``std::span``; 236- Conversely, you can unwrap a container or a view object into a raw pointer 237 and a raw size by calling its ``.data()`` and ``.size()`` methods. 238- The overloaded ``operator&()`` found on container and iterator classes 239 acts similarly to ``.data()`` in this regard; operations such as 240 ``&span[0]`` and ``&*span.begin()`` are effectively unsafe. 241 242Additional ``-Wunsafe-buffer-usage`` warnings are emitted when encapsulation 243of **standard** containers is broken in this manner. If you're using 244non-standard containers, you can achieve a similar effect with facilities 245described in the next section: :ref:`customization`. 246 247For example, our previous attempt to address the warning in 248``get_last_element()`` has actually introduced a new warning along the way, 249that notifies you about the potentially incorrect bounds information 250passed into the two-parameter constructor of ``std::span``:: 251 252 int get_last_element(int *pointer, size_t size) { 253 std::span<int> sp(pointer, size); // warning: unsafe constructor 254 return sp[size - 1]; 255 } 256 257In order to address this warning, you need to make the function receive 258the bounds information from the allocation site in a formal manner. 259The function doesn't necessarily need to know where the allocation site is; 260it simply needs to be able to accept bounds information **when** it's available. 261You can achieve this by refactoring the function to accept a ``std::span`` 262as a parameter:: 263 264 int get_last_element(std::span<int> sp) { 265 return sp[size - 1]; 266 } 267 268This solution puts the responsibility for making sure the span is well-formed 269on the **caller**. They should do the same, so that eventually the 270responsibility is placed on the allocation site! 271 272Such definition is also very ergonomic as it naturally accepts arbitrary 273standard containers without any additional code at the call site:: 274 275 void use_last_element() { 276 std::vector<int> vec { 1, 2, 3 }; 277 int x = get_last_element(vec); // x = 3 278 } 279 280Such code is naturally bounds-safe because bounds-information is passed down 281from the allocation site to the buffer access site. Only safe operations 282are performed on container types. The containers are never "unforged" into 283raw pointer-size pairs and never "reforged" again. This is what ideal 284bounds-safe C++ code looks like. 285 286 287.. _customization: 288 289Backwards Compatibility, Interoperation with Unsafe Code, Customization 290======================================================================= 291 292Some of the code changes described above can be somewhat intrusive. 293For example, changing a function that previously accepted a pointer and a size 294separately, to accept a ``std::span`` instead, may require you to update 295every call site of the function. This is often undesirable and sometimes 296completely unacceptable when backwards compatibility is required. 297 298In order to facilitate **incremental adoption** of the coding convention 299described above, as well as to handle various unusual situations, the compiler 300provides two additional facilities to give the user more control over 301``-Wunsafe-buffer-usage`` diagnostics: 302 303- ``#pragma clang unsafe_buffer_usage`` to mark code as unsafe and **suppress** 304 ``-Wunsafe-buffer-usage`` warnings in that code. 305- ``[[clang::unsafe_buffer_usage]]`` to annotate potential sources of 306 discontinuity of bounds information -- thus introducing 307 **additional** ``-Wunsafe-buffer-usage`` warnings. 308 309In this section we describe these facilities in detail and show how they can 310help you with various unusual situations. 311 312Suppress unwanted warnings with ``#pragma clang unsafe_buffer_usage`` 313--------------------------------------------------------------------- 314 315If you really need to write unsafe code, you can always suppress all 316``-Wunsafe-buffer-usage`` warnings in a section of code by surrounding 317that code with the ``unsafe_buffer_usage`` pragma. For example, if you don't 318want to address the warning in our example function ``get_last_element()``, 319here is how you can suppress it:: 320 321 int get_last_element(int *pointer, size_t size) { 322 #pragma clang unsafe_buffer_usage begin 323 return ptr[sz - 1]; // warning suppressed 324 #pragma clang unsafe_buffer_usage end 325 } 326 327This behavior is analogous to ``#pragma clang diagnostic`` (`documentation 328<https://clang.llvm.org/docs/UsersManual.html#controlling-diagnostics-via-pragmas>`_) 329However, ``#pragma clang unsafe_buffer_usage`` is specialized and recommended 330over ``#pragma clang diagnostic`` for a number of technical and non-technical 331reasons. Most importantly, ``#pragma clang unsafe_buffer_usage`` is more 332suitable for security audits because it is significantly simpler and 333describes unsafe code in a more formal manner. On the contrary, 334``#pragma clang diagnostic`` comes with a push/pop syntax (as opposed to 335the begin/end syntax) and it offers ways to suppress warnings without 336mentioning them by name (such as ``-Weverything``), which can make it 337difficult to determine at a glance whether the warning is suppressed 338on any given line of code. 339 340There are a few natural reasons to use this pragma: 341 342- In implementations of safe custom containers. You need this because ultimately 343 ``-Wunsafe-buffer-usage`` cannot help you verify that your custom container 344 is safe. It will naturally remind you to audit your container's implementation 345 to make sure it has all the necessary runtime checks, but ultimately you'll 346 need to suppress it once the audit is complete. 347- In performance-critical code where bounds-safety-related runtime checks 348 cause an unacceptable performance regression. The compiler can theoretically 349 optimize them away (eg. replace a repeated bounds check in a loop with 350 a single check before the loop) but it is not guaranteed to do that. 351- For incremental adoption purposes. If you want to adopt the coding convention 352 gradually, you can always surround an entire file with the 353 ``unsafe_buffer_usage`` pragma and then "make holes" in it whenever 354 you address warnings on specific portions of the code. 355- In the code that interoperates with unsafe code. This may be code that 356 will never follow the programming model (such as plain C code that will 357 never be converted to C++) or with the code that simply haven't been converted 358 yet. 359 360Interoperation with unsafe code may require a lot of suppressions. 361You are encouraged to introduce "unsafe wrapper functions" for various unsafe 362operations that you need to perform regularly. 363 364For example, if you regularly receive pointer/size pairs from unsafe code, 365you may want to introduce a wrapper function for the unsafe span constructor:: 366 367 #pragma clang unsafe_buffer_usage begin 368 369 template <typename T> 370 std::span<T> unsafe_forge_span(T *pointer, size_t size) { 371 return std::span(pointer, size); 372 } 373 374 #pragma clang unsafe_buffer_usage end 375 376Such wrapper function can be used to suppress warnings about unsafe span 377constructor usage in a more ergonomic manner:: 378 379 void use_unsafe_c_struct(unsafe_c_struct *s) { 380 // No warning here. 381 std::span<int> sp = unsafe_forge_span(s->pointer, s->size); 382 // ... 383 } 384 385The code remains unsafe but it also continues to be nicely readable, and it 386proves that ``-Wunsafe-buffer-usage`` has done it best to notify you about 387the potential unsafety. A security auditor will need to keep an eye on such 388unsafe wrappers. **It is still up to you to confirm that the bounds information 389passed into the wrapper is correct.** 390 391 392Flag bounds information discontinuities with ``[[clang::unsafe_buffer_usage]]`` 393------------------------------------------------------------------------------- 394 395The clang attribute ``[[clang::unsafe_buffer_usage]]`` 396(`attribute documentation 397<https://clang.llvm.org/docs/AttributeReference.html#unsafe-buffer-usage>`_) 398allows the user to annotate various objects, such as functions or member 399variables, as incompatible with the Safe Buffers programming model. 400You are encouraged to do that for arbitrary reasons, but typically the main 401reason to do that is when an unsafe function needs to be provided for 402backwards compatibility. 403 404For example, in the previous section we've seen how the example function 405``get_last_element()`` needed to have its parameter types changed in order 406to preserve the continuity of bounds information when receiving a buffer pointer 407from the caller. However, such a change breaks both API and ABI compatibility. 408The code that previously used this function will no longer compile, nor link, 409until every call site of that function is updated. You can reclaim the 410backwards compatibility -- in terms of both API and ABI -- by adding 411a "compatibility overload":: 412 413 int get_last_element(std::span<int> sp) { 414 return sp[size - 1]; 415 } 416 417 [[clang::unsafe_buffer_usage]] // Please use the new function. 418 int get_last_element(int *pointer, size_t size) { 419 // Avoid code duplication - simply invoke the safe function! 420 // The pragma suppresses the unsafe constructor warning. 421 #pragma clang unsafe_buffer_usage begin 422 return get_last_element(std::span(pointer, size)); 423 #pragma clang unsafe_buffer_usage end 424 } 425 426 427Such an overload allows the surrounding code to continue to work. 428It is both source-compatible and binary-compatible. It is also strictly safer 429than the original function because the unsafe buffer access through raw pointer 430is replaced with a safe ``std::span`` access no matter how it's called. However, 431because it requires the caller to pass the pointer and the size separately, 432it violates our "bounds information continuity" principle. This means that 433the callers who care about bounds safety needs to be encouraged to use the 434``std::span``-based overload instead. Luckily, the attribute 435``[[clang::unsafe_buffer_usage]]`` causes a ``-Wunsafe-buffer-usage`` warning 436to be displayed at every call site of the compatibility overload in order to 437remind the callers to update their code:: 438 439 void use_last_element() { 440 std::vector<int> vec { 1, 2, 3 }; 441 442 // no warning 443 int x = get_last_element(vec); 444 445 // warning: this overload introduces unsafe buffer manipulation 446 int x = get_last_element(vec.data(), vec.size()); 447 } 448 449The compatibility overload can be further simplified with the help of the 450``unsafe_forge_span()`` wrapper as described in the previous section -- 451and it even makes the pragmas unnecessary:: 452 453 [[clang::unsafe_buffer_usage]] // Please use the new function. 454 int get_last_element(int *pointer, size_t size) { 455 // Avoid code duplication - simply invoke the safe function! 456 return get_last_element(unsafe_forge_span(pointer, size)); 457 } 458 459Notice how the attribute ``[[clang::unsafe_buffer_usage]]`` does **not** 460suppress the warnings within the function on its own. Similarly, functions whose 461entire definitions are covered by ``#pragma clang unsafe_buffer_usage`` do 462**not** become automatically annotated with the attribute 463``[[clang::unsafe_buffer_usage]]``. They serve two different purposes: 464 465- The pragma says that the function isn't safely **written**; 466- The attribute says that the function isn't safe to **use**. 467 468Also notice how we've made an **unsafe** wrapper for a **safe** function. 469This is significantly better than making a **safe** wrapper for an **unsafe** 470function. In other words, the following solution is significantly more unsafe 471and undesirable than the previous solution:: 472 473 int get_last_element(std::span<int> sp) { 474 // You've just added that attribute, and now you need to 475 // immediately suppress the warning that comes with it? 476 #pragma clang unsafe_buffer_usage begin 477 return get_last_element(sp.data(), sp.size()); 478 #pragma clang unsafe_buffer_usage end 479 } 480 481 482 [[clang::unsafe_buffer_usage]] 483 int get_last_element(int *pointer, size_t size) { 484 // This access is still completely unchecked. What's the point of having 485 // perfect bounds information if you aren't performing runtime checks? 486 #pragma clang unsafe_buffer_usage begin 487 return ptr[sz - 1]; 488 #pragma clang unsafe_buffer_usage end 489 } 490 491**Structs and classes**, unlike functions, cannot be overloaded. If a struct 492contains an unsafe buffer (in the form of a nested array or a pointer/size pair) 493then it is typically impossible to replace them with a safe container (such as 494``std::array`` or ``std::span`` respectively) without breaking the layout 495of the struct and introducing both source and binary incompatibilities with 496the surrounding client code. 497 498Additionally, member variables of a class cannot be naturally "hidden" from 499client code. If a class needs to be used by clients who haven't updated to 500C++20 yet, you cannot use the C++20-specific ``std::span`` as a member variable 501type. If the definition of a struct is shared with plain C code that manipulates 502member variables directly, you cannot use any C++-specific types for these 503member variables. 504 505In such cases there's usually no backwards-compatible way to use safe types 506directly. The best option is usually to discourage the clients from using 507member variables directly by annotating the member variables with the attribute 508``[[clang::unsafe_buffer_usage]]``, and then to change the interface 509of the class to provide safe "accessors" to the unsafe data. 510 511For example, let's assume the worst-case scenario: ``struct foo`` is an unsafe 512struct type fully defined in a header shared between plain C code and C++ code:: 513 514 struct foo { 515 int *pointer; 516 size_t size; 517 }; 518 519In this case you can achieve safety in C++ code by annotating the member 520variables as unsafe and encapsulating them into safe accessor methods:: 521 522 struct foo { 523 [[clang::unsafe_buffer_usage]] 524 int *pointer; 525 [[clang::unsafe_buffer_usage]] 526 size_t size; 527 528 // Avoid showing this code to clients who are unable to digest it. 529 #if __cplusplus >= 202002L 530 std::span<int> get_pointer_as_span() { 531 #pragma clang unsafe_buffer_usage begin 532 return std::span(pointer, size); 533 #pragma clang unsafe_buffer_usage end 534 } 535 536 void set_pointer_from_span(std::span<int> sp) { 537 #pragma clang unsafe_buffer_usage begin 538 pointer = sp.data(); 539 size = sp.size(); 540 #pragma clang unsafe_buffer_usage end 541 } 542 543 // Potentially more utility functions. 544 #endif 545 }; 546 547Future Work 548=========== 549 550The ``-Wunsafe-buffer-usage`` technology is in active development. The warning 551is largely ready for everyday use but it is continuously improved to reduce 552unnecessary noise as well as cover some of the trickier unsafe operations. 553 554Fix-It Hints for ``-Wunsafe-buffer-usage`` 555------------------------------------------ 556 557A code transformation tool is in development that can semi-automatically 558transform large bodies of code to follow the C++ Safe Buffers programming model. 559It can currently be accessed by passing the experimental flag 560``-fsafe-buffer-usage-suggestions`` in addition to ``-Wunsafe-buffer-usage``. 561 562Fixits produced this way currently assume the default approach described 563in this document as they suggest standard containers and views (most notably 564``std::span`` and ``std::array``) as replacements for raw buffer pointers. 565This also additionally requires libc++ hardening in order to make the runtime 566bounds checks actually happen. 567 568Static Analysis to Identify Suspicious Sources of Bounds Information 569-------------------------------------------------------------------- 570 571The unsafe constructor ``span(pointer, size)`` is often a necessary evil 572when it comes to interoperation with unsafe code. However, passing the 573correct bounds information to such constructor is often difficult. 574In order to detect those ``span(target_pointer, source_size)`` anti-patterns, 575path-sensitive analysis performed by `the clang static analyzer 576<https://clang-analyzer.llvm.org>`_ can be taught to identify situations 577when the pointer and the size are coming from "suspiciously different" sources. 578 579Such analysis will be able to identify the source of information with 580significantly higher precision than that of the compiler, making it much better 581at identifying incorrect bounds information in your code while producing 582significantly fewer warnings. It will also need to bypass 583``#pragma clang unsafe_buffer_usage`` suppressions and "see through" 584unsafe wrappers such as ``unsafe_forge_span`` -- something that 585the static analyzer is naturally capable of doing. 586