xref: /openbsd-src/gnu/llvm/clang/docs/HardwareAssistedAddressSanitizerDesign.rst (revision 4e1ee0786f11cc571bd0be17d38e46f635c719fc)
1=======================================================
2Hardware-assisted AddressSanitizer Design Documentation
3=======================================================
4
5This page is a design document for
6**hardware-assisted AddressSanitizer** (or **HWASAN**)
7a tool similar to :doc:`AddressSanitizer`,
8but based on partial hardware assistance.
9
10
11Introduction
12============
13
14:doc:`AddressSanitizer`
15tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*),
16uses *redzones* to find buffer-overflows and
17*quarantine* to find use-after-free.
18The redzones, the quarantine, and, to a less extent, the shadow, are the
19sources of AddressSanitizer's memory overhead.
20See the `AddressSanitizer paper`_ for details.
21
22AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows
23software to use 8 most significant bits of a 64-bit pointer as
24a tag. HWASAN uses `Address Tagging`_
25to implement a memory safety tool, similar to :doc:`AddressSanitizer`,
26but with smaller memory overhead and slightly different (mostly better)
27accuracy guarantees.
28
29Algorithm
30=========
31* Every heap/stack/global memory object is forcibly aligned by `TG` bytes
32  (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**.
33* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8)
34* The pointer to the object is tagged with `T`.
35* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory)
36* Every load and store is instrumented to read the memory tag and compare it
37  with the pointer tag, exception is raised on tag mismatch.
38
39For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf
40
41Short granules
42--------------
43
44A short granule is a granule of size between 1 and `TG-1` bytes. The size
45of a short granule is stored at the location in shadow memory where the
46granule's tag is normally stored, while the granule's actual tag is stored
47in the last byte of the granule. This means that in order to verify that a
48pointer tag matches a memory tag, HWASAN must check for two possibilities:
49
50* the pointer tag is equal to the memory tag in shadow memory, or
51* the shadow memory tag is actually a short granule size, the value being loaded
52  is in bounds of the granule and the pointer tag is equal to the last byte of
53  the granule.
54
55Pointer tags between 1 to `TG-1` are possible and are as likely as any other
56tag. This means that these tags in memory have two interpretations: the full
57tag interpretation (where the pointer tag is between 1 and `TG-1` and the
58last byte of the granule is ordinary data) and the short tag interpretation
59(where the pointer tag is stored in the granule).
60
61When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
62will show both the memory tag and the last byte of the granule. Currently,
63it is up to the user to disambiguate the two possibilities.
64
65Instrumentation
66===============
67
68Memory Accesses
69---------------
70In the majority of cases, memory accesses are prefixed with a call to
71an outlined instruction sequence that verifies the tags. The code size
72and performance overhead of the call is reduced by using a custom calling
73convention that
74
75* preserves most registers, and
76* is specialized to the register containing the address, and the type and
77  size of the memory access.
78
79Currently, the following sequence is used:
80
81.. code-block:: none
82
83  // int foo(int *a) { return *a; }
84  // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c
85  [...]
86  foo:
87        str     x30, [sp, #-16]!
88        adrp    x9, :got:__hwasan_shadow                // load shadow address from GOT into x9
89        ldr     x9, [x9, :got_lo12:__hwasan_shadow]
90        bl      __hwasan_check_x0_2_short               // call outlined tag check
91                                                        // (arguments: x0 = address, x9 = shadow base;
92                                                        // "2" encodes the access type and size)
93        ldr     w0, [x0]                                // inline load
94        ldr     x30, [sp], #16
95        ret
96
97  [...]
98  __hwasan_check_x0_2_short:
99        ubfx    x16, x0, #4, #52                        // shadow offset
100        ldrb    w16, [x9, x16]                          // load shadow tag
101        cmp     x16, x0, lsr #56                        // extract address tag, compare with shadow tag
102        b.ne    .Ltmp0                                  // jump to short tag handler on mismatch
103  .Ltmp1:
104        ret
105  .Ltmp0:
106        cmp     w16, #15                                // is this a short tag?
107        b.hi    .Ltmp2                                  // if not, error
108        and     x17, x0, #0xf                           // find the address's position in the short granule
109        add     x17, x17, #3                            // adjust to the position of the last byte loaded
110        cmp     w16, w17                                // check that position is in bounds
111        b.ls    .Ltmp2                                  // if not, error
112        orr     x16, x0, #0xf                           // compute address of last byte of granule
113        ldrb    w16, [x16]                              // load tag from it
114        cmp     x16, x0, lsr #56                        // compare with pointer tag
115        b.eq    .Ltmp1                                  // if matches, continue
116  .Ltmp2:
117        stp     x0, x1, [sp, #-256]!                    // save original x0, x1 on stack (they will be overwritten)
118        stp     x29, x30, [sp, #232]                    // create frame record
119        mov     x1, #2                                  // set x1 to a constant indicating the type of failure
120        adrp    x16, :got:__hwasan_tag_mismatch_v2      // call runtime function to save remaining registers and report error
121        ldr     x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler)
122        br      x16
123
124Heap
125----
126
127Tagging the heap memory/pointers is done by `malloc`.
128This can be based on any malloc that forces all objects to be TG-aligned.
129`free` tags the memory with a different tag.
130
131Stack
132-----
133
134Stack frames are instrumented by aligning all non-promotable allocas
135by `TG` and tagging stack memory in function prologue and epilogue.
136
137Tags for different allocas in one function are **not** generated
138independently; doing that in a function with `M` allocas would require
139maintaining `M` live stack pointers, significantly increasing register
140pressure. Instead we generate a single base tag value in the prologue,
141and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where
142ReTag can be as simple as exclusive-or with constant `M`.
143
144Stack instrumentation is expected to be a major source of overhead,
145but could be optional.
146
147Globals
148-------
149
150Most globals in HWASAN instrumented code are tagged. This is accomplished
151using the following mechanisms:
152
153  * The address of each global has a static tag associated with it. The first
154    defined global in a translation unit has a pseudorandom tag associated
155    with it, based on the hash of the file path. Subsequent global tags are
156    incremental from the previously-assigned tag.
157
158  * The global's tag is added to its symbol address in the object file's symbol
159    table. This causes the global's address to be tagged when its address is
160    taken.
161
162  * When the address of a global is taken directly (i.e. not via the GOT), a special
163    instruction sequence needs to be used to add the tag to the address,
164    because the tag would otherwise take the address outside of the small code
165    model (4GB on AArch64). No changes are required when the address is taken
166    via the GOT because the address stored in the GOT will contain the tag.
167
168  * An associated ``hwasan_globals`` section is emitted for each tagged global,
169    which indicates the address of the global, its size and its tag.  These
170    sections are concatenated by the linker into a single ``hwasan_globals``
171    section that is enumerated by the runtime (via an ELF note) when a binary
172    is loaded and the memory is tagged accordingly.
173
174A complete example is given below:
175
176.. code-block:: none
177
178  // int x = 1; int *f() { return &x; }
179  // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c
180
181  [...]
182  f:
183        adrp    x0, :pg_hi21_nc:x            // set bits 12-63 to upper bits of untagged address
184        movk    x0, #:prel_g3:x+0x100000000  // set bits 48-63 to tag
185        add     x0, x0, :lo12:x              // set bits 0-11 to lower bits of address
186        ret
187
188  [...]
189        .data
190  .Lx.hwasan:
191        .word   1
192
193        .globl  x
194        .set x, .Lx.hwasan+0x2d00000000000000
195
196  [...]
197        .section        .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat
198  .Lhwasan.note:
199        .word   8                            // namesz
200        .word   8                            // descsz
201        .word   3                            // NT_LLVM_HWASAN_GLOBALS
202        .asciz  "LLVM\000\000\000"
203        .word   __start_hwasan_globals-.Lhwasan.note
204        .word   __stop_hwasan_globals-.Lhwasan.note
205
206  [...]
207        .section        hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2
208  .Lx.hwasan.descriptor:
209        .word   .Lx.hwasan-.Lx.hwasan.descriptor
210        .word   0x2d000004                   // tag = 0x2d, size = 4
211
212Error reporting
213---------------
214
215Errors are generated by the `HLT` instruction and are handled by a signal handler.
216
217Attribute
218---------
219
220HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching
221C function attribute. An alternative would be to re-use ASAN's attribute
222`sanitize_address`. The reasons to use a separate attribute are:
223
224  * Users may need to disable ASAN but not HWASAN, or vise versa,
225    because the tools have different trade-offs and compatibility issues.
226  * LLVM (ideally) does not use flags to decide which pass is being used,
227    ASAN or HWASAN are being applied, based on the function attributes.
228
229This does mean that users of HWASAN may need to add the new attribute
230to the code that already uses the old attribute.
231
232
233Comparison with AddressSanitizer
234================================
235
236HWASAN:
237  * Is less portable than :doc:`AddressSanitizer`
238    as it relies on hardware `Address Tagging`_ (AArch64).
239    Address Tagging can be emulated with compiler instrumentation,
240    but it will require the instrumentation to remove the tags before
241    any load or store, which is infeasible in any realistic environment
242    that contains non-instrumented code.
243  * May have compatibility problems if the target code uses higher
244    pointer bits for other purposes.
245  * May require changes in the OS kernels (e.g. Linux seems to dislike
246    tagged pointers passed from address space:
247    https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt).
248  * **Does not require redzones to detect buffer overflows**,
249    but the buffer overflow detection is probabilistic, with roughly
250    `1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS
251    respectively).
252  * **Does not require quarantine to detect heap-use-after-free,
253    or stack-use-after-return**.
254    The detection is similarly probabilistic.
255
256The memory overhead of HWASAN is expected to be much smaller
257than that of AddressSanitizer:
258`1/TG` extra memory for the shadow
259and some overhead due to `TG`-aligning all objects.
260
261Supported architectures
262=======================
263HWASAN relies on `Address Tagging`_ which is only available on AArch64.
264For other 64-bit architectures it is possible to remove the address tags
265before every load and store by compiler instrumentation, but this variant
266will have limited deployability since not all of the code is
267typically instrumented.
268
269The HWASAN's approach is not applicable to 32-bit architectures.
270
271
272Related Work
273============
274* `SPARC ADI`_ implements a similar tool mostly in hardware.
275* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses
276  similar approaches ("lock & key").
277* `Watchdog`_ discussed a heavier, but still somewhat similar
278  "lock & key" approach.
279* *TODO: add more "related work" links. Suggestions are welcome.*
280
281
282.. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf
283.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf
284.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html
285.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf
286.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html
287
288