1======================================================= 2Hardware-assisted AddressSanitizer Design Documentation 3======================================================= 4 5This page is a design document for 6**hardware-assisted AddressSanitizer** (or **HWASAN**) 7a tool similar to :doc:`AddressSanitizer`, 8but based on partial hardware assistance. 9 10 11Introduction 12============ 13 14:doc:`AddressSanitizer` 15tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*), 16uses *redzones* to find buffer-overflows and 17*quarantine* to find use-after-free. 18The redzones, the quarantine, and, to a less extent, the shadow, are the 19sources of AddressSanitizer's memory overhead. 20See the `AddressSanitizer paper`_ for details. 21 22AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows 23software to use 8 most significant bits of a 64-bit pointer as 24a tag. HWASAN uses `Address Tagging`_ 25to implement a memory safety tool, similar to :doc:`AddressSanitizer`, 26but with smaller memory overhead and slightly different (mostly better) 27accuracy guarantees. 28 29Algorithm 30========= 31* Every heap/stack/global memory object is forcibly aligned by `TG` bytes 32 (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**. 33* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8) 34* The pointer to the object is tagged with `T`. 35* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory) 36* Every load and store is instrumented to read the memory tag and compare it 37 with the pointer tag, exception is raised on tag mismatch. 38 39For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf 40 41Short granules 42-------------- 43 44A short granule is a granule of size between 1 and `TG-1` bytes. The size 45of a short granule is stored at the location in shadow memory where the 46granule's tag is normally stored, while the granule's actual tag is stored 47in the last byte of the granule. This means that in order to verify that a 48pointer tag matches a memory tag, HWASAN must check for two possibilities: 49 50* the pointer tag is equal to the memory tag in shadow memory, or 51* the shadow memory tag is actually a short granule size, the value being loaded 52 is in bounds of the granule and the pointer tag is equal to the last byte of 53 the granule. 54 55Pointer tags between 1 to `TG-1` are possible and are as likely as any other 56tag. This means that these tags in memory have two interpretations: the full 57tag interpretation (where the pointer tag is between 1 and `TG-1` and the 58last byte of the granule is ordinary data) and the short tag interpretation 59(where the pointer tag is stored in the granule). 60 61When HWASAN detects an error near a memory tag between 1 and `TG-1`, it 62will show both the memory tag and the last byte of the granule. Currently, 63it is up to the user to disambiguate the two possibilities. 64 65Instrumentation 66=============== 67 68Memory Accesses 69--------------- 70In the majority of cases, memory accesses are prefixed with a call to 71an outlined instruction sequence that verifies the tags. The code size 72and performance overhead of the call is reduced by using a custom calling 73convention that 74 75* preserves most registers, and 76* is specialized to the register containing the address, and the type and 77 size of the memory access. 78 79Currently, the following sequence is used: 80 81.. code-block:: none 82 83 // int foo(int *a) { return *a; } 84 // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c 85 [...] 86 foo: 87 str x30, [sp, #-16]! 88 adrp x9, :got:__hwasan_shadow // load shadow address from GOT into x9 89 ldr x9, [x9, :got_lo12:__hwasan_shadow] 90 bl __hwasan_check_x0_2_short // call outlined tag check 91 // (arguments: x0 = address, x9 = shadow base; 92 // "2" encodes the access type and size) 93 ldr w0, [x0] // inline load 94 ldr x30, [sp], #16 95 ret 96 97 [...] 98 __hwasan_check_x0_2_short: 99 ubfx x16, x0, #4, #52 // shadow offset 100 ldrb w16, [x9, x16] // load shadow tag 101 cmp x16, x0, lsr #56 // extract address tag, compare with shadow tag 102 b.ne .Ltmp0 // jump to short tag handler on mismatch 103 .Ltmp1: 104 ret 105 .Ltmp0: 106 cmp w16, #15 // is this a short tag? 107 b.hi .Ltmp2 // if not, error 108 and x17, x0, #0xf // find the address's position in the short granule 109 add x17, x17, #3 // adjust to the position of the last byte loaded 110 cmp w16, w17 // check that position is in bounds 111 b.ls .Ltmp2 // if not, error 112 orr x16, x0, #0xf // compute address of last byte of granule 113 ldrb w16, [x16] // load tag from it 114 cmp x16, x0, lsr #56 // compare with pointer tag 115 b.eq .Ltmp1 // if matches, continue 116 .Ltmp2: 117 stp x0, x1, [sp, #-256]! // save original x0, x1 on stack (they will be overwritten) 118 stp x29, x30, [sp, #232] // create frame record 119 mov x1, #2 // set x1 to a constant indicating the type of failure 120 adrp x16, :got:__hwasan_tag_mismatch_v2 // call runtime function to save remaining registers and report error 121 ldr x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler) 122 br x16 123 124Heap 125---- 126 127Tagging the heap memory/pointers is done by `malloc`. 128This can be based on any malloc that forces all objects to be TG-aligned. 129`free` tags the memory with a different tag. 130 131Stack 132----- 133 134Stack frames are instrumented by aligning all non-promotable allocas 135by `TG` and tagging stack memory in function prologue and epilogue. 136 137Tags for different allocas in one function are **not** generated 138independently; doing that in a function with `M` allocas would require 139maintaining `M` live stack pointers, significantly increasing register 140pressure. Instead we generate a single base tag value in the prologue, 141and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where 142ReTag can be as simple as exclusive-or with constant `M`. 143 144Stack instrumentation is expected to be a major source of overhead, 145but could be optional. 146 147Globals 148------- 149 150Most globals in HWASAN instrumented code are tagged. This is accomplished 151using the following mechanisms: 152 153 * The address of each global has a static tag associated with it. The first 154 defined global in a translation unit has a pseudorandom tag associated 155 with it, based on the hash of the file path. Subsequent global tags are 156 incremental from the previously-assigned tag. 157 158 * The global's tag is added to its symbol address in the object file's symbol 159 table. This causes the global's address to be tagged when its address is 160 taken. 161 162 * When the address of a global is taken directly (i.e. not via the GOT), a special 163 instruction sequence needs to be used to add the tag to the address, 164 because the tag would otherwise take the address outside of the small code 165 model (4GB on AArch64). No changes are required when the address is taken 166 via the GOT because the address stored in the GOT will contain the tag. 167 168 * An associated ``hwasan_globals`` section is emitted for each tagged global, 169 which indicates the address of the global, its size and its tag. These 170 sections are concatenated by the linker into a single ``hwasan_globals`` 171 section that is enumerated by the runtime (via an ELF note) when a binary 172 is loaded and the memory is tagged accordingly. 173 174A complete example is given below: 175 176.. code-block:: none 177 178 // int x = 1; int *f() { return &x; } 179 // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c 180 181 [...] 182 f: 183 adrp x0, :pg_hi21_nc:x // set bits 12-63 to upper bits of untagged address 184 movk x0, #:prel_g3:x+0x100000000 // set bits 48-63 to tag 185 add x0, x0, :lo12:x // set bits 0-11 to lower bits of address 186 ret 187 188 [...] 189 .data 190 .Lx.hwasan: 191 .word 1 192 193 .globl x 194 .set x, .Lx.hwasan+0x2d00000000000000 195 196 [...] 197 .section .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat 198 .Lhwasan.note: 199 .word 8 // namesz 200 .word 8 // descsz 201 .word 3 // NT_LLVM_HWASAN_GLOBALS 202 .asciz "LLVM\000\000\000" 203 .word __start_hwasan_globals-.Lhwasan.note 204 .word __stop_hwasan_globals-.Lhwasan.note 205 206 [...] 207 .section hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2 208 .Lx.hwasan.descriptor: 209 .word .Lx.hwasan-.Lx.hwasan.descriptor 210 .word 0x2d000004 // tag = 0x2d, size = 4 211 212Error reporting 213--------------- 214 215Errors are generated by the `HLT` instruction and are handled by a signal handler. 216 217Attribute 218--------- 219 220HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching 221C function attribute. An alternative would be to re-use ASAN's attribute 222`sanitize_address`. The reasons to use a separate attribute are: 223 224 * Users may need to disable ASAN but not HWASAN, or vise versa, 225 because the tools have different trade-offs and compatibility issues. 226 * LLVM (ideally) does not use flags to decide which pass is being used, 227 ASAN or HWASAN are being applied, based on the function attributes. 228 229This does mean that users of HWASAN may need to add the new attribute 230to the code that already uses the old attribute. 231 232 233Comparison with AddressSanitizer 234================================ 235 236HWASAN: 237 * Is less portable than :doc:`AddressSanitizer` 238 as it relies on hardware `Address Tagging`_ (AArch64). 239 Address Tagging can be emulated with compiler instrumentation, 240 but it will require the instrumentation to remove the tags before 241 any load or store, which is infeasible in any realistic environment 242 that contains non-instrumented code. 243 * May have compatibility problems if the target code uses higher 244 pointer bits for other purposes. 245 * May require changes in the OS kernels (e.g. Linux seems to dislike 246 tagged pointers passed from address space: 247 https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt). 248 * **Does not require redzones to detect buffer overflows**, 249 but the buffer overflow detection is probabilistic, with roughly 250 `1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS 251 respectively). 252 * **Does not require quarantine to detect heap-use-after-free, 253 or stack-use-after-return**. 254 The detection is similarly probabilistic. 255 256The memory overhead of HWASAN is expected to be much smaller 257than that of AddressSanitizer: 258`1/TG` extra memory for the shadow 259and some overhead due to `TG`-aligning all objects. 260 261Supported architectures 262======================= 263HWASAN relies on `Address Tagging`_ which is only available on AArch64. 264For other 64-bit architectures it is possible to remove the address tags 265before every load and store by compiler instrumentation, but this variant 266will have limited deployability since not all of the code is 267typically instrumented. 268 269The HWASAN's approach is not applicable to 32-bit architectures. 270 271 272Related Work 273============ 274* `SPARC ADI`_ implements a similar tool mostly in hardware. 275* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses 276 similar approaches ("lock & key"). 277* `Watchdog`_ discussed a heavier, but still somewhat similar 278 "lock & key" approach. 279* *TODO: add more "related work" links. Suggestions are welcome.* 280 281 282.. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf 283.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf 284.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html 285.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf 286.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html 287 288