1=============== 2ShadowCallStack 3=============== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11ShadowCallStack is an instrumentation pass, currently only implemented for 12aarch64 and RISC-V, that protects programs against return address overwrites 13(e.g. stack buffer overflows.) It works by saving a function's return address 14to a separately allocated 'shadow call stack' in the function prolog in 15non-leaf functions and loading the return address from the shadow call stack 16in the function epilog. The return address is also stored on the regular stack 17for compatibility with unwinders, but is otherwise unused. 18 19The aarch64 implementation is considered production ready, and 20an `implementation of the runtime`_ has been added to Android's libc 21(bionic). An x86_64 implementation was evaluated using Chromium and was found 22to have critical performance and security deficiencies--it was removed in 23LLVM 9.0. Details on the x86_64 implementation can be found in the 24`Clang 7.0.1 documentation`_. 25 26.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128 27.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html 28 29Comparison 30---------- 31 32To optimize for memory consumption and cache locality, the shadow call 33stack stores only an array of return addresses. This is in contrast to other 34schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off 35consuming more memory for shorter function prologs and epilogs with fewer 36memory accesses. 37 38`Return Flow Guard`_ is a pure software implementation of shadow call stacks 39on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is 40inherently racy due to the architecture's use of the stack for calls and 41returns. 42 43Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware 44extension that would add native support to use a shadow stack to store/check 45return addresses at call/return time. Being a hardware implementation, it 46would not suffer from race conditions and would not incur the overhead of 47function instrumentation, but it does require operating system support. 48 49.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/ 50.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf 51 52Compatibility 53------------- 54 55A runtime is not provided in compiler-rt so one must be provided by the 56compiled application or the operating system. Integrating the runtime into 57the operating system should be preferred since otherwise all thread creation 58and destruction would need to be intercepted by the application. 59 60The instrumentation makes use of the platform register ``x18`` on AArch64, 61``x3`` (``gp``) on RISC-V with software shadow stack and ``ssp`` on RISC-V with 62hardware shadow stack, which needs `Zicfiss`_ and ``-fcf-protection=return``. 63Users can choose between the software and hardware based shadow stack 64implementation on RISC-V backend by passing ``-fsanitize=shadowcallstack`` 65or ``Zicfiss`` with ``-fcf-protection=return``. 66For simplicity we will refer to this as the ``SCSReg``. On some platforms, 67``SCSReg`` is reserved, and on others, it is designated as a scratch register. 68This generally means that any code that may run on the same thread as code 69compiled with ShadowCallStack must either target one of the platforms whose ABI 70reserves ``SCSReg`` (currently Android, Darwin, Fuchsia and Windows) or be 71compiled with a flag to reserve that register (e.g., ``-ffixed-x18``). If 72absolutely necessary, code compiled without reserving the register may be run on 73the same thread as code that uses ShadowCallStack by saving the register value 74temporarily on the stack (`example in Android`_) but this should be done with 75care since it risks leaking the shadow call stack address. 76 77.. _`Zicfiss`: https://github.com/riscv/riscv-cfi/blob/main/cfi_backward.adoc 78.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717 79 80Because it requires a dedicated register, the ShadowCallStack feature is 81incompatible with any other feature that may use ``SCSReg``. However, there is 82no inherent reason why ShadowCallStack needs to use a specific register; in 83principle, a platform could choose to reserve and use another register for 84ShadowCallStack, but this would be incompatible with the ABI standards 85published in AAPCS64 and the RISC-V psABI. 86 87Special unwind information is required on functions that are compiled 88with ShadowCallStack and that may be unwound, i.e. functions compiled with 89``-fexceptions`` (which is the default in C++). Some unwinders (such as the 90libgcc 4.9 unwinder) do not understand this unwind info and will segfault 91when encountering it. LLVM libunwind processes this unwind info correctly, 92however. This means that if exceptions are used together with ShadowCallStack, 93the program must use a compatible unwinder. 94 95Security 96======== 97 98ShadowCallStack is intended to be a stronger alternative to 99``-fstack-protector``. It protects from non-linear overflows and arbitrary 100memory writes to the return address slot. 101 102The instrumentation makes use of the ``SCSReg`` register to reference the shadow 103call stack, meaning that references to the shadow call stack do not have 104to be stored in memory. This makes it possible to implement a runtime that 105avoids exposing the address of the shadow call stack to attackers that can 106read arbitrary memory. However, attackers could still try to exploit side 107channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ 108to discover the address of the shadow call stack. 109 110.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ 111.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf 112.. _`[3]`: https://www.vusec.net/projects/anc/ 113 114Unless care is taken when allocating the shadow call stack, it may be 115possible for an attacker to guess its address using the addresses of 116other allocations. Therefore, the address should be chosen to make this 117difficult. One way to do this is to allocate a large guard region without 118read/write permissions, randomly select a small region within it to be 119used as the address of the shadow call stack and mark only that region as 120read/write. This also mitigates somewhat against processor side channels. 121The intent is that the Android runtime `will do this`_, but the platform will 122first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit 123memory allocations in certain processes, as this also limits the number of 124guard regions that can be allocated. 125 126.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622 127.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745 128 129The runtime will need the address of the shadow call stack in order to 130deallocate it when destroying the thread. If the entire program is compiled 131with ``SCSReg`` reserved, this is trivial: the address can be derived from the 132value stored in ``SCSReg`` (e.g. by masking out the lower bits). If a guard 133region is used, the address of the start of the guard region could then be 134stored at the start of the shadow call stack itself. But if it is possible 135for code compiled without reserving ``SCSReg`` to run on a thread managed by the 136runtime, which is the case on Android for example, the address must be stored 137somewhere else instead. On Android we store the address of the start of the 138guard region in TLS and deallocate the entire guard region including the 139shadow call stack at thread exit. This is considered acceptable given that 140the address of the start of the guard region is already somewhat guessable. 141 142One way in which the address of the shadow call stack could leak is in the 143``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android 144runtime `avoids this`_ by only storing the low bits of ``SCSReg`` in the 145``jmp_buf``, which requires the address of the shadow call stack to be 146aligned to its size. 147 148.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49 149 150The architecture's call and return instructions (``bl`` and ``ret``) operate on 151a register rather than the stack, which means that leaf functions are generally 152protected from return address overwrites even without ShadowCallStack. 153 154Usage 155===== 156 157To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` flag 158to both compile and link command lines. On aarch64, you also need to pass 159``-ffixed-x18`` unless your target already reserves ``x18``. No additional flags 160need to be passed on RISC-V because the software based shadow stack uses 161``x3`` (``gp``), which is always reserved, and the hardware based shadow call 162stack uses a dedicated register, ``ssp``. 163However, it is important to disable GP relaxation in the linker when using the 164software based shadow call stack on RISC-V. This can be done with the 165``--no-relax-gp`` flag in GNU ld, and is off by default in LLD. 166 167Low-level API 168------------- 169 170``__has_feature(shadow_call_stack)`` 171~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 172 173In some cases one may need to execute different code depending on whether 174ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can 175be used for this purpose. 176 177.. code-block:: c 178 179 #if defined(__has_feature) 180 # if __has_feature(shadow_call_stack) 181 // code that builds only under ShadowCallStack 182 # endif 183 #endif 184 185``__attribute__((no_sanitize("shadow-call-stack")))`` 186~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 187 188Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function 189declaration to specify that the shadow call stack instrumentation should not be 190applied to that function, even if enabled globally. 191 192Example 193======= 194 195The following example code: 196 197.. code-block:: c++ 198 199 int foo() { 200 return bar() + 1; 201 } 202 203Generates the following aarch64 assembly when compiled with ``-O2``: 204 205.. code-block:: none 206 207 stp x29, x30, [sp, #-16]! 208 mov x29, sp 209 bl bar 210 add w0, w0, #1 211 ldp x29, x30, [sp], #16 212 ret 213 214Adding ``-fsanitize=shadow-call-stack`` would output the following assembly: 215 216.. code-block:: none 217 218 str x30, [x18], #8 219 stp x29, x30, [sp, #-16]! 220 mov x29, sp 221 bl bar 222 add w0, w0, #1 223 ldp x29, x30, [sp], #16 224 ldr x30, [x18, #-8]! 225 ret 226