xref: /llvm-project/clang/docs/PointerAuthentication.rst (revision 7c814c13d0df6dbd0ef6a8b2be214d3f6edbb566)
1Pointer Authentication
2======================
3
4.. contents::
5   :local:
6
7Introduction
8------------
9
10Pointer authentication is a technology which offers strong probabilistic
11protection against exploiting a broad class of memory bugs to take control of
12program execution.  When adopted consistently in a language ABI, it provides
13a form of relatively fine-grained control flow integrity (CFI) check that
14resists both return-oriented programming (ROP) and jump-oriented programming
15(JOP) attacks.
16
17While pointer authentication can be implemented purely in software, direct
18hardware support (e.g. as provided by Armv8.3 PAuth) can dramatically improve
19performance and code size.  Similarly, while pointer authentication
20can be implemented on any architecture, taking advantage of the (typically)
21excess addressing range of a target with 64-bit pointers minimizes the impact
22on memory performance and can allow interoperation with existing code (by
23disabling pointer authentication dynamically).  This document will generally
24attempt to present the pointer authentication feature independent of any
25hardware implementation or ABI.  Considerations that are
26implementation-specific are clearly identified throughout.
27
28Note that there are several different terms in use:
29
30- **Pointer authentication** is a target-independent language technology.
31
32- **PAuth** (sometimes referred to as **PAC**, for Pointer Authentication
33  Codes) is an AArch64 architecture extension that provides hardware support
34  for pointer authentication.  Additional extensions either modify some of the
35  PAuth instruction behavior (notably FPAC), or provide new instruction
36  variants (PAuth_LR).
37
38- **Armv8.3** is an AArch64 architecture revision that makes PAuth mandatory.
39
40- **arm64e** is a specific ABI (not yet fully stable) for implementing pointer
41  authentication using PAuth on certain Apple operating systems.
42
43This document serves four purposes:
44
45- It describes the basic ideas of pointer authentication.
46
47- It documents several language extensions that are useful on targets using
48  pointer authentication.
49
50- It will eventually present a theory of operation for the security mitigation,
51  describing the basic requirements for correctness, various weaknesses in the
52  mechanism, and ways in which programmers can strengthen its protections
53  (including recommendations for language implementors).
54
55- It will eventually document the language ABIs currently used for C, C++,
56  Objective-C, and Swift on arm64e, although these are not yet stable on any
57  target.
58
59Basic Concepts
60--------------
61
62The simple address of an object or function is a **raw pointer**.  A raw
63pointer can be **signed** to produce a **signed pointer**.  A signed pointer
64can be then **authenticated** in order to verify that it was **validly signed**
65and extract the original raw pointer.  These terms reflect the most likely
66implementation technique: computing and storing a cryptographic signature along
67with the pointer.
68
69An **abstract signing key** is a name which refers to a secret key which is
70used to sign and authenticate pointers.  The concrete key value for a
71particular name is consistent throughout a process.
72
73A **discriminator** is an arbitrary value used to **diversify** signed pointers
74so that one validly-signed pointer cannot simply be copied over another.
75A discriminator is simply opaque data of some implementation-defined size that
76is included in the signature as a salt (see `Discriminators`_ for details.)
77
78Nearly all aspects of pointer authentication use just these two primary
79operations:
80
81- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given
82  a raw pointer, an abstract signing key, and a discriminator.
83
84- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given
85  a signed pointer, an abstract signing key, and a discriminator.
86
87``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must
88succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was
89ultimately produced in any other way is expected to fail, which halts the
90program either:
91
92- immediately, on implementations that enforce ``auth`` success (e.g., when
93  using compiler-generated ``auth`` failure checks, or Armv8.3 with the FPAC
94  extension), or
95
96- when the resulting pointer value is used, on implementations that don't.
97
98However, regardless of the implementation's handling of ``auth`` failures, it
99is permitted for ``auth`` to fail to detect that a signed pointer was not
100produced in this way, in which case it may return anything; this is what makes
101pointer authentication a probabilistic mitigation rather than a perfect one.
102
103There are two secondary operations which are required only to implement certain
104intrinsics in ``<ptrauth.h>``:
105
106- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer
107  and a key without verifying its validity, unlike ``auth``.  This is useful
108  for certain kinds of tooling, such as crash backtraces; it should generally
109  not be used in the basic language ABI except in very careful ways.
110
111- ``sign_generic(value)`` produces a cryptographic signature for arbitrary
112  data, not necessarily a pointer.  This is useful for efficiently verifying
113  that non-pointer data has not been tampered with.
114
115Whenever any of these operations is called for, the key value must be known
116statically.  This is because the layout of a signed pointer may vary according
117to the signing key.  (For example, in Armv8.3, the layout of a signed pointer
118depends on whether Top Byte Ignore (TBI) is enabled, which can be set
119independently for I and D keys.)
120
121.. admonition:: Note for API designers and language implementors
122
123  These are the *primitive* operations of pointer authentication, provided for
124  clarity of description.  They are not suitable either as high-level
125  interfaces or as primitives in a compiler IR because they expose raw
126  pointers.  Raw pointers require special attention in the language
127  implementation to avoid the accidental creation of exploitable code
128  sequences.
129
130The following details are all implementation-defined:
131
132- the nature of a signed pointer
133- the size of a discriminator
134- the number and nature of the signing keys
135- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic``
136  operations
137
138While the use of the terms "sign" and "signed pointer" suggest the use of
139a cryptographic signature, other implementations may be possible.  See
140`Alternative implementations`_ for an exploration of implementation options.
141
142.. admonition:: Implementation example: Armv8.3
143
144  Readers may find it helpful to know how these terms map to Armv8.3 PAuth:
145
146  - A signed pointer is a pointer with a signature stored in the
147    otherwise-unused high bits.  The kernel configures the address width based
148    on the system's addressing needs, and enables TBI for I or D keys as
149    needed.  The bits above the address bits and below the TBI bits (if
150    enabled) are unused.  The signature width then depends on this addressing
151    configuration.
152
153  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit
154    integers.  Blending a constant discriminator into an address consists of
155    replacing the top 16 bits of the pointer containing the address with the
156    constant.  Pointers used for blending purposes should only have address
157    bits, since higher bits will be at least partially overwritten with the
158    constant discriminator.
159
160  - There are five 128-bit signing-key registers, each of which can only be
161    directly read or set by privileged code.  Of these, four are used for
162    signing pointers, and the fifth is used only for ``sign_generic``.  The key
163    data is simply a pepper added to the hash, not an encryption key, and so
164    can be initialized using random data.
165
166  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and
167    signing key, and stores it in the high bits as the signature. ``auth``
168    removes the signature, computes the same hash, and compares the result with
169    the stored signature.  ``strip`` removes the signature without
170    authenticating it.  While ``aut*`` instructions do not themselves trap on
171    failure in Armv8.3 PAuth, they do with the later optional FPAC extension.
172    An implementation can also choose to emulate this trapping behavior by
173    emitting additional instructions around ``aut*``.
174
175  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two
176    64-bit values and produces a 64-bit cryptographic hash. Implementations of
177    this instruction are not required to produce meaningful data in all bits of
178    the result.
179
180Discriminators
181~~~~~~~~~~~~~~
182
183A discriminator is arbitrary extra data which alters the signature calculated
184for a pointer.  When two pointers are signed differently --- either with
185different keys or with different discriminators --- an attacker cannot simply
186replace one pointer with the other.
187
188To use standard cryptographic terminology, a discriminator acts as a
189`salt <https://en.wikipedia.org/wiki/Salt_(cryptography)>`_ in the signing of a
190pointer, and the key data acts as a
191`pepper <https://en.wikipedia.org/wiki/Pepper_(cryptography)>`_.  That is,
192both the discriminator and key data are ultimately just added as inputs to the
193signing algorithm along with the pointer, but they serve significantly
194different roles.  The key data is a common secret added to every signature,
195whereas the discriminator is a value that can be derived from
196the context in which a specific pointer is signed.  However, unlike a password
197salt, it's important that discriminators be *independently* derived from the
198circumstances of the signing; they should never simply be stored alongside
199a pointer.  Discriminators are then re-derived in authentication operations.
200
201The intrinsic interface in ``<ptrauth.h>`` allows an arbitrary discriminator
202value to be provided, but can only be used when running normal code.  The
203discriminators used by language ABIs must be restricted to make it feasible for
204the loader to sign pointers stored in global memory without needing excessive
205amounts of metadata.  Under these restrictions, a discriminator may consist of
206either or both of the following:
207
208- The address at which the pointer is stored in memory.  A pointer signed with
209  a discriminator which incorporates its storage address is said to have
210  **address diversity**.  In general, using address diversity means that
211  a pointer cannot be reliably copied by an attacker to or from a different
212  memory location.  However, an attacker may still be able to attack a larger
213  call sequence if they can alter the address through which the pointer is
214  accessed.  Furthermore, some situations cannot use address diversity because
215  of language or other restrictions.
216
217- A constant integer, called a **constant discriminator**. A pointer signed
218  with a non-zero constant discriminator is said to have **constant
219  diversity**.  If the discriminator is specific to a single declaration, it is
220  said to have **declaration diversity**; if the discriminator is specific to
221  a type of value, it is said to have **type diversity**.  For example, C++
222  v-tables on arm64e sign their component functions using a hash of their
223  method names and signatures, which provides declaration diversity; similarly,
224  C++ member function pointers sign their invocation functions using a hash of
225  the member pointer type, which provides type diversity.
226
227The implementation may need to restrict constant discriminators to be
228significantly smaller than the full size of a discriminator.  For example, on
229arm64e, constant discriminators are only 16-bit values.  This is believed to
230not significantly weaken the mitigation, since collisions remain uncommon.
231
232The algorithm for blending a constant discriminator with a storage address is
233implementation-defined.
234
235.. _Signing schemas:
236
237Signing Schemas
238~~~~~~~~~~~~~~~
239
240Correct use of pointer authentication requires the signing code and the
241authenticating code to agree about the **signing schema** for the pointer:
242
243- the abstract signing key with which the pointer should be signed and
244- an algorithm for computing the discriminator.
245
246As described in the section above on `Discriminators`_, in most situations, the
247discriminator is produced by taking a constant discriminator and optionally
248blending it with the storage address of the pointer.  In these situations, the
249signing schema breaks down even more simply:
250
251- the abstract signing key,
252- a constant discriminator, and
253- whether to use address diversity.
254
255It is important that the signing schema be independently derived at all signing
256and authentication sites.  Preferably, the schema should be hard-coded
257everywhere it is needed, but at the very least, it must not be derived by
258inspecting information stored along with the pointer.
259
260Language Features
261-----------------
262
263There is currently one main pointer authentication language feature:
264
265- The language provides the ``<ptrauth.h>`` intrinsic interface for manually
266  signing and authenticating pointers in code.  These can be used in
267  circumstances where very specific behavior is required.
268
269
270Language Extensions
271~~~~~~~~~~~~~~~~~~~
272
273Feature Testing
274^^^^^^^^^^^^^^^
275
276Whether the current target uses pointer authentication can be tested for with
277a number of different tests.
278
279- ``__has_feature(ptrauth_intrinsics)`` is true if ``<ptrauth.h>`` provides its
280  normal interface.  This may be true even on targets where pointer
281  authentication is not enabled by default.
282
283``<ptrauth.h>``
284~~~~~~~~~~~~~~~
285
286This header defines the following types and operations:
287
288``ptrauth_key``
289^^^^^^^^^^^^^^^
290
291This ``enum`` is the type of abstract signing keys.  In addition to defining
292the set of implementation-specific signing keys (for example, Armv8.3 defines
293``ptrauth_key_asia``), it also defines some portable aliases for those keys.
294For example, ``ptrauth_key_function_pointer`` is the key generally used for
295C function pointers, which will generally be suitable for other
296function-signing schemas.
297
298In all the operation descriptions below, key values must be constant values
299corresponding to one of the implementation-specific abstract signing keys from
300this ``enum``.
301
302``ptrauth_extra_data_t``
303^^^^^^^^^^^^^^^^^^^^^^^^
304
305This is a ``typedef`` of a standard integer type of the correct size to hold
306a discriminator value.
307
308In the signing and authentication operation descriptions below, discriminator
309values must have either pointer type or integer type. If the discriminator is
310an integer, it will be coerced to ``ptrauth_extra_data_t``.
311
312``ptrauth_blend_discriminator``
313^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
314
315.. code-block:: c
316
317  ptrauth_blend_discriminator(pointer, integer)
318
319Produce a discriminator value which blends information from the given pointer
320and the given integer.
321
322Implementations may ignore some bits from each value, which is to say, the
323blending algorithm may be chosen for speed and convenience over theoretical
324strength as a hash-combining algorithm.  For example, arm64e simply overwrites
325the high 16 bits of the pointer with the low 16 bits of the integer, which can
326be done in a single instruction with an immediate integer.
327
328``pointer`` must have pointer type, and ``integer`` must have integer type. The
329result has type ``ptrauth_extra_data_t``.
330
331``ptrauth_string_discriminator``
332^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
333
334.. code-block:: c
335
336  ptrauth_string_discriminator(string)
337
338Compute a constant discriminator from the given string.
339
340``string`` must be a string literal of ``char`` character type.  The result has
341type ``ptrauth_extra_data_t``.
342
343The result value is never zero and always within range for both the
344``__ptrauth`` qualifier and ``ptrauth_blend_discriminator``.
345
346This can be used in constant expressions.
347
348``ptrauth_strip``
349^^^^^^^^^^^^^^^^^
350
351.. code-block:: c
352
353  ptrauth_strip(signedPointer, key)
354
355Given that ``signedPointer`` matches the layout for signed pointers signed with
356the given key, extract the raw pointer from it.  This operation does not trap
357and cannot fail, even if the pointer is not validly signed.
358
359``ptrauth_sign_constant``
360^^^^^^^^^^^^^^^^^^^^^^^^^
361
362.. code-block:: c
363
364  ptrauth_sign_constant(pointer, key, discriminator)
365
366Return a signed pointer for a constant address in a manner which guarantees
367a non-attackable sequence.
368
369``pointer`` must be a constant expression of pointer type which evaluates to
370a non-null pointer.
371``key``  must be a constant expression of type ``ptrauth_key``.
372``discriminator`` must be a constant expression of pointer or integer type;
373if an integer, it will be coerced to ``ptrauth_extra_data_t``.
374The result will have the same type as ``pointer``.
375
376This can be used in constant expressions.
377
378``ptrauth_sign_unauthenticated``
379^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
380
381.. code-block:: c
382
383  ptrauth_sign_unauthenticated(pointer, key, discriminator)
384
385Produce a signed pointer for the given raw pointer without applying any
386authentication or extra treatment.  This operation is not required to have the
387same behavior on a null pointer that the language implementation would.
388
389This is a treacherous operation that can easily result in signing oracles.
390Programs should use it seldom and carefully.
391
392``ptrauth_auth_and_resign``
393^^^^^^^^^^^^^^^^^^^^^^^^^^^
394
395.. code-block:: c
396
397  ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey, newDiscriminator)
398
399Authenticate that ``pointer`` is signed with ``oldKey`` and
400``oldDiscriminator`` and then resign the raw-pointer result of that
401authentication with ``newKey`` and ``newDiscriminator``.
402
403``pointer`` must have pointer type.  The result will have the same type as
404``pointer``.  This operation is not required to have the same behavior on
405a null pointer that the language implementation would.
406
407The code sequence produced for this operation must not be directly attackable.
408However, if the discriminator values are not constant integers, their
409computations may still be attackable.  In the future, Clang should be enhanced
410to guaranteed non-attackability if these expressions are safely-derived.
411
412``ptrauth_auth_data``
413^^^^^^^^^^^^^^^^^^^^^
414
415.. code-block:: c
416
417  ptrauth_auth_data(pointer, key, discriminator)
418
419Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and
420remove the signature.
421
422``pointer`` must have object pointer type.  The result will have the same type
423as ``pointer``.  This operation is not required to have the same behavior on
424a null pointer that the language implementation would.
425
426In the future when Clang makes safe derivation guarantees, the result of
427this operation should be considered safely-derived.
428
429``ptrauth_sign_generic_data``
430^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
431
432.. code-block:: c
433
434  ptrauth_sign_generic_data(value1, value2)
435
436Computes a signature for the given pair of values, incorporating a secret
437signing key.
438
439This operation can be used to verify that arbitrary data has not been tampered
440with by computing a signature for the data, storing that signature, and then
441repeating this process and verifying that it yields the same result.  This can
442be reasonably done in any number of ways; for example, a library could compute
443an ordinary checksum of the data and just sign the result in order to get the
444tamper-resistance advantages of the secret signing key (since otherwise an
445attacker could reliably overwrite both the data and the checksum).
446
447``value1`` and ``value2`` must be either pointers or integers.  If the integers
448are larger than ``uintptr_t`` then data not representable in ``uintptr_t`` may
449be discarded.
450
451The result will have type ``ptrauth_generic_signature_t``, which is an integer
452type.  Implementations are not required to make all bits of the result equally
453significant; in particular, some implementations are known to not leave
454meaningful data in the low bits.
455
456
457
458Alternative Implementations
459---------------------------
460
461Signature Storage
462~~~~~~~~~~~~~~~~~
463
464It is not critical for the security of pointer authentication that the
465signature be stored "together" with the pointer, as it is in Armv8.3. An
466implementation could just as well store the signature in a separate word, so
467that the ``sizeof`` a signed pointer would be larger than the ``sizeof`` a raw
468pointer.
469
470Storing the signature in the high bits, as Armv8.3 does, has several trade-offs:
471
472- Disadvantage: there are substantially fewer bits available for the signature,
473  weakening the mitigation by making it much easier for an attacker to simply
474  guess the correct signature.
475
476- Disadvantage: future growth of the address space will necessarily further
477  weaken the mitigation.
478
479- Advantage: memory layouts don't change, so it's possible for
480  pointer-authentication-enabled code (for example, in a system library) to
481  efficiently interoperate with existing code, as long as pointer
482  authentication can be disabled dynamically.
483
484- Advantage: the size of a signed pointer doesn't grow, which might
485  significantly increase memory requirements, code size, and register pressure.
486
487- Advantage: the size of a signed pointer is the same as a raw pointer, so
488  generic APIs which work in types like `void *` (such as `dlsym`) can still
489  return signed pointers.  This means that clients of these APIs will not
490  require insecure code in order to correctly receive a function pointer.
491
492Hashing vs. Encrypting Pointers
493~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
494
495Armv8.3 implements ``sign`` by computing a cryptographic hash and storing that
496in the spare bits of the pointer.  This means that there are relatively few
497possible values for the valid signed pointer, since the bits corresponding to
498the raw pointer are known.  Together with an ``auth`` oracle, this can make it
499computationally feasible to discover the correct signature with brute force.
500(The implementation should of course endeavor not to introduce ``auth``
501oracles, but this can be difficult, and attackers can be devious.)
502
503If the implementation can instead *encrypt* the pointer during ``sign`` and
504*decrypt* it during ``auth``, this brute-force attack becomes far less
505feasible, even with an ``auth`` oracle.  However, there are several problems
506with this idea:
507
508- It's unclear whether this kind of encryption is even possible without
509  increasing the storage size of a signed pointer.  If the storage size can be
510  increased, brute-force atacks can be equally well mitigated by simply storing
511  a larger signature.
512
513- It would likely be impossible to implement a ``strip`` operation, which might
514  make debuggers and other out-of-process tools far more difficult to write, as
515  well as generally making primitive debugging more challenging.
516
517- Implementations can benefit from being able to extract the raw pointer
518  immediately from a signed pointer.  An Armv8.3 processor executing an
519  ``auth``-and-load instruction can perform the load and ``auth`` in parallel;
520  a processor which instead encrypted the pointer would be forced to perform
521  these operations serially.
522