xref: /llvm-project/clang/docs/TypeSanitizer.rst (revision 822954b4a97753b0c7accc606287529518e9d425)
1=============
2TypeSanitizer
3=============
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11The TypeSanitizer is a detector for strict type aliasing violations. It consists of a compiler
12instrumentation module and a run-time library. C/C++ has type-based aliasing rules, and LLVM
13can exploit these for optimizations given the TBAA metadata Clang emits. In general, a pointer
14of a given type cannot access an object of a different type, with only a few exceptions.
15
16These rules aren't always apparent to users, which leads to code that violates these rules
17(e.g. for type punning). This can lead to optimization passes introducing bugs unless the
18code is build with ``-fno-strict-aliasing``, sacrificing performance.
19
20TypeSanitizer is built to catch when these strict aliasing rules have been violated, helping
21users find where such bugs originate in their code despite the code looking valid at first glance.
22
23As TypeSanitizer is still experimental, it can currently have a large impact on runtime speed,
24memory use, and code size. It also has a large compile-time overhead. Work is being done to
25reduce these impacts.
26
27The TypeSanitizer Algorithm
28===========================
29For each TBAA type-access descriptor, encoded in LLVM IR using TBAA Metadata, the instrumentation
30pass generates descriptor tales. Thus there is a unique pointer to each type (and access descriptor).
31These tables are comdat (except for anonymous-namespace types), so the pointer values are unique
32across the program.
33
34The descriptors refer to other descriptors to form a type aliasing tree, like how LLVM's TBAA data
35does.
36
37The runtime uses 8 bytes of shadow memory, the size of the pointer to the type descriptor, for
38every byte of accessed data in the program. The first byte of a type will have its shadow memory
39be set to the pointer to its type descriptor. Aside from that, there are some other values it may be.
40
41* 0 is used to represent an unknown type
42* Negative numbers represent an interior byte: A byte inside a type that is not the first one. As an
43  example, a value of -2 means you are in the third byte of a type.
44
45The Instrumentation first checks for an exact match between the type of the current access and the
46type for that address in the shadow memory. This can quickly be done by checking pointer values. If
47it matches, it checks the remaining shadow memory of the type to ensure they are the correct negative
48numbers. If this fails, it calls the "slow path" check. If the exact match fails, we check to see if
49the value, and the remainder of the shadow bytes, is 0. If they are, we can set the shadow memory to
50the correct type descriptor pointer for the first byte, and the correct negative numbers for the rest
51of the type's shadow.
52
53If the type in shadow memory is neither an exact match nor 0, we call the slower runtime check. It
54uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to
55alias.
56
57The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset,
58memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory
59to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the
60same function for the library calls.
61
62How to build
63============
64
65Build LLVM/Clang with `CMake <https://llvm.org/docs/CMake.html>`_ and enable
66the ``compiler-rt`` runtime. An example CMake configuration that will allow
67for the use/testing of TypeSanitizer:
68
69.. code-block:: console
70
71   $ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="compiler-rt" <path to source>/llvm
72
73Usage
74=====
75
76Compile and link your program with ``-fsanitize=type`` flag. The
77TypeSanitizer run-time library should be linked to the final executable, so
78make sure to use ``clang`` (not ``ld``) for the final link step. To
79get a reasonable performance add ``-O1`` or higher.
80TypeSanitizer by default doesn't print the full stack trace in error messages. Use ``TYSAN_OPTIONS=print_stacktrace=1``
81to print the full trace. To get nicer stack traces in error messages add ``-fno-omit-frame-pointer`` and
82``-g``.  To get perfect stack traces you may need to disable inlining (just use ``-O1``) and tail call elimination
83(``-fno-optimize-sibling-calls``).
84
85.. code-block:: console
86
87    % cat example_AliasViolation.c
88    int main(int argc, char **argv) {
89      int x = 100;
90      float *y = (float*)&x;
91      *y += 2.0f;          // Strict aliasing violation
92      return 0;
93    }
94
95    # Compile and link
96    % clang++ -g -fsanitize=type example_AliasViolation.cc
97
98The program will print an error message to ``stderr`` each time a strict aliasing violation is detected.
99The program won't terminate, which will allow you to detect many strict aliasing violations in one
100run.
101
102.. code-block:: console
103
104    % ./a.out
105    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1145ff41 bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
106    READ of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
107        #0 0x5b3b1145ff40 in main example_AliasViolation.c:4:10
108
109    ==1375532==ERROR: TypeSanitizer: type-aliasing-violation on address 0x7ffeebf1a72c (pc 0x5b3b1146008a bp 0x7ffeebf1a660 sp 0x7ffeebf19e08 tid 1375532)
110    WRITE of size 4 at 0x7ffeebf1a72c with type float accesses an existing object of type int
111        #0 0x5b3b11460089 in main example_AliasViolation.c:4:10
112
113Error terminology
114------------------
115
116There are some terms that may appear in TypeSanitizer errors that are derived from
117`TBAA Metadata <https://llvm.org/docs/LangRef.html#tbaa-metadata>`. This section hopes to provide a
118brief dictionary of these terms.
119
120* ``omnipotent char``: This is a special type which can alias with anything. Its name comes from the C/C++
121  type ``char``.
122* ``type p[x]``: This signifies pointers to the type. ``x`` is the number of indirections to reach the final value.
123  As an example, a pointer to a pointer to an integer would be ``type p2 int``.
124
125TypeSanitizer is still experimental. User-facing error messages should be improved in the future to remove
126references to LLVM IR specific terms.
127
128Sanitizer features
129==================
130
131``__has_feature(type_sanitizer)``
132------------------------------------
133
134In some cases one may need to execute different code depending on whether
135TypeSanitizer is enabled.
136:ref:`\_\_has\_feature <langext-__has_feature-__has_extension>` can be used for
137this purpose.
138
139.. code-block:: c
140
141    #if defined(__has_feature)
142    #  if __has_feature(type_sanitizer)
143    // code that builds only under TypeSanitizer
144    #  endif
145    #endif
146
147``__attribute__((no_sanitize("type")))``
148-----------------------------------------------
149
150Some code you may not want to be instrumented by TypeSanitizer.  One may use the
151function attribute ``no_sanitize("type")`` to disable instrumenting type aliasing.
152It is possible, depending on what happens in non-instrumented code, that instrumented code
153emits false-positives/ false-negatives. This attribute may not be supported by other
154compilers, so we suggest to use it together with ``__has_feature(type_sanitizer)``.
155
156``__attribute__((disable_sanitizer_instrumentation))``
157--------------------------------------------------------
158
159The ``disable_sanitizer_instrumentation`` attribute can be applied to functions
160to prevent all kinds of instrumentation. As a result, it may introduce false
161positives and incorrect stack traces. Therefore, it should be used with care,
162and only if absolutely required; for example for certain code that cannot
163tolerate any instrumentation and resulting side-effects. This attribute
164overrides ``no_sanitize("type")``.
165
166Ignorelist
167----------
168
169TypeSanitizer supports ``src`` and ``fun`` entity types in
170:doc:`SanitizerSpecialCaseList`, that can be used to suppress aliasing
171violation reports in the specified source files or functions. Like
172with other methods of ignoring instrumentation, this can result in false
173positives/ false-negatives.
174
175Limitations
176-----------
177
178* TypeSanitizer uses more real memory than a native run. It uses 8 bytes of
179  shadow memory for each byte of user memory.
180* There are transformation passes which run before TypeSanitizer. If these
181  passes optimize out an aliasing violation, TypeSanitizer cannot catch it.
182* Currently, all instrumentation is inlined. This can result in a **15x**
183  (on average) increase in generated file size, and **3x** to **7x** increase
184  in compile time. In some documented cases this can cause the compiler to hang.
185  There are plans to improve this in the future.
186* Codebases that use unions and struct-initialized variables can see incorrect
187  results, as TypeSanitizer doesn't yet instrument these reliably.
188* Since Clang & LLVM's TBAA system is used to generate the checks used by the
189  instrumentation, TypeSanitizer follows Clang & LLVM's rules for type aliasing.
190  There may be situations where that disagrees with the standard. However this
191  does at least mean that TypeSanitizer will catch any aliasing violations that
192  would cause bugs when compiling with Clang & LLVM.
193* TypeSanitizer cannot currently be run alongside other sanitizers such as
194  AddressSanitizer, ThreadSanitizer or UndefinedBehaviourSanitizer.
195
196Current Status
197--------------
198
199TypeSanitizer is brand new, and still in development. There are some known
200issues, especially in areas where Clang's emitted TBAA data isn't extensive
201enough for TypeSanitizer's runtime.
202
203We are actively working on enhancing the tool --- stay tuned.  Any help,
204issues, pull requests, ideas, is more than welcome. You can find the
205`issue tracker here.<https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20TySan%20label%3Acompiler-rt%3Atysan>`
206