xref: /minix3/external/bsd/llvm/dist/clang/docs/PTHInternals.rst (revision 0a6a1f1d05b60e214de2f05a7310ddd1f0e590e7)
1f4a2713aSLionel Sambuc==========================
2f4a2713aSLionel SambucPretokenized Headers (PTH)
3f4a2713aSLionel Sambuc==========================
4f4a2713aSLionel Sambuc
5f4a2713aSLionel SambucThis document first describes the low-level interface for using PTH and
6f4a2713aSLionel Sambucthen briefly elaborates on its design and implementation. If you are
7f4a2713aSLionel Sambucinterested in the end-user view, please see the :ref:`User's Manual
8f4a2713aSLionel Sambuc<usersmanual-precompiled-headers>`.
9f4a2713aSLionel Sambuc
10f4a2713aSLionel SambucUsing Pretokenized Headers with ``clang`` (Low-level Interface)
11f4a2713aSLionel Sambuc===============================================================
12f4a2713aSLionel Sambuc
13f4a2713aSLionel SambucThe Clang compiler frontend, ``clang -cc1``, supports three command line
14f4a2713aSLionel Sambucoptions for generating and using PTH files.
15f4a2713aSLionel Sambuc
16f4a2713aSLionel SambucTo generate PTH files using ``clang -cc1``, use the option ``-emit-pth``:
17f4a2713aSLionel Sambuc
18f4a2713aSLionel Sambuc.. code-block:: console
19f4a2713aSLionel Sambuc
20f4a2713aSLionel Sambuc  $ clang -cc1 test.h -emit-pth -o test.h.pth
21f4a2713aSLionel Sambuc
22f4a2713aSLionel SambucThis option is transparently used by ``clang`` when generating PTH
23f4a2713aSLionel Sambucfiles. Similarly, PTH files can be used as prefix headers using the
24f4a2713aSLionel Sambuc``-include-pth`` option:
25f4a2713aSLionel Sambuc
26f4a2713aSLionel Sambuc.. code-block:: console
27f4a2713aSLionel Sambuc
28f4a2713aSLionel Sambuc  $ clang -cc1 -include-pth test.h.pth test.c -o test.s
29f4a2713aSLionel Sambuc
30f4a2713aSLionel SambucAlternatively, Clang's PTH files can be used as a raw "token-cache" (or
31f4a2713aSLionel Sambuc"content" cache) of the source included by the original header file.
32f4a2713aSLionel SambucThis means that the contents of the PTH file are searched as substitutes
33f4a2713aSLionel Sambucfor *any* source files that are used by ``clang -cc1`` to process a
34f4a2713aSLionel Sambucsource file. This is done by specifying the ``-token-cache`` option:
35f4a2713aSLionel Sambuc
36f4a2713aSLionel Sambuc.. code-block:: console
37f4a2713aSLionel Sambuc
38f4a2713aSLionel Sambuc  $ cat test.h
39f4a2713aSLionel Sambuc  #include <stdio.h>
40f4a2713aSLionel Sambuc  $ clang -cc1 -emit-pth test.h -o test.h.pth
41f4a2713aSLionel Sambuc  $ cat test.c
42f4a2713aSLionel Sambuc  #include "test.h"
43f4a2713aSLionel Sambuc  $ clang -cc1 test.c -o test -token-cache test.h.pth
44f4a2713aSLionel Sambuc
45f4a2713aSLionel SambucIn this example the contents of ``stdio.h`` (and the files it includes)
46f4a2713aSLionel Sambucwill be retrieved from ``test.h.pth``, as the PTH file is being used in
47f4a2713aSLionel Sambucthis case as a raw cache of the contents of ``test.h``. This is a
48f4a2713aSLionel Sambuclow-level interface used to both implement the high-level PTH interface
49f4a2713aSLionel Sambucas well as to provide alternative means to use PTH-style caching.
50f4a2713aSLionel Sambuc
51f4a2713aSLionel SambucPTH Design and Implementation
52f4a2713aSLionel Sambuc=============================
53f4a2713aSLionel Sambuc
54f4a2713aSLionel SambucUnlike GCC's precompiled headers, which cache the full ASTs and
55f4a2713aSLionel Sambucpreprocessor state of a header file, Clang's pretokenized header files
56f4a2713aSLionel Sambucmainly cache the raw lexer *tokens* that are needed to segment the
57f4a2713aSLionel Sambucstream of characters in a source file into keywords, identifiers, and
58f4a2713aSLionel Sambucoperators. Consequently, PTH serves to mainly directly speed up the
59f4a2713aSLionel Sambuclexing and preprocessing of a source file, while parsing and
60f4a2713aSLionel Sambuctype-checking must be completely redone every time a PTH file is used.
61f4a2713aSLionel Sambuc
62f4a2713aSLionel SambucBasic Design Tradeoffs
63f4a2713aSLionel Sambuc----------------------
64f4a2713aSLionel Sambuc
65f4a2713aSLionel SambucIn the long term there are plans to provide an alternate PCH
66f4a2713aSLionel Sambucimplementation for Clang that also caches the work for parsing and type
67f4a2713aSLionel Sambucchecking the contents of header files. The current implementation of PCH
68f4a2713aSLionel Sambucin Clang as pretokenized header files was motivated by the following
69f4a2713aSLionel Sambucfactors:
70f4a2713aSLionel Sambuc
71f4a2713aSLionel Sambuc**Language independence**
72f4a2713aSLionel Sambuc   PTH files work with any language that
73f4a2713aSLionel Sambuc   Clang's lexer can handle, including C, Objective-C, and (in the early
74f4a2713aSLionel Sambuc   stages) C++. This means development on language features at the
75f4a2713aSLionel Sambuc   parsing level or above (which is basically almost all interesting
76f4a2713aSLionel Sambuc   pieces) does not require PTH to be modified.
77f4a2713aSLionel Sambuc
78f4a2713aSLionel Sambuc**Simple design**
79f4a2713aSLionel Sambuc   Relatively speaking, PTH has a simple design and
80f4a2713aSLionel Sambuc   implementation, making it easy to test. Further, because the
81f4a2713aSLionel Sambuc   machinery for PTH resides at the lower-levels of the Clang library
82f4a2713aSLionel Sambuc   stack it is fairly straightforward to profile and optimize.
83f4a2713aSLionel Sambuc
84f4a2713aSLionel SambucFurther, compared to GCC's PCH implementation (which is the dominate
85f4a2713aSLionel Sambucprecompiled header file implementation that Clang can be directly
86f4a2713aSLionel Sambuccompared against) the PTH design in Clang yields several attractive
87f4a2713aSLionel Sambucfeatures:
88f4a2713aSLionel Sambuc
89f4a2713aSLionel Sambuc**Architecture independence**
90f4a2713aSLionel Sambuc   In contrast to GCC's PCH files (and
91f4a2713aSLionel Sambuc   those of several other compilers), Clang's PTH files are architecture
92f4a2713aSLionel Sambuc   independent, requiring only a single PTH file when building a
93f4a2713aSLionel Sambuc   program for multiple architectures.
94f4a2713aSLionel Sambuc
95f4a2713aSLionel Sambuc   For example, on Mac OS X one may wish to compile a "universal binary"
96f4a2713aSLionel Sambuc   that runs on PowerPC, 32-bit Intel (i386), and 64-bit Intel
97f4a2713aSLionel Sambuc   architectures. In contrast, GCC requires a PCH file for each
98f4a2713aSLionel Sambuc   architecture, as the definitions of types in the AST are
99f4a2713aSLionel Sambuc   architecture-specific. Since a Clang PTH file essentially represents
100f4a2713aSLionel Sambuc   a lexical cache of header files, a single PTH file can be safely used
101f4a2713aSLionel Sambuc   when compiling for multiple architectures. This can also reduce
102f4a2713aSLionel Sambuc   compile times because only a single PTH file needs to be generated
103f4a2713aSLionel Sambuc   during a build instead of several.
104f4a2713aSLionel Sambuc
105f4a2713aSLionel Sambuc**Reduced memory pressure**
106f4a2713aSLionel Sambuc   Similar to GCC, Clang reads PTH files
107f4a2713aSLionel Sambuc   via the use of memory mapping (i.e., ``mmap``). Clang, however,
108f4a2713aSLionel Sambuc   memory maps PTH files as read-only, meaning that multiple invocations
109f4a2713aSLionel Sambuc   of ``clang -cc1`` can share the same pages in memory from a
110f4a2713aSLionel Sambuc   memory-mapped PTH file. In comparison, GCC also memory maps its PCH
111f4a2713aSLionel Sambuc   files but also modifies those pages in memory, incurring the
112f4a2713aSLionel Sambuc   copy-on-write costs. The read-only nature of PTH can greatly reduce
113f4a2713aSLionel Sambuc   memory pressure for builds involving multiple cores, thus improving
114f4a2713aSLionel Sambuc   overall scalability.
115f4a2713aSLionel Sambuc
116f4a2713aSLionel Sambuc**Fast generation**
117f4a2713aSLionel Sambuc   PTH files can be generated in a small fraction
118f4a2713aSLionel Sambuc   of the time needed to generate GCC's PCH files. Since PTH/PCH
119f4a2713aSLionel Sambuc   generation is a serial operation that typically blocks progress
120f4a2713aSLionel Sambuc   during a build, faster generation time leads to improved processor
121f4a2713aSLionel Sambuc   utilization with parallel builds on multicore machines.
122f4a2713aSLionel Sambuc
123f4a2713aSLionel SambucDespite these strengths, PTH's simple design suffers some algorithmic
124f4a2713aSLionel Sambuchandicaps compared to other PCH strategies such as those used by GCC.
125f4a2713aSLionel SambucWhile PTH can greatly speed up the processing time of a header file, the
126f4a2713aSLionel Sambucamount of work required to process a header file is still roughly linear
127f4a2713aSLionel Sambucin the size of the header file. In contrast, the amount of work done by
128f4a2713aSLionel SambucGCC to process a precompiled header is (theoretically) constant (the
129f4a2713aSLionel SambucASTs for the header are literally memory mapped into the compiler). This
130f4a2713aSLionel Sambucmeans that only the pieces of the header file that are referenced by the
131f4a2713aSLionel Sambucsource file including the header are the only ones the compiler needs to
132f4a2713aSLionel Sambucprocess during actual compilation. While GCC's particular implementation
133f4a2713aSLionel Sambucof PCH mitigates some of these algorithmic strengths via the use of
134f4a2713aSLionel Sambuccopy-on-write pages, the approach itself can fundamentally dominate at
135f4a2713aSLionel Sambucan algorithmic level, especially when one considers header files of
136f4a2713aSLionel Sambucarbitrary size.
137f4a2713aSLionel Sambuc
138*0a6a1f1dSLionel SambucThere is also a PCH implementation for Clang based on the lazy
139*0a6a1f1dSLionel Sambucdeserialization of ASTs. This approach theoretically has the same
140*0a6a1f1dSLionel Sambucconstant-time algorithmic advantages just mentioned but also retains some
141*0a6a1f1dSLionel Sambucof the strengths of PTH such as reduced memory pressure (ideal for
142*0a6a1f1dSLionel Sambucmulti-core builds).
143f4a2713aSLionel Sambuc
144f4a2713aSLionel SambucInternal PTH Optimizations
145f4a2713aSLionel Sambuc--------------------------
146f4a2713aSLionel Sambuc
147f4a2713aSLionel SambucWhile the main optimization employed by PTH is to reduce lexing time of
148f4a2713aSLionel Sambucheader files by caching pre-lexed tokens, PTH also employs several other
149f4a2713aSLionel Sambucoptimizations to speed up the processing of header files:
150f4a2713aSLionel Sambuc
151f4a2713aSLionel Sambuc-  ``stat`` caching: PTH files cache information obtained via calls to
152f4a2713aSLionel Sambuc   ``stat`` that ``clang -cc1`` uses to resolve which files are included
153f4a2713aSLionel Sambuc   by ``#include`` directives. This greatly reduces the overhead
154f4a2713aSLionel Sambuc   involved in context-switching to the kernel to resolve included
155f4a2713aSLionel Sambuc   files.
156f4a2713aSLionel Sambuc
157f4a2713aSLionel Sambuc-  Fast skipping of ``#ifdef`` ... ``#endif`` chains: PTH files
158f4a2713aSLionel Sambuc   record the basic structure of nested preprocessor blocks. When the
159f4a2713aSLionel Sambuc   condition of the preprocessor block is false, all of its tokens are
160f4a2713aSLionel Sambuc   immediately skipped instead of requiring them to be handled by
161f4a2713aSLionel Sambuc   Clang's preprocessor.
162f4a2713aSLionel Sambuc
163f4a2713aSLionel Sambuc
164