xref: /minix3/external/bsd/llvm/dist/clang/docs/PCHInternals.rst (revision f4a2713ac843a11c696ec80c0a5e3e5d80b4d338)
1*f4a2713aSLionel Sambuc========================================
2*f4a2713aSLionel SambucPrecompiled Header and Modules Internals
3*f4a2713aSLionel Sambuc========================================
4*f4a2713aSLionel Sambuc
5*f4a2713aSLionel Sambuc.. contents::
6*f4a2713aSLionel Sambuc   :local:
7*f4a2713aSLionel Sambuc
8*f4a2713aSLionel SambucThis document describes the design and implementation of Clang's precompiled
9*f4a2713aSLionel Sambucheaders (PCH) and modules.  If you are interested in the end-user view, please
10*f4a2713aSLionel Sambucsee the :ref:`User's Manual <usersmanual-precompiled-headers>`.
11*f4a2713aSLionel Sambuc
12*f4a2713aSLionel SambucUsing Precompiled Headers with ``clang``
13*f4a2713aSLionel Sambuc----------------------------------------
14*f4a2713aSLionel Sambuc
15*f4a2713aSLionel SambucThe Clang compiler frontend, ``clang -cc1``, supports two command line options
16*f4a2713aSLionel Sambucfor generating and using PCH files.
17*f4a2713aSLionel Sambuc
18*f4a2713aSLionel SambucTo generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`:
19*f4a2713aSLionel Sambuc
20*f4a2713aSLionel Sambuc.. code-block:: bash
21*f4a2713aSLionel Sambuc
22*f4a2713aSLionel Sambuc  $ clang -cc1 test.h -emit-pch -o test.h.pch
23*f4a2713aSLionel Sambuc
24*f4a2713aSLionel SambucThis option is transparently used by ``clang`` when generating PCH files.  The
25*f4a2713aSLionel Sambucresulting PCH file contains the serialized form of the compiler's internal
26*f4a2713aSLionel Sambucrepresentation after it has completed parsing and semantic analysis.  The PCH
27*f4a2713aSLionel Sambucfile can then be used as a prefix header with the :option:`-include-pch`
28*f4a2713aSLionel Sambucoption:
29*f4a2713aSLionel Sambuc
30*f4a2713aSLionel Sambuc.. code-block:: bash
31*f4a2713aSLionel Sambuc
32*f4a2713aSLionel Sambuc  $ clang -cc1 -include-pch test.h.pch test.c -o test.s
33*f4a2713aSLionel Sambuc
34*f4a2713aSLionel SambucDesign Philosophy
35*f4a2713aSLionel Sambuc-----------------
36*f4a2713aSLionel Sambuc
37*f4a2713aSLionel SambucPrecompiled headers are meant to improve overall compile times for projects, so
38*f4a2713aSLionel Sambucthe design of precompiled headers is entirely driven by performance concerns.
39*f4a2713aSLionel SambucThe use case for precompiled headers is relatively simple: when there is a
40*f4a2713aSLionel Sambuccommon set of headers that is included in nearly every source file in the
41*f4a2713aSLionel Sambucproject, we *precompile* that bundle of headers into a single precompiled
42*f4a2713aSLionel Sambucheader (PCH file).  Then, when compiling the source files in the project, we
43*f4a2713aSLionel Sambucload the PCH file first (as a prefix header), which acts as a stand-in for that
44*f4a2713aSLionel Sambucbundle of headers.
45*f4a2713aSLionel Sambuc
46*f4a2713aSLionel SambucA precompiled header implementation improves performance when:
47*f4a2713aSLionel Sambuc
48*f4a2713aSLionel Sambuc* Loading the PCH file is significantly faster than re-parsing the bundle of
49*f4a2713aSLionel Sambuc  headers stored within the PCH file.  Thus, a precompiled header design
50*f4a2713aSLionel Sambuc  attempts to minimize the cost of reading the PCH file.  Ideally, this cost
51*f4a2713aSLionel Sambuc  should not vary with the size of the precompiled header file.
52*f4a2713aSLionel Sambuc
53*f4a2713aSLionel Sambuc* The cost of generating the PCH file initially is not so large that it
54*f4a2713aSLionel Sambuc  counters the per-source-file performance improvement due to eliminating the
55*f4a2713aSLionel Sambuc  need to parse the bundled headers in the first place.  This is particularly
56*f4a2713aSLionel Sambuc  important on multi-core systems, because PCH file generation serializes the
57*f4a2713aSLionel Sambuc  build when all compilations require the PCH file to be up-to-date.
58*f4a2713aSLionel Sambuc
59*f4a2713aSLionel SambucModules, as implemented in Clang, use the same mechanisms as precompiled
60*f4a2713aSLionel Sambucheaders to save a serialized AST file (one per module) and use those AST
61*f4a2713aSLionel Sambucmodules.  From an implementation standpoint, modules are a generalization of
62*f4a2713aSLionel Sambucprecompiled headers, lifting a number of restrictions placed on precompiled
63*f4a2713aSLionel Sambucheaders.  In particular, there can only be one precompiled header and it must
64*f4a2713aSLionel Sambucbe included at the beginning of the translation unit.  The extensions to the
65*f4a2713aSLionel SambucAST file format required for modules are discussed in the section on
66*f4a2713aSLionel Sambuc:ref:`modules <pchinternals-modules>`.
67*f4a2713aSLionel Sambuc
68*f4a2713aSLionel SambucClang's AST files are designed with a compact on-disk representation, which
69*f4a2713aSLionel Sambucminimizes both creation time and the time required to initially load the AST
70*f4a2713aSLionel Sambucfile.  The AST file itself contains a serialized representation of Clang's
71*f4a2713aSLionel Sambucabstract syntax trees and supporting data structures, stored using the same
72*f4a2713aSLionel Sambuccompressed bitstream as `LLVM's bitcode file format
73*f4a2713aSLionel Sambuc<http://llvm.org/docs/BitCodeFormat.html>`_.
74*f4a2713aSLionel Sambuc
75*f4a2713aSLionel SambucClang's AST files are loaded "lazily" from disk.  When an AST file is initially
76*f4a2713aSLionel Sambucloaded, Clang reads only a small amount of data from the AST file to establish
77*f4a2713aSLionel Sambucwhere certain important data structures are stored.  The amount of data read in
78*f4a2713aSLionel Sambucthis initial load is independent of the size of the AST file, such that a
79*f4a2713aSLionel Sambuclarger AST file does not lead to longer AST load times.  The actual header data
80*f4a2713aSLionel Sambucin the AST file --- macros, functions, variables, types, etc. --- is loaded
81*f4a2713aSLionel Sambuconly when it is referenced from the user's code, at which point only that
82*f4a2713aSLionel Sambucentity (and those entities it depends on) are deserialized from the AST file.
83*f4a2713aSLionel SambucWith this approach, the cost of using an AST file for a translation unit is
84*f4a2713aSLionel Sambucproportional to the amount of code actually used from the AST file, rather than
85*f4a2713aSLionel Sambucbeing proportional to the size of the AST file itself.
86*f4a2713aSLionel Sambuc
87*f4a2713aSLionel SambucWhen given the :option:`-print-stats` option, Clang produces statistics
88*f4a2713aSLionel Sambucdescribing how much of the AST file was actually loaded from disk.  For a
89*f4a2713aSLionel Sambucsimple "Hello, World!" program that includes the Apple ``Cocoa.h`` header
90*f4a2713aSLionel Sambuc(which is built as a precompiled header), this option illustrates how little of
91*f4a2713aSLionel Sambucthe actual precompiled header is required:
92*f4a2713aSLionel Sambuc
93*f4a2713aSLionel Sambuc.. code-block:: none
94*f4a2713aSLionel Sambuc
95*f4a2713aSLionel Sambuc  *** AST File Statistics:
96*f4a2713aSLionel Sambuc    895/39981 source location entries read (2.238563%)
97*f4a2713aSLionel Sambuc    19/15315 types read (0.124061%)
98*f4a2713aSLionel Sambuc    20/82685 declarations read (0.024188%)
99*f4a2713aSLionel Sambuc    154/58070 identifiers read (0.265197%)
100*f4a2713aSLionel Sambuc    0/7260 selectors read (0.000000%)
101*f4a2713aSLionel Sambuc    0/30842 statements read (0.000000%)
102*f4a2713aSLionel Sambuc    4/8400 macros read (0.047619%)
103*f4a2713aSLionel Sambuc    1/4995 lexical declcontexts read (0.020020%)
104*f4a2713aSLionel Sambuc    0/4413 visible declcontexts read (0.000000%)
105*f4a2713aSLionel Sambuc    0/7230 method pool entries read (0.000000%)
106*f4a2713aSLionel Sambuc    0 method pool misses
107*f4a2713aSLionel Sambuc
108*f4a2713aSLionel SambucFor this small program, only a tiny fraction of the source locations, types,
109*f4a2713aSLionel Sambucdeclarations, identifiers, and macros were actually deserialized from the
110*f4a2713aSLionel Sambucprecompiled header.  These statistics can be useful to determine whether the
111*f4a2713aSLionel SambucAST file implementation can be improved by making more of the implementation
112*f4a2713aSLionel Sambuclazy.
113*f4a2713aSLionel Sambuc
114*f4a2713aSLionel SambucPrecompiled headers can be chained.  When you create a PCH while including an
115*f4a2713aSLionel Sambucexisting PCH, Clang can create the new PCH by referencing the original file and
116*f4a2713aSLionel Sambuconly writing the new data to the new file.  For example, you could create a PCH
117*f4a2713aSLionel Sambucout of all the headers that are very commonly used throughout your project, and
118*f4a2713aSLionel Sambucthen create a PCH for every single source file in the project that includes the
119*f4a2713aSLionel Sambuccode that is specific to that file, so that recompiling the file itself is very
120*f4a2713aSLionel Sambucfast, without duplicating the data from the common headers for every file.  The
121*f4a2713aSLionel Sambucmechanisms behind chained precompiled headers are discussed in a :ref:`later
122*f4a2713aSLionel Sambucsection <pchinternals-chained>`.
123*f4a2713aSLionel Sambuc
124*f4a2713aSLionel SambucAST File Contents
125*f4a2713aSLionel Sambuc-----------------
126*f4a2713aSLionel Sambuc
127*f4a2713aSLionel SambucClang's AST files are organized into several different blocks, each of which
128*f4a2713aSLionel Sambuccontains the serialized representation of a part of Clang's internal
129*f4a2713aSLionel Sambucrepresentation.  Each of the blocks corresponds to either a block or a record
130*f4a2713aSLionel Sambucwithin `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
131*f4a2713aSLionel SambucThe contents of each of these logical blocks are described below.
132*f4a2713aSLionel Sambuc
133*f4a2713aSLionel Sambuc.. image:: PCHLayout.png
134*f4a2713aSLionel Sambuc
135*f4a2713aSLionel SambucFor a given AST file, the `llvm-bcanalyzer
136*f4a2713aSLionel Sambuc<http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ utility can be used
137*f4a2713aSLionel Sambucto examine the actual structure of the bitstream for the AST file.  This
138*f4a2713aSLionel Sambucinformation can be used both to help understand the structure of the AST file
139*f4a2713aSLionel Sambucand to isolate areas where AST files can still be optimized, e.g., through the
140*f4a2713aSLionel Sambucintroduction of abbreviations.
141*f4a2713aSLionel Sambuc
142*f4a2713aSLionel SambucMetadata Block
143*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^
144*f4a2713aSLionel Sambuc
145*f4a2713aSLionel SambucThe metadata block contains several records that provide information about how
146*f4a2713aSLionel Sambucthe AST file was built.  This metadata is primarily used to validate the use of
147*f4a2713aSLionel Sambucan AST file.  For example, a precompiled header built for a 32-bit x86 target
148*f4a2713aSLionel Sambuccannot be used when compiling for a 64-bit x86 target.  The metadata block
149*f4a2713aSLionel Sambuccontains information about:
150*f4a2713aSLionel Sambuc
151*f4a2713aSLionel SambucLanguage options
152*f4a2713aSLionel Sambuc  Describes the particular language dialect used to compile the AST file,
153*f4a2713aSLionel Sambuc  including major options (e.g., Objective-C support) and more minor options
154*f4a2713aSLionel Sambuc  (e.g., support for "``//``" comments).  The contents of this record correspond to
155*f4a2713aSLionel Sambuc  the ``LangOptions`` class.
156*f4a2713aSLionel Sambuc
157*f4a2713aSLionel SambucTarget architecture
158*f4a2713aSLionel Sambuc  The target triple that describes the architecture, platform, and ABI for
159*f4a2713aSLionel Sambuc  which the AST file was generated, e.g., ``i386-apple-darwin9``.
160*f4a2713aSLionel Sambuc
161*f4a2713aSLionel SambucAST version
162*f4a2713aSLionel Sambuc  The major and minor version numbers of the AST file format.  Changes in the
163*f4a2713aSLionel Sambuc  minor version number should not affect backward compatibility, while changes
164*f4a2713aSLionel Sambuc  in the major version number imply that a newer compiler cannot read an older
165*f4a2713aSLionel Sambuc  precompiled header (and vice-versa).
166*f4a2713aSLionel Sambuc
167*f4a2713aSLionel SambucOriginal file name
168*f4a2713aSLionel Sambuc  The full path of the header that was used to generate the AST file.
169*f4a2713aSLionel Sambuc
170*f4a2713aSLionel SambucPredefines buffer
171*f4a2713aSLionel Sambuc  Although not explicitly stored as part of the metadata, the predefines buffer
172*f4a2713aSLionel Sambuc  is used in the validation of the AST file.  The predefines buffer itself
173*f4a2713aSLionel Sambuc  contains code generated by the compiler to initialize the preprocessor state
174*f4a2713aSLionel Sambuc  according to the current target, platform, and command-line options.  For
175*f4a2713aSLionel Sambuc  example, the predefines buffer will contain "``#define __STDC__ 1``" when we
176*f4a2713aSLionel Sambuc  are compiling C without Microsoft extensions.  The predefines buffer itself
177*f4a2713aSLionel Sambuc  is stored within the :ref:`pchinternals-sourcemgr`, but its contents are
178*f4a2713aSLionel Sambuc  verified along with the rest of the metadata.
179*f4a2713aSLionel Sambuc
180*f4a2713aSLionel SambucA chained PCH file (that is, one that references another PCH) and a module
181*f4a2713aSLionel Sambuc(which may import other modules) have additional metadata containing the list
182*f4a2713aSLionel Sambucof all AST files that this AST file depends on.  Each of those files will be
183*f4a2713aSLionel Sambucloaded along with this AST file.
184*f4a2713aSLionel Sambuc
185*f4a2713aSLionel SambucFor chained precompiled headers, the language options, target architecture and
186*f4a2713aSLionel Sambucpredefines buffer data is taken from the end of the chain, since they have to
187*f4a2713aSLionel Sambucmatch anyway.
188*f4a2713aSLionel Sambuc
189*f4a2713aSLionel Sambuc.. _pchinternals-sourcemgr:
190*f4a2713aSLionel Sambuc
191*f4a2713aSLionel SambucSource Manager Block
192*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^
193*f4a2713aSLionel Sambuc
194*f4a2713aSLionel SambucThe source manager block contains the serialized representation of Clang's
195*f4a2713aSLionel Sambuc:ref:`SourceManager <SourceManager>` class, which handles the mapping from
196*f4a2713aSLionel Sambucsource locations (as represented in Clang's abstract syntax tree) into actual
197*f4a2713aSLionel Sambuccolumn/line positions within a source file or macro instantiation.  The AST
198*f4a2713aSLionel Sambucfile's representation of the source manager also includes information about all
199*f4a2713aSLionel Sambucof the headers that were (transitively) included when building the AST file.
200*f4a2713aSLionel Sambuc
201*f4a2713aSLionel SambucThe bulk of the source manager block is dedicated to information about the
202*f4a2713aSLionel Sambucvarious files, buffers, and macro instantiations into which a source location
203*f4a2713aSLionel Sambuccan refer.  Each of these is referenced by a numeric "file ID", which is a
204*f4a2713aSLionel Sambucunique number (allocated starting at 1) stored in the source location.  Clang
205*f4a2713aSLionel Sambucserializes the information for each kind of file ID, along with an index that
206*f4a2713aSLionel Sambucmaps file IDs to the position within the AST file where the information about
207*f4a2713aSLionel Sambucthat file ID is stored.  The data associated with a file ID is loaded only when
208*f4a2713aSLionel Sambucrequired by the front end, e.g., to emit a diagnostic that includes a macro
209*f4a2713aSLionel Sambucinstantiation history inside the header itself.
210*f4a2713aSLionel Sambuc
211*f4a2713aSLionel SambucThe source manager block also contains information about all of the headers
212*f4a2713aSLionel Sambucthat were included when building the AST file.  This includes information about
213*f4a2713aSLionel Sambucthe controlling macro for the header (e.g., when the preprocessor identified
214*f4a2713aSLionel Sambucthat the contents of the header dependent on a macro like
215*f4a2713aSLionel Sambuc``LLVM_CLANG_SOURCEMANAGER_H``).
216*f4a2713aSLionel Sambuc
217*f4a2713aSLionel Sambuc.. _pchinternals-preprocessor:
218*f4a2713aSLionel Sambuc
219*f4a2713aSLionel SambucPreprocessor Block
220*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^
221*f4a2713aSLionel Sambuc
222*f4a2713aSLionel SambucThe preprocessor block contains the serialized representation of the
223*f4a2713aSLionel Sambucpreprocessor.  Specifically, it contains all of the macros that have been
224*f4a2713aSLionel Sambucdefined by the end of the header used to build the AST file, along with the
225*f4a2713aSLionel Sambuctoken sequences that comprise each macro.  The macro definitions are only read
226*f4a2713aSLionel Sambucfrom the AST file when the name of the macro first occurs in the program.  This
227*f4a2713aSLionel Sambuclazy loading of macro definitions is triggered by lookups into the
228*f4a2713aSLionel Sambuc:ref:`identifier table <pchinternals-ident-table>`.
229*f4a2713aSLionel Sambuc
230*f4a2713aSLionel Sambuc.. _pchinternals-types:
231*f4a2713aSLionel Sambuc
232*f4a2713aSLionel SambucTypes Block
233*f4a2713aSLionel Sambuc^^^^^^^^^^^
234*f4a2713aSLionel Sambuc
235*f4a2713aSLionel SambucThe types block contains the serialized representation of all of the types
236*f4a2713aSLionel Sambucreferenced in the translation unit.  Each Clang type node (``PointerType``,
237*f4a2713aSLionel Sambuc``FunctionProtoType``, etc.) has a corresponding record type in the AST file.
238*f4a2713aSLionel SambucWhen types are deserialized from the AST file, the data within the record is
239*f4a2713aSLionel Sambucused to reconstruct the appropriate type node using the AST context.
240*f4a2713aSLionel Sambuc
241*f4a2713aSLionel SambucEach type has a unique type ID, which is an integer that uniquely identifies
242*f4a2713aSLionel Sambucthat type.  Type ID 0 represents the NULL type, type IDs less than
243*f4a2713aSLionel Sambuc``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.),
244*f4a2713aSLionel Sambucwhile other "user-defined" type IDs are assigned consecutively from
245*f4a2713aSLionel Sambuc``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered.  The AST file has
246*f4a2713aSLionel Sambucan associated mapping from the user-defined types block to the location within
247*f4a2713aSLionel Sambucthe types block where the serialized representation of that type resides,
248*f4a2713aSLionel Sambucenabling lazy deserialization of types.  When a type is referenced from within
249*f4a2713aSLionel Sambucthe AST file, that reference is encoded using the type ID shifted left by 3
250*f4a2713aSLionel Sambucbits.  The lower three bits are used to represent the ``const``, ``volatile``,
251*f4a2713aSLionel Sambucand ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class.
252*f4a2713aSLionel Sambuc
253*f4a2713aSLionel Sambuc.. _pchinternals-decls:
254*f4a2713aSLionel Sambuc
255*f4a2713aSLionel SambucDeclarations Block
256*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^
257*f4a2713aSLionel Sambuc
258*f4a2713aSLionel SambucThe declarations block contains the serialized representation of all of the
259*f4a2713aSLionel Sambucdeclarations referenced in the translation unit.  Each Clang declaration node
260*f4a2713aSLionel Sambuc(``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the
261*f4a2713aSLionel SambucAST file.  When declarations are deserialized from the AST file, the data
262*f4a2713aSLionel Sambucwithin the record is used to build and populate a new instance of the
263*f4a2713aSLionel Sambuccorresponding ``Decl`` node.  As with types, each declaration node has a
264*f4a2713aSLionel Sambucnumeric ID that is used to refer to that declaration within the AST file.  In
265*f4a2713aSLionel Sambucaddition, a lookup table provides a mapping from that numeric ID to the offset
266*f4a2713aSLionel Sambucwithin the precompiled header where that declaration is described.
267*f4a2713aSLionel Sambuc
268*f4a2713aSLionel SambucDeclarations in Clang's abstract syntax trees are stored hierarchically.  At
269*f4a2713aSLionel Sambucthe top of the hierarchy is the translation unit (``TranslationUnitDecl``),
270*f4a2713aSLionel Sambucwhich contains all of the declarations in the translation unit but is not
271*f4a2713aSLionel Sambucactually written as a specific declaration node.  Its child declarations (such
272*f4a2713aSLionel Sambucas functions or struct types) may also contain other declarations inside them,
273*f4a2713aSLionel Sambucand so on.  Within Clang, each declaration is stored within a :ref:`declaration
274*f4a2713aSLionel Sambuccontext <DeclContext>`, as represented by the ``DeclContext`` class.
275*f4a2713aSLionel SambucDeclaration contexts provide the mechanism to perform name lookup within a
276*f4a2713aSLionel Sambucgiven declaration (e.g., find the member named ``x`` in a structure) and
277*f4a2713aSLionel Sambuciterate over the declarations stored within a context (e.g., iterate over all
278*f4a2713aSLionel Sambucof the fields of a structure for structure layout).
279*f4a2713aSLionel Sambuc
280*f4a2713aSLionel SambucIn Clang's AST file format, deserializing a declaration that is a
281*f4a2713aSLionel Sambuc``DeclContext`` is a separate operation from deserializing all of the
282*f4a2713aSLionel Sambucdeclarations stored within that declaration context.  Therefore, Clang will
283*f4a2713aSLionel Sambucdeserialize the translation unit declaration without deserializing the
284*f4a2713aSLionel Sambucdeclarations within that translation unit.  When required, the declarations
285*f4a2713aSLionel Sambucstored within a declaration context will be deserialized.  There are two
286*f4a2713aSLionel Sambucrepresentations of the declarations within a declaration context, which
287*f4a2713aSLionel Sambuccorrespond to the name-lookup and iteration behavior described above:
288*f4a2713aSLionel Sambuc
289*f4a2713aSLionel Sambuc* When the front end performs name lookup to find a name ``x`` within a given
290*f4a2713aSLionel Sambuc  declaration context (for example, during semantic analysis of the expression
291*f4a2713aSLionel Sambuc  ``p->x``, where ``p``'s type is defined in the precompiled header), Clang
292*f4a2713aSLionel Sambuc  refers to an on-disk hash table that maps from the names within that
293*f4a2713aSLionel Sambuc  declaration context to the declaration IDs that represent each visible
294*f4a2713aSLionel Sambuc  declaration with that name.  The actual declarations will then be
295*f4a2713aSLionel Sambuc  deserialized to provide the results of name lookup.
296*f4a2713aSLionel Sambuc* When the front end performs iteration over all of the declarations within a
297*f4a2713aSLionel Sambuc  declaration context, all of those declarations are immediately
298*f4a2713aSLionel Sambuc  de-serialized.  For large declaration contexts (e.g., the translation unit),
299*f4a2713aSLionel Sambuc  this operation is expensive; however, large declaration contexts are not
300*f4a2713aSLionel Sambuc  traversed in normal compilation, since such a traversal is unnecessary.
301*f4a2713aSLionel Sambuc  However, it is common for the code generator and semantic analysis to
302*f4a2713aSLionel Sambuc  traverse declaration contexts for structs, classes, unions, and
303*f4a2713aSLionel Sambuc  enumerations, although those contexts contain relatively few declarations in
304*f4a2713aSLionel Sambuc  the common case.
305*f4a2713aSLionel Sambuc
306*f4a2713aSLionel SambucStatements and Expressions
307*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^
308*f4a2713aSLionel Sambuc
309*f4a2713aSLionel SambucStatements and expressions are stored in the AST file in both the :ref:`types
310*f4a2713aSLionel Sambuc<pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks,
311*f4a2713aSLionel Sambucbecause every statement or expression will be associated with either a type or
312*f4a2713aSLionel Sambucdeclaration.  The actual statement and expression records are stored
313*f4a2713aSLionel Sambucimmediately following the declaration or type that owns the statement or
314*f4a2713aSLionel Sambucexpression.  For example, the statement representing the body of a function
315*f4a2713aSLionel Sambucwill be stored directly following the declaration of the function.
316*f4a2713aSLionel Sambuc
317*f4a2713aSLionel SambucAs with types and declarations, each statement and expression kind in Clang's
318*f4a2713aSLionel Sambucabstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding
319*f4a2713aSLionel Sambucrecord type in the AST file, which contains the serialized representation of
320*f4a2713aSLionel Sambucthat statement or expression.  Each substatement or subexpression within an
321*f4a2713aSLionel Sambucexpression is stored as a separate record (which keeps most records to a fixed
322*f4a2713aSLionel Sambucsize).  Within the AST file, the subexpressions of an expression are stored, in
323*f4a2713aSLionel Sambucreverse order, prior to the expression that owns those expression, using a form
324*f4a2713aSLionel Sambucof `Reverse Polish Notation
325*f4a2713aSLionel Sambuc<http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_.  For example, an
326*f4a2713aSLionel Sambucexpression ``3 - 4 + 5`` would be represented as follows:
327*f4a2713aSLionel Sambuc
328*f4a2713aSLionel Sambuc+-----------------------+
329*f4a2713aSLionel Sambuc| ``IntegerLiteral(5)`` |
330*f4a2713aSLionel Sambuc+-----------------------+
331*f4a2713aSLionel Sambuc| ``IntegerLiteral(4)`` |
332*f4a2713aSLionel Sambuc+-----------------------+
333*f4a2713aSLionel Sambuc| ``IntegerLiteral(3)`` |
334*f4a2713aSLionel Sambuc+-----------------------+
335*f4a2713aSLionel Sambuc| ``IntegerLiteral(-)`` |
336*f4a2713aSLionel Sambuc+-----------------------+
337*f4a2713aSLionel Sambuc| ``IntegerLiteral(+)`` |
338*f4a2713aSLionel Sambuc+-----------------------+
339*f4a2713aSLionel Sambuc|       ``STOP``        |
340*f4a2713aSLionel Sambuc+-----------------------+
341*f4a2713aSLionel Sambuc
342*f4a2713aSLionel SambucWhen reading this representation, Clang evaluates each expression record it
343*f4a2713aSLionel Sambucencounters, builds the appropriate abstract syntax tree node, and then pushes
344*f4a2713aSLionel Sambucthat expression on to a stack.  When a record contains *N* subexpressions ---
345*f4a2713aSLionel Sambuc``BinaryOperator`` has two of them --- those expressions are popped from the
346*f4a2713aSLionel Sambuctop of the stack.  The special STOP code indicates that we have reached the end
347*f4a2713aSLionel Sambucof a serialized expression or statement; other expression or statement records
348*f4a2713aSLionel Sambucmay follow, but they are part of a different expression.
349*f4a2713aSLionel Sambuc
350*f4a2713aSLionel Sambuc.. _pchinternals-ident-table:
351*f4a2713aSLionel Sambuc
352*f4a2713aSLionel SambucIdentifier Table Block
353*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^
354*f4a2713aSLionel Sambuc
355*f4a2713aSLionel SambucThe identifier table block contains an on-disk hash table that maps each
356*f4a2713aSLionel Sambucidentifier mentioned within the AST file to the serialized representation of
357*f4a2713aSLionel Sambucthe identifier's information (e.g, the ``IdentifierInfo`` structure).  The
358*f4a2713aSLionel Sambucserialized representation contains:
359*f4a2713aSLionel Sambuc
360*f4a2713aSLionel Sambuc* The actual identifier string.
361*f4a2713aSLionel Sambuc* Flags that describe whether this identifier is the name of a built-in, a
362*f4a2713aSLionel Sambuc  poisoned identifier, an extension token, or a macro.
363*f4a2713aSLionel Sambuc* If the identifier names a macro, the offset of the macro definition within
364*f4a2713aSLionel Sambuc  the :ref:`pchinternals-preprocessor`.
365*f4a2713aSLionel Sambuc* If the identifier names one or more declarations visible from translation
366*f4a2713aSLionel Sambuc  unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these
367*f4a2713aSLionel Sambuc  declarations.
368*f4a2713aSLionel Sambuc
369*f4a2713aSLionel SambucWhen an AST file is loaded, the AST file reader mechanism introduces itself
370*f4a2713aSLionel Sambucinto the identifier table as an external lookup source.  Thus, when the user
371*f4a2713aSLionel Sambucprogram refers to an identifier that has not yet been seen, Clang will perform
372*f4a2713aSLionel Sambuca lookup into the identifier table.  If an identifier is found, its contents
373*f4a2713aSLionel Sambuc(macro definitions, flags, top-level declarations, etc.) will be deserialized,
374*f4a2713aSLionel Sambucat which point the corresponding ``IdentifierInfo`` structure will have the
375*f4a2713aSLionel Sambucsame contents it would have after parsing the headers in the AST file.
376*f4a2713aSLionel Sambuc
377*f4a2713aSLionel SambucWithin the AST file, the identifiers used to name declarations are represented
378*f4a2713aSLionel Sambucwith an integral value.  A separate table provides a mapping from this integral
379*f4a2713aSLionel Sambucvalue (the identifier ID) to the location within the on-disk hash table where
380*f4a2713aSLionel Sambucthat identifier is stored.  This mapping is used when deserializing the name of
381*f4a2713aSLionel Sambuca declaration, the identifier of a token, or any other construct in the AST
382*f4a2713aSLionel Sambucfile that refers to a name.
383*f4a2713aSLionel Sambuc
384*f4a2713aSLionel Sambuc.. _pchinternals-method-pool:
385*f4a2713aSLionel Sambuc
386*f4a2713aSLionel SambucMethod Pool Block
387*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^
388*f4a2713aSLionel Sambuc
389*f4a2713aSLionel SambucThe method pool block is represented as an on-disk hash table that serves two
390*f4a2713aSLionel Sambucpurposes: it provides a mapping from the names of Objective-C selectors to the
391*f4a2713aSLionel Sambucset of Objective-C instance and class methods that have that particular
392*f4a2713aSLionel Sambucselector (which is required for semantic analysis in Objective-C) and also
393*f4a2713aSLionel Sambucstores all of the selectors used by entities within the AST file.  The design
394*f4a2713aSLionel Sambucof the method pool is similar to that of the :ref:`identifier table
395*f4a2713aSLionel Sambuc<pchinternals-ident-table>`: the first time a particular selector is formed
396*f4a2713aSLionel Sambucduring the compilation of the program, Clang will search in the on-disk hash
397*f4a2713aSLionel Sambuctable of selectors; if found, Clang will read the Objective-C methods
398*f4a2713aSLionel Sambucassociated with that selector into the appropriate front-end data structure
399*f4a2713aSLionel Sambuc(``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and
400*f4a2713aSLionel Sambucclass methods, respectively).
401*f4a2713aSLionel Sambuc
402*f4a2713aSLionel SambucAs with identifiers, selectors are represented by numeric values within the AST
403*f4a2713aSLionel Sambucfile.  A separate index maps these numeric selector values to the offset of the
404*f4a2713aSLionel Sambucselector within the on-disk hash table, and will be used when de-serializing an
405*f4a2713aSLionel SambucObjective-C method declaration (or other Objective-C construct) that refers to
406*f4a2713aSLionel Sambucthe selector.
407*f4a2713aSLionel Sambuc
408*f4a2713aSLionel SambucAST Reader Integration Points
409*f4a2713aSLionel Sambuc-----------------------------
410*f4a2713aSLionel Sambuc
411*f4a2713aSLionel SambucThe "lazy" deserialization behavior of AST files requires their integration
412*f4a2713aSLionel Sambucinto several completely different submodules of Clang.  For example, lazily
413*f4a2713aSLionel Sambucdeserializing the declarations during name lookup requires that the name-lookup
414*f4a2713aSLionel Sambucroutines be able to query the AST file to find entities stored there.
415*f4a2713aSLionel Sambuc
416*f4a2713aSLionel SambucFor each Clang data structure that requires direct interaction with the AST
417*f4a2713aSLionel Sambucreader logic, there is an abstract class that provides the interface between
418*f4a2713aSLionel Sambucthe two modules.  The ``ASTReader`` class, which handles the loading of an AST
419*f4a2713aSLionel Sambucfile, inherits from all of these abstract classes to provide lazy
420*f4a2713aSLionel Sambucdeserialization of Clang's data structures.  ``ASTReader`` implements the
421*f4a2713aSLionel Sambucfollowing abstract classes:
422*f4a2713aSLionel Sambuc
423*f4a2713aSLionel Sambuc``ExternalSLocEntrySource``
424*f4a2713aSLionel Sambuc  This abstract interface is associated with the ``SourceManager`` class, and
425*f4a2713aSLionel Sambuc  is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to
426*f4a2713aSLionel Sambuc  load the details of a file, buffer, or macro instantiation.
427*f4a2713aSLionel Sambuc
428*f4a2713aSLionel Sambuc``IdentifierInfoLookup``
429*f4a2713aSLionel Sambuc  This abstract interface is associated with the ``IdentifierTable`` class, and
430*f4a2713aSLionel Sambuc  is used whenever the program source refers to an identifier that has not yet
431*f4a2713aSLionel Sambuc  been seen.  In this case, the AST reader searches for this identifier within
432*f4a2713aSLionel Sambuc  its :ref:`identifier table <pchinternals-ident-table>` to load any top-level
433*f4a2713aSLionel Sambuc  declarations or macros associated with that identifier.
434*f4a2713aSLionel Sambuc
435*f4a2713aSLionel Sambuc``ExternalASTSource``
436*f4a2713aSLionel Sambuc  This abstract interface is associated with the ``ASTContext`` class, and is
437*f4a2713aSLionel Sambuc  used whenever the abstract syntax tree nodes need to loaded from the AST
438*f4a2713aSLionel Sambuc  file.  It provides the ability to de-serialize declarations and types
439*f4a2713aSLionel Sambuc  identified by their numeric values, read the bodies of functions when
440*f4a2713aSLionel Sambuc  required, and read the declarations stored within a declaration context
441*f4a2713aSLionel Sambuc  (either for iteration or for name lookup).
442*f4a2713aSLionel Sambuc
443*f4a2713aSLionel Sambuc``ExternalSemaSource``
444*f4a2713aSLionel Sambuc  This abstract interface is associated with the ``Sema`` class, and is used
445*f4a2713aSLionel Sambuc  whenever semantic analysis needs to read information from the :ref:`global
446*f4a2713aSLionel Sambuc  method pool <pchinternals-method-pool>`.
447*f4a2713aSLionel Sambuc
448*f4a2713aSLionel Sambuc.. _pchinternals-chained:
449*f4a2713aSLionel Sambuc
450*f4a2713aSLionel SambucChained precompiled headers
451*f4a2713aSLionel Sambuc---------------------------
452*f4a2713aSLionel Sambuc
453*f4a2713aSLionel SambucChained precompiled headers were initially intended to improve the performance
454*f4a2713aSLionel Sambucof IDE-centric operations such as syntax highlighting and code completion while
455*f4a2713aSLionel Sambuca particular source file is being edited by the user.  To minimize the amount
456*f4a2713aSLionel Sambucof reparsing required after a change to the file, a form of precompiled header
457*f4a2713aSLionel Sambuc--- called a precompiled *preamble* --- is automatically generated by parsing
458*f4a2713aSLionel Sambucall of the headers in the source file, up to and including the last
459*f4a2713aSLionel Sambuc``#include``.  When only the source file changes (and none of the headers it
460*f4a2713aSLionel Sambucdepends on), reparsing of that source file can use the precompiled preamble and
461*f4a2713aSLionel Sambucstart parsing after the ``#include``\ s, so parsing time is proportional to the
462*f4a2713aSLionel Sambucsize of the source file (rather than all of its includes).  However, the
463*f4a2713aSLionel Sambuccompilation of that translation unit may already use a precompiled header: in
464*f4a2713aSLionel Sambucthis case, Clang will create the precompiled preamble as a chained precompiled
465*f4a2713aSLionel Sambucheader that refers to the original precompiled header.  This drastically
466*f4a2713aSLionel Sambucreduces the time needed to serialize the precompiled preamble for use in
467*f4a2713aSLionel Sambucreparsing.
468*f4a2713aSLionel Sambuc
469*f4a2713aSLionel SambucChained precompiled headers get their name because each precompiled header can
470*f4a2713aSLionel Sambucdepend on one other precompiled header, forming a chain of dependencies.  A
471*f4a2713aSLionel Sambuctranslation unit will then include the precompiled header that starts the chain
472*f4a2713aSLionel Sambuc(i.e., nothing depends on it).  This linearity of dependencies is important for
473*f4a2713aSLionel Sambucthe semantic model of chained precompiled headers, because the most-recent
474*f4a2713aSLionel Sambucprecompiled header can provide information that overrides the information
475*f4a2713aSLionel Sambucprovided by the precompiled headers it depends on, just like a header file
476*f4a2713aSLionel Sambuc``B.h`` that includes another header ``A.h`` can modify the state produced by
477*f4a2713aSLionel Sambucparsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``.
478*f4a2713aSLionel Sambuc
479*f4a2713aSLionel SambucThere are several ways in which chained precompiled headers generalize the AST
480*f4a2713aSLionel Sambucfile model:
481*f4a2713aSLionel Sambuc
482*f4a2713aSLionel SambucNumbering of IDs
483*f4a2713aSLionel Sambuc  Many different kinds of entities --- identifiers, declarations, types, etc.
484*f4a2713aSLionel Sambuc  --- have ID numbers that start at 1 or some other predefined constant and
485*f4a2713aSLionel Sambuc  grow upward.  Each precompiled header records the maximum ID number it has
486*f4a2713aSLionel Sambuc  assigned in each category.  Then, when a new precompiled header is generated
487*f4a2713aSLionel Sambuc  that depends on (chains to) another precompiled header, it will start
488*f4a2713aSLionel Sambuc  counting at the next available ID number.  This way, one can determine, given
489*f4a2713aSLionel Sambuc  an ID number, which AST file actually contains the entity.
490*f4a2713aSLionel Sambuc
491*f4a2713aSLionel SambucName lookup
492*f4a2713aSLionel Sambuc  When writing a chained precompiled header, Clang attempts to write only
493*f4a2713aSLionel Sambuc  information that has changed from the precompiled header on which it is
494*f4a2713aSLionel Sambuc  based.  This changes the lookup algorithm for the various tables, such as the
495*f4a2713aSLionel Sambuc  :ref:`identifier table <pchinternals-ident-table>`: the search starts at the
496*f4a2713aSLionel Sambuc  most-recent precompiled header.  If no entry is found, lookup then proceeds
497*f4a2713aSLionel Sambuc  to the identifier table in the precompiled header it depends on, and so one.
498*f4a2713aSLionel Sambuc  Once a lookup succeeds, that result is considered definitive, overriding any
499*f4a2713aSLionel Sambuc  results from earlier precompiled headers.
500*f4a2713aSLionel Sambuc
501*f4a2713aSLionel SambucUpdate records
502*f4a2713aSLionel Sambuc  There are various ways in which a later precompiled header can modify the
503*f4a2713aSLionel Sambuc  entities described in an earlier precompiled header.  For example, later
504*f4a2713aSLionel Sambuc  precompiled headers can add entries into the various name-lookup tables for
505*f4a2713aSLionel Sambuc  the translation unit or namespaces, or add new categories to an Objective-C
506*f4a2713aSLionel Sambuc  class.  Each of these updates is captured in an "update record" that is
507*f4a2713aSLionel Sambuc  stored in the chained precompiled header file and will be loaded along with
508*f4a2713aSLionel Sambuc  the original entity.
509*f4a2713aSLionel Sambuc
510*f4a2713aSLionel Sambuc.. _pchinternals-modules:
511*f4a2713aSLionel Sambuc
512*f4a2713aSLionel SambucModules
513*f4a2713aSLionel Sambuc-------
514*f4a2713aSLionel Sambuc
515*f4a2713aSLionel SambucModules generalize the chained precompiled header model yet further, from a
516*f4a2713aSLionel Sambuclinear chain of precompiled headers to an arbitrary directed acyclic graph
517*f4a2713aSLionel Sambuc(DAG) of AST files.  All of the same techniques used to make chained
518*f4a2713aSLionel Sambucprecompiled headers work --- ID number, name lookup, update records --- are
519*f4a2713aSLionel Sambucshared with modules.  However, the DAG nature of modules introduce a number of
520*f4a2713aSLionel Sambucadditional complications to the model:
521*f4a2713aSLionel Sambuc
522*f4a2713aSLionel SambucNumbering of IDs
523*f4a2713aSLionel Sambuc  The simple, linear numbering scheme used in chained precompiled headers falls
524*f4a2713aSLionel Sambuc  apart with the module DAG, because different modules may end up with
525*f4a2713aSLionel Sambuc  different numbering schemes for entities they imported from common shared
526*f4a2713aSLionel Sambuc  modules.  To account for this, each module file provides information about
527*f4a2713aSLionel Sambuc  which modules it depends on and which ID numbers it assigned to the entities
528*f4a2713aSLionel Sambuc  in those modules, as well as which ID numbers it took for its own new
529*f4a2713aSLionel Sambuc  entities.  The AST reader then maps these "local" ID numbers into a "global"
530*f4a2713aSLionel Sambuc  ID number space for the current translation unit, providing a 1-1 mapping
531*f4a2713aSLionel Sambuc  between entities (in whatever AST file they inhabit) and global ID numbers.
532*f4a2713aSLionel Sambuc  If that translation unit is then serialized into an AST file, this mapping
533*f4a2713aSLionel Sambuc  will be stored for use when the AST file is imported.
534*f4a2713aSLionel Sambuc
535*f4a2713aSLionel SambucDeclaration merging
536*f4a2713aSLionel Sambuc  It is possible for a given entity (from the language's perspective) to be
537*f4a2713aSLionel Sambuc  declared multiple times in different places.  For example, two different
538*f4a2713aSLionel Sambuc  headers can have the declaration of ``printf`` or could forward-declare
539*f4a2713aSLionel Sambuc  ``struct stat``.  If each of those headers is included in a module, and some
540*f4a2713aSLionel Sambuc  third party imports both of those modules, there is a potentially serious
541*f4a2713aSLionel Sambuc  problem: name lookup for ``printf`` or ``struct stat`` will find both
542*f4a2713aSLionel Sambuc  declarations, but the AST nodes are unrelated.  This would result in a
543*f4a2713aSLionel Sambuc  compilation error, due to an ambiguity in name lookup.  Therefore, the AST
544*f4a2713aSLionel Sambuc  reader performs declaration merging according to the appropriate language
545*f4a2713aSLionel Sambuc  semantics, ensuring that the two disjoint declarations are merged into a
546*f4a2713aSLionel Sambuc  single redeclaration chain (with a common canonical declaration), so that it
547*f4a2713aSLionel Sambuc  is as if one of the headers had been included before the other.
548*f4a2713aSLionel Sambuc
549*f4a2713aSLionel SambucName Visibility
550*f4a2713aSLionel Sambuc  Modules allow certain names that occur during module creation to be "hidden",
551*f4a2713aSLionel Sambuc  so that they are not part of the public interface of the module and are not
552*f4a2713aSLionel Sambuc  visible to its clients.  The AST reader maintains a "visible" bit on various
553*f4a2713aSLionel Sambuc  AST nodes (declarations, macros, etc.) to indicate whether that particular
554*f4a2713aSLionel Sambuc  AST node is currently visible; the various name lookup mechanisms in Clang
555*f4a2713aSLionel Sambuc  inspect the visible bit to determine whether that entity, which is still in
556*f4a2713aSLionel Sambuc  the AST (because other, visible AST nodes may depend on it), can actually be
557*f4a2713aSLionel Sambuc  found by name lookup.  When a new (sub)module is imported, it may make
558*f4a2713aSLionel Sambuc  existing, non-visible, already-deserialized AST nodes visible; it is the
559*f4a2713aSLionel Sambuc  responsibility of the AST reader to find and update these AST nodes when it
560*f4a2713aSLionel Sambuc  is notified of the import.
561*f4a2713aSLionel Sambuc
562