1*f4a2713aSLionel Sambuc======================================== 2*f4a2713aSLionel SambucPrecompiled Header and Modules Internals 3*f4a2713aSLionel Sambuc======================================== 4*f4a2713aSLionel Sambuc 5*f4a2713aSLionel Sambuc.. contents:: 6*f4a2713aSLionel Sambuc :local: 7*f4a2713aSLionel Sambuc 8*f4a2713aSLionel SambucThis document describes the design and implementation of Clang's precompiled 9*f4a2713aSLionel Sambucheaders (PCH) and modules. If you are interested in the end-user view, please 10*f4a2713aSLionel Sambucsee the :ref:`User's Manual <usersmanual-precompiled-headers>`. 11*f4a2713aSLionel Sambuc 12*f4a2713aSLionel SambucUsing Precompiled Headers with ``clang`` 13*f4a2713aSLionel Sambuc---------------------------------------- 14*f4a2713aSLionel Sambuc 15*f4a2713aSLionel SambucThe Clang compiler frontend, ``clang -cc1``, supports two command line options 16*f4a2713aSLionel Sambucfor generating and using PCH files. 17*f4a2713aSLionel Sambuc 18*f4a2713aSLionel SambucTo generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`: 19*f4a2713aSLionel Sambuc 20*f4a2713aSLionel Sambuc.. code-block:: bash 21*f4a2713aSLionel Sambuc 22*f4a2713aSLionel Sambuc $ clang -cc1 test.h -emit-pch -o test.h.pch 23*f4a2713aSLionel Sambuc 24*f4a2713aSLionel SambucThis option is transparently used by ``clang`` when generating PCH files. The 25*f4a2713aSLionel Sambucresulting PCH file contains the serialized form of the compiler's internal 26*f4a2713aSLionel Sambucrepresentation after it has completed parsing and semantic analysis. The PCH 27*f4a2713aSLionel Sambucfile can then be used as a prefix header with the :option:`-include-pch` 28*f4a2713aSLionel Sambucoption: 29*f4a2713aSLionel Sambuc 30*f4a2713aSLionel Sambuc.. code-block:: bash 31*f4a2713aSLionel Sambuc 32*f4a2713aSLionel Sambuc $ clang -cc1 -include-pch test.h.pch test.c -o test.s 33*f4a2713aSLionel Sambuc 34*f4a2713aSLionel SambucDesign Philosophy 35*f4a2713aSLionel Sambuc----------------- 36*f4a2713aSLionel Sambuc 37*f4a2713aSLionel SambucPrecompiled headers are meant to improve overall compile times for projects, so 38*f4a2713aSLionel Sambucthe design of precompiled headers is entirely driven by performance concerns. 39*f4a2713aSLionel SambucThe use case for precompiled headers is relatively simple: when there is a 40*f4a2713aSLionel Sambuccommon set of headers that is included in nearly every source file in the 41*f4a2713aSLionel Sambucproject, we *precompile* that bundle of headers into a single precompiled 42*f4a2713aSLionel Sambucheader (PCH file). Then, when compiling the source files in the project, we 43*f4a2713aSLionel Sambucload the PCH file first (as a prefix header), which acts as a stand-in for that 44*f4a2713aSLionel Sambucbundle of headers. 45*f4a2713aSLionel Sambuc 46*f4a2713aSLionel SambucA precompiled header implementation improves performance when: 47*f4a2713aSLionel Sambuc 48*f4a2713aSLionel Sambuc* Loading the PCH file is significantly faster than re-parsing the bundle of 49*f4a2713aSLionel Sambuc headers stored within the PCH file. Thus, a precompiled header design 50*f4a2713aSLionel Sambuc attempts to minimize the cost of reading the PCH file. Ideally, this cost 51*f4a2713aSLionel Sambuc should not vary with the size of the precompiled header file. 52*f4a2713aSLionel Sambuc 53*f4a2713aSLionel Sambuc* The cost of generating the PCH file initially is not so large that it 54*f4a2713aSLionel Sambuc counters the per-source-file performance improvement due to eliminating the 55*f4a2713aSLionel Sambuc need to parse the bundled headers in the first place. This is particularly 56*f4a2713aSLionel Sambuc important on multi-core systems, because PCH file generation serializes the 57*f4a2713aSLionel Sambuc build when all compilations require the PCH file to be up-to-date. 58*f4a2713aSLionel Sambuc 59*f4a2713aSLionel SambucModules, as implemented in Clang, use the same mechanisms as precompiled 60*f4a2713aSLionel Sambucheaders to save a serialized AST file (one per module) and use those AST 61*f4a2713aSLionel Sambucmodules. From an implementation standpoint, modules are a generalization of 62*f4a2713aSLionel Sambucprecompiled headers, lifting a number of restrictions placed on precompiled 63*f4a2713aSLionel Sambucheaders. In particular, there can only be one precompiled header and it must 64*f4a2713aSLionel Sambucbe included at the beginning of the translation unit. The extensions to the 65*f4a2713aSLionel SambucAST file format required for modules are discussed in the section on 66*f4a2713aSLionel Sambuc:ref:`modules <pchinternals-modules>`. 67*f4a2713aSLionel Sambuc 68*f4a2713aSLionel SambucClang's AST files are designed with a compact on-disk representation, which 69*f4a2713aSLionel Sambucminimizes both creation time and the time required to initially load the AST 70*f4a2713aSLionel Sambucfile. The AST file itself contains a serialized representation of Clang's 71*f4a2713aSLionel Sambucabstract syntax trees and supporting data structures, stored using the same 72*f4a2713aSLionel Sambuccompressed bitstream as `LLVM's bitcode file format 73*f4a2713aSLionel Sambuc<http://llvm.org/docs/BitCodeFormat.html>`_. 74*f4a2713aSLionel Sambuc 75*f4a2713aSLionel SambucClang's AST files are loaded "lazily" from disk. When an AST file is initially 76*f4a2713aSLionel Sambucloaded, Clang reads only a small amount of data from the AST file to establish 77*f4a2713aSLionel Sambucwhere certain important data structures are stored. The amount of data read in 78*f4a2713aSLionel Sambucthis initial load is independent of the size of the AST file, such that a 79*f4a2713aSLionel Sambuclarger AST file does not lead to longer AST load times. The actual header data 80*f4a2713aSLionel Sambucin the AST file --- macros, functions, variables, types, etc. --- is loaded 81*f4a2713aSLionel Sambuconly when it is referenced from the user's code, at which point only that 82*f4a2713aSLionel Sambucentity (and those entities it depends on) are deserialized from the AST file. 83*f4a2713aSLionel SambucWith this approach, the cost of using an AST file for a translation unit is 84*f4a2713aSLionel Sambucproportional to the amount of code actually used from the AST file, rather than 85*f4a2713aSLionel Sambucbeing proportional to the size of the AST file itself. 86*f4a2713aSLionel Sambuc 87*f4a2713aSLionel SambucWhen given the :option:`-print-stats` option, Clang produces statistics 88*f4a2713aSLionel Sambucdescribing how much of the AST file was actually loaded from disk. For a 89*f4a2713aSLionel Sambucsimple "Hello, World!" program that includes the Apple ``Cocoa.h`` header 90*f4a2713aSLionel Sambuc(which is built as a precompiled header), this option illustrates how little of 91*f4a2713aSLionel Sambucthe actual precompiled header is required: 92*f4a2713aSLionel Sambuc 93*f4a2713aSLionel Sambuc.. code-block:: none 94*f4a2713aSLionel Sambuc 95*f4a2713aSLionel Sambuc *** AST File Statistics: 96*f4a2713aSLionel Sambuc 895/39981 source location entries read (2.238563%) 97*f4a2713aSLionel Sambuc 19/15315 types read (0.124061%) 98*f4a2713aSLionel Sambuc 20/82685 declarations read (0.024188%) 99*f4a2713aSLionel Sambuc 154/58070 identifiers read (0.265197%) 100*f4a2713aSLionel Sambuc 0/7260 selectors read (0.000000%) 101*f4a2713aSLionel Sambuc 0/30842 statements read (0.000000%) 102*f4a2713aSLionel Sambuc 4/8400 macros read (0.047619%) 103*f4a2713aSLionel Sambuc 1/4995 lexical declcontexts read (0.020020%) 104*f4a2713aSLionel Sambuc 0/4413 visible declcontexts read (0.000000%) 105*f4a2713aSLionel Sambuc 0/7230 method pool entries read (0.000000%) 106*f4a2713aSLionel Sambuc 0 method pool misses 107*f4a2713aSLionel Sambuc 108*f4a2713aSLionel SambucFor this small program, only a tiny fraction of the source locations, types, 109*f4a2713aSLionel Sambucdeclarations, identifiers, and macros were actually deserialized from the 110*f4a2713aSLionel Sambucprecompiled header. These statistics can be useful to determine whether the 111*f4a2713aSLionel SambucAST file implementation can be improved by making more of the implementation 112*f4a2713aSLionel Sambuclazy. 113*f4a2713aSLionel Sambuc 114*f4a2713aSLionel SambucPrecompiled headers can be chained. When you create a PCH while including an 115*f4a2713aSLionel Sambucexisting PCH, Clang can create the new PCH by referencing the original file and 116*f4a2713aSLionel Sambuconly writing the new data to the new file. For example, you could create a PCH 117*f4a2713aSLionel Sambucout of all the headers that are very commonly used throughout your project, and 118*f4a2713aSLionel Sambucthen create a PCH for every single source file in the project that includes the 119*f4a2713aSLionel Sambuccode that is specific to that file, so that recompiling the file itself is very 120*f4a2713aSLionel Sambucfast, without duplicating the data from the common headers for every file. The 121*f4a2713aSLionel Sambucmechanisms behind chained precompiled headers are discussed in a :ref:`later 122*f4a2713aSLionel Sambucsection <pchinternals-chained>`. 123*f4a2713aSLionel Sambuc 124*f4a2713aSLionel SambucAST File Contents 125*f4a2713aSLionel Sambuc----------------- 126*f4a2713aSLionel Sambuc 127*f4a2713aSLionel SambucClang's AST files are organized into several different blocks, each of which 128*f4a2713aSLionel Sambuccontains the serialized representation of a part of Clang's internal 129*f4a2713aSLionel Sambucrepresentation. Each of the blocks corresponds to either a block or a record 130*f4a2713aSLionel Sambucwithin `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_. 131*f4a2713aSLionel SambucThe contents of each of these logical blocks are described below. 132*f4a2713aSLionel Sambuc 133*f4a2713aSLionel Sambuc.. image:: PCHLayout.png 134*f4a2713aSLionel Sambuc 135*f4a2713aSLionel SambucFor a given AST file, the `llvm-bcanalyzer 136*f4a2713aSLionel Sambuc<http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ utility can be used 137*f4a2713aSLionel Sambucto examine the actual structure of the bitstream for the AST file. This 138*f4a2713aSLionel Sambucinformation can be used both to help understand the structure of the AST file 139*f4a2713aSLionel Sambucand to isolate areas where AST files can still be optimized, e.g., through the 140*f4a2713aSLionel Sambucintroduction of abbreviations. 141*f4a2713aSLionel Sambuc 142*f4a2713aSLionel SambucMetadata Block 143*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^ 144*f4a2713aSLionel Sambuc 145*f4a2713aSLionel SambucThe metadata block contains several records that provide information about how 146*f4a2713aSLionel Sambucthe AST file was built. This metadata is primarily used to validate the use of 147*f4a2713aSLionel Sambucan AST file. For example, a precompiled header built for a 32-bit x86 target 148*f4a2713aSLionel Sambuccannot be used when compiling for a 64-bit x86 target. The metadata block 149*f4a2713aSLionel Sambuccontains information about: 150*f4a2713aSLionel Sambuc 151*f4a2713aSLionel SambucLanguage options 152*f4a2713aSLionel Sambuc Describes the particular language dialect used to compile the AST file, 153*f4a2713aSLionel Sambuc including major options (e.g., Objective-C support) and more minor options 154*f4a2713aSLionel Sambuc (e.g., support for "``//``" comments). The contents of this record correspond to 155*f4a2713aSLionel Sambuc the ``LangOptions`` class. 156*f4a2713aSLionel Sambuc 157*f4a2713aSLionel SambucTarget architecture 158*f4a2713aSLionel Sambuc The target triple that describes the architecture, platform, and ABI for 159*f4a2713aSLionel Sambuc which the AST file was generated, e.g., ``i386-apple-darwin9``. 160*f4a2713aSLionel Sambuc 161*f4a2713aSLionel SambucAST version 162*f4a2713aSLionel Sambuc The major and minor version numbers of the AST file format. Changes in the 163*f4a2713aSLionel Sambuc minor version number should not affect backward compatibility, while changes 164*f4a2713aSLionel Sambuc in the major version number imply that a newer compiler cannot read an older 165*f4a2713aSLionel Sambuc precompiled header (and vice-versa). 166*f4a2713aSLionel Sambuc 167*f4a2713aSLionel SambucOriginal file name 168*f4a2713aSLionel Sambuc The full path of the header that was used to generate the AST file. 169*f4a2713aSLionel Sambuc 170*f4a2713aSLionel SambucPredefines buffer 171*f4a2713aSLionel Sambuc Although not explicitly stored as part of the metadata, the predefines buffer 172*f4a2713aSLionel Sambuc is used in the validation of the AST file. The predefines buffer itself 173*f4a2713aSLionel Sambuc contains code generated by the compiler to initialize the preprocessor state 174*f4a2713aSLionel Sambuc according to the current target, platform, and command-line options. For 175*f4a2713aSLionel Sambuc example, the predefines buffer will contain "``#define __STDC__ 1``" when we 176*f4a2713aSLionel Sambuc are compiling C without Microsoft extensions. The predefines buffer itself 177*f4a2713aSLionel Sambuc is stored within the :ref:`pchinternals-sourcemgr`, but its contents are 178*f4a2713aSLionel Sambuc verified along with the rest of the metadata. 179*f4a2713aSLionel Sambuc 180*f4a2713aSLionel SambucA chained PCH file (that is, one that references another PCH) and a module 181*f4a2713aSLionel Sambuc(which may import other modules) have additional metadata containing the list 182*f4a2713aSLionel Sambucof all AST files that this AST file depends on. Each of those files will be 183*f4a2713aSLionel Sambucloaded along with this AST file. 184*f4a2713aSLionel Sambuc 185*f4a2713aSLionel SambucFor chained precompiled headers, the language options, target architecture and 186*f4a2713aSLionel Sambucpredefines buffer data is taken from the end of the chain, since they have to 187*f4a2713aSLionel Sambucmatch anyway. 188*f4a2713aSLionel Sambuc 189*f4a2713aSLionel Sambuc.. _pchinternals-sourcemgr: 190*f4a2713aSLionel Sambuc 191*f4a2713aSLionel SambucSource Manager Block 192*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^ 193*f4a2713aSLionel Sambuc 194*f4a2713aSLionel SambucThe source manager block contains the serialized representation of Clang's 195*f4a2713aSLionel Sambuc:ref:`SourceManager <SourceManager>` class, which handles the mapping from 196*f4a2713aSLionel Sambucsource locations (as represented in Clang's abstract syntax tree) into actual 197*f4a2713aSLionel Sambuccolumn/line positions within a source file or macro instantiation. The AST 198*f4a2713aSLionel Sambucfile's representation of the source manager also includes information about all 199*f4a2713aSLionel Sambucof the headers that were (transitively) included when building the AST file. 200*f4a2713aSLionel Sambuc 201*f4a2713aSLionel SambucThe bulk of the source manager block is dedicated to information about the 202*f4a2713aSLionel Sambucvarious files, buffers, and macro instantiations into which a source location 203*f4a2713aSLionel Sambuccan refer. Each of these is referenced by a numeric "file ID", which is a 204*f4a2713aSLionel Sambucunique number (allocated starting at 1) stored in the source location. Clang 205*f4a2713aSLionel Sambucserializes the information for each kind of file ID, along with an index that 206*f4a2713aSLionel Sambucmaps file IDs to the position within the AST file where the information about 207*f4a2713aSLionel Sambucthat file ID is stored. The data associated with a file ID is loaded only when 208*f4a2713aSLionel Sambucrequired by the front end, e.g., to emit a diagnostic that includes a macro 209*f4a2713aSLionel Sambucinstantiation history inside the header itself. 210*f4a2713aSLionel Sambuc 211*f4a2713aSLionel SambucThe source manager block also contains information about all of the headers 212*f4a2713aSLionel Sambucthat were included when building the AST file. This includes information about 213*f4a2713aSLionel Sambucthe controlling macro for the header (e.g., when the preprocessor identified 214*f4a2713aSLionel Sambucthat the contents of the header dependent on a macro like 215*f4a2713aSLionel Sambuc``LLVM_CLANG_SOURCEMANAGER_H``). 216*f4a2713aSLionel Sambuc 217*f4a2713aSLionel Sambuc.. _pchinternals-preprocessor: 218*f4a2713aSLionel Sambuc 219*f4a2713aSLionel SambucPreprocessor Block 220*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^ 221*f4a2713aSLionel Sambuc 222*f4a2713aSLionel SambucThe preprocessor block contains the serialized representation of the 223*f4a2713aSLionel Sambucpreprocessor. Specifically, it contains all of the macros that have been 224*f4a2713aSLionel Sambucdefined by the end of the header used to build the AST file, along with the 225*f4a2713aSLionel Sambuctoken sequences that comprise each macro. The macro definitions are only read 226*f4a2713aSLionel Sambucfrom the AST file when the name of the macro first occurs in the program. This 227*f4a2713aSLionel Sambuclazy loading of macro definitions is triggered by lookups into the 228*f4a2713aSLionel Sambuc:ref:`identifier table <pchinternals-ident-table>`. 229*f4a2713aSLionel Sambuc 230*f4a2713aSLionel Sambuc.. _pchinternals-types: 231*f4a2713aSLionel Sambuc 232*f4a2713aSLionel SambucTypes Block 233*f4a2713aSLionel Sambuc^^^^^^^^^^^ 234*f4a2713aSLionel Sambuc 235*f4a2713aSLionel SambucThe types block contains the serialized representation of all of the types 236*f4a2713aSLionel Sambucreferenced in the translation unit. Each Clang type node (``PointerType``, 237*f4a2713aSLionel Sambuc``FunctionProtoType``, etc.) has a corresponding record type in the AST file. 238*f4a2713aSLionel SambucWhen types are deserialized from the AST file, the data within the record is 239*f4a2713aSLionel Sambucused to reconstruct the appropriate type node using the AST context. 240*f4a2713aSLionel Sambuc 241*f4a2713aSLionel SambucEach type has a unique type ID, which is an integer that uniquely identifies 242*f4a2713aSLionel Sambucthat type. Type ID 0 represents the NULL type, type IDs less than 243*f4a2713aSLionel Sambuc``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.), 244*f4a2713aSLionel Sambucwhile other "user-defined" type IDs are assigned consecutively from 245*f4a2713aSLionel Sambuc``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered. The AST file has 246*f4a2713aSLionel Sambucan associated mapping from the user-defined types block to the location within 247*f4a2713aSLionel Sambucthe types block where the serialized representation of that type resides, 248*f4a2713aSLionel Sambucenabling lazy deserialization of types. When a type is referenced from within 249*f4a2713aSLionel Sambucthe AST file, that reference is encoded using the type ID shifted left by 3 250*f4a2713aSLionel Sambucbits. The lower three bits are used to represent the ``const``, ``volatile``, 251*f4a2713aSLionel Sambucand ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class. 252*f4a2713aSLionel Sambuc 253*f4a2713aSLionel Sambuc.. _pchinternals-decls: 254*f4a2713aSLionel Sambuc 255*f4a2713aSLionel SambucDeclarations Block 256*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^ 257*f4a2713aSLionel Sambuc 258*f4a2713aSLionel SambucThe declarations block contains the serialized representation of all of the 259*f4a2713aSLionel Sambucdeclarations referenced in the translation unit. Each Clang declaration node 260*f4a2713aSLionel Sambuc(``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the 261*f4a2713aSLionel SambucAST file. When declarations are deserialized from the AST file, the data 262*f4a2713aSLionel Sambucwithin the record is used to build and populate a new instance of the 263*f4a2713aSLionel Sambuccorresponding ``Decl`` node. As with types, each declaration node has a 264*f4a2713aSLionel Sambucnumeric ID that is used to refer to that declaration within the AST file. In 265*f4a2713aSLionel Sambucaddition, a lookup table provides a mapping from that numeric ID to the offset 266*f4a2713aSLionel Sambucwithin the precompiled header where that declaration is described. 267*f4a2713aSLionel Sambuc 268*f4a2713aSLionel SambucDeclarations in Clang's abstract syntax trees are stored hierarchically. At 269*f4a2713aSLionel Sambucthe top of the hierarchy is the translation unit (``TranslationUnitDecl``), 270*f4a2713aSLionel Sambucwhich contains all of the declarations in the translation unit but is not 271*f4a2713aSLionel Sambucactually written as a specific declaration node. Its child declarations (such 272*f4a2713aSLionel Sambucas functions or struct types) may also contain other declarations inside them, 273*f4a2713aSLionel Sambucand so on. Within Clang, each declaration is stored within a :ref:`declaration 274*f4a2713aSLionel Sambuccontext <DeclContext>`, as represented by the ``DeclContext`` class. 275*f4a2713aSLionel SambucDeclaration contexts provide the mechanism to perform name lookup within a 276*f4a2713aSLionel Sambucgiven declaration (e.g., find the member named ``x`` in a structure) and 277*f4a2713aSLionel Sambuciterate over the declarations stored within a context (e.g., iterate over all 278*f4a2713aSLionel Sambucof the fields of a structure for structure layout). 279*f4a2713aSLionel Sambuc 280*f4a2713aSLionel SambucIn Clang's AST file format, deserializing a declaration that is a 281*f4a2713aSLionel Sambuc``DeclContext`` is a separate operation from deserializing all of the 282*f4a2713aSLionel Sambucdeclarations stored within that declaration context. Therefore, Clang will 283*f4a2713aSLionel Sambucdeserialize the translation unit declaration without deserializing the 284*f4a2713aSLionel Sambucdeclarations within that translation unit. When required, the declarations 285*f4a2713aSLionel Sambucstored within a declaration context will be deserialized. There are two 286*f4a2713aSLionel Sambucrepresentations of the declarations within a declaration context, which 287*f4a2713aSLionel Sambuccorrespond to the name-lookup and iteration behavior described above: 288*f4a2713aSLionel Sambuc 289*f4a2713aSLionel Sambuc* When the front end performs name lookup to find a name ``x`` within a given 290*f4a2713aSLionel Sambuc declaration context (for example, during semantic analysis of the expression 291*f4a2713aSLionel Sambuc ``p->x``, where ``p``'s type is defined in the precompiled header), Clang 292*f4a2713aSLionel Sambuc refers to an on-disk hash table that maps from the names within that 293*f4a2713aSLionel Sambuc declaration context to the declaration IDs that represent each visible 294*f4a2713aSLionel Sambuc declaration with that name. The actual declarations will then be 295*f4a2713aSLionel Sambuc deserialized to provide the results of name lookup. 296*f4a2713aSLionel Sambuc* When the front end performs iteration over all of the declarations within a 297*f4a2713aSLionel Sambuc declaration context, all of those declarations are immediately 298*f4a2713aSLionel Sambuc de-serialized. For large declaration contexts (e.g., the translation unit), 299*f4a2713aSLionel Sambuc this operation is expensive; however, large declaration contexts are not 300*f4a2713aSLionel Sambuc traversed in normal compilation, since such a traversal is unnecessary. 301*f4a2713aSLionel Sambuc However, it is common for the code generator and semantic analysis to 302*f4a2713aSLionel Sambuc traverse declaration contexts for structs, classes, unions, and 303*f4a2713aSLionel Sambuc enumerations, although those contexts contain relatively few declarations in 304*f4a2713aSLionel Sambuc the common case. 305*f4a2713aSLionel Sambuc 306*f4a2713aSLionel SambucStatements and Expressions 307*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^ 308*f4a2713aSLionel Sambuc 309*f4a2713aSLionel SambucStatements and expressions are stored in the AST file in both the :ref:`types 310*f4a2713aSLionel Sambuc<pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks, 311*f4a2713aSLionel Sambucbecause every statement or expression will be associated with either a type or 312*f4a2713aSLionel Sambucdeclaration. The actual statement and expression records are stored 313*f4a2713aSLionel Sambucimmediately following the declaration or type that owns the statement or 314*f4a2713aSLionel Sambucexpression. For example, the statement representing the body of a function 315*f4a2713aSLionel Sambucwill be stored directly following the declaration of the function. 316*f4a2713aSLionel Sambuc 317*f4a2713aSLionel SambucAs with types and declarations, each statement and expression kind in Clang's 318*f4a2713aSLionel Sambucabstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding 319*f4a2713aSLionel Sambucrecord type in the AST file, which contains the serialized representation of 320*f4a2713aSLionel Sambucthat statement or expression. Each substatement or subexpression within an 321*f4a2713aSLionel Sambucexpression is stored as a separate record (which keeps most records to a fixed 322*f4a2713aSLionel Sambucsize). Within the AST file, the subexpressions of an expression are stored, in 323*f4a2713aSLionel Sambucreverse order, prior to the expression that owns those expression, using a form 324*f4a2713aSLionel Sambucof `Reverse Polish Notation 325*f4a2713aSLionel Sambuc<http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_. For example, an 326*f4a2713aSLionel Sambucexpression ``3 - 4 + 5`` would be represented as follows: 327*f4a2713aSLionel Sambuc 328*f4a2713aSLionel Sambuc+-----------------------+ 329*f4a2713aSLionel Sambuc| ``IntegerLiteral(5)`` | 330*f4a2713aSLionel Sambuc+-----------------------+ 331*f4a2713aSLionel Sambuc| ``IntegerLiteral(4)`` | 332*f4a2713aSLionel Sambuc+-----------------------+ 333*f4a2713aSLionel Sambuc| ``IntegerLiteral(3)`` | 334*f4a2713aSLionel Sambuc+-----------------------+ 335*f4a2713aSLionel Sambuc| ``IntegerLiteral(-)`` | 336*f4a2713aSLionel Sambuc+-----------------------+ 337*f4a2713aSLionel Sambuc| ``IntegerLiteral(+)`` | 338*f4a2713aSLionel Sambuc+-----------------------+ 339*f4a2713aSLionel Sambuc| ``STOP`` | 340*f4a2713aSLionel Sambuc+-----------------------+ 341*f4a2713aSLionel Sambuc 342*f4a2713aSLionel SambucWhen reading this representation, Clang evaluates each expression record it 343*f4a2713aSLionel Sambucencounters, builds the appropriate abstract syntax tree node, and then pushes 344*f4a2713aSLionel Sambucthat expression on to a stack. When a record contains *N* subexpressions --- 345*f4a2713aSLionel Sambuc``BinaryOperator`` has two of them --- those expressions are popped from the 346*f4a2713aSLionel Sambuctop of the stack. The special STOP code indicates that we have reached the end 347*f4a2713aSLionel Sambucof a serialized expression or statement; other expression or statement records 348*f4a2713aSLionel Sambucmay follow, but they are part of a different expression. 349*f4a2713aSLionel Sambuc 350*f4a2713aSLionel Sambuc.. _pchinternals-ident-table: 351*f4a2713aSLionel Sambuc 352*f4a2713aSLionel SambucIdentifier Table Block 353*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^ 354*f4a2713aSLionel Sambuc 355*f4a2713aSLionel SambucThe identifier table block contains an on-disk hash table that maps each 356*f4a2713aSLionel Sambucidentifier mentioned within the AST file to the serialized representation of 357*f4a2713aSLionel Sambucthe identifier's information (e.g, the ``IdentifierInfo`` structure). The 358*f4a2713aSLionel Sambucserialized representation contains: 359*f4a2713aSLionel Sambuc 360*f4a2713aSLionel Sambuc* The actual identifier string. 361*f4a2713aSLionel Sambuc* Flags that describe whether this identifier is the name of a built-in, a 362*f4a2713aSLionel Sambuc poisoned identifier, an extension token, or a macro. 363*f4a2713aSLionel Sambuc* If the identifier names a macro, the offset of the macro definition within 364*f4a2713aSLionel Sambuc the :ref:`pchinternals-preprocessor`. 365*f4a2713aSLionel Sambuc* If the identifier names one or more declarations visible from translation 366*f4a2713aSLionel Sambuc unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these 367*f4a2713aSLionel Sambuc declarations. 368*f4a2713aSLionel Sambuc 369*f4a2713aSLionel SambucWhen an AST file is loaded, the AST file reader mechanism introduces itself 370*f4a2713aSLionel Sambucinto the identifier table as an external lookup source. Thus, when the user 371*f4a2713aSLionel Sambucprogram refers to an identifier that has not yet been seen, Clang will perform 372*f4a2713aSLionel Sambuca lookup into the identifier table. If an identifier is found, its contents 373*f4a2713aSLionel Sambuc(macro definitions, flags, top-level declarations, etc.) will be deserialized, 374*f4a2713aSLionel Sambucat which point the corresponding ``IdentifierInfo`` structure will have the 375*f4a2713aSLionel Sambucsame contents it would have after parsing the headers in the AST file. 376*f4a2713aSLionel Sambuc 377*f4a2713aSLionel SambucWithin the AST file, the identifiers used to name declarations are represented 378*f4a2713aSLionel Sambucwith an integral value. A separate table provides a mapping from this integral 379*f4a2713aSLionel Sambucvalue (the identifier ID) to the location within the on-disk hash table where 380*f4a2713aSLionel Sambucthat identifier is stored. This mapping is used when deserializing the name of 381*f4a2713aSLionel Sambuca declaration, the identifier of a token, or any other construct in the AST 382*f4a2713aSLionel Sambucfile that refers to a name. 383*f4a2713aSLionel Sambuc 384*f4a2713aSLionel Sambuc.. _pchinternals-method-pool: 385*f4a2713aSLionel Sambuc 386*f4a2713aSLionel SambucMethod Pool Block 387*f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^ 388*f4a2713aSLionel Sambuc 389*f4a2713aSLionel SambucThe method pool block is represented as an on-disk hash table that serves two 390*f4a2713aSLionel Sambucpurposes: it provides a mapping from the names of Objective-C selectors to the 391*f4a2713aSLionel Sambucset of Objective-C instance and class methods that have that particular 392*f4a2713aSLionel Sambucselector (which is required for semantic analysis in Objective-C) and also 393*f4a2713aSLionel Sambucstores all of the selectors used by entities within the AST file. The design 394*f4a2713aSLionel Sambucof the method pool is similar to that of the :ref:`identifier table 395*f4a2713aSLionel Sambuc<pchinternals-ident-table>`: the first time a particular selector is formed 396*f4a2713aSLionel Sambucduring the compilation of the program, Clang will search in the on-disk hash 397*f4a2713aSLionel Sambuctable of selectors; if found, Clang will read the Objective-C methods 398*f4a2713aSLionel Sambucassociated with that selector into the appropriate front-end data structure 399*f4a2713aSLionel Sambuc(``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and 400*f4a2713aSLionel Sambucclass methods, respectively). 401*f4a2713aSLionel Sambuc 402*f4a2713aSLionel SambucAs with identifiers, selectors are represented by numeric values within the AST 403*f4a2713aSLionel Sambucfile. A separate index maps these numeric selector values to the offset of the 404*f4a2713aSLionel Sambucselector within the on-disk hash table, and will be used when de-serializing an 405*f4a2713aSLionel SambucObjective-C method declaration (or other Objective-C construct) that refers to 406*f4a2713aSLionel Sambucthe selector. 407*f4a2713aSLionel Sambuc 408*f4a2713aSLionel SambucAST Reader Integration Points 409*f4a2713aSLionel Sambuc----------------------------- 410*f4a2713aSLionel Sambuc 411*f4a2713aSLionel SambucThe "lazy" deserialization behavior of AST files requires their integration 412*f4a2713aSLionel Sambucinto several completely different submodules of Clang. For example, lazily 413*f4a2713aSLionel Sambucdeserializing the declarations during name lookup requires that the name-lookup 414*f4a2713aSLionel Sambucroutines be able to query the AST file to find entities stored there. 415*f4a2713aSLionel Sambuc 416*f4a2713aSLionel SambucFor each Clang data structure that requires direct interaction with the AST 417*f4a2713aSLionel Sambucreader logic, there is an abstract class that provides the interface between 418*f4a2713aSLionel Sambucthe two modules. The ``ASTReader`` class, which handles the loading of an AST 419*f4a2713aSLionel Sambucfile, inherits from all of these abstract classes to provide lazy 420*f4a2713aSLionel Sambucdeserialization of Clang's data structures. ``ASTReader`` implements the 421*f4a2713aSLionel Sambucfollowing abstract classes: 422*f4a2713aSLionel Sambuc 423*f4a2713aSLionel Sambuc``ExternalSLocEntrySource`` 424*f4a2713aSLionel Sambuc This abstract interface is associated with the ``SourceManager`` class, and 425*f4a2713aSLionel Sambuc is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to 426*f4a2713aSLionel Sambuc load the details of a file, buffer, or macro instantiation. 427*f4a2713aSLionel Sambuc 428*f4a2713aSLionel Sambuc``IdentifierInfoLookup`` 429*f4a2713aSLionel Sambuc This abstract interface is associated with the ``IdentifierTable`` class, and 430*f4a2713aSLionel Sambuc is used whenever the program source refers to an identifier that has not yet 431*f4a2713aSLionel Sambuc been seen. In this case, the AST reader searches for this identifier within 432*f4a2713aSLionel Sambuc its :ref:`identifier table <pchinternals-ident-table>` to load any top-level 433*f4a2713aSLionel Sambuc declarations or macros associated with that identifier. 434*f4a2713aSLionel Sambuc 435*f4a2713aSLionel Sambuc``ExternalASTSource`` 436*f4a2713aSLionel Sambuc This abstract interface is associated with the ``ASTContext`` class, and is 437*f4a2713aSLionel Sambuc used whenever the abstract syntax tree nodes need to loaded from the AST 438*f4a2713aSLionel Sambuc file. It provides the ability to de-serialize declarations and types 439*f4a2713aSLionel Sambuc identified by their numeric values, read the bodies of functions when 440*f4a2713aSLionel Sambuc required, and read the declarations stored within a declaration context 441*f4a2713aSLionel Sambuc (either for iteration or for name lookup). 442*f4a2713aSLionel Sambuc 443*f4a2713aSLionel Sambuc``ExternalSemaSource`` 444*f4a2713aSLionel Sambuc This abstract interface is associated with the ``Sema`` class, and is used 445*f4a2713aSLionel Sambuc whenever semantic analysis needs to read information from the :ref:`global 446*f4a2713aSLionel Sambuc method pool <pchinternals-method-pool>`. 447*f4a2713aSLionel Sambuc 448*f4a2713aSLionel Sambuc.. _pchinternals-chained: 449*f4a2713aSLionel Sambuc 450*f4a2713aSLionel SambucChained precompiled headers 451*f4a2713aSLionel Sambuc--------------------------- 452*f4a2713aSLionel Sambuc 453*f4a2713aSLionel SambucChained precompiled headers were initially intended to improve the performance 454*f4a2713aSLionel Sambucof IDE-centric operations such as syntax highlighting and code completion while 455*f4a2713aSLionel Sambuca particular source file is being edited by the user. To minimize the amount 456*f4a2713aSLionel Sambucof reparsing required after a change to the file, a form of precompiled header 457*f4a2713aSLionel Sambuc--- called a precompiled *preamble* --- is automatically generated by parsing 458*f4a2713aSLionel Sambucall of the headers in the source file, up to and including the last 459*f4a2713aSLionel Sambuc``#include``. When only the source file changes (and none of the headers it 460*f4a2713aSLionel Sambucdepends on), reparsing of that source file can use the precompiled preamble and 461*f4a2713aSLionel Sambucstart parsing after the ``#include``\ s, so parsing time is proportional to the 462*f4a2713aSLionel Sambucsize of the source file (rather than all of its includes). However, the 463*f4a2713aSLionel Sambuccompilation of that translation unit may already use a precompiled header: in 464*f4a2713aSLionel Sambucthis case, Clang will create the precompiled preamble as a chained precompiled 465*f4a2713aSLionel Sambucheader that refers to the original precompiled header. This drastically 466*f4a2713aSLionel Sambucreduces the time needed to serialize the precompiled preamble for use in 467*f4a2713aSLionel Sambucreparsing. 468*f4a2713aSLionel Sambuc 469*f4a2713aSLionel SambucChained precompiled headers get their name because each precompiled header can 470*f4a2713aSLionel Sambucdepend on one other precompiled header, forming a chain of dependencies. A 471*f4a2713aSLionel Sambuctranslation unit will then include the precompiled header that starts the chain 472*f4a2713aSLionel Sambuc(i.e., nothing depends on it). This linearity of dependencies is important for 473*f4a2713aSLionel Sambucthe semantic model of chained precompiled headers, because the most-recent 474*f4a2713aSLionel Sambucprecompiled header can provide information that overrides the information 475*f4a2713aSLionel Sambucprovided by the precompiled headers it depends on, just like a header file 476*f4a2713aSLionel Sambuc``B.h`` that includes another header ``A.h`` can modify the state produced by 477*f4a2713aSLionel Sambucparsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``. 478*f4a2713aSLionel Sambuc 479*f4a2713aSLionel SambucThere are several ways in which chained precompiled headers generalize the AST 480*f4a2713aSLionel Sambucfile model: 481*f4a2713aSLionel Sambuc 482*f4a2713aSLionel SambucNumbering of IDs 483*f4a2713aSLionel Sambuc Many different kinds of entities --- identifiers, declarations, types, etc. 484*f4a2713aSLionel Sambuc --- have ID numbers that start at 1 or some other predefined constant and 485*f4a2713aSLionel Sambuc grow upward. Each precompiled header records the maximum ID number it has 486*f4a2713aSLionel Sambuc assigned in each category. Then, when a new precompiled header is generated 487*f4a2713aSLionel Sambuc that depends on (chains to) another precompiled header, it will start 488*f4a2713aSLionel Sambuc counting at the next available ID number. This way, one can determine, given 489*f4a2713aSLionel Sambuc an ID number, which AST file actually contains the entity. 490*f4a2713aSLionel Sambuc 491*f4a2713aSLionel SambucName lookup 492*f4a2713aSLionel Sambuc When writing a chained precompiled header, Clang attempts to write only 493*f4a2713aSLionel Sambuc information that has changed from the precompiled header on which it is 494*f4a2713aSLionel Sambuc based. This changes the lookup algorithm for the various tables, such as the 495*f4a2713aSLionel Sambuc :ref:`identifier table <pchinternals-ident-table>`: the search starts at the 496*f4a2713aSLionel Sambuc most-recent precompiled header. If no entry is found, lookup then proceeds 497*f4a2713aSLionel Sambuc to the identifier table in the precompiled header it depends on, and so one. 498*f4a2713aSLionel Sambuc Once a lookup succeeds, that result is considered definitive, overriding any 499*f4a2713aSLionel Sambuc results from earlier precompiled headers. 500*f4a2713aSLionel Sambuc 501*f4a2713aSLionel SambucUpdate records 502*f4a2713aSLionel Sambuc There are various ways in which a later precompiled header can modify the 503*f4a2713aSLionel Sambuc entities described in an earlier precompiled header. For example, later 504*f4a2713aSLionel Sambuc precompiled headers can add entries into the various name-lookup tables for 505*f4a2713aSLionel Sambuc the translation unit or namespaces, or add new categories to an Objective-C 506*f4a2713aSLionel Sambuc class. Each of these updates is captured in an "update record" that is 507*f4a2713aSLionel Sambuc stored in the chained precompiled header file and will be loaded along with 508*f4a2713aSLionel Sambuc the original entity. 509*f4a2713aSLionel Sambuc 510*f4a2713aSLionel Sambuc.. _pchinternals-modules: 511*f4a2713aSLionel Sambuc 512*f4a2713aSLionel SambucModules 513*f4a2713aSLionel Sambuc------- 514*f4a2713aSLionel Sambuc 515*f4a2713aSLionel SambucModules generalize the chained precompiled header model yet further, from a 516*f4a2713aSLionel Sambuclinear chain of precompiled headers to an arbitrary directed acyclic graph 517*f4a2713aSLionel Sambuc(DAG) of AST files. All of the same techniques used to make chained 518*f4a2713aSLionel Sambucprecompiled headers work --- ID number, name lookup, update records --- are 519*f4a2713aSLionel Sambucshared with modules. However, the DAG nature of modules introduce a number of 520*f4a2713aSLionel Sambucadditional complications to the model: 521*f4a2713aSLionel Sambuc 522*f4a2713aSLionel SambucNumbering of IDs 523*f4a2713aSLionel Sambuc The simple, linear numbering scheme used in chained precompiled headers falls 524*f4a2713aSLionel Sambuc apart with the module DAG, because different modules may end up with 525*f4a2713aSLionel Sambuc different numbering schemes for entities they imported from common shared 526*f4a2713aSLionel Sambuc modules. To account for this, each module file provides information about 527*f4a2713aSLionel Sambuc which modules it depends on and which ID numbers it assigned to the entities 528*f4a2713aSLionel Sambuc in those modules, as well as which ID numbers it took for its own new 529*f4a2713aSLionel Sambuc entities. The AST reader then maps these "local" ID numbers into a "global" 530*f4a2713aSLionel Sambuc ID number space for the current translation unit, providing a 1-1 mapping 531*f4a2713aSLionel Sambuc between entities (in whatever AST file they inhabit) and global ID numbers. 532*f4a2713aSLionel Sambuc If that translation unit is then serialized into an AST file, this mapping 533*f4a2713aSLionel Sambuc will be stored for use when the AST file is imported. 534*f4a2713aSLionel Sambuc 535*f4a2713aSLionel SambucDeclaration merging 536*f4a2713aSLionel Sambuc It is possible for a given entity (from the language's perspective) to be 537*f4a2713aSLionel Sambuc declared multiple times in different places. For example, two different 538*f4a2713aSLionel Sambuc headers can have the declaration of ``printf`` or could forward-declare 539*f4a2713aSLionel Sambuc ``struct stat``. If each of those headers is included in a module, and some 540*f4a2713aSLionel Sambuc third party imports both of those modules, there is a potentially serious 541*f4a2713aSLionel Sambuc problem: name lookup for ``printf`` or ``struct stat`` will find both 542*f4a2713aSLionel Sambuc declarations, but the AST nodes are unrelated. This would result in a 543*f4a2713aSLionel Sambuc compilation error, due to an ambiguity in name lookup. Therefore, the AST 544*f4a2713aSLionel Sambuc reader performs declaration merging according to the appropriate language 545*f4a2713aSLionel Sambuc semantics, ensuring that the two disjoint declarations are merged into a 546*f4a2713aSLionel Sambuc single redeclaration chain (with a common canonical declaration), so that it 547*f4a2713aSLionel Sambuc is as if one of the headers had been included before the other. 548*f4a2713aSLionel Sambuc 549*f4a2713aSLionel SambucName Visibility 550*f4a2713aSLionel Sambuc Modules allow certain names that occur during module creation to be "hidden", 551*f4a2713aSLionel Sambuc so that they are not part of the public interface of the module and are not 552*f4a2713aSLionel Sambuc visible to its clients. The AST reader maintains a "visible" bit on various 553*f4a2713aSLionel Sambuc AST nodes (declarations, macros, etc.) to indicate whether that particular 554*f4a2713aSLionel Sambuc AST node is currently visible; the various name lookup mechanisms in Clang 555*f4a2713aSLionel Sambuc inspect the visible bit to determine whether that entity, which is still in 556*f4a2713aSLionel Sambuc the AST (because other, visible AST nodes may depend on it), can actually be 557*f4a2713aSLionel Sambuc found by name lookup. When a new (sub)module is imported, it may make 558*f4a2713aSLionel Sambuc existing, non-visible, already-deserialized AST nodes visible; it is the 559*f4a2713aSLionel Sambuc responsibility of the AST reader to find and update these AST nodes when it 560*f4a2713aSLionel Sambuc is notified of the import. 561*f4a2713aSLionel Sambuc 562