1*f4a2713aSLionel Sambuc============================= 2*f4a2713aSLionel SambucIntroduction to the Clang AST 3*f4a2713aSLionel Sambuc============================= 4*f4a2713aSLionel Sambuc 5*f4a2713aSLionel SambucThis document gives a gentle introduction to the mysteries of the Clang 6*f4a2713aSLionel SambucAST. It is targeted at developers who either want to contribute to 7*f4a2713aSLionel SambucClang, or use tools that work based on Clang's AST, like the AST 8*f4a2713aSLionel Sambucmatchers. 9*f4a2713aSLionel Sambuc 10*f4a2713aSLionel Sambuc.. raw:: html 11*f4a2713aSLionel Sambuc 12*f4a2713aSLionel Sambuc <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center> 13*f4a2713aSLionel Sambuc 14*f4a2713aSLionel Sambuc`Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_ 15*f4a2713aSLionel Sambuc 16*f4a2713aSLionel SambucIntroduction 17*f4a2713aSLionel Sambuc============ 18*f4a2713aSLionel Sambuc 19*f4a2713aSLionel SambucClang's AST is different from ASTs produced by some other compilers in 20*f4a2713aSLionel Sambucthat it closely resembles both the written C++ code and the C++ 21*f4a2713aSLionel Sambucstandard. For example, parenthesis expressions and compile time 22*f4a2713aSLionel Sambucconstants are available in an unreduced form in the AST. This makes 23*f4a2713aSLionel SambucClang's AST a good fit for refactoring tools. 24*f4a2713aSLionel Sambuc 25*f4a2713aSLionel SambucDocumentation for all Clang AST nodes is available via the generated 26*f4a2713aSLionel Sambuc`Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online 27*f4a2713aSLionel Sambucdocumentation is also indexed by your favorite search engine, which will 28*f4a2713aSLionel Sambucmake a search for clang and the AST node's class name usually turn up 29*f4a2713aSLionel Sambucthe doxygen of the class you're looking for (for example, search for: 30*f4a2713aSLionel Sambucclang ParenExpr). 31*f4a2713aSLionel Sambuc 32*f4a2713aSLionel SambucExamining the AST 33*f4a2713aSLionel Sambuc================= 34*f4a2713aSLionel Sambuc 35*f4a2713aSLionel SambucA good way to familarize yourself with the Clang AST is to actually look 36*f4a2713aSLionel Sambucat it on some simple example code. Clang has a builtin AST-dump mode, 37*f4a2713aSLionel Sambucwhich can be enabled with the flag ``-ast-dump``. 38*f4a2713aSLionel Sambuc 39*f4a2713aSLionel SambucLet's look at a simple example AST: 40*f4a2713aSLionel Sambuc 41*f4a2713aSLionel Sambuc:: 42*f4a2713aSLionel Sambuc 43*f4a2713aSLionel Sambuc $ cat test.cc 44*f4a2713aSLionel Sambuc int f(int x) { 45*f4a2713aSLionel Sambuc int result = (x / 42); 46*f4a2713aSLionel Sambuc return result; 47*f4a2713aSLionel Sambuc } 48*f4a2713aSLionel Sambuc 49*f4a2713aSLionel Sambuc # Clang by default is a frontend for many tools; -Xclang is used to pass 50*f4a2713aSLionel Sambuc # options directly to the C++ frontend. 51*f4a2713aSLionel Sambuc $ clang -Xclang -ast-dump -fsyntax-only test.cc 52*f4a2713aSLionel Sambuc TranslationUnitDecl 0x5aea0d0 <<invalid sloc>> 53*f4a2713aSLionel Sambuc ... cutting out internal declarations of clang ... 54*f4a2713aSLionel Sambuc `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)' 55*f4a2713aSLionel Sambuc |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int' 56*f4a2713aSLionel Sambuc `-CompoundStmt 0x5aead88 <col:14, line:4:1> 57*f4a2713aSLionel Sambuc |-DeclStmt 0x5aead10 <line:2:3, col:24> 58*f4a2713aSLionel Sambuc | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int' 59*f4a2713aSLionel Sambuc | `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int' 60*f4a2713aSLionel Sambuc | `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/' 61*f4a2713aSLionel Sambuc | |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue> 62*f4a2713aSLionel Sambuc | | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int' 63*f4a2713aSLionel Sambuc | `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42 64*f4a2713aSLionel Sambuc `-ReturnStmt 0x5aead68 <line:3:3, col:10> 65*f4a2713aSLionel Sambuc `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue> 66*f4a2713aSLionel Sambuc `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int' 67*f4a2713aSLionel Sambuc 68*f4a2713aSLionel SambucThe toplevel declaration in 69*f4a2713aSLionel Sambuca translation unit is always the `translation unit 70*f4a2713aSLionel Sambucdeclaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_. 71*f4a2713aSLionel SambucIn this example, our first user written declaration is the `function 72*f4a2713aSLionel Sambucdeclaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_ 73*f4a2713aSLionel Sambucof "``f``". The body of "``f``" is a `compound 74*f4a2713aSLionel Sambucstatement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_, 75*f4a2713aSLionel Sambucwhose child nodes are a `declaration 76*f4a2713aSLionel Sambucstatement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_ 77*f4a2713aSLionel Sambucthat declares our result variable, and the `return 78*f4a2713aSLionel Sambucstatement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_. 79*f4a2713aSLionel Sambuc 80*f4a2713aSLionel SambucAST Context 81*f4a2713aSLionel Sambuc=========== 82*f4a2713aSLionel Sambuc 83*f4a2713aSLionel SambucAll information about the AST for a translation unit is bundled up in 84*f4a2713aSLionel Sambucthe class 85*f4a2713aSLionel Sambuc`ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_. 86*f4a2713aSLionel SambucIt allows traversal of the whole translation unit starting from 87*f4a2713aSLionel Sambuc`getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_, 88*f4a2713aSLionel Sambucor to access Clang's `table of 89*f4a2713aSLionel Sambucidentifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_ 90*f4a2713aSLionel Sambucfor the parsed translation unit. 91*f4a2713aSLionel Sambuc 92*f4a2713aSLionel SambucAST Nodes 93*f4a2713aSLionel Sambuc========= 94*f4a2713aSLionel Sambuc 95*f4a2713aSLionel SambucClang's AST nodes are modeled on a class hierarchy that does not have a 96*f4a2713aSLionel Sambuccommon ancestor. Instead, there are multiple larger hierarchies for 97*f4a2713aSLionel Sambucbasic node types like 98*f4a2713aSLionel Sambuc`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and 99*f4a2713aSLionel Sambuc`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many 100*f4a2713aSLionel Sambucimportant AST nodes derive from 101*f4a2713aSLionel Sambuc`Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_, 102*f4a2713aSLionel Sambuc`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_, 103*f4a2713aSLionel Sambuc`DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_ 104*f4a2713aSLionel Sambucor `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with 105*f4a2713aSLionel Sambucsome classes deriving from both Decl and DeclContext. 106*f4a2713aSLionel Sambuc 107*f4a2713aSLionel SambucThere are also a multitude of nodes in the AST that are not part of a 108*f4a2713aSLionel Sambuclarger hierarchy, and are only reachable from specific other nodes, like 109*f4a2713aSLionel Sambuc`CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_. 110*f4a2713aSLionel Sambuc 111*f4a2713aSLionel SambucThus, to traverse the full AST, one starts from the 112*f4a2713aSLionel Sambuc`TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_ 113*f4a2713aSLionel Sambucand then recursively traverses everything that can be reached from that 114*f4a2713aSLionel Sambucnode - this information has to be encoded for each specific node type. 115*f4a2713aSLionel SambucThis algorithm is encoded in the 116*f4a2713aSLionel Sambuc`RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_. 117*f4a2713aSLionel SambucSee the `RecursiveASTVisitor 118*f4a2713aSLionel Sambuctutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_. 119*f4a2713aSLionel Sambuc 120*f4a2713aSLionel SambucThe two most basic nodes in the Clang AST are statements 121*f4a2713aSLionel Sambuc(`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and 122*f4a2713aSLionel Sambucdeclarations 123*f4a2713aSLionel Sambuc(`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note 124*f4a2713aSLionel Sambucthat expressions 125*f4a2713aSLionel Sambuc(`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are 126*f4a2713aSLionel Sambucalso statements in Clang's AST. 127