xref: /minix3/external/bsd/llvm/dist/clang/docs/IntroductionToTheClangAST.rst (revision f4a2713ac843a11c696ec80c0a5e3e5d80b4d338)
1*f4a2713aSLionel Sambuc=============================
2*f4a2713aSLionel SambucIntroduction to the Clang AST
3*f4a2713aSLionel Sambuc=============================
4*f4a2713aSLionel Sambuc
5*f4a2713aSLionel SambucThis document gives a gentle introduction to the mysteries of the Clang
6*f4a2713aSLionel SambucAST. It is targeted at developers who either want to contribute to
7*f4a2713aSLionel SambucClang, or use tools that work based on Clang's AST, like the AST
8*f4a2713aSLionel Sambucmatchers.
9*f4a2713aSLionel Sambuc
10*f4a2713aSLionel Sambuc.. raw:: html
11*f4a2713aSLionel Sambuc
12*f4a2713aSLionel Sambuc  <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center>
13*f4a2713aSLionel Sambuc
14*f4a2713aSLionel Sambuc`Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_
15*f4a2713aSLionel Sambuc
16*f4a2713aSLionel SambucIntroduction
17*f4a2713aSLionel Sambuc============
18*f4a2713aSLionel Sambuc
19*f4a2713aSLionel SambucClang's AST is different from ASTs produced by some other compilers in
20*f4a2713aSLionel Sambucthat it closely resembles both the written C++ code and the C++
21*f4a2713aSLionel Sambucstandard. For example, parenthesis expressions and compile time
22*f4a2713aSLionel Sambucconstants are available in an unreduced form in the AST. This makes
23*f4a2713aSLionel SambucClang's AST a good fit for refactoring tools.
24*f4a2713aSLionel Sambuc
25*f4a2713aSLionel SambucDocumentation for all Clang AST nodes is available via the generated
26*f4a2713aSLionel Sambuc`Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online
27*f4a2713aSLionel Sambucdocumentation is also indexed by your favorite search engine, which will
28*f4a2713aSLionel Sambucmake a search for clang and the AST node's class name usually turn up
29*f4a2713aSLionel Sambucthe doxygen of the class you're looking for (for example, search for:
30*f4a2713aSLionel Sambucclang ParenExpr).
31*f4a2713aSLionel Sambuc
32*f4a2713aSLionel SambucExamining the AST
33*f4a2713aSLionel Sambuc=================
34*f4a2713aSLionel Sambuc
35*f4a2713aSLionel SambucA good way to familarize yourself with the Clang AST is to actually look
36*f4a2713aSLionel Sambucat it on some simple example code. Clang has a builtin AST-dump mode,
37*f4a2713aSLionel Sambucwhich can be enabled with the flag ``-ast-dump``.
38*f4a2713aSLionel Sambuc
39*f4a2713aSLionel SambucLet's look at a simple example AST:
40*f4a2713aSLionel Sambuc
41*f4a2713aSLionel Sambuc::
42*f4a2713aSLionel Sambuc
43*f4a2713aSLionel Sambuc    $ cat test.cc
44*f4a2713aSLionel Sambuc    int f(int x) {
45*f4a2713aSLionel Sambuc      int result = (x / 42);
46*f4a2713aSLionel Sambuc      return result;
47*f4a2713aSLionel Sambuc    }
48*f4a2713aSLionel Sambuc
49*f4a2713aSLionel Sambuc    # Clang by default is a frontend for many tools; -Xclang is used to pass
50*f4a2713aSLionel Sambuc    # options directly to the C++ frontend.
51*f4a2713aSLionel Sambuc    $ clang -Xclang -ast-dump -fsyntax-only test.cc
52*f4a2713aSLionel Sambuc    TranslationUnitDecl 0x5aea0d0 <<invalid sloc>>
53*f4a2713aSLionel Sambuc    ... cutting out internal declarations of clang ...
54*f4a2713aSLionel Sambuc    `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)'
55*f4a2713aSLionel Sambuc      |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int'
56*f4a2713aSLionel Sambuc      `-CompoundStmt 0x5aead88 <col:14, line:4:1>
57*f4a2713aSLionel Sambuc        |-DeclStmt 0x5aead10 <line:2:3, col:24>
58*f4a2713aSLionel Sambuc        | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int'
59*f4a2713aSLionel Sambuc        |   `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int'
60*f4a2713aSLionel Sambuc        |     `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/'
61*f4a2713aSLionel Sambuc        |       |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue>
62*f4a2713aSLionel Sambuc        |       | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int'
63*f4a2713aSLionel Sambuc        |       `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42
64*f4a2713aSLionel Sambuc        `-ReturnStmt 0x5aead68 <line:3:3, col:10>
65*f4a2713aSLionel Sambuc          `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue>
66*f4a2713aSLionel Sambuc            `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int'
67*f4a2713aSLionel Sambuc
68*f4a2713aSLionel SambucThe toplevel declaration in
69*f4a2713aSLionel Sambuca translation unit is always the `translation unit
70*f4a2713aSLionel Sambucdeclaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_.
71*f4a2713aSLionel SambucIn this example, our first user written declaration is the `function
72*f4a2713aSLionel Sambucdeclaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_
73*f4a2713aSLionel Sambucof "``f``". The body of "``f``" is a `compound
74*f4a2713aSLionel Sambucstatement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_,
75*f4a2713aSLionel Sambucwhose child nodes are a `declaration
76*f4a2713aSLionel Sambucstatement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_
77*f4a2713aSLionel Sambucthat declares our result variable, and the `return
78*f4a2713aSLionel Sambucstatement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_.
79*f4a2713aSLionel Sambuc
80*f4a2713aSLionel SambucAST Context
81*f4a2713aSLionel Sambuc===========
82*f4a2713aSLionel Sambuc
83*f4a2713aSLionel SambucAll information about the AST for a translation unit is bundled up in
84*f4a2713aSLionel Sambucthe class
85*f4a2713aSLionel Sambuc`ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_.
86*f4a2713aSLionel SambucIt allows traversal of the whole translation unit starting from
87*f4a2713aSLionel Sambuc`getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_,
88*f4a2713aSLionel Sambucor to access Clang's `table of
89*f4a2713aSLionel Sambucidentifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_
90*f4a2713aSLionel Sambucfor the parsed translation unit.
91*f4a2713aSLionel Sambuc
92*f4a2713aSLionel SambucAST Nodes
93*f4a2713aSLionel Sambuc=========
94*f4a2713aSLionel Sambuc
95*f4a2713aSLionel SambucClang's AST nodes are modeled on a class hierarchy that does not have a
96*f4a2713aSLionel Sambuccommon ancestor. Instead, there are multiple larger hierarchies for
97*f4a2713aSLionel Sambucbasic node types like
98*f4a2713aSLionel Sambuc`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and
99*f4a2713aSLionel Sambuc`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many
100*f4a2713aSLionel Sambucimportant AST nodes derive from
101*f4a2713aSLionel Sambuc`Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_,
102*f4a2713aSLionel Sambuc`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_,
103*f4a2713aSLionel Sambuc`DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
104*f4a2713aSLionel Sambucor `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with
105*f4a2713aSLionel Sambucsome classes deriving from both Decl and DeclContext.
106*f4a2713aSLionel Sambuc
107*f4a2713aSLionel SambucThere are also a multitude of nodes in the AST that are not part of a
108*f4a2713aSLionel Sambuclarger hierarchy, and are only reachable from specific other nodes, like
109*f4a2713aSLionel Sambuc`CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_.
110*f4a2713aSLionel Sambuc
111*f4a2713aSLionel SambucThus, to traverse the full AST, one starts from the
112*f4a2713aSLionel Sambuc`TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_
113*f4a2713aSLionel Sambucand then recursively traverses everything that can be reached from that
114*f4a2713aSLionel Sambucnode - this information has to be encoded for each specific node type.
115*f4a2713aSLionel SambucThis algorithm is encoded in the
116*f4a2713aSLionel Sambuc`RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_.
117*f4a2713aSLionel SambucSee the `RecursiveASTVisitor
118*f4a2713aSLionel Sambuctutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_.
119*f4a2713aSLionel Sambuc
120*f4a2713aSLionel SambucThe two most basic nodes in the Clang AST are statements
121*f4a2713aSLionel Sambuc(`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and
122*f4a2713aSLionel Sambucdeclarations
123*f4a2713aSLionel Sambuc(`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note
124*f4a2713aSLionel Sambucthat expressions
125*f4a2713aSLionel Sambuc(`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are
126*f4a2713aSLionel Sambucalso statements in Clang's AST.
127