xref: /netbsd-src/external/apache2/llvm/dist/clang/docs/LibASTMatchersTutorial.rst (revision e038c9c4676b0f19b1b7dd08a940c6ed64a6d5ae)
1===============================================================
2Tutorial for building tools using LibTooling and LibASTMatchers
3===============================================================
4
5This document is intended to show how to build a useful source-to-source
6translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7explicitly aimed at people who are new to Clang, so all you should need
8is a working knowledge of C++ and the command line.
9
10In order to work on the compiler, you need some basic knowledge of the
11abstract syntax tree (AST). To this end, the reader is encouraged to
12skim the :doc:`Introduction to the Clang
13AST <IntroductionToTheClangAST>`
14
15Step 0: Obtaining Clang
16=======================
17
18As Clang is part of the LLVM project, you'll need to download LLVM's
19source code first. Both Clang and LLVM are in the same git repository,
20under different directories. For further information, see the `getting
21started guide <https://llvm.org/docs/GettingStarted.html>`_.
22
23.. code-block:: console
24
25      cd ~/clang-llvm
26      git clone https://github.com/llvm/llvm-project.git
27
28Next you need to obtain the CMake build system and Ninja build tool.
29
30.. code-block:: console
31
32      cd ~/clang-llvm
33      git clone https://github.com/martine/ninja.git
34      cd ninja
35      git checkout release
36      ./bootstrap.py
37      sudo cp ninja /usr/bin/
38
39      cd ~/clang-llvm
40      git clone git://cmake.org/stage/cmake.git
41      cd cmake
42      git checkout next
43      ./bootstrap
44      make
45      sudo make install
46
47Okay. Now we'll build Clang!
48
49.. code-block:: console
50
51      cd ~/clang-llvm
52      mkdir build && cd build
53      cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON  # Enable tests; default is off.
54      ninja
55      ninja check       # Test LLVM only.
56      ninja clang-test  # Test Clang only.
57      ninja install
58
59And we're live.
60
61All of the tests should pass.
62
63Finally, we want to set Clang as its own compiler.
64
65.. code-block:: console
66
67      cd ~/clang-llvm/build
68      ccmake ../llvm
69
70The second command will bring up a GUI for configuring Clang. You need
71to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
72advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
73``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
74configure, then ``'g'`` to generate CMake's files.
75
76Finally, run ninja one last time, and you're done.
77
78Step 1: Create a ClangTool
79==========================
80
81Now that we have enough background knowledge, it's time to create the
82simplest productive ClangTool in existence: a syntax checker. While this
83already exists as ``clang-check``, it's important to understand what's
84going on.
85
86First, we'll need to create a new directory for our tool and tell CMake
87that it exists. As this is not going to be a core clang tool, it will
88live in the ``clang-tools-extra`` repository.
89
90.. code-block:: console
91
92      cd ~/clang-llvm
93      mkdir clang-tools-extra/loop-convert
94      echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt
95      vim clang-tools-extra/loop-convert/CMakeLists.txt
96
97CMakeLists.txt should have the following contents:
98
99::
100
101      set(LLVM_LINK_COMPONENTS support)
102
103      add_clang_executable(loop-convert
104        LoopConvert.cpp
105        )
106      target_link_libraries(loop-convert
107        PRIVATE
108        clangTooling
109        clangBasic
110        clangASTMatchers
111        )
112
113With that done, Ninja will be able to compile our tool. Let's give it
114something to compile! Put the following into
115``clang-tools-extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
116why the different parts are needed can be found in the `LibTooling
117documentation <LibTooling.html>`_.
118
119.. code-block:: c++
120
121      // Declares clang::SyntaxOnlyAction.
122      #include "clang/Frontend/FrontendActions.h"
123      #include "clang/Tooling/CommonOptionsParser.h"
124      #include "clang/Tooling/Tooling.h"
125      // Declares llvm::cl::extrahelp.
126      #include "llvm/Support/CommandLine.h"
127
128      using namespace clang::tooling;
129      using namespace llvm;
130
131      // Apply a custom category to all command-line options so that they are the
132      // only ones displayed.
133      static llvm::cl::OptionCategory MyToolCategory("my-tool options");
134
135      // CommonOptionsParser declares HelpMessage with a description of the common
136      // command-line options related to the compilation database and input files.
137      // It's nice to have this help message in all tools.
138      static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
139
140      // A help message for this specific tool can be added afterwards.
141      static cl::extrahelp MoreHelp("\nMore help text...\n");
142
143      int main(int argc, const char **argv) {
144        auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
145        if (!ExpectedParser) {
146          // Fail gracefully for unsupported options.
147          llvm::errs() << ExpectedParser.takeError();
148          return 1;
149        }
150        CommonOptionsParser& OptionsParser = ExpectedParser.get();
151        ClangTool Tool(OptionsParser.getCompilations(),
152                       OptionsParser.getSourcePathList());
153        return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
154      }
155
156And that's it! You can compile our new tool by running ninja from the
157``build`` directory.
158
159.. code-block:: console
160
161      cd ~/clang-llvm/build
162      ninja
163
164You should now be able to run the syntax checker, which is located in
165``~/clang-llvm/build/bin``, on any source file. Try it!
166
167.. code-block:: console
168
169      echo "int main() { return 0; }" > test.cpp
170      bin/loop-convert test.cpp --
171
172Note the two dashes after we specify the source file. The additional
173options for the compiler are passed after the dashes rather than loading
174them from a compilation database - there just aren't any options needed
175right now.
176
177Intermezzo: Learn AST matcher basics
178====================================
179
180Clang recently introduced the :doc:`ASTMatcher
181library <LibASTMatchers>` to provide a simple, powerful, and
182concise way to describe specific patterns in the AST. Implemented as a
183DSL powered by macros and templates (see
184`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
185curious), matchers offer the feel of algebraic data types common to
186functional programming languages.
187
188For example, suppose you wanted to examine only binary operators. There
189is a matcher to do exactly that, conveniently named ``binaryOperator``.
190I'll give you one guess what this matcher does:
191
192.. code-block:: c++
193
194      binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
195
196Shockingly, it will match against addition expressions whose left hand
197side is exactly the literal 0. It will not match against other forms of
1980, such as ``'\0'`` or ``NULL``, but it will match against macros that
199expand to 0. The matcher will also not match against calls to the
200overloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
201matcher to handle overloaded operators.
202
203There are AST matchers to match all the different nodes of the AST,
204narrowing matchers to only match AST nodes fulfilling specific criteria,
205and traversal matchers to get from one kind of AST node to another. For
206a complete list of AST matchers, take a look at the `AST Matcher
207References <LibASTMatchersReference.html>`_
208
209All matcher that are nouns describe entities in the AST and can be
210bound, so that they can be referred to whenever a match is found. To do
211so, simply call the method ``bind`` on these matchers, e.g.:
212
213.. code-block:: c++
214
215      variable(hasType(isInteger())).bind("intvar")
216
217Step 2: Using AST matchers
218==========================
219
220Okay, on to using matchers for real. Let's start by defining a matcher
221which will capture all ``for`` statements that define a new variable
222initialized to zero. Let's start with matching all ``for`` loops:
223
224.. code-block:: c++
225
226      forStmt()
227
228Next, we want to specify that a single variable is declared in the first
229portion of the loop, so we can extend the matcher to
230
231.. code-block:: c++
232
233      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
234
235Finally, we can add the condition that the variable is initialized to
236zero.
237
238.. code-block:: c++
239
240      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
241        hasInitializer(integerLiteral(equals(0))))))))
242
243It is fairly easy to read and understand the matcher definition ("match
244loops whose init portion declares a single variable which is initialized
245to the integer literal 0"), but deciding that every piece is necessary
246is more difficult. Note that this matcher will not match loops whose
247variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
248zero besides the integer 0.
249
250The last step is giving the matcher a name and binding the ``ForStmt``
251as we will want to do something with it:
252
253.. code-block:: c++
254
255      StatementMatcher LoopMatcher =
256        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
257          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
258
259Once you have defined your matchers, you will need to add a little more
260scaffolding in order to run them. Matchers are paired with a
261``MatchCallback`` and registered with a ``MatchFinder`` object, then run
262from a ``ClangTool``. More code!
263
264Add the following to ``LoopConvert.cpp``:
265
266.. code-block:: c++
267
268      #include "clang/ASTMatchers/ASTMatchers.h"
269      #include "clang/ASTMatchers/ASTMatchFinder.h"
270
271      using namespace clang;
272      using namespace clang::ast_matchers;
273
274      StatementMatcher LoopMatcher =
275        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
276          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
277
278      class LoopPrinter : public MatchFinder::MatchCallback {
279      public :
280        virtual void run(const MatchFinder::MatchResult &Result) {
281          if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
282            FS->dump();
283        }
284      };
285
286And change ``main()`` to:
287
288.. code-block:: c++
289
290      int main(int argc, const char **argv) {
291        auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
292        if (!ExpectedParser) {
293          // Fail gracefully for unsupported options.
294          llvm::errs() << ExpectedParser.takeError();
295          return 1;
296        }
297        CommonOptionsParser& OptionsParser = ExpectedParser.get();
298        ClangTool Tool(OptionsParser.getCompilations(),
299                       OptionsParser.getSourcePathList());
300
301        LoopPrinter Printer;
302        MatchFinder Finder;
303        Finder.addMatcher(LoopMatcher, &Printer);
304
305        return Tool.run(newFrontendActionFactory(&Finder).get());
306      }
307
308Now, you should be able to recompile and run the code to discover for
309loops. Create a new file with a few examples, and test out our new
310handiwork:
311
312.. code-block:: console
313
314      cd ~/clang-llvm/llvm/llvm_build/
315      ninja loop-convert
316      vim ~/test-files/simple-loops.cc
317      bin/loop-convert ~/test-files/simple-loops.cc
318
319Step 3.5: More Complicated Matchers
320===================================
321
322Our simple matcher is capable of discovering for loops, but we would
323still need to filter out many more ourselves. We can do a good portion
324of the remaining work with some cleverly chosen matchers, but first we
325need to decide exactly which properties we want to allow.
326
327How can we characterize for loops over arrays which would be eligible
328for translation to range-based syntax? Range based loops over arrays of
329size ``N`` that:
330
331-  start at index ``0``
332-  iterate consecutively
333-  end at index ``N-1``
334
335We already check for (1), so all we need to add is a check to the loop's
336condition to ensure that the loop's index variable is compared against
337``N`` and another check to ensure that the increment step just
338increments this same variable. The matcher for (2) is straightforward:
339require a pre- or post-increment of the same variable declared in the
340init portion.
341
342Unfortunately, such a matcher is impossible to write. Matchers contain
343no logic for comparing two arbitrary AST nodes and determining whether
344or not they are equal, so the best we can do is matching more than we
345would like to allow, and punting extra comparisons to the callback.
346
347In any case, we can start building this sub-matcher. We can require that
348the increment step be a unary increment like this:
349
350.. code-block:: c++
351
352      hasIncrement(unaryOperator(hasOperatorName("++")))
353
354Specifying what is incremented introduces another quirk of Clang's AST:
355Usages of variables are represented as ``DeclRefExpr``'s ("declaration
356reference expressions") because they are expressions which refer to
357variable declarations. To find a ``unaryOperator`` that refers to a
358specific declaration, we can simply add a second condition to it:
359
360.. code-block:: c++
361
362      hasIncrement(unaryOperator(
363        hasOperatorName("++"),
364        hasUnaryOperand(declRefExpr())))
365
366Furthermore, we can restrict our matcher to only match if the
367incremented variable is an integer:
368
369.. code-block:: c++
370
371      hasIncrement(unaryOperator(
372        hasOperatorName("++"),
373        hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
374
375And the last step will be to attach an identifier to this variable, so
376that we can retrieve it in the callback:
377
378.. code-block:: c++
379
380      hasIncrement(unaryOperator(
381        hasOperatorName("++"),
382        hasUnaryOperand(declRefExpr(to(
383          varDecl(hasType(isInteger())).bind("incrementVariable"))))))
384
385We can add this code to the definition of ``LoopMatcher`` and make sure
386that our program, outfitted with the new matcher, only prints out loops
387that declare a single variable initialized to zero and have an increment
388step consisting of a unary increment of some variable.
389
390Now, we just need to add a matcher to check if the condition part of the
391``for`` loop compares a variable against the size of the array. There is
392only one problem - we don't know which array we're iterating over
393without looking at the body of the loop! We are again restricted to
394approximating the result we want with matchers, filling in the details
395in the callback. So we start with:
396
397.. code-block:: c++
398
399      hasCondition(binaryOperator(hasOperatorName("<"))
400
401It makes sense to ensure that the left-hand side is a reference to a
402variable, and that the right-hand side has integer type.
403
404.. code-block:: c++
405
406      hasCondition(binaryOperator(
407        hasOperatorName("<"),
408        hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
409        hasRHS(expr(hasType(isInteger())))))
410
411Why? Because it doesn't work. Of the three loops provided in
412``test-files/simple.cpp``, zero of them have a matching condition. A
413quick look at the AST dump of the first for loop, produced by the
414previous iteration of loop-convert, shows us the answer:
415
416::
417
418      (ForStmt 0x173b240
419        (DeclStmt 0x173afc8
420          0x173af50 "int i =
421            (IntegerLiteral 0x173afa8 'int' 0)")
422        <<>>
423        (BinaryOperator 0x173b060 '_Bool' '<'
424          (ImplicitCastExpr 0x173b030 'int'
425            (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
426          (ImplicitCastExpr 0x173b048 'int'
427            (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
428        (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
429          (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
430        (CompoundStatement ...
431
432We already know that the declaration and increments both match, or this
433loop wouldn't have been dumped. The culprit lies in the implicit cast
434applied to the first operand (i.e. the LHS) of the less-than operator,
435an L-value to R-value conversion applied to the expression referencing
436``i``. Thankfully, the matcher library offers a solution to this problem
437in the form of ``ignoringParenImpCasts``, which instructs the matcher to
438ignore implicit casts and parentheses before continuing to match.
439Adjusting the condition operator will restore the desired match.
440
441.. code-block:: c++
442
443      hasCondition(binaryOperator(
444        hasOperatorName("<"),
445        hasLHS(ignoringParenImpCasts(declRefExpr(
446          to(varDecl(hasType(isInteger())))))),
447        hasRHS(expr(hasType(isInteger())))))
448
449After adding binds to the expressions we wished to capture and
450extracting the identifier strings into variables, we have array-step-2
451completed.
452
453Step 4: Retrieving Matched Nodes
454================================
455
456So far, the matcher callback isn't very interesting: it just dumps the
457loop's AST. At some point, we will need to make changes to the input
458source code. Next, we'll work on using the nodes we bound in the
459previous step.
460
461The ``MatchFinder::run()`` callback takes a
462``MatchFinder::MatchResult&`` as its parameter. We're most interested in
463its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
464class to represent contextual information about the AST, as the name
465implies, though the most functionally important detail is that several
466operations require an ``ASTContext*`` parameter. More immediately useful
467is the set of matched nodes, and how we retrieve them.
468
469Since we bind three variables (identified by ConditionVarName,
470InitVarName, and IncrementVarName), we can obtain the matched nodes by
471using the ``getNodeAs()`` member function.
472
473In ``LoopConvert.cpp`` add
474
475.. code-block:: c++
476
477      #include "clang/AST/ASTContext.h"
478
479Change ``LoopMatcher`` to
480
481.. code-block:: c++
482
483      StatementMatcher LoopMatcher =
484          forStmt(hasLoopInit(declStmt(
485                      hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
486                                        .bind("initVarName")))),
487                  hasIncrement(unaryOperator(
488                      hasOperatorName("++"),
489                      hasUnaryOperand(declRefExpr(
490                          to(varDecl(hasType(isInteger())).bind("incVarName")))))),
491                  hasCondition(binaryOperator(
492                      hasOperatorName("<"),
493                      hasLHS(ignoringParenImpCasts(declRefExpr(
494                          to(varDecl(hasType(isInteger())).bind("condVarName"))))),
495                      hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
496
497And change ``LoopPrinter::run`` to
498
499.. code-block:: c++
500
501      void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
502        ASTContext *Context = Result.Context;
503        const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop");
504        // We do not want to convert header files!
505        if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc()))
506          return;
507        const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
508        const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
509        const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
510
511        if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
512          return;
513        llvm::outs() << "Potential array-based loop discovered.\n";
514      }
515
516Clang associates a ``VarDecl`` with each variable to represent the variable's
517declaration. Since the "canonical" form of each declaration is unique by
518address, all we need to do is make sure neither ``ValueDecl`` (base class of
519``VarDecl``) is ``NULL`` and compare the canonical Decls.
520
521.. code-block:: c++
522
523      static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
524        return First && Second &&
525               First->getCanonicalDecl() == Second->getCanonicalDecl();
526      }
527
528If execution reaches the end of ``LoopPrinter::run()``, we know that the
529loop shell that looks like
530
531.. code-block:: c++
532
533      for (int i= 0; i < expr(); ++i) { ... }
534
535For now, we will just print a message explaining that we found a loop.
536The next section will deal with recursively traversing the AST to
537discover all changes needed.
538
539As a side note, it's not as trivial to test if two expressions are the same,
540though Clang has already done the hard work for us by providing a way to
541canonicalize expressions:
542
543.. code-block:: c++
544
545      static bool areSameExpr(ASTContext *Context, const Expr *First,
546                              const Expr *Second) {
547        if (!First || !Second)
548          return false;
549        llvm::FoldingSetNodeID FirstID, SecondID;
550        First->Profile(FirstID, *Context, true);
551        Second->Profile(SecondID, *Context, true);
552        return FirstID == SecondID;
553      }
554
555This code relies on the comparison between two
556``llvm::FoldingSetNodeIDs``. As the documentation for
557``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
558a description of a node in the AST, based on its properties, along with
559those of its children. ``FoldingSetNodeID`` then serves as a hash we can
560use to compare expressions. We will need ``areSameExpr`` later. Before
561you run the new code on the additional loops added to
562test-files/simple.cpp, try to figure out which ones will be considered
563potentially convertible.
564