xref: /minix3/external/bsd/llvm/dist/clang/docs/LibASTMatchersTutorial.rst (revision f4a2713ac843a11c696ec80c0a5e3e5d80b4d338)
1*f4a2713aSLionel Sambuc===============================================================
2*f4a2713aSLionel SambucTutorial for building tools using LibTooling and LibASTMatchers
3*f4a2713aSLionel Sambuc===============================================================
4*f4a2713aSLionel Sambuc
5*f4a2713aSLionel SambucThis document is intended to show how to build a useful source-to-source
6*f4a2713aSLionel Sambuctranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7*f4a2713aSLionel Sambucexplicitly aimed at people who are new to Clang, so all you should need
8*f4a2713aSLionel Sambucis a working knowledge of C++ and the command line.
9*f4a2713aSLionel Sambuc
10*f4a2713aSLionel SambucIn order to work on the compiler, you need some basic knowledge of the
11*f4a2713aSLionel Sambucabstract syntax tree (AST). To this end, the reader is incouraged to
12*f4a2713aSLionel Sambucskim the :doc:`Introduction to the Clang
13*f4a2713aSLionel SambucAST <IntroductionToTheClangAST>`
14*f4a2713aSLionel Sambuc
15*f4a2713aSLionel SambucStep 0: Obtaining Clang
16*f4a2713aSLionel Sambuc=======================
17*f4a2713aSLionel Sambuc
18*f4a2713aSLionel SambucAs Clang is part of the LLVM project, you'll need to download LLVM's
19*f4a2713aSLionel Sambucsource code first. Both Clang and LLVM are maintained as Subversion
20*f4a2713aSLionel Sambucrepositories, but we'll be accessing them through the git mirror. For
21*f4a2713aSLionel Sambucfurther information, see the `getting started
22*f4a2713aSLionel Sambucguide <http://llvm.org/docs/GettingStarted.html>`_.
23*f4a2713aSLionel Sambuc
24*f4a2713aSLionel Sambuc.. code-block:: console
25*f4a2713aSLionel Sambuc
26*f4a2713aSLionel Sambuc      mkdir ~/clang-llvm && cd ~/clang-llvm
27*f4a2713aSLionel Sambuc      git clone http://llvm.org/git/llvm.git
28*f4a2713aSLionel Sambuc      cd llvm/tools
29*f4a2713aSLionel Sambuc      git clone http://llvm.org/git/clang.git
30*f4a2713aSLionel Sambuc      cd clang/tools
31*f4a2713aSLionel Sambuc      git clone http://llvm.org/git/clang-tools-extra.git extra
32*f4a2713aSLionel Sambuc
33*f4a2713aSLionel SambucNext you need to obtain the CMake build system and Ninja build tool. You
34*f4a2713aSLionel Sambucmay already have CMake installed, but current binary versions of CMake
35*f4a2713aSLionel Sambucaren't built with Ninja support.
36*f4a2713aSLionel Sambuc
37*f4a2713aSLionel Sambuc.. code-block:: console
38*f4a2713aSLionel Sambuc
39*f4a2713aSLionel Sambuc      cd ~/clang-llvm
40*f4a2713aSLionel Sambuc      git clone https://github.com/martine/ninja.git
41*f4a2713aSLionel Sambuc      cd ninja
42*f4a2713aSLionel Sambuc      git checkout release
43*f4a2713aSLionel Sambuc      ./bootstrap.py
44*f4a2713aSLionel Sambuc      sudo cp ninja /usr/bin/
45*f4a2713aSLionel Sambuc
46*f4a2713aSLionel Sambuc      cd ~/clang-llvm
47*f4a2713aSLionel Sambuc      git clone git://cmake.org/stage/cmake.git
48*f4a2713aSLionel Sambuc      cd cmake
49*f4a2713aSLionel Sambuc      git checkout next
50*f4a2713aSLionel Sambuc      ./bootstrap
51*f4a2713aSLionel Sambuc      make
52*f4a2713aSLionel Sambuc      sudo make install
53*f4a2713aSLionel Sambuc
54*f4a2713aSLionel SambucOkay. Now we'll build Clang!
55*f4a2713aSLionel Sambuc
56*f4a2713aSLionel Sambuc.. code-block:: console
57*f4a2713aSLionel Sambuc
58*f4a2713aSLionel Sambuc      cd ~/clang-llvm
59*f4a2713aSLionel Sambuc      mkdir build && cd build
60*f4a2713aSLionel Sambuc      cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON  # Enable tests; default is off.
61*f4a2713aSLionel Sambuc      ninja
62*f4a2713aSLionel Sambuc      ninja check       # Test LLVM only.
63*f4a2713aSLionel Sambuc      ninja clang-test  # Test Clang only.
64*f4a2713aSLionel Sambuc      ninja install
65*f4a2713aSLionel Sambuc
66*f4a2713aSLionel SambucAnd we're live.
67*f4a2713aSLionel Sambuc
68*f4a2713aSLionel SambucAll of the tests should pass, though there is a (very) small chance that
69*f4a2713aSLionel Sambucyou can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
70*f4a2713aSLionel Sambucin both the llvm and clang directories should fix any problems.
71*f4a2713aSLionel Sambuc
72*f4a2713aSLionel SambucFinally, we want to set Clang as its own compiler.
73*f4a2713aSLionel Sambuc
74*f4a2713aSLionel Sambuc.. code-block:: console
75*f4a2713aSLionel Sambuc
76*f4a2713aSLionel Sambuc      cd ~/clang-llvm/build
77*f4a2713aSLionel Sambuc      ccmake ../llvm
78*f4a2713aSLionel Sambuc
79*f4a2713aSLionel SambucThe second command will bring up a GUI for configuring Clang. You need
80*f4a2713aSLionel Sambucto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
81*f4a2713aSLionel Sambucadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
82*f4a2713aSLionel Sambuc``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
83*f4a2713aSLionel Sambucconfigure, then ``'g'`` to generate CMake's files.
84*f4a2713aSLionel Sambuc
85*f4a2713aSLionel SambucFinally, run ninja one last time, and you're done.
86*f4a2713aSLionel Sambuc
87*f4a2713aSLionel SambucStep 1: Create a ClangTool
88*f4a2713aSLionel Sambuc==========================
89*f4a2713aSLionel Sambuc
90*f4a2713aSLionel SambucNow that we have enough background knowledge, it's time to create the
91*f4a2713aSLionel Sambucsimplest productive ClangTool in existence: a syntax checker. While this
92*f4a2713aSLionel Sambucalready exists as ``clang-check``, it's important to understand what's
93*f4a2713aSLionel Sambucgoing on.
94*f4a2713aSLionel Sambuc
95*f4a2713aSLionel SambucFirst, we'll need to create a new directory for our tool and tell CMake
96*f4a2713aSLionel Sambucthat it exists. As this is not going to be a core clang tool, it will
97*f4a2713aSLionel Sambuclive in the ``tools/extra`` repository.
98*f4a2713aSLionel Sambuc
99*f4a2713aSLionel Sambuc.. code-block:: console
100*f4a2713aSLionel Sambuc
101*f4a2713aSLionel Sambuc      cd ~/clang-llvm/llvm/tools/clang
102*f4a2713aSLionel Sambuc      mkdir tools/extra/loop-convert
103*f4a2713aSLionel Sambuc      echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
104*f4a2713aSLionel Sambuc      vim tools/extra/loop-convert/CMakeLists.txt
105*f4a2713aSLionel Sambuc
106*f4a2713aSLionel SambucCMakeLists.txt should have the following contents:
107*f4a2713aSLionel Sambuc
108*f4a2713aSLionel Sambuc::
109*f4a2713aSLionel Sambuc
110*f4a2713aSLionel Sambuc      set(LLVM_LINK_COMPONENTS support)
111*f4a2713aSLionel Sambuc      set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
112*f4a2713aSLionel Sambuc
113*f4a2713aSLionel Sambuc      add_clang_executable(loop-convert
114*f4a2713aSLionel Sambuc        LoopConvert.cpp
115*f4a2713aSLionel Sambuc        )
116*f4a2713aSLionel Sambuc      target_link_libraries(loop-convert
117*f4a2713aSLionel Sambuc        clangTooling
118*f4a2713aSLionel Sambuc        clangBasic
119*f4a2713aSLionel Sambuc        clangASTMatchers
120*f4a2713aSLionel Sambuc        )
121*f4a2713aSLionel Sambuc
122*f4a2713aSLionel SambucWith that done, Ninja will be able to compile our tool. Let's give it
123*f4a2713aSLionel Sambucsomething to compile! Put the following into
124*f4a2713aSLionel Sambuc``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
125*f4a2713aSLionel Sambucwhy the different parts are needed can be found in the `LibTooling
126*f4a2713aSLionel Sambucdocumentation <LibTooling.html>`_.
127*f4a2713aSLionel Sambuc
128*f4a2713aSLionel Sambuc.. code-block:: c++
129*f4a2713aSLionel Sambuc
130*f4a2713aSLionel Sambuc      // Declares clang::SyntaxOnlyAction.
131*f4a2713aSLionel Sambuc      #include "clang/Frontend/FrontendActions.h"
132*f4a2713aSLionel Sambuc      #include "clang/Tooling/CommonOptionsParser.h"
133*f4a2713aSLionel Sambuc      #include "clang/Tooling/Tooling.h"
134*f4a2713aSLionel Sambuc      // Declares llvm::cl::extrahelp.
135*f4a2713aSLionel Sambuc      #include "llvm/Support/CommandLine.h"
136*f4a2713aSLionel Sambuc
137*f4a2713aSLionel Sambuc      using namespace clang::tooling;
138*f4a2713aSLionel Sambuc      using namespace llvm;
139*f4a2713aSLionel Sambuc
140*f4a2713aSLionel Sambuc      // CommonOptionsParser declares HelpMessage with a description of the common
141*f4a2713aSLionel Sambuc      // command-line options related to the compilation database and input files.
142*f4a2713aSLionel Sambuc      // It's nice to have this help message in all tools.
143*f4a2713aSLionel Sambuc      static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
144*f4a2713aSLionel Sambuc
145*f4a2713aSLionel Sambuc      // A help message for this specific tool can be added afterwards.
146*f4a2713aSLionel Sambuc      static cl::extrahelp MoreHelp("\nMore help text...");
147*f4a2713aSLionel Sambuc
148*f4a2713aSLionel Sambuc      int main(int argc, const char **argv) {
149*f4a2713aSLionel Sambuc        CommonOptionsParser OptionsParser(argc, argv);
150*f4a2713aSLionel Sambuc        ClangTool Tool(OptionsParser.getCompilations(),
151*f4a2713aSLionel Sambuc                       OptionsParser.getSourcePathList());
152*f4a2713aSLionel Sambuc        return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
153*f4a2713aSLionel Sambuc      }
154*f4a2713aSLionel Sambuc
155*f4a2713aSLionel SambucAnd that's it! You can compile our new tool by running ninja from the
156*f4a2713aSLionel Sambuc``build`` directory.
157*f4a2713aSLionel Sambuc
158*f4a2713aSLionel Sambuc.. code-block:: console
159*f4a2713aSLionel Sambuc
160*f4a2713aSLionel Sambuc      cd ~/clang-llvm/build
161*f4a2713aSLionel Sambuc      ninja
162*f4a2713aSLionel Sambuc
163*f4a2713aSLionel SambucYou should now be able to run the syntax checker, which is located in
164*f4a2713aSLionel Sambuc``~/clang-llvm/build/bin``, on any source file. Try it!
165*f4a2713aSLionel Sambuc
166*f4a2713aSLionel Sambuc.. code-block:: console
167*f4a2713aSLionel Sambuc
168*f4a2713aSLionel Sambuc      cat "int main() { return 0; }" > test.cpp
169*f4a2713aSLionel Sambuc      bin/loop-convert test.cpp --
170*f4a2713aSLionel Sambuc
171*f4a2713aSLionel SambucNote the two dashes after we specify the source file. The additional
172*f4a2713aSLionel Sambucoptions for the compiler are passed after the dashes rather than loading
173*f4a2713aSLionel Sambucthem from a compilation database - there just aren't any options needed
174*f4a2713aSLionel Sambucright now.
175*f4a2713aSLionel Sambuc
176*f4a2713aSLionel SambucIntermezzo: Learn AST matcher basics
177*f4a2713aSLionel Sambuc====================================
178*f4a2713aSLionel Sambuc
179*f4a2713aSLionel SambucClang recently introduced the :doc:`ASTMatcher
180*f4a2713aSLionel Sambuclibrary <LibASTMatchers>` to provide a simple, powerful, and
181*f4a2713aSLionel Sambucconcise way to describe specific patterns in the AST. Implemented as a
182*f4a2713aSLionel SambucDSL powered by macros and templates (see
183*f4a2713aSLionel Sambuc`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
184*f4a2713aSLionel Sambuccurious), matchers offer the feel of algebraic data types common to
185*f4a2713aSLionel Sambucfunctional programming languages.
186*f4a2713aSLionel Sambuc
187*f4a2713aSLionel SambucFor example, suppose you wanted to examine only binary operators. There
188*f4a2713aSLionel Sambucis a matcher to do exactly that, conveniently named ``binaryOperator``.
189*f4a2713aSLionel SambucI'll give you one guess what this matcher does:
190*f4a2713aSLionel Sambuc
191*f4a2713aSLionel Sambuc.. code-block:: c++
192*f4a2713aSLionel Sambuc
193*f4a2713aSLionel Sambuc      binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
194*f4a2713aSLionel Sambuc
195*f4a2713aSLionel SambucShockingly, it will match against addition expressions whose left hand
196*f4a2713aSLionel Sambucside is exactly the literal 0. It will not match against other forms of
197*f4a2713aSLionel Sambuc0, such as ``'\0'`` or ``NULL``, but it will match against macros that
198*f4a2713aSLionel Sambucexpand to 0. The matcher will also not match against calls to the
199*f4a2713aSLionel Sambucoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
200*f4a2713aSLionel Sambucmatcher to handle overloaded operators.
201*f4a2713aSLionel Sambuc
202*f4a2713aSLionel SambucThere are AST matchers to match all the different nodes of the AST,
203*f4a2713aSLionel Sambucnarrowing matchers to only match AST nodes fulfilling specific criteria,
204*f4a2713aSLionel Sambucand traversal matchers to get from one kind of AST node to another. For
205*f4a2713aSLionel Sambuca complete list of AST matchers, take a look at the `AST Matcher
206*f4a2713aSLionel SambucReferences <LibASTMatchersReference.html>`_
207*f4a2713aSLionel Sambuc
208*f4a2713aSLionel SambucAll matcher that are nouns describe entities in the AST and can be
209*f4a2713aSLionel Sambucbound, so that they can be referred to whenever a match is found. To do
210*f4a2713aSLionel Sambucso, simply call the method ``bind`` on these matchers, e.g.:
211*f4a2713aSLionel Sambuc
212*f4a2713aSLionel Sambuc.. code-block:: c++
213*f4a2713aSLionel Sambuc
214*f4a2713aSLionel Sambuc      variable(hasType(isInteger())).bind("intvar")
215*f4a2713aSLionel Sambuc
216*f4a2713aSLionel SambucStep 2: Using AST matchers
217*f4a2713aSLionel Sambuc==========================
218*f4a2713aSLionel Sambuc
219*f4a2713aSLionel SambucOkay, on to using matchers for real. Let's start by defining a matcher
220*f4a2713aSLionel Sambucwhich will capture all ``for`` statements that define a new variable
221*f4a2713aSLionel Sambucinitialized to zero. Let's start with matching all ``for`` loops:
222*f4a2713aSLionel Sambuc
223*f4a2713aSLionel Sambuc.. code-block:: c++
224*f4a2713aSLionel Sambuc
225*f4a2713aSLionel Sambuc      forStmt()
226*f4a2713aSLionel Sambuc
227*f4a2713aSLionel SambucNext, we want to specify that a single variable is declared in the first
228*f4a2713aSLionel Sambucportion of the loop, so we can extend the matcher to
229*f4a2713aSLionel Sambuc
230*f4a2713aSLionel Sambuc.. code-block:: c++
231*f4a2713aSLionel Sambuc
232*f4a2713aSLionel Sambuc      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
233*f4a2713aSLionel Sambuc
234*f4a2713aSLionel SambucFinally, we can add the condition that the variable is initialized to
235*f4a2713aSLionel Sambuczero.
236*f4a2713aSLionel Sambuc
237*f4a2713aSLionel Sambuc.. code-block:: c++
238*f4a2713aSLionel Sambuc
239*f4a2713aSLionel Sambuc      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
240*f4a2713aSLionel Sambuc        hasInitializer(integerLiteral(equals(0))))))))
241*f4a2713aSLionel Sambuc
242*f4a2713aSLionel SambucIt is fairly easy to read and understand the matcher definition ("match
243*f4a2713aSLionel Sambucloops whose init portion declares a single variable which is initialized
244*f4a2713aSLionel Sambucto the integer literal 0"), but deciding that every piece is necessary
245*f4a2713aSLionel Sambucis more difficult. Note that this matcher will not match loops whose
246*f4a2713aSLionel Sambucvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
247*f4a2713aSLionel Sambuczero besides the integer 0.
248*f4a2713aSLionel Sambuc
249*f4a2713aSLionel SambucThe last step is giving the matcher a name and binding the ``ForStmt``
250*f4a2713aSLionel Sambucas we will want to do something with it:
251*f4a2713aSLionel Sambuc
252*f4a2713aSLionel Sambuc.. code-block:: c++
253*f4a2713aSLionel Sambuc
254*f4a2713aSLionel Sambuc      StatementMatcher LoopMatcher =
255*f4a2713aSLionel Sambuc        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
256*f4a2713aSLionel Sambuc          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
257*f4a2713aSLionel Sambuc
258*f4a2713aSLionel SambucOnce you have defined your matchers, you will need to add a little more
259*f4a2713aSLionel Sambucscaffolding in order to run them. Matchers are paired with a
260*f4a2713aSLionel Sambuc``MatchCallback`` and registered with a ``MatchFinder`` object, then run
261*f4a2713aSLionel Sambucfrom a ``ClangTool``. More code!
262*f4a2713aSLionel Sambuc
263*f4a2713aSLionel SambucAdd the following to ``LoopConvert.cpp``:
264*f4a2713aSLionel Sambuc
265*f4a2713aSLionel Sambuc.. code-block:: c++
266*f4a2713aSLionel Sambuc
267*f4a2713aSLionel Sambuc      #include "clang/ASTMatchers/ASTMatchers.h"
268*f4a2713aSLionel Sambuc      #include "clang/ASTMatchers/ASTMatchFinder.h"
269*f4a2713aSLionel Sambuc
270*f4a2713aSLionel Sambuc      using namespace clang;
271*f4a2713aSLionel Sambuc      using namespace clang::ast_matchers;
272*f4a2713aSLionel Sambuc
273*f4a2713aSLionel Sambuc      StatementMatcher LoopMatcher =
274*f4a2713aSLionel Sambuc        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
275*f4a2713aSLionel Sambuc          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
276*f4a2713aSLionel Sambuc
277*f4a2713aSLionel Sambuc      class LoopPrinter : public MatchFinder::MatchCallback {
278*f4a2713aSLionel Sambuc      public :
279*f4a2713aSLionel Sambuc        virtual void run(const MatchFinder::MatchResult &Result) {
280*f4a2713aSLionel Sambuc          if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
281*f4a2713aSLionel Sambuc            FS->dump();
282*f4a2713aSLionel Sambuc        }
283*f4a2713aSLionel Sambuc      };
284*f4a2713aSLionel Sambuc
285*f4a2713aSLionel SambucAnd change ``main()`` to:
286*f4a2713aSLionel Sambuc
287*f4a2713aSLionel Sambuc.. code-block:: c++
288*f4a2713aSLionel Sambuc
289*f4a2713aSLionel Sambuc      int main(int argc, const char **argv) {
290*f4a2713aSLionel Sambuc        CommonOptionsParser OptionsParser(argc, argv);
291*f4a2713aSLionel Sambuc        ClangTool Tool(OptionsParser.getCompilations(),
292*f4a2713aSLionel Sambuc                       OptionsParser.getSourcePathList());
293*f4a2713aSLionel Sambuc
294*f4a2713aSLionel Sambuc        LoopPrinter Printer;
295*f4a2713aSLionel Sambuc        MatchFinder Finder;
296*f4a2713aSLionel Sambuc        Finder.addMatcher(LoopMatcher, &Printer);
297*f4a2713aSLionel Sambuc
298*f4a2713aSLionel Sambuc        return Tool.run(newFrontendActionFactory(&Finder));
299*f4a2713aSLionel Sambuc      }
300*f4a2713aSLionel Sambuc
301*f4a2713aSLionel SambucNow, you should be able to recompile and run the code to discover for
302*f4a2713aSLionel Sambucloops. Create a new file with a few examples, and test out our new
303*f4a2713aSLionel Sambuchandiwork:
304*f4a2713aSLionel Sambuc
305*f4a2713aSLionel Sambuc.. code-block:: console
306*f4a2713aSLionel Sambuc
307*f4a2713aSLionel Sambuc      cd ~/clang-llvm/llvm/llvm_build/
308*f4a2713aSLionel Sambuc      ninja loop-convert
309*f4a2713aSLionel Sambuc      vim ~/test-files/simple-loops.cc
310*f4a2713aSLionel Sambuc      bin/loop-convert ~/test-files/simple-loops.cc
311*f4a2713aSLionel Sambuc
312*f4a2713aSLionel SambucStep 3.5: More Complicated Matchers
313*f4a2713aSLionel Sambuc===================================
314*f4a2713aSLionel Sambuc
315*f4a2713aSLionel SambucOur simple matcher is capable of discovering for loops, but we would
316*f4a2713aSLionel Sambucstill need to filter out many more ourselves. We can do a good portion
317*f4a2713aSLionel Sambucof the remaining work with some cleverly chosen matchers, but first we
318*f4a2713aSLionel Sambucneed to decide exactly which properties we want to allow.
319*f4a2713aSLionel Sambuc
320*f4a2713aSLionel SambucHow can we characterize for loops over arrays which would be eligible
321*f4a2713aSLionel Sambucfor translation to range-based syntax? Range based loops over arrays of
322*f4a2713aSLionel Sambucsize ``N`` that:
323*f4a2713aSLionel Sambuc
324*f4a2713aSLionel Sambuc-  start at index ``0``
325*f4a2713aSLionel Sambuc-  iterate consecutively
326*f4a2713aSLionel Sambuc-  end at index ``N-1``
327*f4a2713aSLionel Sambuc
328*f4a2713aSLionel SambucWe already check for (1), so all we need to add is a check to the loop's
329*f4a2713aSLionel Sambuccondition to ensure that the loop's index variable is compared against
330*f4a2713aSLionel Sambuc``N`` and another check to ensure that the increment step just
331*f4a2713aSLionel Sambucincrements this same variable. The matcher for (2) is straightforward:
332*f4a2713aSLionel Sambucrequire a pre- or post-increment of the same variable declared in the
333*f4a2713aSLionel Sambucinit portion.
334*f4a2713aSLionel Sambuc
335*f4a2713aSLionel SambucUnfortunately, such a matcher is impossible to write. Matchers contain
336*f4a2713aSLionel Sambucno logic for comparing two arbitrary AST nodes and determining whether
337*f4a2713aSLionel Sambucor not they are equal, so the best we can do is matching more than we
338*f4a2713aSLionel Sambucwould like to allow, and punting extra comparisons to the callback.
339*f4a2713aSLionel Sambuc
340*f4a2713aSLionel SambucIn any case, we can start building this sub-matcher. We can require that
341*f4a2713aSLionel Sambucthe increment step be a unary increment like this:
342*f4a2713aSLionel Sambuc
343*f4a2713aSLionel Sambuc.. code-block:: c++
344*f4a2713aSLionel Sambuc
345*f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(hasOperatorName("++")))
346*f4a2713aSLionel Sambuc
347*f4a2713aSLionel SambucSpecifying what is incremented introduces another quirk of Clang's AST:
348*f4a2713aSLionel SambucUsages of variables are represented as ``DeclRefExpr``'s ("declaration
349*f4a2713aSLionel Sambucreference expressions") because they are expressions which refer to
350*f4a2713aSLionel Sambucvariable declarations. To find a ``unaryOperator`` that refers to a
351*f4a2713aSLionel Sambucspecific declaration, we can simply add a second condition to it:
352*f4a2713aSLionel Sambuc
353*f4a2713aSLionel Sambuc.. code-block:: c++
354*f4a2713aSLionel Sambuc
355*f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(
356*f4a2713aSLionel Sambuc        hasOperatorName("++"),
357*f4a2713aSLionel Sambuc        hasUnaryOperand(declRefExpr())))
358*f4a2713aSLionel Sambuc
359*f4a2713aSLionel SambucFurthermore, we can restrict our matcher to only match if the
360*f4a2713aSLionel Sambucincremented variable is an integer:
361*f4a2713aSLionel Sambuc
362*f4a2713aSLionel Sambuc.. code-block:: c++
363*f4a2713aSLionel Sambuc
364*f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(
365*f4a2713aSLionel Sambuc        hasOperatorName("++"),
366*f4a2713aSLionel Sambuc        hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
367*f4a2713aSLionel Sambuc
368*f4a2713aSLionel SambucAnd the last step will be to attach an identifier to this variable, so
369*f4a2713aSLionel Sambucthat we can retrieve it in the callback:
370*f4a2713aSLionel Sambuc
371*f4a2713aSLionel Sambuc.. code-block:: c++
372*f4a2713aSLionel Sambuc
373*f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(
374*f4a2713aSLionel Sambuc        hasOperatorName("++"),
375*f4a2713aSLionel Sambuc        hasUnaryOperand(declRefExpr(to(
376*f4a2713aSLionel Sambuc          varDecl(hasType(isInteger())).bind("incrementVariable"))))))
377*f4a2713aSLionel Sambuc
378*f4a2713aSLionel SambucWe can add this code to the definition of ``LoopMatcher`` and make sure
379*f4a2713aSLionel Sambucthat our program, outfitted with the new matcher, only prints out loops
380*f4a2713aSLionel Sambucthat declare a single variable initialized to zero and have an increment
381*f4a2713aSLionel Sambucstep consisting of a unary increment of some variable.
382*f4a2713aSLionel Sambuc
383*f4a2713aSLionel SambucNow, we just need to add a matcher to check if the condition part of the
384*f4a2713aSLionel Sambuc``for`` loop compares a variable against the size of the array. There is
385*f4a2713aSLionel Sambuconly one problem - we don't know which array we're iterating over
386*f4a2713aSLionel Sambucwithout looking at the body of the loop! We are again restricted to
387*f4a2713aSLionel Sambucapproximating the result we want with matchers, filling in the details
388*f4a2713aSLionel Sambucin the callback. So we start with:
389*f4a2713aSLionel Sambuc
390*f4a2713aSLionel Sambuc.. code-block:: c++
391*f4a2713aSLionel Sambuc
392*f4a2713aSLionel Sambuc      hasCondition(binaryOperator(hasOperatorName("<"))
393*f4a2713aSLionel Sambuc
394*f4a2713aSLionel SambucIt makes sense to ensure that the left-hand side is a reference to a
395*f4a2713aSLionel Sambucvariable, and that the right-hand side has integer type.
396*f4a2713aSLionel Sambuc
397*f4a2713aSLionel Sambuc.. code-block:: c++
398*f4a2713aSLionel Sambuc
399*f4a2713aSLionel Sambuc      hasCondition(binaryOperator(
400*f4a2713aSLionel Sambuc        hasOperatorName("<"),
401*f4a2713aSLionel Sambuc        hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
402*f4a2713aSLionel Sambuc        hasRHS(expr(hasType(isInteger())))))
403*f4a2713aSLionel Sambuc
404*f4a2713aSLionel SambucWhy? Because it doesn't work. Of the three loops provided in
405*f4a2713aSLionel Sambuc``test-files/simple.cpp``, zero of them have a matching condition. A
406*f4a2713aSLionel Sambucquick look at the AST dump of the first for loop, produced by the
407*f4a2713aSLionel Sambucprevious iteration of loop-convert, shows us the answer:
408*f4a2713aSLionel Sambuc
409*f4a2713aSLionel Sambuc::
410*f4a2713aSLionel Sambuc
411*f4a2713aSLionel Sambuc      (ForStmt 0x173b240
412*f4a2713aSLionel Sambuc        (DeclStmt 0x173afc8
413*f4a2713aSLionel Sambuc          0x173af50 "int i =
414*f4a2713aSLionel Sambuc            (IntegerLiteral 0x173afa8 'int' 0)")
415*f4a2713aSLionel Sambuc        <<>>
416*f4a2713aSLionel Sambuc        (BinaryOperator 0x173b060 '_Bool' '<'
417*f4a2713aSLionel Sambuc          (ImplicitCastExpr 0x173b030 'int'
418*f4a2713aSLionel Sambuc            (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
419*f4a2713aSLionel Sambuc          (ImplicitCastExpr 0x173b048 'int'
420*f4a2713aSLionel Sambuc            (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
421*f4a2713aSLionel Sambuc        (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
422*f4a2713aSLionel Sambuc          (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
423*f4a2713aSLionel Sambuc        (CompoundStatement ...
424*f4a2713aSLionel Sambuc
425*f4a2713aSLionel SambucWe already know that the declaration and increments both match, or this
426*f4a2713aSLionel Sambucloop wouldn't have been dumped. The culprit lies in the implicit cast
427*f4a2713aSLionel Sambucapplied to the first operand (i.e. the LHS) of the less-than operator,
428*f4a2713aSLionel Sambucan L-value to R-value conversion applied to the expression referencing
429*f4a2713aSLionel Sambuc``i``. Thankfully, the matcher library offers a solution to this problem
430*f4a2713aSLionel Sambucin the form of ``ignoringParenImpCasts``, which instructs the matcher to
431*f4a2713aSLionel Sambucignore implicit casts and parentheses before continuing to match.
432*f4a2713aSLionel SambucAdjusting the condition operator will restore the desired match.
433*f4a2713aSLionel Sambuc
434*f4a2713aSLionel Sambuc.. code-block:: c++
435*f4a2713aSLionel Sambuc
436*f4a2713aSLionel Sambuc      hasCondition(binaryOperator(
437*f4a2713aSLionel Sambuc        hasOperatorName("<"),
438*f4a2713aSLionel Sambuc        hasLHS(ignoringParenImpCasts(declRefExpr(
439*f4a2713aSLionel Sambuc          to(varDecl(hasType(isInteger())))))),
440*f4a2713aSLionel Sambuc        hasRHS(expr(hasType(isInteger())))))
441*f4a2713aSLionel Sambuc
442*f4a2713aSLionel SambucAfter adding binds to the expressions we wished to capture and
443*f4a2713aSLionel Sambucextracting the identifier strings into variables, we have array-step-2
444*f4a2713aSLionel Sambuccompleted.
445*f4a2713aSLionel Sambuc
446*f4a2713aSLionel SambucStep 4: Retrieving Matched Nodes
447*f4a2713aSLionel Sambuc================================
448*f4a2713aSLionel Sambuc
449*f4a2713aSLionel SambucSo far, the matcher callback isn't very interesting: it just dumps the
450*f4a2713aSLionel Sambucloop's AST. At some point, we will need to make changes to the input
451*f4a2713aSLionel Sambucsource code. Next, we'll work on using the nodes we bound in the
452*f4a2713aSLionel Sambucprevious step.
453*f4a2713aSLionel Sambuc
454*f4a2713aSLionel SambucThe ``MatchFinder::run()`` callback takes a
455*f4a2713aSLionel Sambuc``MatchFinder::MatchResult&`` as its parameter. We're most interested in
456*f4a2713aSLionel Sambucits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
457*f4a2713aSLionel Sambucclass to represent contextual information about the AST, as the name
458*f4a2713aSLionel Sambucimplies, though the most functionally important detail is that several
459*f4a2713aSLionel Sambucoperations require an ``ASTContext*`` parameter. More immediately useful
460*f4a2713aSLionel Sambucis the set of matched nodes, and how we retrieve them.
461*f4a2713aSLionel Sambuc
462*f4a2713aSLionel SambucSince we bind three variables (identified by ConditionVarName,
463*f4a2713aSLionel SambucInitVarName, and IncrementVarName), we can obtain the matched nodes by
464*f4a2713aSLionel Sambucusing the ``getNodeAs()`` member function.
465*f4a2713aSLionel Sambuc
466*f4a2713aSLionel SambucIn ``LoopConvert.cpp`` add
467*f4a2713aSLionel Sambuc
468*f4a2713aSLionel Sambuc.. code-block:: c++
469*f4a2713aSLionel Sambuc
470*f4a2713aSLionel Sambuc      #include "clang/AST/ASTContext.h"
471*f4a2713aSLionel Sambuc
472*f4a2713aSLionel SambucChange ``LoopMatcher`` to
473*f4a2713aSLionel Sambuc
474*f4a2713aSLionel Sambuc.. code-block:: c++
475*f4a2713aSLionel Sambuc
476*f4a2713aSLionel Sambuc      StatementMatcher LoopMatcher =
477*f4a2713aSLionel Sambuc          forStmt(hasLoopInit(declStmt(
478*f4a2713aSLionel Sambuc                      hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
479*f4a2713aSLionel Sambuc                                        .bind("initVarName")))),
480*f4a2713aSLionel Sambuc                  hasIncrement(unaryOperator(
481*f4a2713aSLionel Sambuc                      hasOperatorName("++"),
482*f4a2713aSLionel Sambuc                      hasUnaryOperand(declRefExpr(
483*f4a2713aSLionel Sambuc                          to(varDecl(hasType(isInteger())).bind("incVarName")))))),
484*f4a2713aSLionel Sambuc                  hasCondition(binaryOperator(
485*f4a2713aSLionel Sambuc                      hasOperatorName("<"),
486*f4a2713aSLionel Sambuc                      hasLHS(ignoringParenImpCasts(declRefExpr(
487*f4a2713aSLionel Sambuc                          to(varDecl(hasType(isInteger())).bind("condVarName"))))),
488*f4a2713aSLionel Sambuc                      hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
489*f4a2713aSLionel Sambuc
490*f4a2713aSLionel SambucAnd change ``LoopPrinter::run`` to
491*f4a2713aSLionel Sambuc
492*f4a2713aSLionel Sambuc.. code-block:: c++
493*f4a2713aSLionel Sambuc
494*f4a2713aSLionel Sambuc      void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
495*f4a2713aSLionel Sambuc        ASTContext *Context = Result.Context;
496*f4a2713aSLionel Sambuc        const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop");
497*f4a2713aSLionel Sambuc        // We do not want to convert header files!
498*f4a2713aSLionel Sambuc        if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
499*f4a2713aSLionel Sambuc          return;
500*f4a2713aSLionel Sambuc        const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
501*f4a2713aSLionel Sambuc        const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
502*f4a2713aSLionel Sambuc        const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
503*f4a2713aSLionel Sambuc
504*f4a2713aSLionel Sambuc        if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
505*f4a2713aSLionel Sambuc          return;
506*f4a2713aSLionel Sambuc        llvm::outs() << "Potential array-based loop discovered.\n";
507*f4a2713aSLionel Sambuc      }
508*f4a2713aSLionel Sambuc
509*f4a2713aSLionel SambucClang associates a ``VarDecl`` with each variable to represent the variable's
510*f4a2713aSLionel Sambucdeclaration. Since the "canonical" form of each declaration is unique by
511*f4a2713aSLionel Sambucaddress, all we need to do is make sure neither ``ValueDecl`` (base class of
512*f4a2713aSLionel Sambuc``VarDecl``) is ``NULL`` and compare the canonical Decls.
513*f4a2713aSLionel Sambuc
514*f4a2713aSLionel Sambuc.. code-block:: c++
515*f4a2713aSLionel Sambuc
516*f4a2713aSLionel Sambuc      static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
517*f4a2713aSLionel Sambuc        return First && Second &&
518*f4a2713aSLionel Sambuc               First->getCanonicalDecl() == Second->getCanonicalDecl();
519*f4a2713aSLionel Sambuc      }
520*f4a2713aSLionel Sambuc
521*f4a2713aSLionel SambucIf execution reaches the end of ``LoopPrinter::run()``, we know that the
522*f4a2713aSLionel Sambucloop shell that looks like
523*f4a2713aSLionel Sambuc
524*f4a2713aSLionel Sambuc.. code-block:: c++
525*f4a2713aSLionel Sambuc
526*f4a2713aSLionel Sambuc      for (int i= 0; i < expr(); ++i) { ... }
527*f4a2713aSLionel Sambuc
528*f4a2713aSLionel SambucFor now, we will just print a message explaining that we found a loop.
529*f4a2713aSLionel SambucThe next section will deal with recursively traversing the AST to
530*f4a2713aSLionel Sambucdiscover all changes needed.
531*f4a2713aSLionel Sambuc
532*f4a2713aSLionel SambucAs a side note, it's not as trivial to test if two expressions are the same,
533*f4a2713aSLionel Sambucthough Clang has already done the hard work for us by providing a way to
534*f4a2713aSLionel Sambuccanonicalize expressions:
535*f4a2713aSLionel Sambuc
536*f4a2713aSLionel Sambuc.. code-block:: c++
537*f4a2713aSLionel Sambuc
538*f4a2713aSLionel Sambuc      static bool areSameExpr(ASTContext *Context, const Expr *First,
539*f4a2713aSLionel Sambuc                              const Expr *Second) {
540*f4a2713aSLionel Sambuc        if (!First || !Second)
541*f4a2713aSLionel Sambuc          return false;
542*f4a2713aSLionel Sambuc        llvm::FoldingSetNodeID FirstID, SecondID;
543*f4a2713aSLionel Sambuc        First->Profile(FirstID, *Context, true);
544*f4a2713aSLionel Sambuc        Second->Profile(SecondID, *Context, true);
545*f4a2713aSLionel Sambuc        return FirstID == SecondID;
546*f4a2713aSLionel Sambuc      }
547*f4a2713aSLionel Sambuc
548*f4a2713aSLionel SambucThis code relies on the comparison between two
549*f4a2713aSLionel Sambuc``llvm::FoldingSetNodeIDs``. As the documentation for
550*f4a2713aSLionel Sambuc``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
551*f4a2713aSLionel Sambuca description of a node in the AST, based on its properties, along with
552*f4a2713aSLionel Sambucthose of its children. ``FoldingSetNodeID`` then serves as a hash we can
553*f4a2713aSLionel Sambucuse to compare expressions. We will need ``areSameExpr`` later. Before
554*f4a2713aSLionel Sambucyou run the new code on the additional loops added to
555*f4a2713aSLionel Sambuctest-files/simple.cpp, try to figure out which ones will be considered
556*f4a2713aSLionel Sambucpotentially convertible.
557