xref: /minix3/external/bsd/llvm/dist/clang/docs/LibASTMatchersTutorial.rst (revision 0a6a1f1d05b60e214de2f05a7310ddd1f0e590e7)
1f4a2713aSLionel Sambuc===============================================================
2f4a2713aSLionel SambucTutorial for building tools using LibTooling and LibASTMatchers
3f4a2713aSLionel Sambuc===============================================================
4f4a2713aSLionel Sambuc
5f4a2713aSLionel SambucThis document is intended to show how to build a useful source-to-source
6f4a2713aSLionel Sambuctranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7f4a2713aSLionel Sambucexplicitly aimed at people who are new to Clang, so all you should need
8f4a2713aSLionel Sambucis a working knowledge of C++ and the command line.
9f4a2713aSLionel Sambuc
10f4a2713aSLionel SambucIn order to work on the compiler, you need some basic knowledge of the
11f4a2713aSLionel Sambucabstract syntax tree (AST). To this end, the reader is incouraged to
12f4a2713aSLionel Sambucskim the :doc:`Introduction to the Clang
13f4a2713aSLionel SambucAST <IntroductionToTheClangAST>`
14f4a2713aSLionel Sambuc
15f4a2713aSLionel SambucStep 0: Obtaining Clang
16f4a2713aSLionel Sambuc=======================
17f4a2713aSLionel Sambuc
18f4a2713aSLionel SambucAs Clang is part of the LLVM project, you'll need to download LLVM's
19f4a2713aSLionel Sambucsource code first. Both Clang and LLVM are maintained as Subversion
20f4a2713aSLionel Sambucrepositories, but we'll be accessing them through the git mirror. For
21f4a2713aSLionel Sambucfurther information, see the `getting started
22f4a2713aSLionel Sambucguide <http://llvm.org/docs/GettingStarted.html>`_.
23f4a2713aSLionel Sambuc
24f4a2713aSLionel Sambuc.. code-block:: console
25f4a2713aSLionel Sambuc
26f4a2713aSLionel Sambuc      mkdir ~/clang-llvm && cd ~/clang-llvm
27f4a2713aSLionel Sambuc      git clone http://llvm.org/git/llvm.git
28f4a2713aSLionel Sambuc      cd llvm/tools
29f4a2713aSLionel Sambuc      git clone http://llvm.org/git/clang.git
30f4a2713aSLionel Sambuc      cd clang/tools
31f4a2713aSLionel Sambuc      git clone http://llvm.org/git/clang-tools-extra.git extra
32f4a2713aSLionel Sambuc
33f4a2713aSLionel SambucNext you need to obtain the CMake build system and Ninja build tool. You
34f4a2713aSLionel Sambucmay already have CMake installed, but current binary versions of CMake
35f4a2713aSLionel Sambucaren't built with Ninja support.
36f4a2713aSLionel Sambuc
37f4a2713aSLionel Sambuc.. code-block:: console
38f4a2713aSLionel Sambuc
39f4a2713aSLionel Sambuc      cd ~/clang-llvm
40f4a2713aSLionel Sambuc      git clone https://github.com/martine/ninja.git
41f4a2713aSLionel Sambuc      cd ninja
42f4a2713aSLionel Sambuc      git checkout release
43f4a2713aSLionel Sambuc      ./bootstrap.py
44f4a2713aSLionel Sambuc      sudo cp ninja /usr/bin/
45f4a2713aSLionel Sambuc
46f4a2713aSLionel Sambuc      cd ~/clang-llvm
47f4a2713aSLionel Sambuc      git clone git://cmake.org/stage/cmake.git
48f4a2713aSLionel Sambuc      cd cmake
49f4a2713aSLionel Sambuc      git checkout next
50f4a2713aSLionel Sambuc      ./bootstrap
51f4a2713aSLionel Sambuc      make
52f4a2713aSLionel Sambuc      sudo make install
53f4a2713aSLionel Sambuc
54f4a2713aSLionel SambucOkay. Now we'll build Clang!
55f4a2713aSLionel Sambuc
56f4a2713aSLionel Sambuc.. code-block:: console
57f4a2713aSLionel Sambuc
58f4a2713aSLionel Sambuc      cd ~/clang-llvm
59f4a2713aSLionel Sambuc      mkdir build && cd build
60f4a2713aSLionel Sambuc      cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON  # Enable tests; default is off.
61f4a2713aSLionel Sambuc      ninja
62f4a2713aSLionel Sambuc      ninja check       # Test LLVM only.
63f4a2713aSLionel Sambuc      ninja clang-test  # Test Clang only.
64f4a2713aSLionel Sambuc      ninja install
65f4a2713aSLionel Sambuc
66f4a2713aSLionel SambucAnd we're live.
67f4a2713aSLionel Sambuc
68f4a2713aSLionel SambucAll of the tests should pass, though there is a (very) small chance that
69f4a2713aSLionel Sambucyou can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
70f4a2713aSLionel Sambucin both the llvm and clang directories should fix any problems.
71f4a2713aSLionel Sambuc
72f4a2713aSLionel SambucFinally, we want to set Clang as its own compiler.
73f4a2713aSLionel Sambuc
74f4a2713aSLionel Sambuc.. code-block:: console
75f4a2713aSLionel Sambuc
76f4a2713aSLionel Sambuc      cd ~/clang-llvm/build
77f4a2713aSLionel Sambuc      ccmake ../llvm
78f4a2713aSLionel Sambuc
79f4a2713aSLionel SambucThe second command will bring up a GUI for configuring Clang. You need
80f4a2713aSLionel Sambucto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
81f4a2713aSLionel Sambucadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
82f4a2713aSLionel Sambuc``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
83f4a2713aSLionel Sambucconfigure, then ``'g'`` to generate CMake's files.
84f4a2713aSLionel Sambuc
85f4a2713aSLionel SambucFinally, run ninja one last time, and you're done.
86f4a2713aSLionel Sambuc
87f4a2713aSLionel SambucStep 1: Create a ClangTool
88f4a2713aSLionel Sambuc==========================
89f4a2713aSLionel Sambuc
90f4a2713aSLionel SambucNow that we have enough background knowledge, it's time to create the
91f4a2713aSLionel Sambucsimplest productive ClangTool in existence: a syntax checker. While this
92f4a2713aSLionel Sambucalready exists as ``clang-check``, it's important to understand what's
93f4a2713aSLionel Sambucgoing on.
94f4a2713aSLionel Sambuc
95f4a2713aSLionel SambucFirst, we'll need to create a new directory for our tool and tell CMake
96f4a2713aSLionel Sambucthat it exists. As this is not going to be a core clang tool, it will
97f4a2713aSLionel Sambuclive in the ``tools/extra`` repository.
98f4a2713aSLionel Sambuc
99f4a2713aSLionel Sambuc.. code-block:: console
100f4a2713aSLionel Sambuc
101f4a2713aSLionel Sambuc      cd ~/clang-llvm/llvm/tools/clang
102f4a2713aSLionel Sambuc      mkdir tools/extra/loop-convert
103f4a2713aSLionel Sambuc      echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
104f4a2713aSLionel Sambuc      vim tools/extra/loop-convert/CMakeLists.txt
105f4a2713aSLionel Sambuc
106f4a2713aSLionel SambucCMakeLists.txt should have the following contents:
107f4a2713aSLionel Sambuc
108f4a2713aSLionel Sambuc::
109f4a2713aSLionel Sambuc
110f4a2713aSLionel Sambuc      set(LLVM_LINK_COMPONENTS support)
111f4a2713aSLionel Sambuc      set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
112f4a2713aSLionel Sambuc
113f4a2713aSLionel Sambuc      add_clang_executable(loop-convert
114f4a2713aSLionel Sambuc        LoopConvert.cpp
115f4a2713aSLionel Sambuc        )
116f4a2713aSLionel Sambuc      target_link_libraries(loop-convert
117f4a2713aSLionel Sambuc        clangTooling
118f4a2713aSLionel Sambuc        clangBasic
119f4a2713aSLionel Sambuc        clangASTMatchers
120f4a2713aSLionel Sambuc        )
121f4a2713aSLionel Sambuc
122f4a2713aSLionel SambucWith that done, Ninja will be able to compile our tool. Let's give it
123f4a2713aSLionel Sambucsomething to compile! Put the following into
124f4a2713aSLionel Sambuc``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
125f4a2713aSLionel Sambucwhy the different parts are needed can be found in the `LibTooling
126f4a2713aSLionel Sambucdocumentation <LibTooling.html>`_.
127f4a2713aSLionel Sambuc
128f4a2713aSLionel Sambuc.. code-block:: c++
129f4a2713aSLionel Sambuc
130f4a2713aSLionel Sambuc      // Declares clang::SyntaxOnlyAction.
131f4a2713aSLionel Sambuc      #include "clang/Frontend/FrontendActions.h"
132f4a2713aSLionel Sambuc      #include "clang/Tooling/CommonOptionsParser.h"
133f4a2713aSLionel Sambuc      #include "clang/Tooling/Tooling.h"
134f4a2713aSLionel Sambuc      // Declares llvm::cl::extrahelp.
135f4a2713aSLionel Sambuc      #include "llvm/Support/CommandLine.h"
136f4a2713aSLionel Sambuc
137f4a2713aSLionel Sambuc      using namespace clang::tooling;
138f4a2713aSLionel Sambuc      using namespace llvm;
139f4a2713aSLionel Sambuc
140*0a6a1f1dSLionel Sambuc      // Apply a custom category to all command-line options so that they are the
141*0a6a1f1dSLionel Sambuc      // only ones displayed.
142*0a6a1f1dSLionel Sambuc      static llvm::cl::OptionCategory MyToolCategory("my-tool options");
143*0a6a1f1dSLionel Sambuc
144f4a2713aSLionel Sambuc      // CommonOptionsParser declares HelpMessage with a description of the common
145f4a2713aSLionel Sambuc      // command-line options related to the compilation database and input files.
146f4a2713aSLionel Sambuc      // It's nice to have this help message in all tools.
147f4a2713aSLionel Sambuc      static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
148f4a2713aSLionel Sambuc
149f4a2713aSLionel Sambuc      // A help message for this specific tool can be added afterwards.
150f4a2713aSLionel Sambuc      static cl::extrahelp MoreHelp("\nMore help text...");
151f4a2713aSLionel Sambuc
152f4a2713aSLionel Sambuc      int main(int argc, const char **argv) {
153*0a6a1f1dSLionel Sambuc        CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
154f4a2713aSLionel Sambuc        ClangTool Tool(OptionsParser.getCompilations(),
155f4a2713aSLionel Sambuc                       OptionsParser.getSourcePathList());
156*0a6a1f1dSLionel Sambuc        return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
157f4a2713aSLionel Sambuc      }
158f4a2713aSLionel Sambuc
159f4a2713aSLionel SambucAnd that's it! You can compile our new tool by running ninja from the
160f4a2713aSLionel Sambuc``build`` directory.
161f4a2713aSLionel Sambuc
162f4a2713aSLionel Sambuc.. code-block:: console
163f4a2713aSLionel Sambuc
164f4a2713aSLionel Sambuc      cd ~/clang-llvm/build
165f4a2713aSLionel Sambuc      ninja
166f4a2713aSLionel Sambuc
167f4a2713aSLionel SambucYou should now be able to run the syntax checker, which is located in
168f4a2713aSLionel Sambuc``~/clang-llvm/build/bin``, on any source file. Try it!
169f4a2713aSLionel Sambuc
170f4a2713aSLionel Sambuc.. code-block:: console
171f4a2713aSLionel Sambuc
172f4a2713aSLionel Sambuc      cat "int main() { return 0; }" > test.cpp
173f4a2713aSLionel Sambuc      bin/loop-convert test.cpp --
174f4a2713aSLionel Sambuc
175f4a2713aSLionel SambucNote the two dashes after we specify the source file. The additional
176f4a2713aSLionel Sambucoptions for the compiler are passed after the dashes rather than loading
177f4a2713aSLionel Sambucthem from a compilation database - there just aren't any options needed
178f4a2713aSLionel Sambucright now.
179f4a2713aSLionel Sambuc
180f4a2713aSLionel SambucIntermezzo: Learn AST matcher basics
181f4a2713aSLionel Sambuc====================================
182f4a2713aSLionel Sambuc
183f4a2713aSLionel SambucClang recently introduced the :doc:`ASTMatcher
184f4a2713aSLionel Sambuclibrary <LibASTMatchers>` to provide a simple, powerful, and
185f4a2713aSLionel Sambucconcise way to describe specific patterns in the AST. Implemented as a
186f4a2713aSLionel SambucDSL powered by macros and templates (see
187f4a2713aSLionel Sambuc`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
188f4a2713aSLionel Sambuccurious), matchers offer the feel of algebraic data types common to
189f4a2713aSLionel Sambucfunctional programming languages.
190f4a2713aSLionel Sambuc
191f4a2713aSLionel SambucFor example, suppose you wanted to examine only binary operators. There
192f4a2713aSLionel Sambucis a matcher to do exactly that, conveniently named ``binaryOperator``.
193f4a2713aSLionel SambucI'll give you one guess what this matcher does:
194f4a2713aSLionel Sambuc
195f4a2713aSLionel Sambuc.. code-block:: c++
196f4a2713aSLionel Sambuc
197f4a2713aSLionel Sambuc      binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
198f4a2713aSLionel Sambuc
199f4a2713aSLionel SambucShockingly, it will match against addition expressions whose left hand
200f4a2713aSLionel Sambucside is exactly the literal 0. It will not match against other forms of
201f4a2713aSLionel Sambuc0, such as ``'\0'`` or ``NULL``, but it will match against macros that
202f4a2713aSLionel Sambucexpand to 0. The matcher will also not match against calls to the
203f4a2713aSLionel Sambucoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
204f4a2713aSLionel Sambucmatcher to handle overloaded operators.
205f4a2713aSLionel Sambuc
206f4a2713aSLionel SambucThere are AST matchers to match all the different nodes of the AST,
207f4a2713aSLionel Sambucnarrowing matchers to only match AST nodes fulfilling specific criteria,
208f4a2713aSLionel Sambucand traversal matchers to get from one kind of AST node to another. For
209f4a2713aSLionel Sambuca complete list of AST matchers, take a look at the `AST Matcher
210f4a2713aSLionel SambucReferences <LibASTMatchersReference.html>`_
211f4a2713aSLionel Sambuc
212f4a2713aSLionel SambucAll matcher that are nouns describe entities in the AST and can be
213f4a2713aSLionel Sambucbound, so that they can be referred to whenever a match is found. To do
214f4a2713aSLionel Sambucso, simply call the method ``bind`` on these matchers, e.g.:
215f4a2713aSLionel Sambuc
216f4a2713aSLionel Sambuc.. code-block:: c++
217f4a2713aSLionel Sambuc
218f4a2713aSLionel Sambuc      variable(hasType(isInteger())).bind("intvar")
219f4a2713aSLionel Sambuc
220f4a2713aSLionel SambucStep 2: Using AST matchers
221f4a2713aSLionel Sambuc==========================
222f4a2713aSLionel Sambuc
223f4a2713aSLionel SambucOkay, on to using matchers for real. Let's start by defining a matcher
224f4a2713aSLionel Sambucwhich will capture all ``for`` statements that define a new variable
225f4a2713aSLionel Sambucinitialized to zero. Let's start with matching all ``for`` loops:
226f4a2713aSLionel Sambuc
227f4a2713aSLionel Sambuc.. code-block:: c++
228f4a2713aSLionel Sambuc
229f4a2713aSLionel Sambuc      forStmt()
230f4a2713aSLionel Sambuc
231f4a2713aSLionel SambucNext, we want to specify that a single variable is declared in the first
232f4a2713aSLionel Sambucportion of the loop, so we can extend the matcher to
233f4a2713aSLionel Sambuc
234f4a2713aSLionel Sambuc.. code-block:: c++
235f4a2713aSLionel Sambuc
236f4a2713aSLionel Sambuc      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
237f4a2713aSLionel Sambuc
238f4a2713aSLionel SambucFinally, we can add the condition that the variable is initialized to
239f4a2713aSLionel Sambuczero.
240f4a2713aSLionel Sambuc
241f4a2713aSLionel Sambuc.. code-block:: c++
242f4a2713aSLionel Sambuc
243f4a2713aSLionel Sambuc      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
244f4a2713aSLionel Sambuc        hasInitializer(integerLiteral(equals(0))))))))
245f4a2713aSLionel Sambuc
246f4a2713aSLionel SambucIt is fairly easy to read and understand the matcher definition ("match
247f4a2713aSLionel Sambucloops whose init portion declares a single variable which is initialized
248f4a2713aSLionel Sambucto the integer literal 0"), but deciding that every piece is necessary
249f4a2713aSLionel Sambucis more difficult. Note that this matcher will not match loops whose
250f4a2713aSLionel Sambucvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
251f4a2713aSLionel Sambuczero besides the integer 0.
252f4a2713aSLionel Sambuc
253f4a2713aSLionel SambucThe last step is giving the matcher a name and binding the ``ForStmt``
254f4a2713aSLionel Sambucas we will want to do something with it:
255f4a2713aSLionel Sambuc
256f4a2713aSLionel Sambuc.. code-block:: c++
257f4a2713aSLionel Sambuc
258f4a2713aSLionel Sambuc      StatementMatcher LoopMatcher =
259f4a2713aSLionel Sambuc        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
260f4a2713aSLionel Sambuc          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
261f4a2713aSLionel Sambuc
262f4a2713aSLionel SambucOnce you have defined your matchers, you will need to add a little more
263f4a2713aSLionel Sambucscaffolding in order to run them. Matchers are paired with a
264f4a2713aSLionel Sambuc``MatchCallback`` and registered with a ``MatchFinder`` object, then run
265f4a2713aSLionel Sambucfrom a ``ClangTool``. More code!
266f4a2713aSLionel Sambuc
267f4a2713aSLionel SambucAdd the following to ``LoopConvert.cpp``:
268f4a2713aSLionel Sambuc
269f4a2713aSLionel Sambuc.. code-block:: c++
270f4a2713aSLionel Sambuc
271f4a2713aSLionel Sambuc      #include "clang/ASTMatchers/ASTMatchers.h"
272f4a2713aSLionel Sambuc      #include "clang/ASTMatchers/ASTMatchFinder.h"
273f4a2713aSLionel Sambuc
274f4a2713aSLionel Sambuc      using namespace clang;
275f4a2713aSLionel Sambuc      using namespace clang::ast_matchers;
276f4a2713aSLionel Sambuc
277f4a2713aSLionel Sambuc      StatementMatcher LoopMatcher =
278f4a2713aSLionel Sambuc        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
279f4a2713aSLionel Sambuc          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
280f4a2713aSLionel Sambuc
281f4a2713aSLionel Sambuc      class LoopPrinter : public MatchFinder::MatchCallback {
282f4a2713aSLionel Sambuc      public :
283f4a2713aSLionel Sambuc        virtual void run(const MatchFinder::MatchResult &Result) {
284f4a2713aSLionel Sambuc          if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
285f4a2713aSLionel Sambuc            FS->dump();
286f4a2713aSLionel Sambuc        }
287f4a2713aSLionel Sambuc      };
288f4a2713aSLionel Sambuc
289f4a2713aSLionel SambucAnd change ``main()`` to:
290f4a2713aSLionel Sambuc
291f4a2713aSLionel Sambuc.. code-block:: c++
292f4a2713aSLionel Sambuc
293f4a2713aSLionel Sambuc      int main(int argc, const char **argv) {
294*0a6a1f1dSLionel Sambuc        CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
295f4a2713aSLionel Sambuc        ClangTool Tool(OptionsParser.getCompilations(),
296f4a2713aSLionel Sambuc                       OptionsParser.getSourcePathList());
297f4a2713aSLionel Sambuc
298f4a2713aSLionel Sambuc        LoopPrinter Printer;
299f4a2713aSLionel Sambuc        MatchFinder Finder;
300f4a2713aSLionel Sambuc        Finder.addMatcher(LoopMatcher, &Printer);
301f4a2713aSLionel Sambuc
302*0a6a1f1dSLionel Sambuc        return Tool.run(newFrontendActionFactory(&Finder).get());
303f4a2713aSLionel Sambuc      }
304f4a2713aSLionel Sambuc
305f4a2713aSLionel SambucNow, you should be able to recompile and run the code to discover for
306f4a2713aSLionel Sambucloops. Create a new file with a few examples, and test out our new
307f4a2713aSLionel Sambuchandiwork:
308f4a2713aSLionel Sambuc
309f4a2713aSLionel Sambuc.. code-block:: console
310f4a2713aSLionel Sambuc
311f4a2713aSLionel Sambuc      cd ~/clang-llvm/llvm/llvm_build/
312f4a2713aSLionel Sambuc      ninja loop-convert
313f4a2713aSLionel Sambuc      vim ~/test-files/simple-loops.cc
314f4a2713aSLionel Sambuc      bin/loop-convert ~/test-files/simple-loops.cc
315f4a2713aSLionel Sambuc
316f4a2713aSLionel SambucStep 3.5: More Complicated Matchers
317f4a2713aSLionel Sambuc===================================
318f4a2713aSLionel Sambuc
319f4a2713aSLionel SambucOur simple matcher is capable of discovering for loops, but we would
320f4a2713aSLionel Sambucstill need to filter out many more ourselves. We can do a good portion
321f4a2713aSLionel Sambucof the remaining work with some cleverly chosen matchers, but first we
322f4a2713aSLionel Sambucneed to decide exactly which properties we want to allow.
323f4a2713aSLionel Sambuc
324f4a2713aSLionel SambucHow can we characterize for loops over arrays which would be eligible
325f4a2713aSLionel Sambucfor translation to range-based syntax? Range based loops over arrays of
326f4a2713aSLionel Sambucsize ``N`` that:
327f4a2713aSLionel Sambuc
328f4a2713aSLionel Sambuc-  start at index ``0``
329f4a2713aSLionel Sambuc-  iterate consecutively
330f4a2713aSLionel Sambuc-  end at index ``N-1``
331f4a2713aSLionel Sambuc
332f4a2713aSLionel SambucWe already check for (1), so all we need to add is a check to the loop's
333f4a2713aSLionel Sambuccondition to ensure that the loop's index variable is compared against
334f4a2713aSLionel Sambuc``N`` and another check to ensure that the increment step just
335f4a2713aSLionel Sambucincrements this same variable. The matcher for (2) is straightforward:
336f4a2713aSLionel Sambucrequire a pre- or post-increment of the same variable declared in the
337f4a2713aSLionel Sambucinit portion.
338f4a2713aSLionel Sambuc
339f4a2713aSLionel SambucUnfortunately, such a matcher is impossible to write. Matchers contain
340f4a2713aSLionel Sambucno logic for comparing two arbitrary AST nodes and determining whether
341f4a2713aSLionel Sambucor not they are equal, so the best we can do is matching more than we
342f4a2713aSLionel Sambucwould like to allow, and punting extra comparisons to the callback.
343f4a2713aSLionel Sambuc
344f4a2713aSLionel SambucIn any case, we can start building this sub-matcher. We can require that
345f4a2713aSLionel Sambucthe increment step be a unary increment like this:
346f4a2713aSLionel Sambuc
347f4a2713aSLionel Sambuc.. code-block:: c++
348f4a2713aSLionel Sambuc
349f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(hasOperatorName("++")))
350f4a2713aSLionel Sambuc
351f4a2713aSLionel SambucSpecifying what is incremented introduces another quirk of Clang's AST:
352f4a2713aSLionel SambucUsages of variables are represented as ``DeclRefExpr``'s ("declaration
353f4a2713aSLionel Sambucreference expressions") because they are expressions which refer to
354f4a2713aSLionel Sambucvariable declarations. To find a ``unaryOperator`` that refers to a
355f4a2713aSLionel Sambucspecific declaration, we can simply add a second condition to it:
356f4a2713aSLionel Sambuc
357f4a2713aSLionel Sambuc.. code-block:: c++
358f4a2713aSLionel Sambuc
359f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(
360f4a2713aSLionel Sambuc        hasOperatorName("++"),
361f4a2713aSLionel Sambuc        hasUnaryOperand(declRefExpr())))
362f4a2713aSLionel Sambuc
363f4a2713aSLionel SambucFurthermore, we can restrict our matcher to only match if the
364f4a2713aSLionel Sambucincremented variable is an integer:
365f4a2713aSLionel Sambuc
366f4a2713aSLionel Sambuc.. code-block:: c++
367f4a2713aSLionel Sambuc
368f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(
369f4a2713aSLionel Sambuc        hasOperatorName("++"),
370f4a2713aSLionel Sambuc        hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
371f4a2713aSLionel Sambuc
372f4a2713aSLionel SambucAnd the last step will be to attach an identifier to this variable, so
373f4a2713aSLionel Sambucthat we can retrieve it in the callback:
374f4a2713aSLionel Sambuc
375f4a2713aSLionel Sambuc.. code-block:: c++
376f4a2713aSLionel Sambuc
377f4a2713aSLionel Sambuc      hasIncrement(unaryOperator(
378f4a2713aSLionel Sambuc        hasOperatorName("++"),
379f4a2713aSLionel Sambuc        hasUnaryOperand(declRefExpr(to(
380f4a2713aSLionel Sambuc          varDecl(hasType(isInteger())).bind("incrementVariable"))))))
381f4a2713aSLionel Sambuc
382f4a2713aSLionel SambucWe can add this code to the definition of ``LoopMatcher`` and make sure
383f4a2713aSLionel Sambucthat our program, outfitted with the new matcher, only prints out loops
384f4a2713aSLionel Sambucthat declare a single variable initialized to zero and have an increment
385f4a2713aSLionel Sambucstep consisting of a unary increment of some variable.
386f4a2713aSLionel Sambuc
387f4a2713aSLionel SambucNow, we just need to add a matcher to check if the condition part of the
388f4a2713aSLionel Sambuc``for`` loop compares a variable against the size of the array. There is
389f4a2713aSLionel Sambuconly one problem - we don't know which array we're iterating over
390f4a2713aSLionel Sambucwithout looking at the body of the loop! We are again restricted to
391f4a2713aSLionel Sambucapproximating the result we want with matchers, filling in the details
392f4a2713aSLionel Sambucin the callback. So we start with:
393f4a2713aSLionel Sambuc
394f4a2713aSLionel Sambuc.. code-block:: c++
395f4a2713aSLionel Sambuc
396f4a2713aSLionel Sambuc      hasCondition(binaryOperator(hasOperatorName("<"))
397f4a2713aSLionel Sambuc
398f4a2713aSLionel SambucIt makes sense to ensure that the left-hand side is a reference to a
399f4a2713aSLionel Sambucvariable, and that the right-hand side has integer type.
400f4a2713aSLionel Sambuc
401f4a2713aSLionel Sambuc.. code-block:: c++
402f4a2713aSLionel Sambuc
403f4a2713aSLionel Sambuc      hasCondition(binaryOperator(
404f4a2713aSLionel Sambuc        hasOperatorName("<"),
405f4a2713aSLionel Sambuc        hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
406f4a2713aSLionel Sambuc        hasRHS(expr(hasType(isInteger())))))
407f4a2713aSLionel Sambuc
408f4a2713aSLionel SambucWhy? Because it doesn't work. Of the three loops provided in
409f4a2713aSLionel Sambuc``test-files/simple.cpp``, zero of them have a matching condition. A
410f4a2713aSLionel Sambucquick look at the AST dump of the first for loop, produced by the
411f4a2713aSLionel Sambucprevious iteration of loop-convert, shows us the answer:
412f4a2713aSLionel Sambuc
413f4a2713aSLionel Sambuc::
414f4a2713aSLionel Sambuc
415f4a2713aSLionel Sambuc      (ForStmt 0x173b240
416f4a2713aSLionel Sambuc        (DeclStmt 0x173afc8
417f4a2713aSLionel Sambuc          0x173af50 "int i =
418f4a2713aSLionel Sambuc            (IntegerLiteral 0x173afa8 'int' 0)")
419f4a2713aSLionel Sambuc        <<>>
420f4a2713aSLionel Sambuc        (BinaryOperator 0x173b060 '_Bool' '<'
421f4a2713aSLionel Sambuc          (ImplicitCastExpr 0x173b030 'int'
422f4a2713aSLionel Sambuc            (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
423f4a2713aSLionel Sambuc          (ImplicitCastExpr 0x173b048 'int'
424f4a2713aSLionel Sambuc            (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
425f4a2713aSLionel Sambuc        (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
426f4a2713aSLionel Sambuc          (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
427f4a2713aSLionel Sambuc        (CompoundStatement ...
428f4a2713aSLionel Sambuc
429f4a2713aSLionel SambucWe already know that the declaration and increments both match, or this
430f4a2713aSLionel Sambucloop wouldn't have been dumped. The culprit lies in the implicit cast
431f4a2713aSLionel Sambucapplied to the first operand (i.e. the LHS) of the less-than operator,
432f4a2713aSLionel Sambucan L-value to R-value conversion applied to the expression referencing
433f4a2713aSLionel Sambuc``i``. Thankfully, the matcher library offers a solution to this problem
434f4a2713aSLionel Sambucin the form of ``ignoringParenImpCasts``, which instructs the matcher to
435f4a2713aSLionel Sambucignore implicit casts and parentheses before continuing to match.
436f4a2713aSLionel SambucAdjusting the condition operator will restore the desired match.
437f4a2713aSLionel Sambuc
438f4a2713aSLionel Sambuc.. code-block:: c++
439f4a2713aSLionel Sambuc
440f4a2713aSLionel Sambuc      hasCondition(binaryOperator(
441f4a2713aSLionel Sambuc        hasOperatorName("<"),
442f4a2713aSLionel Sambuc        hasLHS(ignoringParenImpCasts(declRefExpr(
443f4a2713aSLionel Sambuc          to(varDecl(hasType(isInteger())))))),
444f4a2713aSLionel Sambuc        hasRHS(expr(hasType(isInteger())))))
445f4a2713aSLionel Sambuc
446f4a2713aSLionel SambucAfter adding binds to the expressions we wished to capture and
447f4a2713aSLionel Sambucextracting the identifier strings into variables, we have array-step-2
448f4a2713aSLionel Sambuccompleted.
449f4a2713aSLionel Sambuc
450f4a2713aSLionel SambucStep 4: Retrieving Matched Nodes
451f4a2713aSLionel Sambuc================================
452f4a2713aSLionel Sambuc
453f4a2713aSLionel SambucSo far, the matcher callback isn't very interesting: it just dumps the
454f4a2713aSLionel Sambucloop's AST. At some point, we will need to make changes to the input
455f4a2713aSLionel Sambucsource code. Next, we'll work on using the nodes we bound in the
456f4a2713aSLionel Sambucprevious step.
457f4a2713aSLionel Sambuc
458f4a2713aSLionel SambucThe ``MatchFinder::run()`` callback takes a
459f4a2713aSLionel Sambuc``MatchFinder::MatchResult&`` as its parameter. We're most interested in
460f4a2713aSLionel Sambucits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
461f4a2713aSLionel Sambucclass to represent contextual information about the AST, as the name
462f4a2713aSLionel Sambucimplies, though the most functionally important detail is that several
463f4a2713aSLionel Sambucoperations require an ``ASTContext*`` parameter. More immediately useful
464f4a2713aSLionel Sambucis the set of matched nodes, and how we retrieve them.
465f4a2713aSLionel Sambuc
466f4a2713aSLionel SambucSince we bind three variables (identified by ConditionVarName,
467f4a2713aSLionel SambucInitVarName, and IncrementVarName), we can obtain the matched nodes by
468f4a2713aSLionel Sambucusing the ``getNodeAs()`` member function.
469f4a2713aSLionel Sambuc
470f4a2713aSLionel SambucIn ``LoopConvert.cpp`` add
471f4a2713aSLionel Sambuc
472f4a2713aSLionel Sambuc.. code-block:: c++
473f4a2713aSLionel Sambuc
474f4a2713aSLionel Sambuc      #include "clang/AST/ASTContext.h"
475f4a2713aSLionel Sambuc
476f4a2713aSLionel SambucChange ``LoopMatcher`` to
477f4a2713aSLionel Sambuc
478f4a2713aSLionel Sambuc.. code-block:: c++
479f4a2713aSLionel Sambuc
480f4a2713aSLionel Sambuc      StatementMatcher LoopMatcher =
481f4a2713aSLionel Sambuc          forStmt(hasLoopInit(declStmt(
482f4a2713aSLionel Sambuc                      hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
483f4a2713aSLionel Sambuc                                        .bind("initVarName")))),
484f4a2713aSLionel Sambuc                  hasIncrement(unaryOperator(
485f4a2713aSLionel Sambuc                      hasOperatorName("++"),
486f4a2713aSLionel Sambuc                      hasUnaryOperand(declRefExpr(
487f4a2713aSLionel Sambuc                          to(varDecl(hasType(isInteger())).bind("incVarName")))))),
488f4a2713aSLionel Sambuc                  hasCondition(binaryOperator(
489f4a2713aSLionel Sambuc                      hasOperatorName("<"),
490f4a2713aSLionel Sambuc                      hasLHS(ignoringParenImpCasts(declRefExpr(
491f4a2713aSLionel Sambuc                          to(varDecl(hasType(isInteger())).bind("condVarName"))))),
492f4a2713aSLionel Sambuc                      hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
493f4a2713aSLionel Sambuc
494f4a2713aSLionel SambucAnd change ``LoopPrinter::run`` to
495f4a2713aSLionel Sambuc
496f4a2713aSLionel Sambuc.. code-block:: c++
497f4a2713aSLionel Sambuc
498f4a2713aSLionel Sambuc      void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
499f4a2713aSLionel Sambuc        ASTContext *Context = Result.Context;
500f4a2713aSLionel Sambuc        const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop");
501f4a2713aSLionel Sambuc        // We do not want to convert header files!
502f4a2713aSLionel Sambuc        if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
503f4a2713aSLionel Sambuc          return;
504f4a2713aSLionel Sambuc        const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
505f4a2713aSLionel Sambuc        const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
506f4a2713aSLionel Sambuc        const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
507f4a2713aSLionel Sambuc
508f4a2713aSLionel Sambuc        if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
509f4a2713aSLionel Sambuc          return;
510f4a2713aSLionel Sambuc        llvm::outs() << "Potential array-based loop discovered.\n";
511f4a2713aSLionel Sambuc      }
512f4a2713aSLionel Sambuc
513f4a2713aSLionel SambucClang associates a ``VarDecl`` with each variable to represent the variable's
514f4a2713aSLionel Sambucdeclaration. Since the "canonical" form of each declaration is unique by
515f4a2713aSLionel Sambucaddress, all we need to do is make sure neither ``ValueDecl`` (base class of
516f4a2713aSLionel Sambuc``VarDecl``) is ``NULL`` and compare the canonical Decls.
517f4a2713aSLionel Sambuc
518f4a2713aSLionel Sambuc.. code-block:: c++
519f4a2713aSLionel Sambuc
520f4a2713aSLionel Sambuc      static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
521f4a2713aSLionel Sambuc        return First && Second &&
522f4a2713aSLionel Sambuc               First->getCanonicalDecl() == Second->getCanonicalDecl();
523f4a2713aSLionel Sambuc      }
524f4a2713aSLionel Sambuc
525f4a2713aSLionel SambucIf execution reaches the end of ``LoopPrinter::run()``, we know that the
526f4a2713aSLionel Sambucloop shell that looks like
527f4a2713aSLionel Sambuc
528f4a2713aSLionel Sambuc.. code-block:: c++
529f4a2713aSLionel Sambuc
530f4a2713aSLionel Sambuc      for (int i= 0; i < expr(); ++i) { ... }
531f4a2713aSLionel Sambuc
532f4a2713aSLionel SambucFor now, we will just print a message explaining that we found a loop.
533f4a2713aSLionel SambucThe next section will deal with recursively traversing the AST to
534f4a2713aSLionel Sambucdiscover all changes needed.
535f4a2713aSLionel Sambuc
536f4a2713aSLionel SambucAs a side note, it's not as trivial to test if two expressions are the same,
537f4a2713aSLionel Sambucthough Clang has already done the hard work for us by providing a way to
538f4a2713aSLionel Sambuccanonicalize expressions:
539f4a2713aSLionel Sambuc
540f4a2713aSLionel Sambuc.. code-block:: c++
541f4a2713aSLionel Sambuc
542f4a2713aSLionel Sambuc      static bool areSameExpr(ASTContext *Context, const Expr *First,
543f4a2713aSLionel Sambuc                              const Expr *Second) {
544f4a2713aSLionel Sambuc        if (!First || !Second)
545f4a2713aSLionel Sambuc          return false;
546f4a2713aSLionel Sambuc        llvm::FoldingSetNodeID FirstID, SecondID;
547f4a2713aSLionel Sambuc        First->Profile(FirstID, *Context, true);
548f4a2713aSLionel Sambuc        Second->Profile(SecondID, *Context, true);
549f4a2713aSLionel Sambuc        return FirstID == SecondID;
550f4a2713aSLionel Sambuc      }
551f4a2713aSLionel Sambuc
552f4a2713aSLionel SambucThis code relies on the comparison between two
553f4a2713aSLionel Sambuc``llvm::FoldingSetNodeIDs``. As the documentation for
554f4a2713aSLionel Sambuc``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
555f4a2713aSLionel Sambuca description of a node in the AST, based on its properties, along with
556f4a2713aSLionel Sambucthose of its children. ``FoldingSetNodeID`` then serves as a hash we can
557f4a2713aSLionel Sambucuse to compare expressions. We will need ``areSameExpr`` later. Before
558f4a2713aSLionel Sambucyou run the new code on the additional loops added to
559f4a2713aSLionel Sambuctest-files/simple.cpp, try to figure out which ones will be considered
560f4a2713aSLionel Sambucpotentially convertible.
561