xref: /netbsd-src/external/apache2/llvm/dist/clang/docs/LibASTMatchersTutorial.rst (revision e038c9c4676b0f19b1b7dd08a940c6ed64a6d5ae)
17330f729Sjoerg===============================================================
27330f729SjoergTutorial for building tools using LibTooling and LibASTMatchers
37330f729Sjoerg===============================================================
47330f729Sjoerg
57330f729SjoergThis document is intended to show how to build a useful source-to-source
67330f729Sjoergtranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
77330f729Sjoergexplicitly aimed at people who are new to Clang, so all you should need
87330f729Sjoergis a working knowledge of C++ and the command line.
97330f729Sjoerg
107330f729SjoergIn order to work on the compiler, you need some basic knowledge of the
11*e038c9c4Sjoergabstract syntax tree (AST). To this end, the reader is encouraged to
127330f729Sjoergskim the :doc:`Introduction to the Clang
137330f729SjoergAST <IntroductionToTheClangAST>`
147330f729Sjoerg
157330f729SjoergStep 0: Obtaining Clang
167330f729Sjoerg=======================
177330f729Sjoerg
187330f729SjoergAs Clang is part of the LLVM project, you'll need to download LLVM's
197330f729Sjoergsource code first. Both Clang and LLVM are in the same git repository,
207330f729Sjoergunder different directories. For further information, see the `getting
217330f729Sjoergstarted guide <https://llvm.org/docs/GettingStarted.html>`_.
227330f729Sjoerg
237330f729Sjoerg.. code-block:: console
247330f729Sjoerg
257330f729Sjoerg      cd ~/clang-llvm
267330f729Sjoerg      git clone https://github.com/llvm/llvm-project.git
277330f729Sjoerg
287330f729SjoergNext you need to obtain the CMake build system and Ninja build tool.
297330f729Sjoerg
307330f729Sjoerg.. code-block:: console
317330f729Sjoerg
327330f729Sjoerg      cd ~/clang-llvm
337330f729Sjoerg      git clone https://github.com/martine/ninja.git
347330f729Sjoerg      cd ninja
357330f729Sjoerg      git checkout release
367330f729Sjoerg      ./bootstrap.py
377330f729Sjoerg      sudo cp ninja /usr/bin/
387330f729Sjoerg
397330f729Sjoerg      cd ~/clang-llvm
407330f729Sjoerg      git clone git://cmake.org/stage/cmake.git
417330f729Sjoerg      cd cmake
427330f729Sjoerg      git checkout next
437330f729Sjoerg      ./bootstrap
447330f729Sjoerg      make
457330f729Sjoerg      sudo make install
467330f729Sjoerg
477330f729SjoergOkay. Now we'll build Clang!
487330f729Sjoerg
497330f729Sjoerg.. code-block:: console
507330f729Sjoerg
517330f729Sjoerg      cd ~/clang-llvm
527330f729Sjoerg      mkdir build && cd build
537330f729Sjoerg      cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON  # Enable tests; default is off.
547330f729Sjoerg      ninja
557330f729Sjoerg      ninja check       # Test LLVM only.
567330f729Sjoerg      ninja clang-test  # Test Clang only.
577330f729Sjoerg      ninja install
587330f729Sjoerg
597330f729SjoergAnd we're live.
607330f729Sjoerg
617330f729SjoergAll of the tests should pass.
627330f729Sjoerg
637330f729SjoergFinally, we want to set Clang as its own compiler.
647330f729Sjoerg
657330f729Sjoerg.. code-block:: console
667330f729Sjoerg
677330f729Sjoerg      cd ~/clang-llvm/build
687330f729Sjoerg      ccmake ../llvm
697330f729Sjoerg
707330f729SjoergThe second command will bring up a GUI for configuring Clang. You need
717330f729Sjoergto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
727330f729Sjoergadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
737330f729Sjoerg``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
747330f729Sjoergconfigure, then ``'g'`` to generate CMake's files.
757330f729Sjoerg
767330f729SjoergFinally, run ninja one last time, and you're done.
777330f729Sjoerg
787330f729SjoergStep 1: Create a ClangTool
797330f729Sjoerg==========================
807330f729Sjoerg
817330f729SjoergNow that we have enough background knowledge, it's time to create the
827330f729Sjoergsimplest productive ClangTool in existence: a syntax checker. While this
837330f729Sjoergalready exists as ``clang-check``, it's important to understand what's
847330f729Sjoerggoing on.
857330f729Sjoerg
867330f729SjoergFirst, we'll need to create a new directory for our tool and tell CMake
877330f729Sjoergthat it exists. As this is not going to be a core clang tool, it will
887330f729Sjoerglive in the ``clang-tools-extra`` repository.
897330f729Sjoerg
907330f729Sjoerg.. code-block:: console
917330f729Sjoerg
927330f729Sjoerg      cd ~/clang-llvm
937330f729Sjoerg      mkdir clang-tools-extra/loop-convert
947330f729Sjoerg      echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt
957330f729Sjoerg      vim clang-tools-extra/loop-convert/CMakeLists.txt
967330f729Sjoerg
977330f729SjoergCMakeLists.txt should have the following contents:
987330f729Sjoerg
997330f729Sjoerg::
1007330f729Sjoerg
1017330f729Sjoerg      set(LLVM_LINK_COMPONENTS support)
1027330f729Sjoerg
1037330f729Sjoerg      add_clang_executable(loop-convert
1047330f729Sjoerg        LoopConvert.cpp
1057330f729Sjoerg        )
1067330f729Sjoerg      target_link_libraries(loop-convert
1077330f729Sjoerg        PRIVATE
1087330f729Sjoerg        clangTooling
1097330f729Sjoerg        clangBasic
1107330f729Sjoerg        clangASTMatchers
1117330f729Sjoerg        )
1127330f729Sjoerg
1137330f729SjoergWith that done, Ninja will be able to compile our tool. Let's give it
1147330f729Sjoergsomething to compile! Put the following into
1157330f729Sjoerg``clang-tools-extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
1167330f729Sjoergwhy the different parts are needed can be found in the `LibTooling
1177330f729Sjoergdocumentation <LibTooling.html>`_.
1187330f729Sjoerg
1197330f729Sjoerg.. code-block:: c++
1207330f729Sjoerg
1217330f729Sjoerg      // Declares clang::SyntaxOnlyAction.
1227330f729Sjoerg      #include "clang/Frontend/FrontendActions.h"
1237330f729Sjoerg      #include "clang/Tooling/CommonOptionsParser.h"
1247330f729Sjoerg      #include "clang/Tooling/Tooling.h"
1257330f729Sjoerg      // Declares llvm::cl::extrahelp.
1267330f729Sjoerg      #include "llvm/Support/CommandLine.h"
1277330f729Sjoerg
1287330f729Sjoerg      using namespace clang::tooling;
1297330f729Sjoerg      using namespace llvm;
1307330f729Sjoerg
1317330f729Sjoerg      // Apply a custom category to all command-line options so that they are the
1327330f729Sjoerg      // only ones displayed.
1337330f729Sjoerg      static llvm::cl::OptionCategory MyToolCategory("my-tool options");
1347330f729Sjoerg
1357330f729Sjoerg      // CommonOptionsParser declares HelpMessage with a description of the common
1367330f729Sjoerg      // command-line options related to the compilation database and input files.
1377330f729Sjoerg      // It's nice to have this help message in all tools.
1387330f729Sjoerg      static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
1397330f729Sjoerg
1407330f729Sjoerg      // A help message for this specific tool can be added afterwards.
1417330f729Sjoerg      static cl::extrahelp MoreHelp("\nMore help text...\n");
1427330f729Sjoerg
1437330f729Sjoerg      int main(int argc, const char **argv) {
144*e038c9c4Sjoerg        auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
145*e038c9c4Sjoerg        if (!ExpectedParser) {
146*e038c9c4Sjoerg          // Fail gracefully for unsupported options.
147*e038c9c4Sjoerg          llvm::errs() << ExpectedParser.takeError();
148*e038c9c4Sjoerg          return 1;
149*e038c9c4Sjoerg        }
150*e038c9c4Sjoerg        CommonOptionsParser& OptionsParser = ExpectedParser.get();
1517330f729Sjoerg        ClangTool Tool(OptionsParser.getCompilations(),
1527330f729Sjoerg                       OptionsParser.getSourcePathList());
1537330f729Sjoerg        return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
1547330f729Sjoerg      }
1557330f729Sjoerg
1567330f729SjoergAnd that's it! You can compile our new tool by running ninja from the
1577330f729Sjoerg``build`` directory.
1587330f729Sjoerg
1597330f729Sjoerg.. code-block:: console
1607330f729Sjoerg
1617330f729Sjoerg      cd ~/clang-llvm/build
1627330f729Sjoerg      ninja
1637330f729Sjoerg
1647330f729SjoergYou should now be able to run the syntax checker, which is located in
1657330f729Sjoerg``~/clang-llvm/build/bin``, on any source file. Try it!
1667330f729Sjoerg
1677330f729Sjoerg.. code-block:: console
1687330f729Sjoerg
1697330f729Sjoerg      echo "int main() { return 0; }" > test.cpp
1707330f729Sjoerg      bin/loop-convert test.cpp --
1717330f729Sjoerg
1727330f729SjoergNote the two dashes after we specify the source file. The additional
1737330f729Sjoergoptions for the compiler are passed after the dashes rather than loading
1747330f729Sjoergthem from a compilation database - there just aren't any options needed
1757330f729Sjoergright now.
1767330f729Sjoerg
1777330f729SjoergIntermezzo: Learn AST matcher basics
1787330f729Sjoerg====================================
1797330f729Sjoerg
1807330f729SjoergClang recently introduced the :doc:`ASTMatcher
1817330f729Sjoerglibrary <LibASTMatchers>` to provide a simple, powerful, and
1827330f729Sjoergconcise way to describe specific patterns in the AST. Implemented as a
1837330f729SjoergDSL powered by macros and templates (see
1847330f729Sjoerg`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
1857330f729Sjoergcurious), matchers offer the feel of algebraic data types common to
1867330f729Sjoergfunctional programming languages.
1877330f729Sjoerg
1887330f729SjoergFor example, suppose you wanted to examine only binary operators. There
1897330f729Sjoergis a matcher to do exactly that, conveniently named ``binaryOperator``.
1907330f729SjoergI'll give you one guess what this matcher does:
1917330f729Sjoerg
1927330f729Sjoerg.. code-block:: c++
1937330f729Sjoerg
1947330f729Sjoerg      binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
1957330f729Sjoerg
1967330f729SjoergShockingly, it will match against addition expressions whose left hand
1977330f729Sjoergside is exactly the literal 0. It will not match against other forms of
1987330f729Sjoerg0, such as ``'\0'`` or ``NULL``, but it will match against macros that
1997330f729Sjoergexpand to 0. The matcher will also not match against calls to the
2007330f729Sjoergoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
2017330f729Sjoergmatcher to handle overloaded operators.
2027330f729Sjoerg
2037330f729SjoergThere are AST matchers to match all the different nodes of the AST,
2047330f729Sjoergnarrowing matchers to only match AST nodes fulfilling specific criteria,
2057330f729Sjoergand traversal matchers to get from one kind of AST node to another. For
2067330f729Sjoerga complete list of AST matchers, take a look at the `AST Matcher
2077330f729SjoergReferences <LibASTMatchersReference.html>`_
2087330f729Sjoerg
2097330f729SjoergAll matcher that are nouns describe entities in the AST and can be
2107330f729Sjoergbound, so that they can be referred to whenever a match is found. To do
2117330f729Sjoergso, simply call the method ``bind`` on these matchers, e.g.:
2127330f729Sjoerg
2137330f729Sjoerg.. code-block:: c++
2147330f729Sjoerg
2157330f729Sjoerg      variable(hasType(isInteger())).bind("intvar")
2167330f729Sjoerg
2177330f729SjoergStep 2: Using AST matchers
2187330f729Sjoerg==========================
2197330f729Sjoerg
2207330f729SjoergOkay, on to using matchers for real. Let's start by defining a matcher
2217330f729Sjoergwhich will capture all ``for`` statements that define a new variable
2227330f729Sjoerginitialized to zero. Let's start with matching all ``for`` loops:
2237330f729Sjoerg
2247330f729Sjoerg.. code-block:: c++
2257330f729Sjoerg
2267330f729Sjoerg      forStmt()
2277330f729Sjoerg
2287330f729SjoergNext, we want to specify that a single variable is declared in the first
2297330f729Sjoergportion of the loop, so we can extend the matcher to
2307330f729Sjoerg
2317330f729Sjoerg.. code-block:: c++
2327330f729Sjoerg
2337330f729Sjoerg      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
2347330f729Sjoerg
2357330f729SjoergFinally, we can add the condition that the variable is initialized to
2367330f729Sjoergzero.
2377330f729Sjoerg
2387330f729Sjoerg.. code-block:: c++
2397330f729Sjoerg
2407330f729Sjoerg      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
2417330f729Sjoerg        hasInitializer(integerLiteral(equals(0))))))))
2427330f729Sjoerg
2437330f729SjoergIt is fairly easy to read and understand the matcher definition ("match
2447330f729Sjoergloops whose init portion declares a single variable which is initialized
2457330f729Sjoergto the integer literal 0"), but deciding that every piece is necessary
2467330f729Sjoergis more difficult. Note that this matcher will not match loops whose
2477330f729Sjoergvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
2487330f729Sjoergzero besides the integer 0.
2497330f729Sjoerg
2507330f729SjoergThe last step is giving the matcher a name and binding the ``ForStmt``
2517330f729Sjoergas we will want to do something with it:
2527330f729Sjoerg
2537330f729Sjoerg.. code-block:: c++
2547330f729Sjoerg
2557330f729Sjoerg      StatementMatcher LoopMatcher =
2567330f729Sjoerg        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
2577330f729Sjoerg          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
2587330f729Sjoerg
2597330f729SjoergOnce you have defined your matchers, you will need to add a little more
2607330f729Sjoergscaffolding in order to run them. Matchers are paired with a
2617330f729Sjoerg``MatchCallback`` and registered with a ``MatchFinder`` object, then run
2627330f729Sjoergfrom a ``ClangTool``. More code!
2637330f729Sjoerg
2647330f729SjoergAdd the following to ``LoopConvert.cpp``:
2657330f729Sjoerg
2667330f729Sjoerg.. code-block:: c++
2677330f729Sjoerg
2687330f729Sjoerg      #include "clang/ASTMatchers/ASTMatchers.h"
2697330f729Sjoerg      #include "clang/ASTMatchers/ASTMatchFinder.h"
2707330f729Sjoerg
2717330f729Sjoerg      using namespace clang;
2727330f729Sjoerg      using namespace clang::ast_matchers;
2737330f729Sjoerg
2747330f729Sjoerg      StatementMatcher LoopMatcher =
2757330f729Sjoerg        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
2767330f729Sjoerg          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
2777330f729Sjoerg
2787330f729Sjoerg      class LoopPrinter : public MatchFinder::MatchCallback {
2797330f729Sjoerg      public :
2807330f729Sjoerg        virtual void run(const MatchFinder::MatchResult &Result) {
2817330f729Sjoerg          if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
2827330f729Sjoerg            FS->dump();
2837330f729Sjoerg        }
2847330f729Sjoerg      };
2857330f729Sjoerg
2867330f729SjoergAnd change ``main()`` to:
2877330f729Sjoerg
2887330f729Sjoerg.. code-block:: c++
2897330f729Sjoerg
2907330f729Sjoerg      int main(int argc, const char **argv) {
291*e038c9c4Sjoerg        auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
292*e038c9c4Sjoerg        if (!ExpectedParser) {
293*e038c9c4Sjoerg          // Fail gracefully for unsupported options.
294*e038c9c4Sjoerg          llvm::errs() << ExpectedParser.takeError();
295*e038c9c4Sjoerg          return 1;
296*e038c9c4Sjoerg        }
297*e038c9c4Sjoerg        CommonOptionsParser& OptionsParser = ExpectedParser.get();
2987330f729Sjoerg        ClangTool Tool(OptionsParser.getCompilations(),
2997330f729Sjoerg                       OptionsParser.getSourcePathList());
3007330f729Sjoerg
3017330f729Sjoerg        LoopPrinter Printer;
3027330f729Sjoerg        MatchFinder Finder;
3037330f729Sjoerg        Finder.addMatcher(LoopMatcher, &Printer);
3047330f729Sjoerg
3057330f729Sjoerg        return Tool.run(newFrontendActionFactory(&Finder).get());
3067330f729Sjoerg      }
3077330f729Sjoerg
3087330f729SjoergNow, you should be able to recompile and run the code to discover for
3097330f729Sjoergloops. Create a new file with a few examples, and test out our new
3107330f729Sjoerghandiwork:
3117330f729Sjoerg
3127330f729Sjoerg.. code-block:: console
3137330f729Sjoerg
3147330f729Sjoerg      cd ~/clang-llvm/llvm/llvm_build/
3157330f729Sjoerg      ninja loop-convert
3167330f729Sjoerg      vim ~/test-files/simple-loops.cc
3177330f729Sjoerg      bin/loop-convert ~/test-files/simple-loops.cc
3187330f729Sjoerg
3197330f729SjoergStep 3.5: More Complicated Matchers
3207330f729Sjoerg===================================
3217330f729Sjoerg
3227330f729SjoergOur simple matcher is capable of discovering for loops, but we would
3237330f729Sjoergstill need to filter out many more ourselves. We can do a good portion
3247330f729Sjoergof the remaining work with some cleverly chosen matchers, but first we
3257330f729Sjoergneed to decide exactly which properties we want to allow.
3267330f729Sjoerg
3277330f729SjoergHow can we characterize for loops over arrays which would be eligible
3287330f729Sjoergfor translation to range-based syntax? Range based loops over arrays of
3297330f729Sjoergsize ``N`` that:
3307330f729Sjoerg
3317330f729Sjoerg-  start at index ``0``
3327330f729Sjoerg-  iterate consecutively
3337330f729Sjoerg-  end at index ``N-1``
3347330f729Sjoerg
3357330f729SjoergWe already check for (1), so all we need to add is a check to the loop's
3367330f729Sjoergcondition to ensure that the loop's index variable is compared against
3377330f729Sjoerg``N`` and another check to ensure that the increment step just
3387330f729Sjoergincrements this same variable. The matcher for (2) is straightforward:
3397330f729Sjoergrequire a pre- or post-increment of the same variable declared in the
3407330f729Sjoerginit portion.
3417330f729Sjoerg
3427330f729SjoergUnfortunately, such a matcher is impossible to write. Matchers contain
3437330f729Sjoergno logic for comparing two arbitrary AST nodes and determining whether
3447330f729Sjoergor not they are equal, so the best we can do is matching more than we
3457330f729Sjoergwould like to allow, and punting extra comparisons to the callback.
3467330f729Sjoerg
3477330f729SjoergIn any case, we can start building this sub-matcher. We can require that
3487330f729Sjoergthe increment step be a unary increment like this:
3497330f729Sjoerg
3507330f729Sjoerg.. code-block:: c++
3517330f729Sjoerg
3527330f729Sjoerg      hasIncrement(unaryOperator(hasOperatorName("++")))
3537330f729Sjoerg
3547330f729SjoergSpecifying what is incremented introduces another quirk of Clang's AST:
3557330f729SjoergUsages of variables are represented as ``DeclRefExpr``'s ("declaration
3567330f729Sjoergreference expressions") because they are expressions which refer to
3577330f729Sjoergvariable declarations. To find a ``unaryOperator`` that refers to a
3587330f729Sjoergspecific declaration, we can simply add a second condition to it:
3597330f729Sjoerg
3607330f729Sjoerg.. code-block:: c++
3617330f729Sjoerg
3627330f729Sjoerg      hasIncrement(unaryOperator(
3637330f729Sjoerg        hasOperatorName("++"),
3647330f729Sjoerg        hasUnaryOperand(declRefExpr())))
3657330f729Sjoerg
3667330f729SjoergFurthermore, we can restrict our matcher to only match if the
3677330f729Sjoergincremented variable is an integer:
3687330f729Sjoerg
3697330f729Sjoerg.. code-block:: c++
3707330f729Sjoerg
3717330f729Sjoerg      hasIncrement(unaryOperator(
3727330f729Sjoerg        hasOperatorName("++"),
3737330f729Sjoerg        hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
3747330f729Sjoerg
3757330f729SjoergAnd the last step will be to attach an identifier to this variable, so
3767330f729Sjoergthat we can retrieve it in the callback:
3777330f729Sjoerg
3787330f729Sjoerg.. code-block:: c++
3797330f729Sjoerg
3807330f729Sjoerg      hasIncrement(unaryOperator(
3817330f729Sjoerg        hasOperatorName("++"),
3827330f729Sjoerg        hasUnaryOperand(declRefExpr(to(
3837330f729Sjoerg          varDecl(hasType(isInteger())).bind("incrementVariable"))))))
3847330f729Sjoerg
3857330f729SjoergWe can add this code to the definition of ``LoopMatcher`` and make sure
3867330f729Sjoergthat our program, outfitted with the new matcher, only prints out loops
3877330f729Sjoergthat declare a single variable initialized to zero and have an increment
3887330f729Sjoergstep consisting of a unary increment of some variable.
3897330f729Sjoerg
3907330f729SjoergNow, we just need to add a matcher to check if the condition part of the
3917330f729Sjoerg``for`` loop compares a variable against the size of the array. There is
3927330f729Sjoergonly one problem - we don't know which array we're iterating over
3937330f729Sjoergwithout looking at the body of the loop! We are again restricted to
3947330f729Sjoergapproximating the result we want with matchers, filling in the details
3957330f729Sjoergin the callback. So we start with:
3967330f729Sjoerg
3977330f729Sjoerg.. code-block:: c++
3987330f729Sjoerg
3997330f729Sjoerg      hasCondition(binaryOperator(hasOperatorName("<"))
4007330f729Sjoerg
4017330f729SjoergIt makes sense to ensure that the left-hand side is a reference to a
4027330f729Sjoergvariable, and that the right-hand side has integer type.
4037330f729Sjoerg
4047330f729Sjoerg.. code-block:: c++
4057330f729Sjoerg
4067330f729Sjoerg      hasCondition(binaryOperator(
4077330f729Sjoerg        hasOperatorName("<"),
4087330f729Sjoerg        hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
4097330f729Sjoerg        hasRHS(expr(hasType(isInteger())))))
4107330f729Sjoerg
4117330f729SjoergWhy? Because it doesn't work. Of the three loops provided in
4127330f729Sjoerg``test-files/simple.cpp``, zero of them have a matching condition. A
4137330f729Sjoergquick look at the AST dump of the first for loop, produced by the
4147330f729Sjoergprevious iteration of loop-convert, shows us the answer:
4157330f729Sjoerg
4167330f729Sjoerg::
4177330f729Sjoerg
4187330f729Sjoerg      (ForStmt 0x173b240
4197330f729Sjoerg        (DeclStmt 0x173afc8
4207330f729Sjoerg          0x173af50 "int i =
4217330f729Sjoerg            (IntegerLiteral 0x173afa8 'int' 0)")
4227330f729Sjoerg        <<>>
4237330f729Sjoerg        (BinaryOperator 0x173b060 '_Bool' '<'
4247330f729Sjoerg          (ImplicitCastExpr 0x173b030 'int'
4257330f729Sjoerg            (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
4267330f729Sjoerg          (ImplicitCastExpr 0x173b048 'int'
4277330f729Sjoerg            (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
4287330f729Sjoerg        (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
4297330f729Sjoerg          (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
4307330f729Sjoerg        (CompoundStatement ...
4317330f729Sjoerg
4327330f729SjoergWe already know that the declaration and increments both match, or this
4337330f729Sjoergloop wouldn't have been dumped. The culprit lies in the implicit cast
4347330f729Sjoergapplied to the first operand (i.e. the LHS) of the less-than operator,
4357330f729Sjoergan L-value to R-value conversion applied to the expression referencing
4367330f729Sjoerg``i``. Thankfully, the matcher library offers a solution to this problem
4377330f729Sjoergin the form of ``ignoringParenImpCasts``, which instructs the matcher to
4387330f729Sjoergignore implicit casts and parentheses before continuing to match.
4397330f729SjoergAdjusting the condition operator will restore the desired match.
4407330f729Sjoerg
4417330f729Sjoerg.. code-block:: c++
4427330f729Sjoerg
4437330f729Sjoerg      hasCondition(binaryOperator(
4447330f729Sjoerg        hasOperatorName("<"),
4457330f729Sjoerg        hasLHS(ignoringParenImpCasts(declRefExpr(
4467330f729Sjoerg          to(varDecl(hasType(isInteger())))))),
4477330f729Sjoerg        hasRHS(expr(hasType(isInteger())))))
4487330f729Sjoerg
4497330f729SjoergAfter adding binds to the expressions we wished to capture and
4507330f729Sjoergextracting the identifier strings into variables, we have array-step-2
4517330f729Sjoergcompleted.
4527330f729Sjoerg
4537330f729SjoergStep 4: Retrieving Matched Nodes
4547330f729Sjoerg================================
4557330f729Sjoerg
4567330f729SjoergSo far, the matcher callback isn't very interesting: it just dumps the
4577330f729Sjoergloop's AST. At some point, we will need to make changes to the input
4587330f729Sjoergsource code. Next, we'll work on using the nodes we bound in the
4597330f729Sjoergprevious step.
4607330f729Sjoerg
4617330f729SjoergThe ``MatchFinder::run()`` callback takes a
4627330f729Sjoerg``MatchFinder::MatchResult&`` as its parameter. We're most interested in
4637330f729Sjoergits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
4647330f729Sjoergclass to represent contextual information about the AST, as the name
4657330f729Sjoergimplies, though the most functionally important detail is that several
4667330f729Sjoergoperations require an ``ASTContext*`` parameter. More immediately useful
4677330f729Sjoergis the set of matched nodes, and how we retrieve them.
4687330f729Sjoerg
4697330f729SjoergSince we bind three variables (identified by ConditionVarName,
4707330f729SjoergInitVarName, and IncrementVarName), we can obtain the matched nodes by
4717330f729Sjoergusing the ``getNodeAs()`` member function.
4727330f729Sjoerg
4737330f729SjoergIn ``LoopConvert.cpp`` add
4747330f729Sjoerg
4757330f729Sjoerg.. code-block:: c++
4767330f729Sjoerg
4777330f729Sjoerg      #include "clang/AST/ASTContext.h"
4787330f729Sjoerg
4797330f729SjoergChange ``LoopMatcher`` to
4807330f729Sjoerg
4817330f729Sjoerg.. code-block:: c++
4827330f729Sjoerg
4837330f729Sjoerg      StatementMatcher LoopMatcher =
4847330f729Sjoerg          forStmt(hasLoopInit(declStmt(
4857330f729Sjoerg                      hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
4867330f729Sjoerg                                        .bind("initVarName")))),
4877330f729Sjoerg                  hasIncrement(unaryOperator(
4887330f729Sjoerg                      hasOperatorName("++"),
4897330f729Sjoerg                      hasUnaryOperand(declRefExpr(
4907330f729Sjoerg                          to(varDecl(hasType(isInteger())).bind("incVarName")))))),
4917330f729Sjoerg                  hasCondition(binaryOperator(
4927330f729Sjoerg                      hasOperatorName("<"),
4937330f729Sjoerg                      hasLHS(ignoringParenImpCasts(declRefExpr(
4947330f729Sjoerg                          to(varDecl(hasType(isInteger())).bind("condVarName"))))),
4957330f729Sjoerg                      hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
4967330f729Sjoerg
4977330f729SjoergAnd change ``LoopPrinter::run`` to
4987330f729Sjoerg
4997330f729Sjoerg.. code-block:: c++
5007330f729Sjoerg
5017330f729Sjoerg      void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
5027330f729Sjoerg        ASTContext *Context = Result.Context;
5037330f729Sjoerg        const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop");
5047330f729Sjoerg        // We do not want to convert header files!
5057330f729Sjoerg        if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc()))
5067330f729Sjoerg          return;
5077330f729Sjoerg        const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
5087330f729Sjoerg        const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
5097330f729Sjoerg        const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
5107330f729Sjoerg
5117330f729Sjoerg        if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
5127330f729Sjoerg          return;
5137330f729Sjoerg        llvm::outs() << "Potential array-based loop discovered.\n";
5147330f729Sjoerg      }
5157330f729Sjoerg
5167330f729SjoergClang associates a ``VarDecl`` with each variable to represent the variable's
5177330f729Sjoergdeclaration. Since the "canonical" form of each declaration is unique by
5187330f729Sjoergaddress, all we need to do is make sure neither ``ValueDecl`` (base class of
5197330f729Sjoerg``VarDecl``) is ``NULL`` and compare the canonical Decls.
5207330f729Sjoerg
5217330f729Sjoerg.. code-block:: c++
5227330f729Sjoerg
5237330f729Sjoerg      static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
5247330f729Sjoerg        return First && Second &&
5257330f729Sjoerg               First->getCanonicalDecl() == Second->getCanonicalDecl();
5267330f729Sjoerg      }
5277330f729Sjoerg
5287330f729SjoergIf execution reaches the end of ``LoopPrinter::run()``, we know that the
5297330f729Sjoergloop shell that looks like
5307330f729Sjoerg
5317330f729Sjoerg.. code-block:: c++
5327330f729Sjoerg
5337330f729Sjoerg      for (int i= 0; i < expr(); ++i) { ... }
5347330f729Sjoerg
5357330f729SjoergFor now, we will just print a message explaining that we found a loop.
5367330f729SjoergThe next section will deal with recursively traversing the AST to
5377330f729Sjoergdiscover all changes needed.
5387330f729Sjoerg
5397330f729SjoergAs a side note, it's not as trivial to test if two expressions are the same,
5407330f729Sjoergthough Clang has already done the hard work for us by providing a way to
5417330f729Sjoergcanonicalize expressions:
5427330f729Sjoerg
5437330f729Sjoerg.. code-block:: c++
5447330f729Sjoerg
5457330f729Sjoerg      static bool areSameExpr(ASTContext *Context, const Expr *First,
5467330f729Sjoerg                              const Expr *Second) {
5477330f729Sjoerg        if (!First || !Second)
5487330f729Sjoerg          return false;
5497330f729Sjoerg        llvm::FoldingSetNodeID FirstID, SecondID;
5507330f729Sjoerg        First->Profile(FirstID, *Context, true);
5517330f729Sjoerg        Second->Profile(SecondID, *Context, true);
5527330f729Sjoerg        return FirstID == SecondID;
5537330f729Sjoerg      }
5547330f729Sjoerg
5557330f729SjoergThis code relies on the comparison between two
5567330f729Sjoerg``llvm::FoldingSetNodeIDs``. As the documentation for
5577330f729Sjoerg``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
5587330f729Sjoerga description of a node in the AST, based on its properties, along with
5597330f729Sjoergthose of its children. ``FoldingSetNodeID`` then serves as a hash we can
5607330f729Sjoerguse to compare expressions. We will need ``areSameExpr`` later. Before
5617330f729Sjoergyou run the new code on the additional loops added to
5627330f729Sjoergtest-files/simple.cpp, try to figure out which ones will be considered
5637330f729Sjoergpotentially convertible.
564