17330f729Sjoerg=============================================================== 27330f729SjoergTutorial for building tools using LibTooling and LibASTMatchers 37330f729Sjoerg=============================================================== 47330f729Sjoerg 57330f729SjoergThis document is intended to show how to build a useful source-to-source 67330f729Sjoergtranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is 77330f729Sjoergexplicitly aimed at people who are new to Clang, so all you should need 87330f729Sjoergis a working knowledge of C++ and the command line. 97330f729Sjoerg 107330f729SjoergIn order to work on the compiler, you need some basic knowledge of the 11*e038c9c4Sjoergabstract syntax tree (AST). To this end, the reader is encouraged to 127330f729Sjoergskim the :doc:`Introduction to the Clang 137330f729SjoergAST <IntroductionToTheClangAST>` 147330f729Sjoerg 157330f729SjoergStep 0: Obtaining Clang 167330f729Sjoerg======================= 177330f729Sjoerg 187330f729SjoergAs Clang is part of the LLVM project, you'll need to download LLVM's 197330f729Sjoergsource code first. Both Clang and LLVM are in the same git repository, 207330f729Sjoergunder different directories. For further information, see the `getting 217330f729Sjoergstarted guide <https://llvm.org/docs/GettingStarted.html>`_. 227330f729Sjoerg 237330f729Sjoerg.. code-block:: console 247330f729Sjoerg 257330f729Sjoerg cd ~/clang-llvm 267330f729Sjoerg git clone https://github.com/llvm/llvm-project.git 277330f729Sjoerg 287330f729SjoergNext you need to obtain the CMake build system and Ninja build tool. 297330f729Sjoerg 307330f729Sjoerg.. code-block:: console 317330f729Sjoerg 327330f729Sjoerg cd ~/clang-llvm 337330f729Sjoerg git clone https://github.com/martine/ninja.git 347330f729Sjoerg cd ninja 357330f729Sjoerg git checkout release 367330f729Sjoerg ./bootstrap.py 377330f729Sjoerg sudo cp ninja /usr/bin/ 387330f729Sjoerg 397330f729Sjoerg cd ~/clang-llvm 407330f729Sjoerg git clone git://cmake.org/stage/cmake.git 417330f729Sjoerg cd cmake 427330f729Sjoerg git checkout next 437330f729Sjoerg ./bootstrap 447330f729Sjoerg make 457330f729Sjoerg sudo make install 467330f729Sjoerg 477330f729SjoergOkay. Now we'll build Clang! 487330f729Sjoerg 497330f729Sjoerg.. code-block:: console 507330f729Sjoerg 517330f729Sjoerg cd ~/clang-llvm 527330f729Sjoerg mkdir build && cd build 537330f729Sjoerg cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. 547330f729Sjoerg ninja 557330f729Sjoerg ninja check # Test LLVM only. 567330f729Sjoerg ninja clang-test # Test Clang only. 577330f729Sjoerg ninja install 587330f729Sjoerg 597330f729SjoergAnd we're live. 607330f729Sjoerg 617330f729SjoergAll of the tests should pass. 627330f729Sjoerg 637330f729SjoergFinally, we want to set Clang as its own compiler. 647330f729Sjoerg 657330f729Sjoerg.. code-block:: console 667330f729Sjoerg 677330f729Sjoerg cd ~/clang-llvm/build 687330f729Sjoerg ccmake ../llvm 697330f729Sjoerg 707330f729SjoergThe second command will bring up a GUI for configuring Clang. You need 717330f729Sjoergto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on 727330f729Sjoergadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to 737330f729Sjoerg``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to 747330f729Sjoergconfigure, then ``'g'`` to generate CMake's files. 757330f729Sjoerg 767330f729SjoergFinally, run ninja one last time, and you're done. 777330f729Sjoerg 787330f729SjoergStep 1: Create a ClangTool 797330f729Sjoerg========================== 807330f729Sjoerg 817330f729SjoergNow that we have enough background knowledge, it's time to create the 827330f729Sjoergsimplest productive ClangTool in existence: a syntax checker. While this 837330f729Sjoergalready exists as ``clang-check``, it's important to understand what's 847330f729Sjoerggoing on. 857330f729Sjoerg 867330f729SjoergFirst, we'll need to create a new directory for our tool and tell CMake 877330f729Sjoergthat it exists. As this is not going to be a core clang tool, it will 887330f729Sjoerglive in the ``clang-tools-extra`` repository. 897330f729Sjoerg 907330f729Sjoerg.. code-block:: console 917330f729Sjoerg 927330f729Sjoerg cd ~/clang-llvm 937330f729Sjoerg mkdir clang-tools-extra/loop-convert 947330f729Sjoerg echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt 957330f729Sjoerg vim clang-tools-extra/loop-convert/CMakeLists.txt 967330f729Sjoerg 977330f729SjoergCMakeLists.txt should have the following contents: 987330f729Sjoerg 997330f729Sjoerg:: 1007330f729Sjoerg 1017330f729Sjoerg set(LLVM_LINK_COMPONENTS support) 1027330f729Sjoerg 1037330f729Sjoerg add_clang_executable(loop-convert 1047330f729Sjoerg LoopConvert.cpp 1057330f729Sjoerg ) 1067330f729Sjoerg target_link_libraries(loop-convert 1077330f729Sjoerg PRIVATE 1087330f729Sjoerg clangTooling 1097330f729Sjoerg clangBasic 1107330f729Sjoerg clangASTMatchers 1117330f729Sjoerg ) 1127330f729Sjoerg 1137330f729SjoergWith that done, Ninja will be able to compile our tool. Let's give it 1147330f729Sjoergsomething to compile! Put the following into 1157330f729Sjoerg``clang-tools-extra/loop-convert/LoopConvert.cpp``. A detailed explanation of 1167330f729Sjoergwhy the different parts are needed can be found in the `LibTooling 1177330f729Sjoergdocumentation <LibTooling.html>`_. 1187330f729Sjoerg 1197330f729Sjoerg.. code-block:: c++ 1207330f729Sjoerg 1217330f729Sjoerg // Declares clang::SyntaxOnlyAction. 1227330f729Sjoerg #include "clang/Frontend/FrontendActions.h" 1237330f729Sjoerg #include "clang/Tooling/CommonOptionsParser.h" 1247330f729Sjoerg #include "clang/Tooling/Tooling.h" 1257330f729Sjoerg // Declares llvm::cl::extrahelp. 1267330f729Sjoerg #include "llvm/Support/CommandLine.h" 1277330f729Sjoerg 1287330f729Sjoerg using namespace clang::tooling; 1297330f729Sjoerg using namespace llvm; 1307330f729Sjoerg 1317330f729Sjoerg // Apply a custom category to all command-line options so that they are the 1327330f729Sjoerg // only ones displayed. 1337330f729Sjoerg static llvm::cl::OptionCategory MyToolCategory("my-tool options"); 1347330f729Sjoerg 1357330f729Sjoerg // CommonOptionsParser declares HelpMessage with a description of the common 1367330f729Sjoerg // command-line options related to the compilation database and input files. 1377330f729Sjoerg // It's nice to have this help message in all tools. 1387330f729Sjoerg static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); 1397330f729Sjoerg 1407330f729Sjoerg // A help message for this specific tool can be added afterwards. 1417330f729Sjoerg static cl::extrahelp MoreHelp("\nMore help text...\n"); 1427330f729Sjoerg 1437330f729Sjoerg int main(int argc, const char **argv) { 144*e038c9c4Sjoerg auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory); 145*e038c9c4Sjoerg if (!ExpectedParser) { 146*e038c9c4Sjoerg // Fail gracefully for unsupported options. 147*e038c9c4Sjoerg llvm::errs() << ExpectedParser.takeError(); 148*e038c9c4Sjoerg return 1; 149*e038c9c4Sjoerg } 150*e038c9c4Sjoerg CommonOptionsParser& OptionsParser = ExpectedParser.get(); 1517330f729Sjoerg ClangTool Tool(OptionsParser.getCompilations(), 1527330f729Sjoerg OptionsParser.getSourcePathList()); 1537330f729Sjoerg return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); 1547330f729Sjoerg } 1557330f729Sjoerg 1567330f729SjoergAnd that's it! You can compile our new tool by running ninja from the 1577330f729Sjoerg``build`` directory. 1587330f729Sjoerg 1597330f729Sjoerg.. code-block:: console 1607330f729Sjoerg 1617330f729Sjoerg cd ~/clang-llvm/build 1627330f729Sjoerg ninja 1637330f729Sjoerg 1647330f729SjoergYou should now be able to run the syntax checker, which is located in 1657330f729Sjoerg``~/clang-llvm/build/bin``, on any source file. Try it! 1667330f729Sjoerg 1677330f729Sjoerg.. code-block:: console 1687330f729Sjoerg 1697330f729Sjoerg echo "int main() { return 0; }" > test.cpp 1707330f729Sjoerg bin/loop-convert test.cpp -- 1717330f729Sjoerg 1727330f729SjoergNote the two dashes after we specify the source file. The additional 1737330f729Sjoergoptions for the compiler are passed after the dashes rather than loading 1747330f729Sjoergthem from a compilation database - there just aren't any options needed 1757330f729Sjoergright now. 1767330f729Sjoerg 1777330f729SjoergIntermezzo: Learn AST matcher basics 1787330f729Sjoerg==================================== 1797330f729Sjoerg 1807330f729SjoergClang recently introduced the :doc:`ASTMatcher 1817330f729Sjoerglibrary <LibASTMatchers>` to provide a simple, powerful, and 1827330f729Sjoergconcise way to describe specific patterns in the AST. Implemented as a 1837330f729SjoergDSL powered by macros and templates (see 1847330f729Sjoerg`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're 1857330f729Sjoergcurious), matchers offer the feel of algebraic data types common to 1867330f729Sjoergfunctional programming languages. 1877330f729Sjoerg 1887330f729SjoergFor example, suppose you wanted to examine only binary operators. There 1897330f729Sjoergis a matcher to do exactly that, conveniently named ``binaryOperator``. 1907330f729SjoergI'll give you one guess what this matcher does: 1917330f729Sjoerg 1927330f729Sjoerg.. code-block:: c++ 1937330f729Sjoerg 1947330f729Sjoerg binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) 1957330f729Sjoerg 1967330f729SjoergShockingly, it will match against addition expressions whose left hand 1977330f729Sjoergside is exactly the literal 0. It will not match against other forms of 1987330f729Sjoerg0, such as ``'\0'`` or ``NULL``, but it will match against macros that 1997330f729Sjoergexpand to 0. The matcher will also not match against calls to the 2007330f729Sjoergoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` 2017330f729Sjoergmatcher to handle overloaded operators. 2027330f729Sjoerg 2037330f729SjoergThere are AST matchers to match all the different nodes of the AST, 2047330f729Sjoergnarrowing matchers to only match AST nodes fulfilling specific criteria, 2057330f729Sjoergand traversal matchers to get from one kind of AST node to another. For 2067330f729Sjoerga complete list of AST matchers, take a look at the `AST Matcher 2077330f729SjoergReferences <LibASTMatchersReference.html>`_ 2087330f729Sjoerg 2097330f729SjoergAll matcher that are nouns describe entities in the AST and can be 2107330f729Sjoergbound, so that they can be referred to whenever a match is found. To do 2117330f729Sjoergso, simply call the method ``bind`` on these matchers, e.g.: 2127330f729Sjoerg 2137330f729Sjoerg.. code-block:: c++ 2147330f729Sjoerg 2157330f729Sjoerg variable(hasType(isInteger())).bind("intvar") 2167330f729Sjoerg 2177330f729SjoergStep 2: Using AST matchers 2187330f729Sjoerg========================== 2197330f729Sjoerg 2207330f729SjoergOkay, on to using matchers for real. Let's start by defining a matcher 2217330f729Sjoergwhich will capture all ``for`` statements that define a new variable 2227330f729Sjoerginitialized to zero. Let's start with matching all ``for`` loops: 2237330f729Sjoerg 2247330f729Sjoerg.. code-block:: c++ 2257330f729Sjoerg 2267330f729Sjoerg forStmt() 2277330f729Sjoerg 2287330f729SjoergNext, we want to specify that a single variable is declared in the first 2297330f729Sjoergportion of the loop, so we can extend the matcher to 2307330f729Sjoerg 2317330f729Sjoerg.. code-block:: c++ 2327330f729Sjoerg 2337330f729Sjoerg forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) 2347330f729Sjoerg 2357330f729SjoergFinally, we can add the condition that the variable is initialized to 2367330f729Sjoergzero. 2377330f729Sjoerg 2387330f729Sjoerg.. code-block:: c++ 2397330f729Sjoerg 2407330f729Sjoerg forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 2417330f729Sjoerg hasInitializer(integerLiteral(equals(0)))))))) 2427330f729Sjoerg 2437330f729SjoergIt is fairly easy to read and understand the matcher definition ("match 2447330f729Sjoergloops whose init portion declares a single variable which is initialized 2457330f729Sjoergto the integer literal 0"), but deciding that every piece is necessary 2467330f729Sjoergis more difficult. Note that this matcher will not match loops whose 2477330f729Sjoergvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of 2487330f729Sjoergzero besides the integer 0. 2497330f729Sjoerg 2507330f729SjoergThe last step is giving the matcher a name and binding the ``ForStmt`` 2517330f729Sjoergas we will want to do something with it: 2527330f729Sjoerg 2537330f729Sjoerg.. code-block:: c++ 2547330f729Sjoerg 2557330f729Sjoerg StatementMatcher LoopMatcher = 2567330f729Sjoerg forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 2577330f729Sjoerg hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 2587330f729Sjoerg 2597330f729SjoergOnce you have defined your matchers, you will need to add a little more 2607330f729Sjoergscaffolding in order to run them. Matchers are paired with a 2617330f729Sjoerg``MatchCallback`` and registered with a ``MatchFinder`` object, then run 2627330f729Sjoergfrom a ``ClangTool``. More code! 2637330f729Sjoerg 2647330f729SjoergAdd the following to ``LoopConvert.cpp``: 2657330f729Sjoerg 2667330f729Sjoerg.. code-block:: c++ 2677330f729Sjoerg 2687330f729Sjoerg #include "clang/ASTMatchers/ASTMatchers.h" 2697330f729Sjoerg #include "clang/ASTMatchers/ASTMatchFinder.h" 2707330f729Sjoerg 2717330f729Sjoerg using namespace clang; 2727330f729Sjoerg using namespace clang::ast_matchers; 2737330f729Sjoerg 2747330f729Sjoerg StatementMatcher LoopMatcher = 2757330f729Sjoerg forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 2767330f729Sjoerg hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 2777330f729Sjoerg 2787330f729Sjoerg class LoopPrinter : public MatchFinder::MatchCallback { 2797330f729Sjoerg public : 2807330f729Sjoerg virtual void run(const MatchFinder::MatchResult &Result) { 2817330f729Sjoerg if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) 2827330f729Sjoerg FS->dump(); 2837330f729Sjoerg } 2847330f729Sjoerg }; 2857330f729Sjoerg 2867330f729SjoergAnd change ``main()`` to: 2877330f729Sjoerg 2887330f729Sjoerg.. code-block:: c++ 2897330f729Sjoerg 2907330f729Sjoerg int main(int argc, const char **argv) { 291*e038c9c4Sjoerg auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory); 292*e038c9c4Sjoerg if (!ExpectedParser) { 293*e038c9c4Sjoerg // Fail gracefully for unsupported options. 294*e038c9c4Sjoerg llvm::errs() << ExpectedParser.takeError(); 295*e038c9c4Sjoerg return 1; 296*e038c9c4Sjoerg } 297*e038c9c4Sjoerg CommonOptionsParser& OptionsParser = ExpectedParser.get(); 2987330f729Sjoerg ClangTool Tool(OptionsParser.getCompilations(), 2997330f729Sjoerg OptionsParser.getSourcePathList()); 3007330f729Sjoerg 3017330f729Sjoerg LoopPrinter Printer; 3027330f729Sjoerg MatchFinder Finder; 3037330f729Sjoerg Finder.addMatcher(LoopMatcher, &Printer); 3047330f729Sjoerg 3057330f729Sjoerg return Tool.run(newFrontendActionFactory(&Finder).get()); 3067330f729Sjoerg } 3077330f729Sjoerg 3087330f729SjoergNow, you should be able to recompile and run the code to discover for 3097330f729Sjoergloops. Create a new file with a few examples, and test out our new 3107330f729Sjoerghandiwork: 3117330f729Sjoerg 3127330f729Sjoerg.. code-block:: console 3137330f729Sjoerg 3147330f729Sjoerg cd ~/clang-llvm/llvm/llvm_build/ 3157330f729Sjoerg ninja loop-convert 3167330f729Sjoerg vim ~/test-files/simple-loops.cc 3177330f729Sjoerg bin/loop-convert ~/test-files/simple-loops.cc 3187330f729Sjoerg 3197330f729SjoergStep 3.5: More Complicated Matchers 3207330f729Sjoerg=================================== 3217330f729Sjoerg 3227330f729SjoergOur simple matcher is capable of discovering for loops, but we would 3237330f729Sjoergstill need to filter out many more ourselves. We can do a good portion 3247330f729Sjoergof the remaining work with some cleverly chosen matchers, but first we 3257330f729Sjoergneed to decide exactly which properties we want to allow. 3267330f729Sjoerg 3277330f729SjoergHow can we characterize for loops over arrays which would be eligible 3287330f729Sjoergfor translation to range-based syntax? Range based loops over arrays of 3297330f729Sjoergsize ``N`` that: 3307330f729Sjoerg 3317330f729Sjoerg- start at index ``0`` 3327330f729Sjoerg- iterate consecutively 3337330f729Sjoerg- end at index ``N-1`` 3347330f729Sjoerg 3357330f729SjoergWe already check for (1), so all we need to add is a check to the loop's 3367330f729Sjoergcondition to ensure that the loop's index variable is compared against 3377330f729Sjoerg``N`` and another check to ensure that the increment step just 3387330f729Sjoergincrements this same variable. The matcher for (2) is straightforward: 3397330f729Sjoergrequire a pre- or post-increment of the same variable declared in the 3407330f729Sjoerginit portion. 3417330f729Sjoerg 3427330f729SjoergUnfortunately, such a matcher is impossible to write. Matchers contain 3437330f729Sjoergno logic for comparing two arbitrary AST nodes and determining whether 3447330f729Sjoergor not they are equal, so the best we can do is matching more than we 3457330f729Sjoergwould like to allow, and punting extra comparisons to the callback. 3467330f729Sjoerg 3477330f729SjoergIn any case, we can start building this sub-matcher. We can require that 3487330f729Sjoergthe increment step be a unary increment like this: 3497330f729Sjoerg 3507330f729Sjoerg.. code-block:: c++ 3517330f729Sjoerg 3527330f729Sjoerg hasIncrement(unaryOperator(hasOperatorName("++"))) 3537330f729Sjoerg 3547330f729SjoergSpecifying what is incremented introduces another quirk of Clang's AST: 3557330f729SjoergUsages of variables are represented as ``DeclRefExpr``'s ("declaration 3567330f729Sjoergreference expressions") because they are expressions which refer to 3577330f729Sjoergvariable declarations. To find a ``unaryOperator`` that refers to a 3587330f729Sjoergspecific declaration, we can simply add a second condition to it: 3597330f729Sjoerg 3607330f729Sjoerg.. code-block:: c++ 3617330f729Sjoerg 3627330f729Sjoerg hasIncrement(unaryOperator( 3637330f729Sjoerg hasOperatorName("++"), 3647330f729Sjoerg hasUnaryOperand(declRefExpr()))) 3657330f729Sjoerg 3667330f729SjoergFurthermore, we can restrict our matcher to only match if the 3677330f729Sjoergincremented variable is an integer: 3687330f729Sjoerg 3697330f729Sjoerg.. code-block:: c++ 3707330f729Sjoerg 3717330f729Sjoerg hasIncrement(unaryOperator( 3727330f729Sjoerg hasOperatorName("++"), 3737330f729Sjoerg hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) 3747330f729Sjoerg 3757330f729SjoergAnd the last step will be to attach an identifier to this variable, so 3767330f729Sjoergthat we can retrieve it in the callback: 3777330f729Sjoerg 3787330f729Sjoerg.. code-block:: c++ 3797330f729Sjoerg 3807330f729Sjoerg hasIncrement(unaryOperator( 3817330f729Sjoerg hasOperatorName("++"), 3827330f729Sjoerg hasUnaryOperand(declRefExpr(to( 3837330f729Sjoerg varDecl(hasType(isInteger())).bind("incrementVariable")))))) 3847330f729Sjoerg 3857330f729SjoergWe can add this code to the definition of ``LoopMatcher`` and make sure 3867330f729Sjoergthat our program, outfitted with the new matcher, only prints out loops 3877330f729Sjoergthat declare a single variable initialized to zero and have an increment 3887330f729Sjoergstep consisting of a unary increment of some variable. 3897330f729Sjoerg 3907330f729SjoergNow, we just need to add a matcher to check if the condition part of the 3917330f729Sjoerg``for`` loop compares a variable against the size of the array. There is 3927330f729Sjoergonly one problem - we don't know which array we're iterating over 3937330f729Sjoergwithout looking at the body of the loop! We are again restricted to 3947330f729Sjoergapproximating the result we want with matchers, filling in the details 3957330f729Sjoergin the callback. So we start with: 3967330f729Sjoerg 3977330f729Sjoerg.. code-block:: c++ 3987330f729Sjoerg 3997330f729Sjoerg hasCondition(binaryOperator(hasOperatorName("<")) 4007330f729Sjoerg 4017330f729SjoergIt makes sense to ensure that the left-hand side is a reference to a 4027330f729Sjoergvariable, and that the right-hand side has integer type. 4037330f729Sjoerg 4047330f729Sjoerg.. code-block:: c++ 4057330f729Sjoerg 4067330f729Sjoerg hasCondition(binaryOperator( 4077330f729Sjoerg hasOperatorName("<"), 4087330f729Sjoerg hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), 4097330f729Sjoerg hasRHS(expr(hasType(isInteger()))))) 4107330f729Sjoerg 4117330f729SjoergWhy? Because it doesn't work. Of the three loops provided in 4127330f729Sjoerg``test-files/simple.cpp``, zero of them have a matching condition. A 4137330f729Sjoergquick look at the AST dump of the first for loop, produced by the 4147330f729Sjoergprevious iteration of loop-convert, shows us the answer: 4157330f729Sjoerg 4167330f729Sjoerg:: 4177330f729Sjoerg 4187330f729Sjoerg (ForStmt 0x173b240 4197330f729Sjoerg (DeclStmt 0x173afc8 4207330f729Sjoerg 0x173af50 "int i = 4217330f729Sjoerg (IntegerLiteral 0x173afa8 'int' 0)") 4227330f729Sjoerg <<>> 4237330f729Sjoerg (BinaryOperator 0x173b060 '_Bool' '<' 4247330f729Sjoerg (ImplicitCastExpr 0x173b030 'int' 4257330f729Sjoerg (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) 4267330f729Sjoerg (ImplicitCastExpr 0x173b048 'int' 4277330f729Sjoerg (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) 4287330f729Sjoerg (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' 4297330f729Sjoerg (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) 4307330f729Sjoerg (CompoundStatement ... 4317330f729Sjoerg 4327330f729SjoergWe already know that the declaration and increments both match, or this 4337330f729Sjoergloop wouldn't have been dumped. The culprit lies in the implicit cast 4347330f729Sjoergapplied to the first operand (i.e. the LHS) of the less-than operator, 4357330f729Sjoergan L-value to R-value conversion applied to the expression referencing 4367330f729Sjoerg``i``. Thankfully, the matcher library offers a solution to this problem 4377330f729Sjoergin the form of ``ignoringParenImpCasts``, which instructs the matcher to 4387330f729Sjoergignore implicit casts and parentheses before continuing to match. 4397330f729SjoergAdjusting the condition operator will restore the desired match. 4407330f729Sjoerg 4417330f729Sjoerg.. code-block:: c++ 4427330f729Sjoerg 4437330f729Sjoerg hasCondition(binaryOperator( 4447330f729Sjoerg hasOperatorName("<"), 4457330f729Sjoerg hasLHS(ignoringParenImpCasts(declRefExpr( 4467330f729Sjoerg to(varDecl(hasType(isInteger())))))), 4477330f729Sjoerg hasRHS(expr(hasType(isInteger()))))) 4487330f729Sjoerg 4497330f729SjoergAfter adding binds to the expressions we wished to capture and 4507330f729Sjoergextracting the identifier strings into variables, we have array-step-2 4517330f729Sjoergcompleted. 4527330f729Sjoerg 4537330f729SjoergStep 4: Retrieving Matched Nodes 4547330f729Sjoerg================================ 4557330f729Sjoerg 4567330f729SjoergSo far, the matcher callback isn't very interesting: it just dumps the 4577330f729Sjoergloop's AST. At some point, we will need to make changes to the input 4587330f729Sjoergsource code. Next, we'll work on using the nodes we bound in the 4597330f729Sjoergprevious step. 4607330f729Sjoerg 4617330f729SjoergThe ``MatchFinder::run()`` callback takes a 4627330f729Sjoerg``MatchFinder::MatchResult&`` as its parameter. We're most interested in 4637330f729Sjoergits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` 4647330f729Sjoergclass to represent contextual information about the AST, as the name 4657330f729Sjoergimplies, though the most functionally important detail is that several 4667330f729Sjoergoperations require an ``ASTContext*`` parameter. More immediately useful 4677330f729Sjoergis the set of matched nodes, and how we retrieve them. 4687330f729Sjoerg 4697330f729SjoergSince we bind three variables (identified by ConditionVarName, 4707330f729SjoergInitVarName, and IncrementVarName), we can obtain the matched nodes by 4717330f729Sjoergusing the ``getNodeAs()`` member function. 4727330f729Sjoerg 4737330f729SjoergIn ``LoopConvert.cpp`` add 4747330f729Sjoerg 4757330f729Sjoerg.. code-block:: c++ 4767330f729Sjoerg 4777330f729Sjoerg #include "clang/AST/ASTContext.h" 4787330f729Sjoerg 4797330f729SjoergChange ``LoopMatcher`` to 4807330f729Sjoerg 4817330f729Sjoerg.. code-block:: c++ 4827330f729Sjoerg 4837330f729Sjoerg StatementMatcher LoopMatcher = 4847330f729Sjoerg forStmt(hasLoopInit(declStmt( 4857330f729Sjoerg hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) 4867330f729Sjoerg .bind("initVarName")))), 4877330f729Sjoerg hasIncrement(unaryOperator( 4887330f729Sjoerg hasOperatorName("++"), 4897330f729Sjoerg hasUnaryOperand(declRefExpr( 4907330f729Sjoerg to(varDecl(hasType(isInteger())).bind("incVarName")))))), 4917330f729Sjoerg hasCondition(binaryOperator( 4927330f729Sjoerg hasOperatorName("<"), 4937330f729Sjoerg hasLHS(ignoringParenImpCasts(declRefExpr( 4947330f729Sjoerg to(varDecl(hasType(isInteger())).bind("condVarName"))))), 4957330f729Sjoerg hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); 4967330f729Sjoerg 4977330f729SjoergAnd change ``LoopPrinter::run`` to 4987330f729Sjoerg 4997330f729Sjoerg.. code-block:: c++ 5007330f729Sjoerg 5017330f729Sjoerg void LoopPrinter::run(const MatchFinder::MatchResult &Result) { 5027330f729Sjoerg ASTContext *Context = Result.Context; 5037330f729Sjoerg const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop"); 5047330f729Sjoerg // We do not want to convert header files! 5057330f729Sjoerg if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc())) 5067330f729Sjoerg return; 5077330f729Sjoerg const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); 5087330f729Sjoerg const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); 5097330f729Sjoerg const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); 5107330f729Sjoerg 5117330f729Sjoerg if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) 5127330f729Sjoerg return; 5137330f729Sjoerg llvm::outs() << "Potential array-based loop discovered.\n"; 5147330f729Sjoerg } 5157330f729Sjoerg 5167330f729SjoergClang associates a ``VarDecl`` with each variable to represent the variable's 5177330f729Sjoergdeclaration. Since the "canonical" form of each declaration is unique by 5187330f729Sjoergaddress, all we need to do is make sure neither ``ValueDecl`` (base class of 5197330f729Sjoerg``VarDecl``) is ``NULL`` and compare the canonical Decls. 5207330f729Sjoerg 5217330f729Sjoerg.. code-block:: c++ 5227330f729Sjoerg 5237330f729Sjoerg static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { 5247330f729Sjoerg return First && Second && 5257330f729Sjoerg First->getCanonicalDecl() == Second->getCanonicalDecl(); 5267330f729Sjoerg } 5277330f729Sjoerg 5287330f729SjoergIf execution reaches the end of ``LoopPrinter::run()``, we know that the 5297330f729Sjoergloop shell that looks like 5307330f729Sjoerg 5317330f729Sjoerg.. code-block:: c++ 5327330f729Sjoerg 5337330f729Sjoerg for (int i= 0; i < expr(); ++i) { ... } 5347330f729Sjoerg 5357330f729SjoergFor now, we will just print a message explaining that we found a loop. 5367330f729SjoergThe next section will deal with recursively traversing the AST to 5377330f729Sjoergdiscover all changes needed. 5387330f729Sjoerg 5397330f729SjoergAs a side note, it's not as trivial to test if two expressions are the same, 5407330f729Sjoergthough Clang has already done the hard work for us by providing a way to 5417330f729Sjoergcanonicalize expressions: 5427330f729Sjoerg 5437330f729Sjoerg.. code-block:: c++ 5447330f729Sjoerg 5457330f729Sjoerg static bool areSameExpr(ASTContext *Context, const Expr *First, 5467330f729Sjoerg const Expr *Second) { 5477330f729Sjoerg if (!First || !Second) 5487330f729Sjoerg return false; 5497330f729Sjoerg llvm::FoldingSetNodeID FirstID, SecondID; 5507330f729Sjoerg First->Profile(FirstID, *Context, true); 5517330f729Sjoerg Second->Profile(SecondID, *Context, true); 5527330f729Sjoerg return FirstID == SecondID; 5537330f729Sjoerg } 5547330f729Sjoerg 5557330f729SjoergThis code relies on the comparison between two 5567330f729Sjoerg``llvm::FoldingSetNodeIDs``. As the documentation for 5577330f729Sjoerg``Stmt::Profile()`` indicates, the ``Profile()`` member function builds 5587330f729Sjoerga description of a node in the AST, based on its properties, along with 5597330f729Sjoergthose of its children. ``FoldingSetNodeID`` then serves as a hash we can 5607330f729Sjoerguse to compare expressions. We will need ``areSameExpr`` later. Before 5617330f729Sjoergyou run the new code on the additional loops added to 5627330f729Sjoergtest-files/simple.cpp, try to figure out which ones will be considered 5637330f729Sjoergpotentially convertible. 564