1*f4a2713aSLionel Sambuc=============================================================== 2*f4a2713aSLionel SambucTutorial for building tools using LibTooling and LibASTMatchers 3*f4a2713aSLionel Sambuc=============================================================== 4*f4a2713aSLionel Sambuc 5*f4a2713aSLionel SambucThis document is intended to show how to build a useful source-to-source 6*f4a2713aSLionel Sambuctranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is 7*f4a2713aSLionel Sambucexplicitly aimed at people who are new to Clang, so all you should need 8*f4a2713aSLionel Sambucis a working knowledge of C++ and the command line. 9*f4a2713aSLionel Sambuc 10*f4a2713aSLionel SambucIn order to work on the compiler, you need some basic knowledge of the 11*f4a2713aSLionel Sambucabstract syntax tree (AST). To this end, the reader is incouraged to 12*f4a2713aSLionel Sambucskim the :doc:`Introduction to the Clang 13*f4a2713aSLionel SambucAST <IntroductionToTheClangAST>` 14*f4a2713aSLionel Sambuc 15*f4a2713aSLionel SambucStep 0: Obtaining Clang 16*f4a2713aSLionel Sambuc======================= 17*f4a2713aSLionel Sambuc 18*f4a2713aSLionel SambucAs Clang is part of the LLVM project, you'll need to download LLVM's 19*f4a2713aSLionel Sambucsource code first. Both Clang and LLVM are maintained as Subversion 20*f4a2713aSLionel Sambucrepositories, but we'll be accessing them through the git mirror. For 21*f4a2713aSLionel Sambucfurther information, see the `getting started 22*f4a2713aSLionel Sambucguide <http://llvm.org/docs/GettingStarted.html>`_. 23*f4a2713aSLionel Sambuc 24*f4a2713aSLionel Sambuc.. code-block:: console 25*f4a2713aSLionel Sambuc 26*f4a2713aSLionel Sambuc mkdir ~/clang-llvm && cd ~/clang-llvm 27*f4a2713aSLionel Sambuc git clone http://llvm.org/git/llvm.git 28*f4a2713aSLionel Sambuc cd llvm/tools 29*f4a2713aSLionel Sambuc git clone http://llvm.org/git/clang.git 30*f4a2713aSLionel Sambuc cd clang/tools 31*f4a2713aSLionel Sambuc git clone http://llvm.org/git/clang-tools-extra.git extra 32*f4a2713aSLionel Sambuc 33*f4a2713aSLionel SambucNext you need to obtain the CMake build system and Ninja build tool. You 34*f4a2713aSLionel Sambucmay already have CMake installed, but current binary versions of CMake 35*f4a2713aSLionel Sambucaren't built with Ninja support. 36*f4a2713aSLionel Sambuc 37*f4a2713aSLionel Sambuc.. code-block:: console 38*f4a2713aSLionel Sambuc 39*f4a2713aSLionel Sambuc cd ~/clang-llvm 40*f4a2713aSLionel Sambuc git clone https://github.com/martine/ninja.git 41*f4a2713aSLionel Sambuc cd ninja 42*f4a2713aSLionel Sambuc git checkout release 43*f4a2713aSLionel Sambuc ./bootstrap.py 44*f4a2713aSLionel Sambuc sudo cp ninja /usr/bin/ 45*f4a2713aSLionel Sambuc 46*f4a2713aSLionel Sambuc cd ~/clang-llvm 47*f4a2713aSLionel Sambuc git clone git://cmake.org/stage/cmake.git 48*f4a2713aSLionel Sambuc cd cmake 49*f4a2713aSLionel Sambuc git checkout next 50*f4a2713aSLionel Sambuc ./bootstrap 51*f4a2713aSLionel Sambuc make 52*f4a2713aSLionel Sambuc sudo make install 53*f4a2713aSLionel Sambuc 54*f4a2713aSLionel SambucOkay. Now we'll build Clang! 55*f4a2713aSLionel Sambuc 56*f4a2713aSLionel Sambuc.. code-block:: console 57*f4a2713aSLionel Sambuc 58*f4a2713aSLionel Sambuc cd ~/clang-llvm 59*f4a2713aSLionel Sambuc mkdir build && cd build 60*f4a2713aSLionel Sambuc cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. 61*f4a2713aSLionel Sambuc ninja 62*f4a2713aSLionel Sambuc ninja check # Test LLVM only. 63*f4a2713aSLionel Sambuc ninja clang-test # Test Clang only. 64*f4a2713aSLionel Sambuc ninja install 65*f4a2713aSLionel Sambuc 66*f4a2713aSLionel SambucAnd we're live. 67*f4a2713aSLionel Sambuc 68*f4a2713aSLionel SambucAll of the tests should pass, though there is a (very) small chance that 69*f4a2713aSLionel Sambucyou can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` 70*f4a2713aSLionel Sambucin both the llvm and clang directories should fix any problems. 71*f4a2713aSLionel Sambuc 72*f4a2713aSLionel SambucFinally, we want to set Clang as its own compiler. 73*f4a2713aSLionel Sambuc 74*f4a2713aSLionel Sambuc.. code-block:: console 75*f4a2713aSLionel Sambuc 76*f4a2713aSLionel Sambuc cd ~/clang-llvm/build 77*f4a2713aSLionel Sambuc ccmake ../llvm 78*f4a2713aSLionel Sambuc 79*f4a2713aSLionel SambucThe second command will bring up a GUI for configuring Clang. You need 80*f4a2713aSLionel Sambucto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on 81*f4a2713aSLionel Sambucadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to 82*f4a2713aSLionel Sambuc``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to 83*f4a2713aSLionel Sambucconfigure, then ``'g'`` to generate CMake's files. 84*f4a2713aSLionel Sambuc 85*f4a2713aSLionel SambucFinally, run ninja one last time, and you're done. 86*f4a2713aSLionel Sambuc 87*f4a2713aSLionel SambucStep 1: Create a ClangTool 88*f4a2713aSLionel Sambuc========================== 89*f4a2713aSLionel Sambuc 90*f4a2713aSLionel SambucNow that we have enough background knowledge, it's time to create the 91*f4a2713aSLionel Sambucsimplest productive ClangTool in existence: a syntax checker. While this 92*f4a2713aSLionel Sambucalready exists as ``clang-check``, it's important to understand what's 93*f4a2713aSLionel Sambucgoing on. 94*f4a2713aSLionel Sambuc 95*f4a2713aSLionel SambucFirst, we'll need to create a new directory for our tool and tell CMake 96*f4a2713aSLionel Sambucthat it exists. As this is not going to be a core clang tool, it will 97*f4a2713aSLionel Sambuclive in the ``tools/extra`` repository. 98*f4a2713aSLionel Sambuc 99*f4a2713aSLionel Sambuc.. code-block:: console 100*f4a2713aSLionel Sambuc 101*f4a2713aSLionel Sambuc cd ~/clang-llvm/llvm/tools/clang 102*f4a2713aSLionel Sambuc mkdir tools/extra/loop-convert 103*f4a2713aSLionel Sambuc echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt 104*f4a2713aSLionel Sambuc vim tools/extra/loop-convert/CMakeLists.txt 105*f4a2713aSLionel Sambuc 106*f4a2713aSLionel SambucCMakeLists.txt should have the following contents: 107*f4a2713aSLionel Sambuc 108*f4a2713aSLionel Sambuc:: 109*f4a2713aSLionel Sambuc 110*f4a2713aSLionel Sambuc set(LLVM_LINK_COMPONENTS support) 111*f4a2713aSLionel Sambuc set(LLVM_USED_LIBS clangTooling clangBasic clangAST) 112*f4a2713aSLionel Sambuc 113*f4a2713aSLionel Sambuc add_clang_executable(loop-convert 114*f4a2713aSLionel Sambuc LoopConvert.cpp 115*f4a2713aSLionel Sambuc ) 116*f4a2713aSLionel Sambuc target_link_libraries(loop-convert 117*f4a2713aSLionel Sambuc clangTooling 118*f4a2713aSLionel Sambuc clangBasic 119*f4a2713aSLionel Sambuc clangASTMatchers 120*f4a2713aSLionel Sambuc ) 121*f4a2713aSLionel Sambuc 122*f4a2713aSLionel SambucWith that done, Ninja will be able to compile our tool. Let's give it 123*f4a2713aSLionel Sambucsomething to compile! Put the following into 124*f4a2713aSLionel Sambuc``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of 125*f4a2713aSLionel Sambucwhy the different parts are needed can be found in the `LibTooling 126*f4a2713aSLionel Sambucdocumentation <LibTooling.html>`_. 127*f4a2713aSLionel Sambuc 128*f4a2713aSLionel Sambuc.. code-block:: c++ 129*f4a2713aSLionel Sambuc 130*f4a2713aSLionel Sambuc // Declares clang::SyntaxOnlyAction. 131*f4a2713aSLionel Sambuc #include "clang/Frontend/FrontendActions.h" 132*f4a2713aSLionel Sambuc #include "clang/Tooling/CommonOptionsParser.h" 133*f4a2713aSLionel Sambuc #include "clang/Tooling/Tooling.h" 134*f4a2713aSLionel Sambuc // Declares llvm::cl::extrahelp. 135*f4a2713aSLionel Sambuc #include "llvm/Support/CommandLine.h" 136*f4a2713aSLionel Sambuc 137*f4a2713aSLionel Sambuc using namespace clang::tooling; 138*f4a2713aSLionel Sambuc using namespace llvm; 139*f4a2713aSLionel Sambuc 140*f4a2713aSLionel Sambuc // CommonOptionsParser declares HelpMessage with a description of the common 141*f4a2713aSLionel Sambuc // command-line options related to the compilation database and input files. 142*f4a2713aSLionel Sambuc // It's nice to have this help message in all tools. 143*f4a2713aSLionel Sambuc static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); 144*f4a2713aSLionel Sambuc 145*f4a2713aSLionel Sambuc // A help message for this specific tool can be added afterwards. 146*f4a2713aSLionel Sambuc static cl::extrahelp MoreHelp("\nMore help text..."); 147*f4a2713aSLionel Sambuc 148*f4a2713aSLionel Sambuc int main(int argc, const char **argv) { 149*f4a2713aSLionel Sambuc CommonOptionsParser OptionsParser(argc, argv); 150*f4a2713aSLionel Sambuc ClangTool Tool(OptionsParser.getCompilations(), 151*f4a2713aSLionel Sambuc OptionsParser.getSourcePathList()); 152*f4a2713aSLionel Sambuc return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>()); 153*f4a2713aSLionel Sambuc } 154*f4a2713aSLionel Sambuc 155*f4a2713aSLionel SambucAnd that's it! You can compile our new tool by running ninja from the 156*f4a2713aSLionel Sambuc``build`` directory. 157*f4a2713aSLionel Sambuc 158*f4a2713aSLionel Sambuc.. code-block:: console 159*f4a2713aSLionel Sambuc 160*f4a2713aSLionel Sambuc cd ~/clang-llvm/build 161*f4a2713aSLionel Sambuc ninja 162*f4a2713aSLionel Sambuc 163*f4a2713aSLionel SambucYou should now be able to run the syntax checker, which is located in 164*f4a2713aSLionel Sambuc``~/clang-llvm/build/bin``, on any source file. Try it! 165*f4a2713aSLionel Sambuc 166*f4a2713aSLionel Sambuc.. code-block:: console 167*f4a2713aSLionel Sambuc 168*f4a2713aSLionel Sambuc cat "int main() { return 0; }" > test.cpp 169*f4a2713aSLionel Sambuc bin/loop-convert test.cpp -- 170*f4a2713aSLionel Sambuc 171*f4a2713aSLionel SambucNote the two dashes after we specify the source file. The additional 172*f4a2713aSLionel Sambucoptions for the compiler are passed after the dashes rather than loading 173*f4a2713aSLionel Sambucthem from a compilation database - there just aren't any options needed 174*f4a2713aSLionel Sambucright now. 175*f4a2713aSLionel Sambuc 176*f4a2713aSLionel SambucIntermezzo: Learn AST matcher basics 177*f4a2713aSLionel Sambuc==================================== 178*f4a2713aSLionel Sambuc 179*f4a2713aSLionel SambucClang recently introduced the :doc:`ASTMatcher 180*f4a2713aSLionel Sambuclibrary <LibASTMatchers>` to provide a simple, powerful, and 181*f4a2713aSLionel Sambucconcise way to describe specific patterns in the AST. Implemented as a 182*f4a2713aSLionel SambucDSL powered by macros and templates (see 183*f4a2713aSLionel Sambuc`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're 184*f4a2713aSLionel Sambuccurious), matchers offer the feel of algebraic data types common to 185*f4a2713aSLionel Sambucfunctional programming languages. 186*f4a2713aSLionel Sambuc 187*f4a2713aSLionel SambucFor example, suppose you wanted to examine only binary operators. There 188*f4a2713aSLionel Sambucis a matcher to do exactly that, conveniently named ``binaryOperator``. 189*f4a2713aSLionel SambucI'll give you one guess what this matcher does: 190*f4a2713aSLionel Sambuc 191*f4a2713aSLionel Sambuc.. code-block:: c++ 192*f4a2713aSLionel Sambuc 193*f4a2713aSLionel Sambuc binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) 194*f4a2713aSLionel Sambuc 195*f4a2713aSLionel SambucShockingly, it will match against addition expressions whose left hand 196*f4a2713aSLionel Sambucside is exactly the literal 0. It will not match against other forms of 197*f4a2713aSLionel Sambuc0, such as ``'\0'`` or ``NULL``, but it will match against macros that 198*f4a2713aSLionel Sambucexpand to 0. The matcher will also not match against calls to the 199*f4a2713aSLionel Sambucoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` 200*f4a2713aSLionel Sambucmatcher to handle overloaded operators. 201*f4a2713aSLionel Sambuc 202*f4a2713aSLionel SambucThere are AST matchers to match all the different nodes of the AST, 203*f4a2713aSLionel Sambucnarrowing matchers to only match AST nodes fulfilling specific criteria, 204*f4a2713aSLionel Sambucand traversal matchers to get from one kind of AST node to another. For 205*f4a2713aSLionel Sambuca complete list of AST matchers, take a look at the `AST Matcher 206*f4a2713aSLionel SambucReferences <LibASTMatchersReference.html>`_ 207*f4a2713aSLionel Sambuc 208*f4a2713aSLionel SambucAll matcher that are nouns describe entities in the AST and can be 209*f4a2713aSLionel Sambucbound, so that they can be referred to whenever a match is found. To do 210*f4a2713aSLionel Sambucso, simply call the method ``bind`` on these matchers, e.g.: 211*f4a2713aSLionel Sambuc 212*f4a2713aSLionel Sambuc.. code-block:: c++ 213*f4a2713aSLionel Sambuc 214*f4a2713aSLionel Sambuc variable(hasType(isInteger())).bind("intvar") 215*f4a2713aSLionel Sambuc 216*f4a2713aSLionel SambucStep 2: Using AST matchers 217*f4a2713aSLionel Sambuc========================== 218*f4a2713aSLionel Sambuc 219*f4a2713aSLionel SambucOkay, on to using matchers for real. Let's start by defining a matcher 220*f4a2713aSLionel Sambucwhich will capture all ``for`` statements that define a new variable 221*f4a2713aSLionel Sambucinitialized to zero. Let's start with matching all ``for`` loops: 222*f4a2713aSLionel Sambuc 223*f4a2713aSLionel Sambuc.. code-block:: c++ 224*f4a2713aSLionel Sambuc 225*f4a2713aSLionel Sambuc forStmt() 226*f4a2713aSLionel Sambuc 227*f4a2713aSLionel SambucNext, we want to specify that a single variable is declared in the first 228*f4a2713aSLionel Sambucportion of the loop, so we can extend the matcher to 229*f4a2713aSLionel Sambuc 230*f4a2713aSLionel Sambuc.. code-block:: c++ 231*f4a2713aSLionel Sambuc 232*f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) 233*f4a2713aSLionel Sambuc 234*f4a2713aSLionel SambucFinally, we can add the condition that the variable is initialized to 235*f4a2713aSLionel Sambuczero. 236*f4a2713aSLionel Sambuc 237*f4a2713aSLionel Sambuc.. code-block:: c++ 238*f4a2713aSLionel Sambuc 239*f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 240*f4a2713aSLionel Sambuc hasInitializer(integerLiteral(equals(0)))))))) 241*f4a2713aSLionel Sambuc 242*f4a2713aSLionel SambucIt is fairly easy to read and understand the matcher definition ("match 243*f4a2713aSLionel Sambucloops whose init portion declares a single variable which is initialized 244*f4a2713aSLionel Sambucto the integer literal 0"), but deciding that every piece is necessary 245*f4a2713aSLionel Sambucis more difficult. Note that this matcher will not match loops whose 246*f4a2713aSLionel Sambucvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of 247*f4a2713aSLionel Sambuczero besides the integer 0. 248*f4a2713aSLionel Sambuc 249*f4a2713aSLionel SambucThe last step is giving the matcher a name and binding the ``ForStmt`` 250*f4a2713aSLionel Sambucas we will want to do something with it: 251*f4a2713aSLionel Sambuc 252*f4a2713aSLionel Sambuc.. code-block:: c++ 253*f4a2713aSLionel Sambuc 254*f4a2713aSLionel Sambuc StatementMatcher LoopMatcher = 255*f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 256*f4a2713aSLionel Sambuc hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 257*f4a2713aSLionel Sambuc 258*f4a2713aSLionel SambucOnce you have defined your matchers, you will need to add a little more 259*f4a2713aSLionel Sambucscaffolding in order to run them. Matchers are paired with a 260*f4a2713aSLionel Sambuc``MatchCallback`` and registered with a ``MatchFinder`` object, then run 261*f4a2713aSLionel Sambucfrom a ``ClangTool``. More code! 262*f4a2713aSLionel Sambuc 263*f4a2713aSLionel SambucAdd the following to ``LoopConvert.cpp``: 264*f4a2713aSLionel Sambuc 265*f4a2713aSLionel Sambuc.. code-block:: c++ 266*f4a2713aSLionel Sambuc 267*f4a2713aSLionel Sambuc #include "clang/ASTMatchers/ASTMatchers.h" 268*f4a2713aSLionel Sambuc #include "clang/ASTMatchers/ASTMatchFinder.h" 269*f4a2713aSLionel Sambuc 270*f4a2713aSLionel Sambuc using namespace clang; 271*f4a2713aSLionel Sambuc using namespace clang::ast_matchers; 272*f4a2713aSLionel Sambuc 273*f4a2713aSLionel Sambuc StatementMatcher LoopMatcher = 274*f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 275*f4a2713aSLionel Sambuc hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 276*f4a2713aSLionel Sambuc 277*f4a2713aSLionel Sambuc class LoopPrinter : public MatchFinder::MatchCallback { 278*f4a2713aSLionel Sambuc public : 279*f4a2713aSLionel Sambuc virtual void run(const MatchFinder::MatchResult &Result) { 280*f4a2713aSLionel Sambuc if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) 281*f4a2713aSLionel Sambuc FS->dump(); 282*f4a2713aSLionel Sambuc } 283*f4a2713aSLionel Sambuc }; 284*f4a2713aSLionel Sambuc 285*f4a2713aSLionel SambucAnd change ``main()`` to: 286*f4a2713aSLionel Sambuc 287*f4a2713aSLionel Sambuc.. code-block:: c++ 288*f4a2713aSLionel Sambuc 289*f4a2713aSLionel Sambuc int main(int argc, const char **argv) { 290*f4a2713aSLionel Sambuc CommonOptionsParser OptionsParser(argc, argv); 291*f4a2713aSLionel Sambuc ClangTool Tool(OptionsParser.getCompilations(), 292*f4a2713aSLionel Sambuc OptionsParser.getSourcePathList()); 293*f4a2713aSLionel Sambuc 294*f4a2713aSLionel Sambuc LoopPrinter Printer; 295*f4a2713aSLionel Sambuc MatchFinder Finder; 296*f4a2713aSLionel Sambuc Finder.addMatcher(LoopMatcher, &Printer); 297*f4a2713aSLionel Sambuc 298*f4a2713aSLionel Sambuc return Tool.run(newFrontendActionFactory(&Finder)); 299*f4a2713aSLionel Sambuc } 300*f4a2713aSLionel Sambuc 301*f4a2713aSLionel SambucNow, you should be able to recompile and run the code to discover for 302*f4a2713aSLionel Sambucloops. Create a new file with a few examples, and test out our new 303*f4a2713aSLionel Sambuchandiwork: 304*f4a2713aSLionel Sambuc 305*f4a2713aSLionel Sambuc.. code-block:: console 306*f4a2713aSLionel Sambuc 307*f4a2713aSLionel Sambuc cd ~/clang-llvm/llvm/llvm_build/ 308*f4a2713aSLionel Sambuc ninja loop-convert 309*f4a2713aSLionel Sambuc vim ~/test-files/simple-loops.cc 310*f4a2713aSLionel Sambuc bin/loop-convert ~/test-files/simple-loops.cc 311*f4a2713aSLionel Sambuc 312*f4a2713aSLionel SambucStep 3.5: More Complicated Matchers 313*f4a2713aSLionel Sambuc=================================== 314*f4a2713aSLionel Sambuc 315*f4a2713aSLionel SambucOur simple matcher is capable of discovering for loops, but we would 316*f4a2713aSLionel Sambucstill need to filter out many more ourselves. We can do a good portion 317*f4a2713aSLionel Sambucof the remaining work with some cleverly chosen matchers, but first we 318*f4a2713aSLionel Sambucneed to decide exactly which properties we want to allow. 319*f4a2713aSLionel Sambuc 320*f4a2713aSLionel SambucHow can we characterize for loops over arrays which would be eligible 321*f4a2713aSLionel Sambucfor translation to range-based syntax? Range based loops over arrays of 322*f4a2713aSLionel Sambucsize ``N`` that: 323*f4a2713aSLionel Sambuc 324*f4a2713aSLionel Sambuc- start at index ``0`` 325*f4a2713aSLionel Sambuc- iterate consecutively 326*f4a2713aSLionel Sambuc- end at index ``N-1`` 327*f4a2713aSLionel Sambuc 328*f4a2713aSLionel SambucWe already check for (1), so all we need to add is a check to the loop's 329*f4a2713aSLionel Sambuccondition to ensure that the loop's index variable is compared against 330*f4a2713aSLionel Sambuc``N`` and another check to ensure that the increment step just 331*f4a2713aSLionel Sambucincrements this same variable. The matcher for (2) is straightforward: 332*f4a2713aSLionel Sambucrequire a pre- or post-increment of the same variable declared in the 333*f4a2713aSLionel Sambucinit portion. 334*f4a2713aSLionel Sambuc 335*f4a2713aSLionel SambucUnfortunately, such a matcher is impossible to write. Matchers contain 336*f4a2713aSLionel Sambucno logic for comparing two arbitrary AST nodes and determining whether 337*f4a2713aSLionel Sambucor not they are equal, so the best we can do is matching more than we 338*f4a2713aSLionel Sambucwould like to allow, and punting extra comparisons to the callback. 339*f4a2713aSLionel Sambuc 340*f4a2713aSLionel SambucIn any case, we can start building this sub-matcher. We can require that 341*f4a2713aSLionel Sambucthe increment step be a unary increment like this: 342*f4a2713aSLionel Sambuc 343*f4a2713aSLionel Sambuc.. code-block:: c++ 344*f4a2713aSLionel Sambuc 345*f4a2713aSLionel Sambuc hasIncrement(unaryOperator(hasOperatorName("++"))) 346*f4a2713aSLionel Sambuc 347*f4a2713aSLionel SambucSpecifying what is incremented introduces another quirk of Clang's AST: 348*f4a2713aSLionel SambucUsages of variables are represented as ``DeclRefExpr``'s ("declaration 349*f4a2713aSLionel Sambucreference expressions") because they are expressions which refer to 350*f4a2713aSLionel Sambucvariable declarations. To find a ``unaryOperator`` that refers to a 351*f4a2713aSLionel Sambucspecific declaration, we can simply add a second condition to it: 352*f4a2713aSLionel Sambuc 353*f4a2713aSLionel Sambuc.. code-block:: c++ 354*f4a2713aSLionel Sambuc 355*f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 356*f4a2713aSLionel Sambuc hasOperatorName("++"), 357*f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr()))) 358*f4a2713aSLionel Sambuc 359*f4a2713aSLionel SambucFurthermore, we can restrict our matcher to only match if the 360*f4a2713aSLionel Sambucincremented variable is an integer: 361*f4a2713aSLionel Sambuc 362*f4a2713aSLionel Sambuc.. code-block:: c++ 363*f4a2713aSLionel Sambuc 364*f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 365*f4a2713aSLionel Sambuc hasOperatorName("++"), 366*f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) 367*f4a2713aSLionel Sambuc 368*f4a2713aSLionel SambucAnd the last step will be to attach an identifier to this variable, so 369*f4a2713aSLionel Sambucthat we can retrieve it in the callback: 370*f4a2713aSLionel Sambuc 371*f4a2713aSLionel Sambuc.. code-block:: c++ 372*f4a2713aSLionel Sambuc 373*f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 374*f4a2713aSLionel Sambuc hasOperatorName("++"), 375*f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr(to( 376*f4a2713aSLionel Sambuc varDecl(hasType(isInteger())).bind("incrementVariable")))))) 377*f4a2713aSLionel Sambuc 378*f4a2713aSLionel SambucWe can add this code to the definition of ``LoopMatcher`` and make sure 379*f4a2713aSLionel Sambucthat our program, outfitted with the new matcher, only prints out loops 380*f4a2713aSLionel Sambucthat declare a single variable initialized to zero and have an increment 381*f4a2713aSLionel Sambucstep consisting of a unary increment of some variable. 382*f4a2713aSLionel Sambuc 383*f4a2713aSLionel SambucNow, we just need to add a matcher to check if the condition part of the 384*f4a2713aSLionel Sambuc``for`` loop compares a variable against the size of the array. There is 385*f4a2713aSLionel Sambuconly one problem - we don't know which array we're iterating over 386*f4a2713aSLionel Sambucwithout looking at the body of the loop! We are again restricted to 387*f4a2713aSLionel Sambucapproximating the result we want with matchers, filling in the details 388*f4a2713aSLionel Sambucin the callback. So we start with: 389*f4a2713aSLionel Sambuc 390*f4a2713aSLionel Sambuc.. code-block:: c++ 391*f4a2713aSLionel Sambuc 392*f4a2713aSLionel Sambuc hasCondition(binaryOperator(hasOperatorName("<")) 393*f4a2713aSLionel Sambuc 394*f4a2713aSLionel SambucIt makes sense to ensure that the left-hand side is a reference to a 395*f4a2713aSLionel Sambucvariable, and that the right-hand side has integer type. 396*f4a2713aSLionel Sambuc 397*f4a2713aSLionel Sambuc.. code-block:: c++ 398*f4a2713aSLionel Sambuc 399*f4a2713aSLionel Sambuc hasCondition(binaryOperator( 400*f4a2713aSLionel Sambuc hasOperatorName("<"), 401*f4a2713aSLionel Sambuc hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), 402*f4a2713aSLionel Sambuc hasRHS(expr(hasType(isInteger()))))) 403*f4a2713aSLionel Sambuc 404*f4a2713aSLionel SambucWhy? Because it doesn't work. Of the three loops provided in 405*f4a2713aSLionel Sambuc``test-files/simple.cpp``, zero of them have a matching condition. A 406*f4a2713aSLionel Sambucquick look at the AST dump of the first for loop, produced by the 407*f4a2713aSLionel Sambucprevious iteration of loop-convert, shows us the answer: 408*f4a2713aSLionel Sambuc 409*f4a2713aSLionel Sambuc:: 410*f4a2713aSLionel Sambuc 411*f4a2713aSLionel Sambuc (ForStmt 0x173b240 412*f4a2713aSLionel Sambuc (DeclStmt 0x173afc8 413*f4a2713aSLionel Sambuc 0x173af50 "int i = 414*f4a2713aSLionel Sambuc (IntegerLiteral 0x173afa8 'int' 0)") 415*f4a2713aSLionel Sambuc <<>> 416*f4a2713aSLionel Sambuc (BinaryOperator 0x173b060 '_Bool' '<' 417*f4a2713aSLionel Sambuc (ImplicitCastExpr 0x173b030 'int' 418*f4a2713aSLionel Sambuc (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) 419*f4a2713aSLionel Sambuc (ImplicitCastExpr 0x173b048 'int' 420*f4a2713aSLionel Sambuc (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) 421*f4a2713aSLionel Sambuc (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' 422*f4a2713aSLionel Sambuc (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) 423*f4a2713aSLionel Sambuc (CompoundStatement ... 424*f4a2713aSLionel Sambuc 425*f4a2713aSLionel SambucWe already know that the declaration and increments both match, or this 426*f4a2713aSLionel Sambucloop wouldn't have been dumped. The culprit lies in the implicit cast 427*f4a2713aSLionel Sambucapplied to the first operand (i.e. the LHS) of the less-than operator, 428*f4a2713aSLionel Sambucan L-value to R-value conversion applied to the expression referencing 429*f4a2713aSLionel Sambuc``i``. Thankfully, the matcher library offers a solution to this problem 430*f4a2713aSLionel Sambucin the form of ``ignoringParenImpCasts``, which instructs the matcher to 431*f4a2713aSLionel Sambucignore implicit casts and parentheses before continuing to match. 432*f4a2713aSLionel SambucAdjusting the condition operator will restore the desired match. 433*f4a2713aSLionel Sambuc 434*f4a2713aSLionel Sambuc.. code-block:: c++ 435*f4a2713aSLionel Sambuc 436*f4a2713aSLionel Sambuc hasCondition(binaryOperator( 437*f4a2713aSLionel Sambuc hasOperatorName("<"), 438*f4a2713aSLionel Sambuc hasLHS(ignoringParenImpCasts(declRefExpr( 439*f4a2713aSLionel Sambuc to(varDecl(hasType(isInteger())))))), 440*f4a2713aSLionel Sambuc hasRHS(expr(hasType(isInteger()))))) 441*f4a2713aSLionel Sambuc 442*f4a2713aSLionel SambucAfter adding binds to the expressions we wished to capture and 443*f4a2713aSLionel Sambucextracting the identifier strings into variables, we have array-step-2 444*f4a2713aSLionel Sambuccompleted. 445*f4a2713aSLionel Sambuc 446*f4a2713aSLionel SambucStep 4: Retrieving Matched Nodes 447*f4a2713aSLionel Sambuc================================ 448*f4a2713aSLionel Sambuc 449*f4a2713aSLionel SambucSo far, the matcher callback isn't very interesting: it just dumps the 450*f4a2713aSLionel Sambucloop's AST. At some point, we will need to make changes to the input 451*f4a2713aSLionel Sambucsource code. Next, we'll work on using the nodes we bound in the 452*f4a2713aSLionel Sambucprevious step. 453*f4a2713aSLionel Sambuc 454*f4a2713aSLionel SambucThe ``MatchFinder::run()`` callback takes a 455*f4a2713aSLionel Sambuc``MatchFinder::MatchResult&`` as its parameter. We're most interested in 456*f4a2713aSLionel Sambucits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` 457*f4a2713aSLionel Sambucclass to represent contextual information about the AST, as the name 458*f4a2713aSLionel Sambucimplies, though the most functionally important detail is that several 459*f4a2713aSLionel Sambucoperations require an ``ASTContext*`` parameter. More immediately useful 460*f4a2713aSLionel Sambucis the set of matched nodes, and how we retrieve them. 461*f4a2713aSLionel Sambuc 462*f4a2713aSLionel SambucSince we bind three variables (identified by ConditionVarName, 463*f4a2713aSLionel SambucInitVarName, and IncrementVarName), we can obtain the matched nodes by 464*f4a2713aSLionel Sambucusing the ``getNodeAs()`` member function. 465*f4a2713aSLionel Sambuc 466*f4a2713aSLionel SambucIn ``LoopConvert.cpp`` add 467*f4a2713aSLionel Sambuc 468*f4a2713aSLionel Sambuc.. code-block:: c++ 469*f4a2713aSLionel Sambuc 470*f4a2713aSLionel Sambuc #include "clang/AST/ASTContext.h" 471*f4a2713aSLionel Sambuc 472*f4a2713aSLionel SambucChange ``LoopMatcher`` to 473*f4a2713aSLionel Sambuc 474*f4a2713aSLionel Sambuc.. code-block:: c++ 475*f4a2713aSLionel Sambuc 476*f4a2713aSLionel Sambuc StatementMatcher LoopMatcher = 477*f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt( 478*f4a2713aSLionel Sambuc hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) 479*f4a2713aSLionel Sambuc .bind("initVarName")))), 480*f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 481*f4a2713aSLionel Sambuc hasOperatorName("++"), 482*f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr( 483*f4a2713aSLionel Sambuc to(varDecl(hasType(isInteger())).bind("incVarName")))))), 484*f4a2713aSLionel Sambuc hasCondition(binaryOperator( 485*f4a2713aSLionel Sambuc hasOperatorName("<"), 486*f4a2713aSLionel Sambuc hasLHS(ignoringParenImpCasts(declRefExpr( 487*f4a2713aSLionel Sambuc to(varDecl(hasType(isInteger())).bind("condVarName"))))), 488*f4a2713aSLionel Sambuc hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); 489*f4a2713aSLionel Sambuc 490*f4a2713aSLionel SambucAnd change ``LoopPrinter::run`` to 491*f4a2713aSLionel Sambuc 492*f4a2713aSLionel Sambuc.. code-block:: c++ 493*f4a2713aSLionel Sambuc 494*f4a2713aSLionel Sambuc void LoopPrinter::run(const MatchFinder::MatchResult &Result) { 495*f4a2713aSLionel Sambuc ASTContext *Context = Result.Context; 496*f4a2713aSLionel Sambuc const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop"); 497*f4a2713aSLionel Sambuc // We do not want to convert header files! 498*f4a2713aSLionel Sambuc if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) 499*f4a2713aSLionel Sambuc return; 500*f4a2713aSLionel Sambuc const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); 501*f4a2713aSLionel Sambuc const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); 502*f4a2713aSLionel Sambuc const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); 503*f4a2713aSLionel Sambuc 504*f4a2713aSLionel Sambuc if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) 505*f4a2713aSLionel Sambuc return; 506*f4a2713aSLionel Sambuc llvm::outs() << "Potential array-based loop discovered.\n"; 507*f4a2713aSLionel Sambuc } 508*f4a2713aSLionel Sambuc 509*f4a2713aSLionel SambucClang associates a ``VarDecl`` with each variable to represent the variable's 510*f4a2713aSLionel Sambucdeclaration. Since the "canonical" form of each declaration is unique by 511*f4a2713aSLionel Sambucaddress, all we need to do is make sure neither ``ValueDecl`` (base class of 512*f4a2713aSLionel Sambuc``VarDecl``) is ``NULL`` and compare the canonical Decls. 513*f4a2713aSLionel Sambuc 514*f4a2713aSLionel Sambuc.. code-block:: c++ 515*f4a2713aSLionel Sambuc 516*f4a2713aSLionel Sambuc static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { 517*f4a2713aSLionel Sambuc return First && Second && 518*f4a2713aSLionel Sambuc First->getCanonicalDecl() == Second->getCanonicalDecl(); 519*f4a2713aSLionel Sambuc } 520*f4a2713aSLionel Sambuc 521*f4a2713aSLionel SambucIf execution reaches the end of ``LoopPrinter::run()``, we know that the 522*f4a2713aSLionel Sambucloop shell that looks like 523*f4a2713aSLionel Sambuc 524*f4a2713aSLionel Sambuc.. code-block:: c++ 525*f4a2713aSLionel Sambuc 526*f4a2713aSLionel Sambuc for (int i= 0; i < expr(); ++i) { ... } 527*f4a2713aSLionel Sambuc 528*f4a2713aSLionel SambucFor now, we will just print a message explaining that we found a loop. 529*f4a2713aSLionel SambucThe next section will deal with recursively traversing the AST to 530*f4a2713aSLionel Sambucdiscover all changes needed. 531*f4a2713aSLionel Sambuc 532*f4a2713aSLionel SambucAs a side note, it's not as trivial to test if two expressions are the same, 533*f4a2713aSLionel Sambucthough Clang has already done the hard work for us by providing a way to 534*f4a2713aSLionel Sambuccanonicalize expressions: 535*f4a2713aSLionel Sambuc 536*f4a2713aSLionel Sambuc.. code-block:: c++ 537*f4a2713aSLionel Sambuc 538*f4a2713aSLionel Sambuc static bool areSameExpr(ASTContext *Context, const Expr *First, 539*f4a2713aSLionel Sambuc const Expr *Second) { 540*f4a2713aSLionel Sambuc if (!First || !Second) 541*f4a2713aSLionel Sambuc return false; 542*f4a2713aSLionel Sambuc llvm::FoldingSetNodeID FirstID, SecondID; 543*f4a2713aSLionel Sambuc First->Profile(FirstID, *Context, true); 544*f4a2713aSLionel Sambuc Second->Profile(SecondID, *Context, true); 545*f4a2713aSLionel Sambuc return FirstID == SecondID; 546*f4a2713aSLionel Sambuc } 547*f4a2713aSLionel Sambuc 548*f4a2713aSLionel SambucThis code relies on the comparison between two 549*f4a2713aSLionel Sambuc``llvm::FoldingSetNodeIDs``. As the documentation for 550*f4a2713aSLionel Sambuc``Stmt::Profile()`` indicates, the ``Profile()`` member function builds 551*f4a2713aSLionel Sambuca description of a node in the AST, based on its properties, along with 552*f4a2713aSLionel Sambucthose of its children. ``FoldingSetNodeID`` then serves as a hash we can 553*f4a2713aSLionel Sambucuse to compare expressions. We will need ``areSameExpr`` later. Before 554*f4a2713aSLionel Sambucyou run the new code on the additional loops added to 555*f4a2713aSLionel Sambuctest-files/simple.cpp, try to figure out which ones will be considered 556*f4a2713aSLionel Sambucpotentially convertible. 557