1f4a2713aSLionel Sambuc=============================================================== 2f4a2713aSLionel SambucTutorial for building tools using LibTooling and LibASTMatchers 3f4a2713aSLionel Sambuc=============================================================== 4f4a2713aSLionel Sambuc 5f4a2713aSLionel SambucThis document is intended to show how to build a useful source-to-source 6f4a2713aSLionel Sambuctranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is 7f4a2713aSLionel Sambucexplicitly aimed at people who are new to Clang, so all you should need 8f4a2713aSLionel Sambucis a working knowledge of C++ and the command line. 9f4a2713aSLionel Sambuc 10f4a2713aSLionel SambucIn order to work on the compiler, you need some basic knowledge of the 11f4a2713aSLionel Sambucabstract syntax tree (AST). To this end, the reader is incouraged to 12f4a2713aSLionel Sambucskim the :doc:`Introduction to the Clang 13f4a2713aSLionel SambucAST <IntroductionToTheClangAST>` 14f4a2713aSLionel Sambuc 15f4a2713aSLionel SambucStep 0: Obtaining Clang 16f4a2713aSLionel Sambuc======================= 17f4a2713aSLionel Sambuc 18f4a2713aSLionel SambucAs Clang is part of the LLVM project, you'll need to download LLVM's 19f4a2713aSLionel Sambucsource code first. Both Clang and LLVM are maintained as Subversion 20f4a2713aSLionel Sambucrepositories, but we'll be accessing them through the git mirror. For 21f4a2713aSLionel Sambucfurther information, see the `getting started 22f4a2713aSLionel Sambucguide <http://llvm.org/docs/GettingStarted.html>`_. 23f4a2713aSLionel Sambuc 24f4a2713aSLionel Sambuc.. code-block:: console 25f4a2713aSLionel Sambuc 26f4a2713aSLionel Sambuc mkdir ~/clang-llvm && cd ~/clang-llvm 27f4a2713aSLionel Sambuc git clone http://llvm.org/git/llvm.git 28f4a2713aSLionel Sambuc cd llvm/tools 29f4a2713aSLionel Sambuc git clone http://llvm.org/git/clang.git 30f4a2713aSLionel Sambuc cd clang/tools 31f4a2713aSLionel Sambuc git clone http://llvm.org/git/clang-tools-extra.git extra 32f4a2713aSLionel Sambuc 33f4a2713aSLionel SambucNext you need to obtain the CMake build system and Ninja build tool. You 34f4a2713aSLionel Sambucmay already have CMake installed, but current binary versions of CMake 35f4a2713aSLionel Sambucaren't built with Ninja support. 36f4a2713aSLionel Sambuc 37f4a2713aSLionel Sambuc.. code-block:: console 38f4a2713aSLionel Sambuc 39f4a2713aSLionel Sambuc cd ~/clang-llvm 40f4a2713aSLionel Sambuc git clone https://github.com/martine/ninja.git 41f4a2713aSLionel Sambuc cd ninja 42f4a2713aSLionel Sambuc git checkout release 43f4a2713aSLionel Sambuc ./bootstrap.py 44f4a2713aSLionel Sambuc sudo cp ninja /usr/bin/ 45f4a2713aSLionel Sambuc 46f4a2713aSLionel Sambuc cd ~/clang-llvm 47f4a2713aSLionel Sambuc git clone git://cmake.org/stage/cmake.git 48f4a2713aSLionel Sambuc cd cmake 49f4a2713aSLionel Sambuc git checkout next 50f4a2713aSLionel Sambuc ./bootstrap 51f4a2713aSLionel Sambuc make 52f4a2713aSLionel Sambuc sudo make install 53f4a2713aSLionel Sambuc 54f4a2713aSLionel SambucOkay. Now we'll build Clang! 55f4a2713aSLionel Sambuc 56f4a2713aSLionel Sambuc.. code-block:: console 57f4a2713aSLionel Sambuc 58f4a2713aSLionel Sambuc cd ~/clang-llvm 59f4a2713aSLionel Sambuc mkdir build && cd build 60f4a2713aSLionel Sambuc cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. 61f4a2713aSLionel Sambuc ninja 62f4a2713aSLionel Sambuc ninja check # Test LLVM only. 63f4a2713aSLionel Sambuc ninja clang-test # Test Clang only. 64f4a2713aSLionel Sambuc ninja install 65f4a2713aSLionel Sambuc 66f4a2713aSLionel SambucAnd we're live. 67f4a2713aSLionel Sambuc 68f4a2713aSLionel SambucAll of the tests should pass, though there is a (very) small chance that 69f4a2713aSLionel Sambucyou can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` 70f4a2713aSLionel Sambucin both the llvm and clang directories should fix any problems. 71f4a2713aSLionel Sambuc 72f4a2713aSLionel SambucFinally, we want to set Clang as its own compiler. 73f4a2713aSLionel Sambuc 74f4a2713aSLionel Sambuc.. code-block:: console 75f4a2713aSLionel Sambuc 76f4a2713aSLionel Sambuc cd ~/clang-llvm/build 77f4a2713aSLionel Sambuc ccmake ../llvm 78f4a2713aSLionel Sambuc 79f4a2713aSLionel SambucThe second command will bring up a GUI for configuring Clang. You need 80f4a2713aSLionel Sambucto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on 81f4a2713aSLionel Sambucadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to 82f4a2713aSLionel Sambuc``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to 83f4a2713aSLionel Sambucconfigure, then ``'g'`` to generate CMake's files. 84f4a2713aSLionel Sambuc 85f4a2713aSLionel SambucFinally, run ninja one last time, and you're done. 86f4a2713aSLionel Sambuc 87f4a2713aSLionel SambucStep 1: Create a ClangTool 88f4a2713aSLionel Sambuc========================== 89f4a2713aSLionel Sambuc 90f4a2713aSLionel SambucNow that we have enough background knowledge, it's time to create the 91f4a2713aSLionel Sambucsimplest productive ClangTool in existence: a syntax checker. While this 92f4a2713aSLionel Sambucalready exists as ``clang-check``, it's important to understand what's 93f4a2713aSLionel Sambucgoing on. 94f4a2713aSLionel Sambuc 95f4a2713aSLionel SambucFirst, we'll need to create a new directory for our tool and tell CMake 96f4a2713aSLionel Sambucthat it exists. As this is not going to be a core clang tool, it will 97f4a2713aSLionel Sambuclive in the ``tools/extra`` repository. 98f4a2713aSLionel Sambuc 99f4a2713aSLionel Sambuc.. code-block:: console 100f4a2713aSLionel Sambuc 101f4a2713aSLionel Sambuc cd ~/clang-llvm/llvm/tools/clang 102f4a2713aSLionel Sambuc mkdir tools/extra/loop-convert 103f4a2713aSLionel Sambuc echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt 104f4a2713aSLionel Sambuc vim tools/extra/loop-convert/CMakeLists.txt 105f4a2713aSLionel Sambuc 106f4a2713aSLionel SambucCMakeLists.txt should have the following contents: 107f4a2713aSLionel Sambuc 108f4a2713aSLionel Sambuc:: 109f4a2713aSLionel Sambuc 110f4a2713aSLionel Sambuc set(LLVM_LINK_COMPONENTS support) 111f4a2713aSLionel Sambuc set(LLVM_USED_LIBS clangTooling clangBasic clangAST) 112f4a2713aSLionel Sambuc 113f4a2713aSLionel Sambuc add_clang_executable(loop-convert 114f4a2713aSLionel Sambuc LoopConvert.cpp 115f4a2713aSLionel Sambuc ) 116f4a2713aSLionel Sambuc target_link_libraries(loop-convert 117f4a2713aSLionel Sambuc clangTooling 118f4a2713aSLionel Sambuc clangBasic 119f4a2713aSLionel Sambuc clangASTMatchers 120f4a2713aSLionel Sambuc ) 121f4a2713aSLionel Sambuc 122f4a2713aSLionel SambucWith that done, Ninja will be able to compile our tool. Let's give it 123f4a2713aSLionel Sambucsomething to compile! Put the following into 124f4a2713aSLionel Sambuc``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of 125f4a2713aSLionel Sambucwhy the different parts are needed can be found in the `LibTooling 126f4a2713aSLionel Sambucdocumentation <LibTooling.html>`_. 127f4a2713aSLionel Sambuc 128f4a2713aSLionel Sambuc.. code-block:: c++ 129f4a2713aSLionel Sambuc 130f4a2713aSLionel Sambuc // Declares clang::SyntaxOnlyAction. 131f4a2713aSLionel Sambuc #include "clang/Frontend/FrontendActions.h" 132f4a2713aSLionel Sambuc #include "clang/Tooling/CommonOptionsParser.h" 133f4a2713aSLionel Sambuc #include "clang/Tooling/Tooling.h" 134f4a2713aSLionel Sambuc // Declares llvm::cl::extrahelp. 135f4a2713aSLionel Sambuc #include "llvm/Support/CommandLine.h" 136f4a2713aSLionel Sambuc 137f4a2713aSLionel Sambuc using namespace clang::tooling; 138f4a2713aSLionel Sambuc using namespace llvm; 139f4a2713aSLionel Sambuc 140*0a6a1f1dSLionel Sambuc // Apply a custom category to all command-line options so that they are the 141*0a6a1f1dSLionel Sambuc // only ones displayed. 142*0a6a1f1dSLionel Sambuc static llvm::cl::OptionCategory MyToolCategory("my-tool options"); 143*0a6a1f1dSLionel Sambuc 144f4a2713aSLionel Sambuc // CommonOptionsParser declares HelpMessage with a description of the common 145f4a2713aSLionel Sambuc // command-line options related to the compilation database and input files. 146f4a2713aSLionel Sambuc // It's nice to have this help message in all tools. 147f4a2713aSLionel Sambuc static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); 148f4a2713aSLionel Sambuc 149f4a2713aSLionel Sambuc // A help message for this specific tool can be added afterwards. 150f4a2713aSLionel Sambuc static cl::extrahelp MoreHelp("\nMore help text..."); 151f4a2713aSLionel Sambuc 152f4a2713aSLionel Sambuc int main(int argc, const char **argv) { 153*0a6a1f1dSLionel Sambuc CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); 154f4a2713aSLionel Sambuc ClangTool Tool(OptionsParser.getCompilations(), 155f4a2713aSLionel Sambuc OptionsParser.getSourcePathList()); 156*0a6a1f1dSLionel Sambuc return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); 157f4a2713aSLionel Sambuc } 158f4a2713aSLionel Sambuc 159f4a2713aSLionel SambucAnd that's it! You can compile our new tool by running ninja from the 160f4a2713aSLionel Sambuc``build`` directory. 161f4a2713aSLionel Sambuc 162f4a2713aSLionel Sambuc.. code-block:: console 163f4a2713aSLionel Sambuc 164f4a2713aSLionel Sambuc cd ~/clang-llvm/build 165f4a2713aSLionel Sambuc ninja 166f4a2713aSLionel Sambuc 167f4a2713aSLionel SambucYou should now be able to run the syntax checker, which is located in 168f4a2713aSLionel Sambuc``~/clang-llvm/build/bin``, on any source file. Try it! 169f4a2713aSLionel Sambuc 170f4a2713aSLionel Sambuc.. code-block:: console 171f4a2713aSLionel Sambuc 172f4a2713aSLionel Sambuc cat "int main() { return 0; }" > test.cpp 173f4a2713aSLionel Sambuc bin/loop-convert test.cpp -- 174f4a2713aSLionel Sambuc 175f4a2713aSLionel SambucNote the two dashes after we specify the source file. The additional 176f4a2713aSLionel Sambucoptions for the compiler are passed after the dashes rather than loading 177f4a2713aSLionel Sambucthem from a compilation database - there just aren't any options needed 178f4a2713aSLionel Sambucright now. 179f4a2713aSLionel Sambuc 180f4a2713aSLionel SambucIntermezzo: Learn AST matcher basics 181f4a2713aSLionel Sambuc==================================== 182f4a2713aSLionel Sambuc 183f4a2713aSLionel SambucClang recently introduced the :doc:`ASTMatcher 184f4a2713aSLionel Sambuclibrary <LibASTMatchers>` to provide a simple, powerful, and 185f4a2713aSLionel Sambucconcise way to describe specific patterns in the AST. Implemented as a 186f4a2713aSLionel SambucDSL powered by macros and templates (see 187f4a2713aSLionel Sambuc`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're 188f4a2713aSLionel Sambuccurious), matchers offer the feel of algebraic data types common to 189f4a2713aSLionel Sambucfunctional programming languages. 190f4a2713aSLionel Sambuc 191f4a2713aSLionel SambucFor example, suppose you wanted to examine only binary operators. There 192f4a2713aSLionel Sambucis a matcher to do exactly that, conveniently named ``binaryOperator``. 193f4a2713aSLionel SambucI'll give you one guess what this matcher does: 194f4a2713aSLionel Sambuc 195f4a2713aSLionel Sambuc.. code-block:: c++ 196f4a2713aSLionel Sambuc 197f4a2713aSLionel Sambuc binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) 198f4a2713aSLionel Sambuc 199f4a2713aSLionel SambucShockingly, it will match against addition expressions whose left hand 200f4a2713aSLionel Sambucside is exactly the literal 0. It will not match against other forms of 201f4a2713aSLionel Sambuc0, such as ``'\0'`` or ``NULL``, but it will match against macros that 202f4a2713aSLionel Sambucexpand to 0. The matcher will also not match against calls to the 203f4a2713aSLionel Sambucoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` 204f4a2713aSLionel Sambucmatcher to handle overloaded operators. 205f4a2713aSLionel Sambuc 206f4a2713aSLionel SambucThere are AST matchers to match all the different nodes of the AST, 207f4a2713aSLionel Sambucnarrowing matchers to only match AST nodes fulfilling specific criteria, 208f4a2713aSLionel Sambucand traversal matchers to get from one kind of AST node to another. For 209f4a2713aSLionel Sambuca complete list of AST matchers, take a look at the `AST Matcher 210f4a2713aSLionel SambucReferences <LibASTMatchersReference.html>`_ 211f4a2713aSLionel Sambuc 212f4a2713aSLionel SambucAll matcher that are nouns describe entities in the AST and can be 213f4a2713aSLionel Sambucbound, so that they can be referred to whenever a match is found. To do 214f4a2713aSLionel Sambucso, simply call the method ``bind`` on these matchers, e.g.: 215f4a2713aSLionel Sambuc 216f4a2713aSLionel Sambuc.. code-block:: c++ 217f4a2713aSLionel Sambuc 218f4a2713aSLionel Sambuc variable(hasType(isInteger())).bind("intvar") 219f4a2713aSLionel Sambuc 220f4a2713aSLionel SambucStep 2: Using AST matchers 221f4a2713aSLionel Sambuc========================== 222f4a2713aSLionel Sambuc 223f4a2713aSLionel SambucOkay, on to using matchers for real. Let's start by defining a matcher 224f4a2713aSLionel Sambucwhich will capture all ``for`` statements that define a new variable 225f4a2713aSLionel Sambucinitialized to zero. Let's start with matching all ``for`` loops: 226f4a2713aSLionel Sambuc 227f4a2713aSLionel Sambuc.. code-block:: c++ 228f4a2713aSLionel Sambuc 229f4a2713aSLionel Sambuc forStmt() 230f4a2713aSLionel Sambuc 231f4a2713aSLionel SambucNext, we want to specify that a single variable is declared in the first 232f4a2713aSLionel Sambucportion of the loop, so we can extend the matcher to 233f4a2713aSLionel Sambuc 234f4a2713aSLionel Sambuc.. code-block:: c++ 235f4a2713aSLionel Sambuc 236f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) 237f4a2713aSLionel Sambuc 238f4a2713aSLionel SambucFinally, we can add the condition that the variable is initialized to 239f4a2713aSLionel Sambuczero. 240f4a2713aSLionel Sambuc 241f4a2713aSLionel Sambuc.. code-block:: c++ 242f4a2713aSLionel Sambuc 243f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 244f4a2713aSLionel Sambuc hasInitializer(integerLiteral(equals(0)))))))) 245f4a2713aSLionel Sambuc 246f4a2713aSLionel SambucIt is fairly easy to read and understand the matcher definition ("match 247f4a2713aSLionel Sambucloops whose init portion declares a single variable which is initialized 248f4a2713aSLionel Sambucto the integer literal 0"), but deciding that every piece is necessary 249f4a2713aSLionel Sambucis more difficult. Note that this matcher will not match loops whose 250f4a2713aSLionel Sambucvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of 251f4a2713aSLionel Sambuczero besides the integer 0. 252f4a2713aSLionel Sambuc 253f4a2713aSLionel SambucThe last step is giving the matcher a name and binding the ``ForStmt`` 254f4a2713aSLionel Sambucas we will want to do something with it: 255f4a2713aSLionel Sambuc 256f4a2713aSLionel Sambuc.. code-block:: c++ 257f4a2713aSLionel Sambuc 258f4a2713aSLionel Sambuc StatementMatcher LoopMatcher = 259f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 260f4a2713aSLionel Sambuc hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 261f4a2713aSLionel Sambuc 262f4a2713aSLionel SambucOnce you have defined your matchers, you will need to add a little more 263f4a2713aSLionel Sambucscaffolding in order to run them. Matchers are paired with a 264f4a2713aSLionel Sambuc``MatchCallback`` and registered with a ``MatchFinder`` object, then run 265f4a2713aSLionel Sambucfrom a ``ClangTool``. More code! 266f4a2713aSLionel Sambuc 267f4a2713aSLionel SambucAdd the following to ``LoopConvert.cpp``: 268f4a2713aSLionel Sambuc 269f4a2713aSLionel Sambuc.. code-block:: c++ 270f4a2713aSLionel Sambuc 271f4a2713aSLionel Sambuc #include "clang/ASTMatchers/ASTMatchers.h" 272f4a2713aSLionel Sambuc #include "clang/ASTMatchers/ASTMatchFinder.h" 273f4a2713aSLionel Sambuc 274f4a2713aSLionel Sambuc using namespace clang; 275f4a2713aSLionel Sambuc using namespace clang::ast_matchers; 276f4a2713aSLionel Sambuc 277f4a2713aSLionel Sambuc StatementMatcher LoopMatcher = 278f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 279f4a2713aSLionel Sambuc hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 280f4a2713aSLionel Sambuc 281f4a2713aSLionel Sambuc class LoopPrinter : public MatchFinder::MatchCallback { 282f4a2713aSLionel Sambuc public : 283f4a2713aSLionel Sambuc virtual void run(const MatchFinder::MatchResult &Result) { 284f4a2713aSLionel Sambuc if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) 285f4a2713aSLionel Sambuc FS->dump(); 286f4a2713aSLionel Sambuc } 287f4a2713aSLionel Sambuc }; 288f4a2713aSLionel Sambuc 289f4a2713aSLionel SambucAnd change ``main()`` to: 290f4a2713aSLionel Sambuc 291f4a2713aSLionel Sambuc.. code-block:: c++ 292f4a2713aSLionel Sambuc 293f4a2713aSLionel Sambuc int main(int argc, const char **argv) { 294*0a6a1f1dSLionel Sambuc CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); 295f4a2713aSLionel Sambuc ClangTool Tool(OptionsParser.getCompilations(), 296f4a2713aSLionel Sambuc OptionsParser.getSourcePathList()); 297f4a2713aSLionel Sambuc 298f4a2713aSLionel Sambuc LoopPrinter Printer; 299f4a2713aSLionel Sambuc MatchFinder Finder; 300f4a2713aSLionel Sambuc Finder.addMatcher(LoopMatcher, &Printer); 301f4a2713aSLionel Sambuc 302*0a6a1f1dSLionel Sambuc return Tool.run(newFrontendActionFactory(&Finder).get()); 303f4a2713aSLionel Sambuc } 304f4a2713aSLionel Sambuc 305f4a2713aSLionel SambucNow, you should be able to recompile and run the code to discover for 306f4a2713aSLionel Sambucloops. Create a new file with a few examples, and test out our new 307f4a2713aSLionel Sambuchandiwork: 308f4a2713aSLionel Sambuc 309f4a2713aSLionel Sambuc.. code-block:: console 310f4a2713aSLionel Sambuc 311f4a2713aSLionel Sambuc cd ~/clang-llvm/llvm/llvm_build/ 312f4a2713aSLionel Sambuc ninja loop-convert 313f4a2713aSLionel Sambuc vim ~/test-files/simple-loops.cc 314f4a2713aSLionel Sambuc bin/loop-convert ~/test-files/simple-loops.cc 315f4a2713aSLionel Sambuc 316f4a2713aSLionel SambucStep 3.5: More Complicated Matchers 317f4a2713aSLionel Sambuc=================================== 318f4a2713aSLionel Sambuc 319f4a2713aSLionel SambucOur simple matcher is capable of discovering for loops, but we would 320f4a2713aSLionel Sambucstill need to filter out many more ourselves. We can do a good portion 321f4a2713aSLionel Sambucof the remaining work with some cleverly chosen matchers, but first we 322f4a2713aSLionel Sambucneed to decide exactly which properties we want to allow. 323f4a2713aSLionel Sambuc 324f4a2713aSLionel SambucHow can we characterize for loops over arrays which would be eligible 325f4a2713aSLionel Sambucfor translation to range-based syntax? Range based loops over arrays of 326f4a2713aSLionel Sambucsize ``N`` that: 327f4a2713aSLionel Sambuc 328f4a2713aSLionel Sambuc- start at index ``0`` 329f4a2713aSLionel Sambuc- iterate consecutively 330f4a2713aSLionel Sambuc- end at index ``N-1`` 331f4a2713aSLionel Sambuc 332f4a2713aSLionel SambucWe already check for (1), so all we need to add is a check to the loop's 333f4a2713aSLionel Sambuccondition to ensure that the loop's index variable is compared against 334f4a2713aSLionel Sambuc``N`` and another check to ensure that the increment step just 335f4a2713aSLionel Sambucincrements this same variable. The matcher for (2) is straightforward: 336f4a2713aSLionel Sambucrequire a pre- or post-increment of the same variable declared in the 337f4a2713aSLionel Sambucinit portion. 338f4a2713aSLionel Sambuc 339f4a2713aSLionel SambucUnfortunately, such a matcher is impossible to write. Matchers contain 340f4a2713aSLionel Sambucno logic for comparing two arbitrary AST nodes and determining whether 341f4a2713aSLionel Sambucor not they are equal, so the best we can do is matching more than we 342f4a2713aSLionel Sambucwould like to allow, and punting extra comparisons to the callback. 343f4a2713aSLionel Sambuc 344f4a2713aSLionel SambucIn any case, we can start building this sub-matcher. We can require that 345f4a2713aSLionel Sambucthe increment step be a unary increment like this: 346f4a2713aSLionel Sambuc 347f4a2713aSLionel Sambuc.. code-block:: c++ 348f4a2713aSLionel Sambuc 349f4a2713aSLionel Sambuc hasIncrement(unaryOperator(hasOperatorName("++"))) 350f4a2713aSLionel Sambuc 351f4a2713aSLionel SambucSpecifying what is incremented introduces another quirk of Clang's AST: 352f4a2713aSLionel SambucUsages of variables are represented as ``DeclRefExpr``'s ("declaration 353f4a2713aSLionel Sambucreference expressions") because they are expressions which refer to 354f4a2713aSLionel Sambucvariable declarations. To find a ``unaryOperator`` that refers to a 355f4a2713aSLionel Sambucspecific declaration, we can simply add a second condition to it: 356f4a2713aSLionel Sambuc 357f4a2713aSLionel Sambuc.. code-block:: c++ 358f4a2713aSLionel Sambuc 359f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 360f4a2713aSLionel Sambuc hasOperatorName("++"), 361f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr()))) 362f4a2713aSLionel Sambuc 363f4a2713aSLionel SambucFurthermore, we can restrict our matcher to only match if the 364f4a2713aSLionel Sambucincremented variable is an integer: 365f4a2713aSLionel Sambuc 366f4a2713aSLionel Sambuc.. code-block:: c++ 367f4a2713aSLionel Sambuc 368f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 369f4a2713aSLionel Sambuc hasOperatorName("++"), 370f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) 371f4a2713aSLionel Sambuc 372f4a2713aSLionel SambucAnd the last step will be to attach an identifier to this variable, so 373f4a2713aSLionel Sambucthat we can retrieve it in the callback: 374f4a2713aSLionel Sambuc 375f4a2713aSLionel Sambuc.. code-block:: c++ 376f4a2713aSLionel Sambuc 377f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 378f4a2713aSLionel Sambuc hasOperatorName("++"), 379f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr(to( 380f4a2713aSLionel Sambuc varDecl(hasType(isInteger())).bind("incrementVariable")))))) 381f4a2713aSLionel Sambuc 382f4a2713aSLionel SambucWe can add this code to the definition of ``LoopMatcher`` and make sure 383f4a2713aSLionel Sambucthat our program, outfitted with the new matcher, only prints out loops 384f4a2713aSLionel Sambucthat declare a single variable initialized to zero and have an increment 385f4a2713aSLionel Sambucstep consisting of a unary increment of some variable. 386f4a2713aSLionel Sambuc 387f4a2713aSLionel SambucNow, we just need to add a matcher to check if the condition part of the 388f4a2713aSLionel Sambuc``for`` loop compares a variable against the size of the array. There is 389f4a2713aSLionel Sambuconly one problem - we don't know which array we're iterating over 390f4a2713aSLionel Sambucwithout looking at the body of the loop! We are again restricted to 391f4a2713aSLionel Sambucapproximating the result we want with matchers, filling in the details 392f4a2713aSLionel Sambucin the callback. So we start with: 393f4a2713aSLionel Sambuc 394f4a2713aSLionel Sambuc.. code-block:: c++ 395f4a2713aSLionel Sambuc 396f4a2713aSLionel Sambuc hasCondition(binaryOperator(hasOperatorName("<")) 397f4a2713aSLionel Sambuc 398f4a2713aSLionel SambucIt makes sense to ensure that the left-hand side is a reference to a 399f4a2713aSLionel Sambucvariable, and that the right-hand side has integer type. 400f4a2713aSLionel Sambuc 401f4a2713aSLionel Sambuc.. code-block:: c++ 402f4a2713aSLionel Sambuc 403f4a2713aSLionel Sambuc hasCondition(binaryOperator( 404f4a2713aSLionel Sambuc hasOperatorName("<"), 405f4a2713aSLionel Sambuc hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), 406f4a2713aSLionel Sambuc hasRHS(expr(hasType(isInteger()))))) 407f4a2713aSLionel Sambuc 408f4a2713aSLionel SambucWhy? Because it doesn't work. Of the three loops provided in 409f4a2713aSLionel Sambuc``test-files/simple.cpp``, zero of them have a matching condition. A 410f4a2713aSLionel Sambucquick look at the AST dump of the first for loop, produced by the 411f4a2713aSLionel Sambucprevious iteration of loop-convert, shows us the answer: 412f4a2713aSLionel Sambuc 413f4a2713aSLionel Sambuc:: 414f4a2713aSLionel Sambuc 415f4a2713aSLionel Sambuc (ForStmt 0x173b240 416f4a2713aSLionel Sambuc (DeclStmt 0x173afc8 417f4a2713aSLionel Sambuc 0x173af50 "int i = 418f4a2713aSLionel Sambuc (IntegerLiteral 0x173afa8 'int' 0)") 419f4a2713aSLionel Sambuc <<>> 420f4a2713aSLionel Sambuc (BinaryOperator 0x173b060 '_Bool' '<' 421f4a2713aSLionel Sambuc (ImplicitCastExpr 0x173b030 'int' 422f4a2713aSLionel Sambuc (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) 423f4a2713aSLionel Sambuc (ImplicitCastExpr 0x173b048 'int' 424f4a2713aSLionel Sambuc (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) 425f4a2713aSLionel Sambuc (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' 426f4a2713aSLionel Sambuc (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) 427f4a2713aSLionel Sambuc (CompoundStatement ... 428f4a2713aSLionel Sambuc 429f4a2713aSLionel SambucWe already know that the declaration and increments both match, or this 430f4a2713aSLionel Sambucloop wouldn't have been dumped. The culprit lies in the implicit cast 431f4a2713aSLionel Sambucapplied to the first operand (i.e. the LHS) of the less-than operator, 432f4a2713aSLionel Sambucan L-value to R-value conversion applied to the expression referencing 433f4a2713aSLionel Sambuc``i``. Thankfully, the matcher library offers a solution to this problem 434f4a2713aSLionel Sambucin the form of ``ignoringParenImpCasts``, which instructs the matcher to 435f4a2713aSLionel Sambucignore implicit casts and parentheses before continuing to match. 436f4a2713aSLionel SambucAdjusting the condition operator will restore the desired match. 437f4a2713aSLionel Sambuc 438f4a2713aSLionel Sambuc.. code-block:: c++ 439f4a2713aSLionel Sambuc 440f4a2713aSLionel Sambuc hasCondition(binaryOperator( 441f4a2713aSLionel Sambuc hasOperatorName("<"), 442f4a2713aSLionel Sambuc hasLHS(ignoringParenImpCasts(declRefExpr( 443f4a2713aSLionel Sambuc to(varDecl(hasType(isInteger())))))), 444f4a2713aSLionel Sambuc hasRHS(expr(hasType(isInteger()))))) 445f4a2713aSLionel Sambuc 446f4a2713aSLionel SambucAfter adding binds to the expressions we wished to capture and 447f4a2713aSLionel Sambucextracting the identifier strings into variables, we have array-step-2 448f4a2713aSLionel Sambuccompleted. 449f4a2713aSLionel Sambuc 450f4a2713aSLionel SambucStep 4: Retrieving Matched Nodes 451f4a2713aSLionel Sambuc================================ 452f4a2713aSLionel Sambuc 453f4a2713aSLionel SambucSo far, the matcher callback isn't very interesting: it just dumps the 454f4a2713aSLionel Sambucloop's AST. At some point, we will need to make changes to the input 455f4a2713aSLionel Sambucsource code. Next, we'll work on using the nodes we bound in the 456f4a2713aSLionel Sambucprevious step. 457f4a2713aSLionel Sambuc 458f4a2713aSLionel SambucThe ``MatchFinder::run()`` callback takes a 459f4a2713aSLionel Sambuc``MatchFinder::MatchResult&`` as its parameter. We're most interested in 460f4a2713aSLionel Sambucits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` 461f4a2713aSLionel Sambucclass to represent contextual information about the AST, as the name 462f4a2713aSLionel Sambucimplies, though the most functionally important detail is that several 463f4a2713aSLionel Sambucoperations require an ``ASTContext*`` parameter. More immediately useful 464f4a2713aSLionel Sambucis the set of matched nodes, and how we retrieve them. 465f4a2713aSLionel Sambuc 466f4a2713aSLionel SambucSince we bind three variables (identified by ConditionVarName, 467f4a2713aSLionel SambucInitVarName, and IncrementVarName), we can obtain the matched nodes by 468f4a2713aSLionel Sambucusing the ``getNodeAs()`` member function. 469f4a2713aSLionel Sambuc 470f4a2713aSLionel SambucIn ``LoopConvert.cpp`` add 471f4a2713aSLionel Sambuc 472f4a2713aSLionel Sambuc.. code-block:: c++ 473f4a2713aSLionel Sambuc 474f4a2713aSLionel Sambuc #include "clang/AST/ASTContext.h" 475f4a2713aSLionel Sambuc 476f4a2713aSLionel SambucChange ``LoopMatcher`` to 477f4a2713aSLionel Sambuc 478f4a2713aSLionel Sambuc.. code-block:: c++ 479f4a2713aSLionel Sambuc 480f4a2713aSLionel Sambuc StatementMatcher LoopMatcher = 481f4a2713aSLionel Sambuc forStmt(hasLoopInit(declStmt( 482f4a2713aSLionel Sambuc hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) 483f4a2713aSLionel Sambuc .bind("initVarName")))), 484f4a2713aSLionel Sambuc hasIncrement(unaryOperator( 485f4a2713aSLionel Sambuc hasOperatorName("++"), 486f4a2713aSLionel Sambuc hasUnaryOperand(declRefExpr( 487f4a2713aSLionel Sambuc to(varDecl(hasType(isInteger())).bind("incVarName")))))), 488f4a2713aSLionel Sambuc hasCondition(binaryOperator( 489f4a2713aSLionel Sambuc hasOperatorName("<"), 490f4a2713aSLionel Sambuc hasLHS(ignoringParenImpCasts(declRefExpr( 491f4a2713aSLionel Sambuc to(varDecl(hasType(isInteger())).bind("condVarName"))))), 492f4a2713aSLionel Sambuc hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); 493f4a2713aSLionel Sambuc 494f4a2713aSLionel SambucAnd change ``LoopPrinter::run`` to 495f4a2713aSLionel Sambuc 496f4a2713aSLionel Sambuc.. code-block:: c++ 497f4a2713aSLionel Sambuc 498f4a2713aSLionel Sambuc void LoopPrinter::run(const MatchFinder::MatchResult &Result) { 499f4a2713aSLionel Sambuc ASTContext *Context = Result.Context; 500f4a2713aSLionel Sambuc const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop"); 501f4a2713aSLionel Sambuc // We do not want to convert header files! 502f4a2713aSLionel Sambuc if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) 503f4a2713aSLionel Sambuc return; 504f4a2713aSLionel Sambuc const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); 505f4a2713aSLionel Sambuc const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); 506f4a2713aSLionel Sambuc const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); 507f4a2713aSLionel Sambuc 508f4a2713aSLionel Sambuc if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) 509f4a2713aSLionel Sambuc return; 510f4a2713aSLionel Sambuc llvm::outs() << "Potential array-based loop discovered.\n"; 511f4a2713aSLionel Sambuc } 512f4a2713aSLionel Sambuc 513f4a2713aSLionel SambucClang associates a ``VarDecl`` with each variable to represent the variable's 514f4a2713aSLionel Sambucdeclaration. Since the "canonical" form of each declaration is unique by 515f4a2713aSLionel Sambucaddress, all we need to do is make sure neither ``ValueDecl`` (base class of 516f4a2713aSLionel Sambuc``VarDecl``) is ``NULL`` and compare the canonical Decls. 517f4a2713aSLionel Sambuc 518f4a2713aSLionel Sambuc.. code-block:: c++ 519f4a2713aSLionel Sambuc 520f4a2713aSLionel Sambuc static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { 521f4a2713aSLionel Sambuc return First && Second && 522f4a2713aSLionel Sambuc First->getCanonicalDecl() == Second->getCanonicalDecl(); 523f4a2713aSLionel Sambuc } 524f4a2713aSLionel Sambuc 525f4a2713aSLionel SambucIf execution reaches the end of ``LoopPrinter::run()``, we know that the 526f4a2713aSLionel Sambucloop shell that looks like 527f4a2713aSLionel Sambuc 528f4a2713aSLionel Sambuc.. code-block:: c++ 529f4a2713aSLionel Sambuc 530f4a2713aSLionel Sambuc for (int i= 0; i < expr(); ++i) { ... } 531f4a2713aSLionel Sambuc 532f4a2713aSLionel SambucFor now, we will just print a message explaining that we found a loop. 533f4a2713aSLionel SambucThe next section will deal with recursively traversing the AST to 534f4a2713aSLionel Sambucdiscover all changes needed. 535f4a2713aSLionel Sambuc 536f4a2713aSLionel SambucAs a side note, it's not as trivial to test if two expressions are the same, 537f4a2713aSLionel Sambucthough Clang has already done the hard work for us by providing a way to 538f4a2713aSLionel Sambuccanonicalize expressions: 539f4a2713aSLionel Sambuc 540f4a2713aSLionel Sambuc.. code-block:: c++ 541f4a2713aSLionel Sambuc 542f4a2713aSLionel Sambuc static bool areSameExpr(ASTContext *Context, const Expr *First, 543f4a2713aSLionel Sambuc const Expr *Second) { 544f4a2713aSLionel Sambuc if (!First || !Second) 545f4a2713aSLionel Sambuc return false; 546f4a2713aSLionel Sambuc llvm::FoldingSetNodeID FirstID, SecondID; 547f4a2713aSLionel Sambuc First->Profile(FirstID, *Context, true); 548f4a2713aSLionel Sambuc Second->Profile(SecondID, *Context, true); 549f4a2713aSLionel Sambuc return FirstID == SecondID; 550f4a2713aSLionel Sambuc } 551f4a2713aSLionel Sambuc 552f4a2713aSLionel SambucThis code relies on the comparison between two 553f4a2713aSLionel Sambuc``llvm::FoldingSetNodeIDs``. As the documentation for 554f4a2713aSLionel Sambuc``Stmt::Profile()`` indicates, the ``Profile()`` member function builds 555f4a2713aSLionel Sambuca description of a node in the AST, based on its properties, along with 556f4a2713aSLionel Sambucthose of its children. ``FoldingSetNodeID`` then serves as a hash we can 557f4a2713aSLionel Sambucuse to compare expressions. We will need ``areSameExpr`` later. Before 558f4a2713aSLionel Sambucyou run the new code on the additional loops added to 559f4a2713aSLionel Sambuctest-files/simple.cpp, try to figure out which ones will be considered 560f4a2713aSLionel Sambucpotentially convertible. 561