1========================== 2Clang Transformer Tutorial 3========================== 4 5A tutorial on how to write a source-to-source translation tool using Clang Transformer. 6 7.. contents:: 8 :local: 9 10What is Clang Transformer? 11-------------------------- 12 13Clang Transformer is a framework for writing C++ diagnostics and program 14transformations. It is built on the clang toolchain and the LibTooling library, 15but aims to hide much of the complexity of clang's native, low-level libraries. 16 17The core abstraction of Transformer is the *rewrite rule*, which specifies how 18to change a given program pattern into a new form. Here are some examples of 19tasks you can achieve with Transformer: 20 21* warn against using the name ``MkX`` for a declared function, 22* change ``MkX`` to ``MakeX``, where ``MkX`` is the name of a declared function, 23* change ``s.size()`` to ``Size(s)``, where ``s`` is a ``string``, 24* collapse ``e.child().m()`` to ``e.m()``, for any expression ``e`` and method named 25 ``m``. 26 27All of the examples have a common form: they identify a pattern that is the 28target of the transformation, they specify an *edit* to the code identified by 29the pattern, and their pattern and edit refer to common variables, like ``s``, 30``e``, and ``m``, that range over code fragments. Our first and second examples also 31specify constraints on the pattern that aren't apparent from the syntax alone, 32like "``s`` is a ``string``." Even the first example ("warn ...") shares this form, 33even though it doesn't change any of the code -- it's "edit" is simply a no-op. 34 35Transformer helps users succinctly specify rules of this sort and easily execute 36them locally over a collection of files, apply them to selected portions of 37a codebase, or even bundle them as a clang-tidy check for ongoing application. 38 39Who is Clang Transformer for? 40----------------------------- 41 42Clang Transformer is for developers who want to write clang-tidy checks or write 43tools to modify a large number of C++ files in (roughly) the same way. What 44qualifies as "large" really depends on the nature of the change and your 45patience for repetitive editing. In our experience, automated solutions become 46worthwhile somewhere between 100 and 500 files. 47 48Getting Started 49--------------- 50 51Patterns in Transformer are expressed with :doc:`clang's AST matchers <LibASTMatchers>`. 52Matchers are a language of combinators for describing portions of a clang 53Abstract Syntax Tree (AST). Since clang's AST includes complete type information 54(within the limits of single `Translation Unit (TU)`_, 55these patterns can even encode rich constraints on the type properties of AST 56nodes. 57 58.. _`Translation Unit (TU)`: https://en.wikipedia.org/wiki/Translation_unit_\(programming\) 59 60We assume a familiarity with the clang AST and the corresponding AST matchers 61for the purpose of this tutorial. Users who are unfamiliar with either are 62encouraged to start with the recommended references in `Related Reading`_. 63 64Example: style-checking names 65^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 66 67Assume you have a style-guide rule which forbids functions from being named 68"MkX" and you want to write a check that catches any violations of this rule. We 69can express this a Transformer rewrite rule: 70 71.. code-block:: c++ 72 73 makeRule(functionDecl(hasName("MkX").bind("fun"), 74 noopEdit(node("fun")), 75 cat("The name ``MkX`` is not allowed for functions; please rename")); 76 77``makeRule`` is our go-to function for generating rewrite rules. It takes three 78arguments: the pattern, the edit, and (optionally) an explanatory note. In our 79example, the pattern (``functionDecl(...)``) identifies the declaration of the 80function ``MkX``. Since we're just diagnosing the problem, but not suggesting a 81fix, our edit is an no-op. But, it contains an *anchor* for the diagnostic 82message: ``node("fun")`` says to associate the message with the source range of 83the AST node bound to "fun"; in this case, the ill-named function declaration. 84Finally, we use ``cat`` to build a message that explains the change. Regarding the 85name ``cat`` -- we'll discuss it in more detail below, but suffice it to say that 86it can also take multiple arguments and concatenate their results. 87 88Note that the result of ``makeRule`` is a value of type 89``clang::transformer::RewriteRule``, but most users don't need to care about the 90details of this type. 91 92Example: renaming a function 93^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 94 95Now, let's extend this example to a *transformation*; specifically, the second 96example above: 97 98.. code-block:: c++ 99 100 makeRule(declRefExpr(to(functionDecl(hasName("MkX")))), 101 changeTo(cat("MakeX")), 102 cat("MkX has been renamed MakeX")); 103 104In this example, the pattern (``declRefExpr(...)``) identifies any *reference* to 105the function ``MkX``, rather than the declaration itself, as in our previous 106example. Our edit (``changeTo(...)``) says to *change* the code matched by the 107pattern *to* the text "MakeX". Finally, we use ``cat`` again to build a message 108that explains the change. 109 110Here are some example changes that this rule would make: 111 112+--------------------------+----------------------------+ 113| Original | Result | 114+==========================+============================+ 115| ``X x = MkX(3);`` | ``X x = MakeX(3);`` | 116+--------------------------+----------------------------+ 117| ``CallFactory(MkX, 3);`` | ``CallFactory(MakeX, 3);`` | 118+--------------------------+----------------------------+ 119| ``auto f = MkX;`` | ``auto f = MakeX;`` | 120+--------------------------+----------------------------+ 121 122Example: method to function 123^^^^^^^^^^^^^^^^^^^^^^^^^^^ 124 125Next, let's write a rule to replace a method call with a (free) function call, 126applied to the original method call's target object. Specifically, "change 127``s.size()`` to ``Size(s)``, where ``s`` is a ``string``." We start with a simpler 128change that ignores the type of ``s``. That is, it will modify *any* method call 129where the method is named "size": 130 131.. code-block:: c++ 132 133 llvm::StringRef s = "str"; 134 makeRule( 135 cxxMemberCallExpr( 136 on(expr().bind(s)), 137 callee(cxxMethodDecl(hasName("size")))), 138 changeTo(cat("Size(", node(s), ")")), 139 cat("Method ``size`` is deprecated in favor of free function ``Size``")); 140 141We express the pattern with the given AST matcher, which binds the method call's 142target to ``s`` [#f1]_. For the edit, we again use ``changeTo``, but this 143time we construct the term from multiple parts, which we compose with ``cat``. The 144second part of our term is ``node(s)``, which selects the source code 145corresponding to the AST node ``s`` that was bound when a match was found in the 146AST for our rule's pattern. ``node(s)`` constructs a ``RangeSelector``, which, when 147used in ``cat``, indicates that the selected source should be inserted in the 148output at that point. 149 150Now, we probably don't want to rewrite *all* invocations of "size" methods, just 151those on ``std::string``\ s. We can achieve this change simply by refining our 152matcher. The rest of the rule remains unchanged: 153 154.. code-block:: c++ 155 156 llvm::StringRef s = "str"; 157 makeRule( 158 cxxMemberCallExpr( 159 on(expr(hasType(namedDecl(hasName("std::string")))) 160 .bind(s)), 161 callee(cxxMethodDecl(hasName("size")))), 162 changeTo(cat("Size(", node(s), ")")), 163 cat("Method ``size`` is deprecated in favor of free function ``Size``")); 164 165Example: rewriting method calls 166^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 167 168In this example, we delete an "intermediary" method call in a string of 169invocations. This scenario can arise, for example, if you want to collapse a 170substructure into its parent. 171 172.. code-block:: c++ 173 174 llvm::StringRef e = "expr", m = "member"; 175 auto child_call = cxxMemberCallExpr(on(expr().bind(e)), 176 callee(cxxMethodDecl(hasName("child")))); 177 makeRule(cxxMemberCallExpr(on(child_call), callee(memberExpr().bind(m)), 178 changeTo(cat(e, ".", member(m), "()"))), 179 cat("``child`` accessor is being removed; call ", 180 member(m), " directly on parent")); 181 182This rule isn't quite what we want: it will rewrite ``my_object.child().foo()`` to 183``my_object.foo()``, but it will also rewrite ``my_ptr->child().foo()`` to 184``my_ptr.foo()``, which is not what we intend. We could fix this by restricting 185the pattern with ``not(isArrow())`` in the definition of ``child_call``. Yet, we 186*want* to rewrite calls through pointers. 187 188To capture this idiom, we provide the ``access`` combinator to intelligently 189construct a field/method access. In our example, the member access is expressed 190as: 191 192.. code-block:: c++ 193 194 access(e, cat(member(m))) 195 196The first argument specifies the object being accessed and the second, a 197description of the field/method name. In this case, we specify that the method 198name should be copied from the source -- specifically, the source range of ``m``'s 199member. To construct the method call, we would use this expression in ``cat``: 200 201.. code-block:: c++ 202 203 cat(access(e, cat(member(m))), "()") 204 205Reference: ranges, stencils, edits, rules 206----------------------------------------- 207 208The above examples demonstrate just the basics of rewrite rules. Every element 209we touched on has more available constructors: range selectors, stencils, edits 210and rules. In this section, we'll briefly review each in turn, with references 211to the source headers for up-to-date information. First, though, we clarify what 212rewrite rules are actually rewriting. 213 214Rewriting ASTs to... Text? 215^^^^^^^^^^^^^^^^^^^^^^^^^^ 216 217The astute reader may have noticed that we've been somewhat vague in our 218explanation of what the rewrite rules are actually rewriting. We've referred to 219"code", but code can be represented both as raw source text and as an abstract 220syntax tree. So, which one is it? 221 222Ideally, we'd be rewriting the input AST to a new AST, but clang's AST is not 223terribly amenable to this kind of transformation. So, we compromise: we express 224our patterns and the names that they bind in terms of the AST, but our changes 225in terms of source code text. We've designed Transformer's language to bridge 226the gap between the two representations, in an attempt to minimize the user's 227need to reason about source code locations and other, low-level syntactic 228details. 229 230Range Selectors 231^^^^^^^^^^^^^^^ 232 233Transformer provides a small API for describing source ranges: the 234``RangeSelector`` combinators. These ranges are most commonly used to specify the 235source code affected by an edit and to extract source code in constructing new 236text. 237 238Roughly, there are two kinds of range combinators: ones that select a source 239range based on the AST, and others that combine existing ranges into new ranges. 240For example, ``node`` selects the range of source spanned by a particular AST 241node, as we've seen, while ``after`` selects the (empty) range located immediately 242after its argument range. So, ``after(node("id"))`` is the empty range immediately 243following the AST node bound to ``id``. 244 245For the full collection of ``RangeSelector``\ s, see the header, 246`clang/Tooling/Transformer/RangeSelector.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RangeSelector.h>`_ 247 248Stencils 249^^^^^^^^ 250 251Transformer offers a large and growing collection of combinators for 252constructing output. Above, we demonstrated ``cat``, the core function for 253constructing stencils. It takes a series of arguments, of three possible kinds: 254 255#. Raw text, to be copied directly to the output. 256#. Selector: specified with a ``RangeSelector``, indicates a range of source text 257 to copy to the output. 258#. Builder: an operation that constructs a code snippet from its arguments. For 259 example, the ``access`` function we saw above. 260 261Data of these different types are all represented (generically) by a ``Stencil``. 262``cat`` takes text and ``RangeSelector``\ s directly as arguments, rather than 263requiring that they be constructed with a builder; other builders are 264constructed explicitly. 265 266In general, ``Stencil``\ s produce text from a match result. So, they are not 267limited to generating source code, but can also be used to generate diagnostic 268messages that reference (named) elements of the matched code, like we saw in the 269example of rewriting method calls. 270 271Further details of the ``Stencil`` type are documented in the header file 272`clang/Tooling/Transformer/Stencil.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/Stencil.h>`_. 273 274Edits 275^^^^^ 276 277Transformer supports additional forms of edits. First, in a ``changeTo``, we can 278specify the particular portion of code to be replaced, using the same 279``RangeSelector`` we saw earlier. For example, we could change the function name 280in a function declaration with: 281 282.. code-block:: c++ 283 284 makeRule(functionDecl(hasName("bad")).bind(f), 285 changeTo(name(f), cat("good")), 286 cat("bad is now good")); 287 288We also provide simpler editing primitives for insertion and deletion: 289``insertBefore``, ``insertAfter`` and ``remove``. These can all be found in the header 290file 291`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_. 292 293We are not limited one edit per match found. Some situations require making 294multiple edits for each match. For example, suppose we wanted to swap two 295arguments of a function call. 296 297For this, we provide an overload of ``makeRule`` that takes a list of edits, 298rather than just a single one. Our example might look like: 299 300.. code-block:: c++ 301 302 makeRule(callExpr(...), 303 {changeTo(node(arg0), cat(node(arg2))), 304 changeTo(node(arg2), cat(node(arg0)))}, 305 cat("swap the first and third arguments of the call")); 306 307``EditGenerator``\ s (Advanced) 308^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 309 310The particular edits we've seen so far are all instances of the ``ASTEdit`` class, 311or a list of such. But, not all edits can be expressed as ``ASTEdit``\ s. So, we 312also support a very general signature for edit generators: 313 314.. code-block:: c++ 315 316 using EditGenerator = MatchConsumer<llvm::SmallVector<Edit, 1>>; 317 318That is, an ``EditGenerator`` is function that maps a ``MatchResult`` to a set 319of edits, or fails. This signature supports a very general form of computation 320over match results. Transformer provides a number of functions for working with 321``EditGenerator``\ s, most notably 322`flatten <https://github.com/llvm/llvm-project/blob/1fabe6e51917bcd7a1242294069c682fe6dffa45/clang/include/clang/Tooling/Transformer/RewriteRule.h#L165-L167>`_ 323``EditGenerator``\ s, like list flattening. For the full list, see the header file 324`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_. 325 326Rules 327^^^^^ 328 329We can also compose multiple *rules*, rather than just edits within a rule, 330using ``applyFirst``: it composes a list of rules as an ordered choice, where 331Transformer applies the first rule whose pattern matches, ignoring others in the 332list that follow. If the matchers are independent then order doesn't matter. In 333that case, ``applyFirst`` is simply joining the set of rules into one. 334 335The benefit of ``applyFirst`` is that, for some problems, it allows the user to 336more concisely formulate later rules in the list, since their patterns need not 337explicitly exclude the earlier patterns of the list. For example, consider a set 338of rules that rewrite compound statements, where one rule handles the case of an 339empty compound statement and the other handles non-empty compound statements. 340With ``applyFirst``, these rules can be expressed compactly as: 341 342.. code-block:: c++ 343 344 applyFirst({ 345 makeRule(compoundStmt(statementCountIs(0)).bind("empty"), ...), 346 makeRule(compoundStmt().bind("non-empty"),...) 347 }) 348 349The second rule does not need to explicitly specify that the compound statement 350is non-empty -- it follows from the rules position in ``applyFirst``. For more 351complicated examples, this can lead to substantially more readable code. 352 353Sometimes, a modification to the code might require the inclusion of a 354particular header file. To this end, users can modify rules to specify include 355directives with ``addInclude``. 356 357For additional documentation on these functions, see the header file 358`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_. 359 360Using a RewriteRule as a clang-tidy check 361----------------------------------------- 362 363Transformer supports executing a rewrite rule as a 364`clang-tidy <https://clang.llvm.org/extra/clang-tidy/>`_ check, with the class 365``clang::tidy::utils::TransformerClangTidyCheck``. It is designed to require 366minimal code in the definition. For example, given a rule 367``MyCheckAsRewriteRule``, one can define a tidy check as follows: 368 369.. code-block:: c++ 370 371 class MyCheck : public TransformerClangTidyCheck { 372 public: 373 MyCheck(StringRef Name, ClangTidyContext *Context) 374 : TransformerClangTidyCheck(MyCheckAsRewriteRule, Name, Context) {} 375 }; 376 377``TransformerClangTidyCheck`` implements the virtual ``registerMatchers`` and 378``check`` methods based on your rule specification, so you don't need to implement 379them yourself. If the rule needs to be configured based on the language options 380and/or the clang-tidy configuration, it can be expressed as a function taking 381these as parameters and (optionally) returning a ``RewriteRule``. This would be 382useful, for example, for our method-renaming rule, which is parameterized by the 383original name and the target. For details, see 384`clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h <https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h>`_ 385 386Related Reading 387--------------- 388 389A good place to start understanding the clang AST and its matchers is with the 390introductions on clang's site: 391 392* :doc:`Introduction to the Clang AST <IntroductionToTheClangAST>` 393* :doc:`Matching the Clang AST <LibASTMatchers>` 394* `AST Matcher Reference <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_ 395 396.. rubric:: Footnotes 397 398.. [#f1] Technically, it binds it to the string "str", to which our 399 variable ``s`` is bound. But, the choice of that id string is 400 irrelevant, so elide the difference. 401