xref: /openbsd-src/gnu/llvm/clang/docs/ClangTransformerTutorial.rst (revision 12c855180aad702bbcca06e0398d774beeafb155)
1==========================
2Clang Transformer Tutorial
3==========================
4
5A tutorial on how to write a source-to-source translation tool using Clang Transformer.
6
7.. contents::
8   :local:
9
10What is Clang Transformer?
11--------------------------
12
13Clang Transformer is a framework for writing C++ diagnostics and program
14transformations. It is built on the clang toolchain and the LibTooling library,
15but aims to hide much of the complexity of clang's native, low-level libraries.
16
17The core abstraction of Transformer is the *rewrite rule*, which specifies how
18to change a given program pattern into a new form. Here are some examples of
19tasks you can achieve with Transformer:
20
21*   warn against using the name ``MkX`` for a declared function,
22*   change ``MkX`` to ``MakeX``, where ``MkX`` is the name of a declared function,
23*   change ``s.size()`` to ``Size(s)``, where ``s`` is a ``string``,
24*   collapse ``e.child().m()`` to ``e.m()``, for any expression ``e`` and method named
25    ``m``.
26
27All of the examples have a common form: they identify a pattern that is the
28target of the transformation, they specify an *edit* to the code identified by
29the pattern, and their pattern and edit refer to common variables, like ``s``,
30``e``, and ``m``, that range over code fragments. Our first and second examples also
31specify constraints on the pattern that aren't apparent from the syntax alone,
32like "``s`` is a ``string``." Even the first example ("warn ...") shares this form,
33even though it doesn't change any of the code -- it's "edit" is simply a no-op.
34
35Transformer helps users succinctly specify rules of this sort and easily execute
36them locally over a collection of files, apply them to selected portions of
37a codebase, or even bundle them as a clang-tidy check for ongoing application.
38
39Who is Clang Transformer for?
40-----------------------------
41
42Clang Transformer is for developers who want to write clang-tidy checks or write
43tools to modify a large number of C++ files in (roughly) the same way. What
44qualifies as "large" really depends on the nature of the change and your
45patience for repetitive editing. In our experience, automated solutions become
46worthwhile somewhere between 100 and 500 files.
47
48Getting Started
49---------------
50
51Patterns in Transformer are expressed with :doc:`clang's AST matchers <LibASTMatchers>`.
52Matchers are a language of combinators for describing portions of a clang
53Abstract Syntax Tree (AST). Since clang's AST includes complete type information
54(within the limits of single `Translation Unit (TU)`_,
55these patterns can even encode rich constraints on the type properties of AST
56nodes.
57
58.. _`Translation Unit (TU)`: https://en.wikipedia.org/wiki/Translation_unit_\(programming\)
59
60We assume a familiarity with the clang AST and the corresponding AST matchers
61for the purpose of this tutorial. Users who are unfamiliar with either are
62encouraged to start with the recommended references in `Related Reading`_.
63
64Example: style-checking names
65^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
66
67Assume you have a style-guide rule which forbids functions from being named
68"MkX" and you want to write a check that catches any violations of this rule. We
69can express this a Transformer rewrite rule:
70
71.. code-block:: c++
72
73   makeRule(functionDecl(hasName("MkX").bind("fun"),
74	    noopEdit(node("fun")),
75	    cat("The name ``MkX`` is not allowed for functions; please rename"));
76
77``makeRule`` is our go-to function for generating rewrite rules. It takes three
78arguments: the pattern, the edit, and (optionally) an explanatory note. In our
79example, the pattern (``functionDecl(...)``) identifies the declaration of the
80function ``MkX``. Since we're just diagnosing the problem, but not suggesting a
81fix, our edit is an no-op. But, it contains an *anchor* for the diagnostic
82message: ``node("fun")`` says to associate the message with the source range of
83the AST node bound to "fun"; in this case, the ill-named function declaration.
84Finally, we use ``cat`` to build a message that explains the change. Regarding the
85name ``cat`` -- we'll discuss it in more detail below, but suffice it to say that
86it can also take multiple arguments and concatenate their results.
87
88Note that the result of ``makeRule`` is a value of type
89``clang::transformer::RewriteRule``, but most users don't need to care about the
90details of this type.
91
92Example: renaming a function
93^^^^^^^^^^^^^^^^^^^^^^^^^^^^
94
95Now, let's extend this example to a *transformation*; specifically, the second
96example above:
97
98.. code-block:: c++
99
100   makeRule(declRefExpr(to(functionDecl(hasName("MkX")))),
101	    changeTo(cat("MakeX")),
102	    cat("MkX has been renamed MakeX"));
103
104In this example, the pattern (``declRefExpr(...)``) identifies any *reference* to
105the function ``MkX``, rather than the declaration itself, as in our previous
106example. Our edit (``changeTo(...)``) says to *change* the code matched by the
107pattern *to* the text "MakeX". Finally, we use ``cat`` again to build a message
108that explains the change.
109
110Here are some example changes that this rule would make:
111
112+--------------------------+----------------------------+
113| Original                 | Result                     |
114+==========================+============================+
115| ``X x = MkX(3);``        | ``X x = MakeX(3);``        |
116+--------------------------+----------------------------+
117| ``CallFactory(MkX, 3);`` | ``CallFactory(MakeX, 3);`` |
118+--------------------------+----------------------------+
119| ``auto f = MkX;``        | ``auto f = MakeX;``        |
120+--------------------------+----------------------------+
121
122Example: method to function
123^^^^^^^^^^^^^^^^^^^^^^^^^^^
124
125Next, let's write a rule to replace a method call with a (free) function call,
126applied to the original method call's target object. Specifically, "change
127``s.size()`` to ``Size(s)``, where ``s`` is a ``string``." We start with a simpler
128change that ignores the type of ``s``. That is, it will modify *any* method call
129where the method is named "size":
130
131.. code-block:: c++
132
133   llvm::StringRef s = "str";
134   makeRule(
135     cxxMemberCallExpr(
136       on(expr().bind(s)),
137       callee(cxxMethodDecl(hasName("size")))),
138     changeTo(cat("Size(", node(s), ")")),
139     cat("Method ``size`` is deprecated in favor of free function ``Size``"));
140
141We express the pattern with the given AST matcher, which binds the method call's
142target to ``s`` [#f1]_. For the edit, we again use ``changeTo``, but this
143time we construct the term from multiple parts, which we compose with ``cat``. The
144second part of our term is ``node(s)``, which selects the source code
145corresponding to the AST node ``s`` that was bound when a match was found in the
146AST for our rule's pattern. ``node(s)`` constructs a ``RangeSelector``, which, when
147used in ``cat``, indicates that the selected source should be inserted in the
148output at that point.
149
150Now, we probably don't want to rewrite *all* invocations of "size" methods, just
151those on ``std::string``\ s. We can achieve this change simply by refining our
152matcher. The rest of the rule remains unchanged:
153
154.. code-block:: c++
155
156   llvm::StringRef s = "str";
157   makeRule(
158     cxxMemberCallExpr(
159       on(expr(hasType(namedDecl(hasName("std::string"))))
160	 .bind(s)),
161       callee(cxxMethodDecl(hasName("size")))),
162     changeTo(cat("Size(", node(s), ")")),
163     cat("Method ``size`` is deprecated in favor of free function ``Size``"));
164
165Example: rewriting method calls
166^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
167
168In this example, we delete an "intermediary" method call in a string of
169invocations. This scenario can arise, for example, if you want to collapse a
170substructure into its parent.
171
172.. code-block:: c++
173
174   llvm::StringRef e = "expr", m = "member";
175   auto child_call = cxxMemberCallExpr(on(expr().bind(e)),
176				       callee(cxxMethodDecl(hasName("child"))));
177   makeRule(cxxMemberCallExpr(on(child_call), callee(memberExpr().bind(m)),
178	    changeTo(cat(e, ".", member(m), "()"))),
179	    cat("``child`` accessor is being removed; call ",
180		member(m), " directly on parent"));
181
182This rule isn't quite what we want: it will rewrite ``my_object.child().foo()`` to
183``my_object.foo()``, but it will also rewrite ``my_ptr->child().foo()`` to
184``my_ptr.foo()``, which is not what we intend. We could fix this by restricting
185the pattern with ``not(isArrow())`` in the definition of ``child_call``. Yet, we
186*want* to rewrite calls through pointers.
187
188To capture this idiom, we provide the ``access`` combinator to intelligently
189construct a field/method access. In our example, the member access is expressed
190as:
191
192.. code-block:: c++
193
194   access(e, cat(member(m)))
195
196The first argument specifies the object being accessed and the second, a
197description of the field/method name. In this case, we specify that the method
198name should be copied from the source -- specifically, the source range of ``m``'s
199member. To construct the method call, we would use this expression in ``cat``:
200
201.. code-block:: c++
202
203   cat(access(e, cat(member(m))), "()")
204
205Reference: ranges, stencils, edits, rules
206-----------------------------------------
207
208The above examples demonstrate just the basics of rewrite rules. Every element
209we touched on has more available constructors: range selectors, stencils, edits
210and rules. In this section, we'll briefly review each in turn, with references
211to the source headers for up-to-date information. First, though, we clarify what
212rewrite rules are actually rewriting.
213
214Rewriting ASTs to... Text?
215^^^^^^^^^^^^^^^^^^^^^^^^^^
216
217The astute reader may have noticed that we've been somewhat vague in our
218explanation of what the rewrite rules are actually rewriting. We've referred to
219"code", but code can be represented both as raw source text and as an abstract
220syntax tree. So, which one is it?
221
222Ideally, we'd be rewriting the input AST to a new AST, but clang's AST is not
223terribly amenable to this kind of transformation. So, we compromise: we express
224our patterns and the names that they bind in terms of the AST, but our changes
225in terms of source code text. We've designed Transformer's language to bridge
226the gap between the two representations, in an attempt to minimize the user's
227need to reason about source code locations and other, low-level syntactic
228details.
229
230Range Selectors
231^^^^^^^^^^^^^^^
232
233Transformer provides a small API for describing source ranges: the
234``RangeSelector`` combinators. These ranges are most commonly used to specify the
235source code affected by an edit and to extract source code in constructing new
236text.
237
238Roughly, there are two kinds of range combinators: ones that select a source
239range based on the AST, and others that combine existing ranges into new ranges.
240For example, ``node`` selects the range of source spanned by a particular AST
241node, as we've seen, while ``after`` selects the (empty) range located immediately
242after its argument range. So, ``after(node("id"))`` is the empty range immediately
243following the AST node bound to ``id``.
244
245For the full collection of ``RangeSelector``\ s, see the header,
246`clang/Tooling/Transformer/RangeSelector.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RangeSelector.h>`_
247
248Stencils
249^^^^^^^^
250
251Transformer offers a large and growing collection of combinators for
252constructing output. Above, we demonstrated ``cat``, the core function for
253constructing stencils. It takes a series of arguments, of three possible kinds:
254
255#.  Raw text, to be copied directly to the output.
256#.  Selector: specified with a ``RangeSelector``, indicates a range of source text
257    to copy to the output.
258#.  Builder: an operation that constructs a code snippet from its arguments. For
259    example, the ``access`` function we saw above.
260
261Data of these different types are all represented (generically) by a ``Stencil``.
262``cat`` takes text and ``RangeSelector``\ s directly as arguments, rather than
263requiring that they be constructed with a builder; other builders are
264constructed explicitly.
265
266In general, ``Stencil``\ s produce text from a match result. So, they are not
267limited to generating source code, but can also be used to generate diagnostic
268messages that reference (named) elements of the matched code, like we saw in the
269example of rewriting method calls.
270
271Further details of the ``Stencil`` type are documented in the header file
272`clang/Tooling/Transformer/Stencil.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/Stencil.h>`_.
273
274Edits
275^^^^^
276
277Transformer supports additional forms of edits. First, in a ``changeTo``, we can
278specify the particular portion of code to be replaced, using the same
279``RangeSelector`` we saw earlier. For example, we could change the function name
280in a function declaration with:
281
282.. code-block:: c++
283
284   makeRule(functionDecl(hasName("bad")).bind(f),
285	    changeTo(name(f), cat("good")),
286	    cat("bad is now good"));
287
288We also provide simpler editing primitives for insertion and deletion:
289``insertBefore``, ``insertAfter`` and ``remove``. These can all be found in the header
290file
291`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
292
293We are not limited one edit per match found. Some situations require making
294multiple edits for each match. For example, suppose we wanted to swap two
295arguments of a function call.
296
297For this, we provide an overload of ``makeRule`` that takes a list of edits,
298rather than just a single one. Our example might look like:
299
300.. code-block:: c++
301
302   makeRule(callExpr(...),
303	   {changeTo(node(arg0), cat(node(arg2))),
304	    changeTo(node(arg2), cat(node(arg0)))},
305	   cat("swap the first and third arguments of the call"));
306
307``EditGenerator``\ s (Advanced)
308^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
309
310The particular edits we've seen so far are all instances of the ``ASTEdit`` class,
311or a list of such. But, not all edits can be expressed as ``ASTEdit``\ s. So, we
312also support a very general signature for edit generators:
313
314.. code-block:: c++
315
316   using EditGenerator = MatchConsumer<llvm::SmallVector<Edit, 1>>;
317
318That is, an ``EditGenerator`` is function that maps a ``MatchResult`` to a set
319of edits, or fails. This signature supports a very general form of computation
320over match results. Transformer provides a number of functions for working with
321``EditGenerator``\ s, most notably
322`flatten <https://github.com/llvm/llvm-project/blob/1fabe6e51917bcd7a1242294069c682fe6dffa45/clang/include/clang/Tooling/Transformer/RewriteRule.h#L165-L167>`_
323``EditGenerator``\ s, like list flattening. For the full list, see the header file
324`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
325
326Rules
327^^^^^
328
329We can also compose multiple *rules*, rather than just edits within a rule,
330using ``applyFirst``: it composes a list of rules as an ordered choice, where
331Transformer applies the first rule whose pattern matches, ignoring others in the
332list that follow. If the matchers are independent then order doesn't matter. In
333that case, ``applyFirst`` is simply joining the set of rules into one.
334
335The benefit of ``applyFirst`` is that, for some problems, it allows the user to
336more concisely formulate later rules in the list, since their patterns need not
337explicitly exclude the earlier patterns of the list. For example, consider a set
338of rules that rewrite compound statements, where one rule handles the case of an
339empty compound statement and the other handles non-empty compound statements.
340With ``applyFirst``, these rules can be expressed compactly as:
341
342.. code-block:: c++
343
344   applyFirst({
345     makeRule(compoundStmt(statementCountIs(0)).bind("empty"), ...),
346     makeRule(compoundStmt().bind("non-empty"),...)
347   })
348
349The second rule does not need to explicitly specify that the compound statement
350is non-empty -- it follows from the rules position in ``applyFirst``. For more
351complicated examples, this can lead to substantially more readable code.
352
353Sometimes, a modification to the code might require the inclusion of a
354particular header file. To this end, users can modify rules to specify include
355directives with ``addInclude``.
356
357For additional documentation on these functions, see the header file
358`clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
359
360Using a RewriteRule as a clang-tidy check
361-----------------------------------------
362
363Transformer supports executing a rewrite rule as a
364`clang-tidy <https://clang.llvm.org/extra/clang-tidy/>`_ check, with the class
365``clang::tidy::utils::TransformerClangTidyCheck``. It is designed to require
366minimal code in the definition. For example, given a rule
367``MyCheckAsRewriteRule``, one can define a tidy check as follows:
368
369.. code-block:: c++
370
371   class MyCheck : public TransformerClangTidyCheck {
372    public:
373     MyCheck(StringRef Name, ClangTidyContext *Context)
374	 : TransformerClangTidyCheck(MyCheckAsRewriteRule, Name, Context) {}
375   };
376
377``TransformerClangTidyCheck`` implements the virtual ``registerMatchers`` and
378``check`` methods based on your rule specification, so you don't need to implement
379them yourself. If the rule needs to be configured based on the language options
380and/or the clang-tidy configuration, it can be expressed as a function taking
381these as parameters and (optionally) returning a ``RewriteRule``. This would be
382useful, for example, for our method-renaming rule, which is parameterized by the
383original name and the target. For details, see
384`clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h <https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h>`_
385
386Related Reading
387---------------
388
389A good place to start understanding the clang AST and its matchers is with the
390introductions on clang's site:
391
392*   :doc:`Introduction to the Clang AST <IntroductionToTheClangAST>`
393*   :doc:`Matching the Clang AST <LibASTMatchers>`
394*   `AST Matcher Reference <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_
395
396.. rubric:: Footnotes
397
398.. [#f1] Technically, it binds it to the string "str", to which our
399    variable ``s`` is bound. But, the choice of that id string is
400    irrelevant, so elide the difference.
401