xref: /llvm-project/mlir/docs/Tutorials/transform/Ch1.md (revision b33b91a21788d439f49d6db4e7224c20f740f1a7)
168ae0d78SAlex Zinenko# Chapter 1: Combining Existing Transformations
268ae0d78SAlex Zinenko
368ae0d78SAlex Zinenko## Introduction
468ae0d78SAlex Zinenko
568ae0d78SAlex ZinenkoThe Transform dialect allows one to precisely target transformations at specific operations in the IR and to chain them, that is to apply a transformation to operations produced by the previous transformation. To achieve this, transformations are expressed as other operations in the IR. We call these the IR containing these operations transform IR. And we call the IR that is being transformed payload IR.
668ae0d78SAlex Zinenko
768ae0d78SAlex ZinenkoTransform IR operations operate on values that may be associated with payload IR operations, values or attributes. We call the first two kinds of values operation and value handles, respectively. We call the last kind of values parameters.
868ae0d78SAlex Zinenko
9*b33b91a2SOleksandr "Alex" ZinenkoThe application of transform IR always starts from one top-level operation. In the C++ API, this operation is passed to the `applyTransforms` function. This top-level operation specifies if other transformations should be performed and how. The most common top-level operation, `transform.named_sequence` merely applies other transform operations listed in its body one after the other, similarly to a function or a macro.
1068ae0d78SAlex Zinenko
1168ae0d78SAlex ZinenkoLet us illustrate this with a simple sequence of transformations on the common “fully connected + bias + ReLU” ML layer, which boils down to performing a matrix multiplication, followed by an (elementwise) matrix addition and taking an elementwise maximum with 0. This can be expressed using the following IR:
1268ae0d78SAlex Zinenko
1368ae0d78SAlex Zinenko```mlir
1468ae0d78SAlex Zinenkofunc.func @fc_relu(%lhs: tensor<512x512xf32>, %rhs: tensor<512x512xf32>,
1568ae0d78SAlex Zinenko                   %bias: tensor<512x512xf32>, %output: tensor<512x512xf32>)
1668ae0d78SAlex Zinenko                   -> tensor<512x512xf32> {
1768ae0d78SAlex Zinenko  // Matrix-matrix multiplication.
1868ae0d78SAlex Zinenko  %matmul = linalg.matmul ins(%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32>)
1968ae0d78SAlex Zinenko                          outs(%output: tensor<512x512xf32>) -> tensor<512x512xf32>
2068ae0d78SAlex Zinenko
2168ae0d78SAlex Zinenko  // Elementwise addition.
2268ae0d78SAlex Zinenko  %biased = linalg.elemwise_binary { fun = #linalg.binary_fn<add> }
2368ae0d78SAlex Zinenko    ins(%matmul, %bias : tensor<512x512xf32>, tensor<512x512xf32>)
2468ae0d78SAlex Zinenko    outs(%output : tensor<512x512xf32>) -> tensor<512x512xf32>
2568ae0d78SAlex Zinenko
2668ae0d78SAlex Zinenko  // Elementwise max with 0 (ReLU).
2768ae0d78SAlex Zinenko  %c0f = arith.constant 0.0 : f32
2868ae0d78SAlex Zinenko  %relued = linalg.elemwise_binary { fun = #linalg.binary_fn<max_signed> }
2968ae0d78SAlex Zinenko    ins(%biased, %c0f : tensor<512x512xf32>, f32)
3068ae0d78SAlex Zinenko    outs(%output : tensor<512x512xf32>) -> tensor<512x512xf32>
3168ae0d78SAlex Zinenko  func.return %relued : tensor<512x512xf32>
3268ae0d78SAlex Zinenko}
3368ae0d78SAlex Zinenko```
3468ae0d78SAlex Zinenko
3568ae0d78SAlex Zinenko## Top-Level Sequence Operation
3668ae0d78SAlex Zinenko
3768ae0d78SAlex ZinenkoFor performance reasons, we would like to tile and fuse these operations to exploit cache locality. This is a sequence of transformations that need to be performed one after another, so we naturally start with the corresponding top-level transform operation.
3868ae0d78SAlex Zinenko
3968ae0d78SAlex Zinenko```mlir
40*b33b91a2SOleksandr "Alex" Zinenkomodule attributes {transform.with_named_sequence} {
41*b33b91a2SOleksandr "Alex" Zinenko  transform.named_sequence @__transform_main(
42*b33b91a2SOleksandr "Alex" Zinenko      %arg0: !transform.any_op,
4368ae0d78SAlex Zinenko      %arg1: !transform.op<"linalg.matmul">,
4468ae0d78SAlex Zinenko      %arg2: !transform.op<"linalg.elemwise_binary">):
4568ae0d78SAlex Zinenko    transform.yield
4668ae0d78SAlex Zinenko  }
47*b33b91a2SOleksandr "Alex" Zinenko}
4868ae0d78SAlex Zinenko```
4968ae0d78SAlex Zinenko
5068ae0d78SAlex ZinenkoThere are several aspects worth noticing in this operation.
5168ae0d78SAlex Zinenko
52*b33b91a2SOleksandr "Alex" ZinenkoIts special name, `@__transform_main` and the first argument are mandated by the interpreter pass, similarly to how the entry point of C programs needs to be called `main` and may have the `int (int argc, char** argv)` signature. This argument will be associated with the top-level payload operation, most often the operation that the pass is applied to. Note that none of this is required when applying the transformation _programmatically_ via `applyTransforms` or `applyNamedSequence`.
5368ae0d78SAlex Zinenko
5468ae0d78SAlex ZinenkoThe remaining entry block arguments are optional and can be associated with payload attributes, operations or values that are useful in the sequence. These are also specified when calling `applyTransforms`. In our case, we are interested in the matrix multiplication and elementwise operations that we are going to tile and fuse.
5568ae0d78SAlex Zinenko
56f763270bSKai SasakiAll value handles have Transform dialect types. These types specify certain properties of the payload IR entities associated with them. In this example, `transform.any_op` indicates that the handle is associated with arbitrary payload operations. On the contrary, `transform.op<"X">` indicates that the handle is associated _only_ with payload operations of kind `X`. These constraints are verified when the handle/payload association is created. For entry block arguments of top-level transform operations, this happens early in the `applyTransforms` function. If the constraints are not satisfied, the transform application fails and produces diagnostics for the user.
5768ae0d78SAlex Zinenko
58*b33b91a2SOleksandr "Alex" ZinenkoFinally, the operation is wrapped in a module with the `transform.with_named_sequence` attribute that triggers all necessary verifications if multiple named sequences exist.
59*b33b91a2SOleksandr "Alex" Zinenko
6068ae0d78SAlex Zinenko## Failure Propagation
6168ae0d78SAlex Zinenko
62*b33b91a2SOleksandr "Alex" ZinenkoThe Transform dialect infrastructure has a particular mechanism for handling diagnostics that supports recoverable errors. It is best understood by considering the (unnamed) sequence operation that has a mandatory attribute specifying the failure propagation mode. There are two options:
6368ae0d78SAlex Zinenko
6468ae0d78SAlex Zinenko*   “propagate” makes the sequence transformation fail if any of the nested transformation fails;
6568ae0d78SAlex Zinenko*   “suppress” makes the sequence succeed even if one of the nested transformations fails, but without attempting to perform the transformations following the failed one in the sequence.
6668ae0d78SAlex Zinenko
67*b33b91a2SOleksandr "Alex" ZinenkoThis latter allows the transformation script surrounding the sequence to continue despite errors within the sequence, assuming they are recoverable. As we are only building the transformation script, it is preferable to propagate failures so we know when something did not apply.
6868ae0d78SAlex Zinenko
6968ae0d78SAlex ZinenkoTo check or debug a transform sequence, it is possible to print various entities associated with the transform IR values. For example, we can print the operations associated with the handles:
7068ae0d78SAlex Zinenko
7168ae0d78SAlex Zinenko```mlir
7268ae0d78SAlex Zinenkotransform.sequence failures(propagate) {
7368ae0d78SAlex Zinenko^bb0(%arg0: !transform.any_op,
7468ae0d78SAlex Zinenko     %arg1: !transform.op<"linalg.matmul">,
7568ae0d78SAlex Zinenko     %arg2: !transform.op<"linalg.elemwise_binary">):
762798b72aSOleksandr "Alex" Zinenko  transform.debug.emit_remark_at %arg1, "matmul"
7768ae0d78SAlex Zinenko      : !transform.op<"linalg.matmul">
782798b72aSOleksandr "Alex" Zinenko  transform.debug.emit_remark_at %arg2, "elemwise_binaries"
7968ae0d78SAlex Zinenko      : !transform.op<"linalg.elemwise_binary">
8068ae0d78SAlex Zinenko  transform.yield
8168ae0d78SAlex Zinenko}
8268ae0d78SAlex Zinenko```
8368ae0d78SAlex Zinenko
8468ae0d78SAlex Zinenko## Transform Dialect Interpreter
8568ae0d78SAlex Zinenko
8639298b09SAndrzej WarzyńskiSince we don’t want to recompile the compiler every time we change a transformation, we can use a Transform dialect interpreter pass to apply this transformation sequence to the payload IR. As we will see in the next chapter, it is possible to define custom passes or even integrate the transform interpreter into a larger pass. For now, we can use the existing test pass:
8768ae0d78SAlex Zinenko
8868ae0d78SAlex Zinenko
8968ae0d78SAlex Zinenko```sh
90*b33b91a2SOleksandr "Alex" Zinenko$ mlir-opt sequence.mlir --pass-pipeline="
91*b33b91a2SOleksandr "Alex" Zinenko    builtin.module(transform-interpreter{
92*b33b91a2SOleksandr "Alex" Zinenko        debug-bind-trailing-args=linalg.matmul,linalg.elemwise_binary})"
9368ae0d78SAlex Zinenko```
9468ae0d78SAlex Zinenko
95*b33b91a2SOleksandr "Alex" ZinenkoThe `sequence.mlir` file contains _both_ the payload IR function _and_ the transform IR sequence nested in the same module. The transform interpreter pass will apply the `@__transform_main` named sequence to the anchor operation of the pass. In our case, we also asked the interpreter pass to associate the two extra arguments of the top-level sequence with all `linalg.matmul` and `linalg.elemwise_binary` payload operations through the respective pass options. Running this pass results in the expected remarks:
9668ae0d78SAlex Zinenko
9768ae0d78SAlex Zinenko```sh
98*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:7:13: remark: matmul
99278a8efbSIngo Müller  %matmul = linalg.matmul ins(%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32>)
10068ae0d78SAlex Zinenko            ^
101*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:7:13: note: see current operation: %0 = linalg.matmul ins(%arg0, %arg1 : tensor<512x512xf32>, tensor<512x512xf32>) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
102*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:10:13: remark: elemwise_binaries
10368ae0d78SAlex Zinenko  %biased = linalg.elemwise_binary { fun = #linalg.binary_fn<add> }
10468ae0d78SAlex Zinenko            ^
105*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:10:13: note: see current operation: %1 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>} ins(%0, %arg2 : tensor<512x512xf32>, tensor<512x512xf32>) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
106*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:14:13: remark: elemwise_binaries
10768ae0d78SAlex Zinenko  %relued = linalg.elemwise_binary { fun = #linalg.binary_fn<max_signed> }
10868ae0d78SAlex Zinenko            ^
109*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:14:13: note: see current operation: %2 = linalg.elemwise_binary {fun = #linalg.binary_fn<max_signed>} ins(%1, %cst : tensor<512x512xf32>, f32) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
11068ae0d78SAlex Zinenko```
11168ae0d78SAlex Zinenko
11268ae0d78SAlex ZinenkoNote that `%arg2` is associated with both elementwise payload operations. Any handle is associated with a list of entities. Individual transformations may or may not care about the order of elements in that list.
11368ae0d78SAlex Zinenko
11468ae0d78SAlex Zinenko
11568ae0d78SAlex Zinenko## Specifying Transformations
11668ae0d78SAlex Zinenko
11768ae0d78SAlex ZinenkoNow that we have handles to the operations we want to transform, we are ready to apply the transformations. Let us first try tiling the matmul operation itself.
11868ae0d78SAlex Zinenko
11968ae0d78SAlex Zinenko```mlir
120*b33b91a2SOleksandr "Alex" Zinenkomodule attributes {transform.with_named_sequence} {
121*b33b91a2SOleksandr "Alex" Zinenko  transform.named_sequence @__transform_main(
122*b33b91a2SOleksandr "Alex" Zinenko       %arg0: !transform.any_op,
12368ae0d78SAlex Zinenko       %arg1: !transform.op<"linalg.matmul">,
124*b33b91a2SOleksandr "Alex" Zinenko       %arg2: !transform.op<"linalg.elemwise_binary">) {
12568ae0d78SAlex Zinenko    // The actual tiling transformation takes tile sizes as attributes.
126*b33b91a2SOleksandr "Alex" Zinenko    %loop, %tiled = transform.structured.tile_using_forall %arg1
127*b33b91a2SOleksandr "Alex" Zinenko                    tile_sizes [4, 32]
128*b33b91a2SOleksandr "Alex" Zinenko      : (!transform.op<"linalg.matmul">)
129*b33b91a2SOleksandr "Alex" Zinenko     -> (!transform.any_op, !transform.any_op)
13068ae0d78SAlex Zinenko    transform.yield
13168ae0d78SAlex Zinenko  }
132*b33b91a2SOleksandr "Alex" Zinenko}
13368ae0d78SAlex Zinenko```
13468ae0d78SAlex Zinenko
135*b33b91a2SOleksandr "Alex" ZinenkoThe transformation returns two handles, as indicated in its [documentation](https://mlir.llvm.org/docs/Dialects/Transform/#transformstructuredtile_using_forall-transformtileusingforallop):
13668ae0d78SAlex Zinenko
13768ae0d78SAlex Zinenko*   A handle to `linalg.generic` operating on the subset of the original data.
138*b33b91a2SOleksandr "Alex" Zinenko*   A handle to the `scf.forall` “multi-for” loop around tensors.
13968ae0d78SAlex Zinenko
14068ae0d78SAlex ZinenkoRunning this transformation with the same command as above expectedly produces the tiled code.
14168ae0d78SAlex Zinenko
14268ae0d78SAlex Zinenko```mlir
143*b33b91a2SOleksandr "Alex" Zinenkofunc.func @fc_relu(%arg0: tensor<512x512xf32>,
144*b33b91a2SOleksandr "Alex" Zinenko                   %arg1: tensor<512x512xf32>,
145*b33b91a2SOleksandr "Alex" Zinenko                   %arg2: tensor<512x512xf32>,
146*b33b91a2SOleksandr "Alex" Zinenko                   %arg3: tensor<512x512xf32>) -> tensor<512x512xf32> {
14768ae0d78SAlex Zinenko  %cst = arith.constant 0.000000e+00 : f32
14868ae0d78SAlex Zinenko  %0 = scf.forall (%arg4, %arg5) in (128, 16) shared_outs(%arg6 = %arg3) -> (tensor<512x512xf32>) {
14968ae0d78SAlex Zinenko    %3 = affine.apply affine_map<(d0) -> (d0 * 4)>(%arg4)
15068ae0d78SAlex Zinenko    %4 = affine.apply affine_map<(d0) -> (d0 * 32)>(%arg5)
15168ae0d78SAlex Zinenko    %extracted_slice = tensor.extract_slice %arg0[%3, 0] [4, 512] [1, 1]
15268ae0d78SAlex Zinenko                     : tensor<512x512xf32> to tensor<4x512xf32>
15368ae0d78SAlex Zinenko    %extracted_slice_0 = tensor.extract_slice %arg1[0, %4] [512, 32] [1, 1]
15468ae0d78SAlex Zinenko                       : tensor<512x512xf32> to tensor<512x32xf32>
15568ae0d78SAlex Zinenko    %extracted_slice_1 = tensor.extract_slice %arg6[%3, %4] [4, 32] [1, 1]
15668ae0d78SAlex Zinenko                      : tensor<512x512xf32> to tensor<4x32xf32>
15768ae0d78SAlex Zinenko    %5 = linalg.matmul
15868ae0d78SAlex Zinenko         ins(%extracted_slice, %extracted_slice_0
15968ae0d78SAlex Zinenko             : tensor<4x512xf32>, tensor<512x32xf32>)
16068ae0d78SAlex Zinenko         outs(%extracted_slice_1 : tensor<4x32xf32>) -> tensor<4x32xf32>
16168ae0d78SAlex Zinenko    scf.forall.in_parallel {
16268ae0d78SAlex Zinenko      tensor.parallel_insert_slice %5 into %arg6[%3, %4] [4, 32] [1, 1]
16368ae0d78SAlex Zinenko          : tensor<4x32xf32> into tensor<512x512xf32>
16468ae0d78SAlex Zinenko    }
16568ae0d78SAlex Zinenko  }
16668ae0d78SAlex Zinenko  %1 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
16768ae0d78SAlex Zinenko    ins(%0, %arg2 : tensor<512x512xf32>, tensor<512x512xf32>)
16868ae0d78SAlex Zinenko    outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
16968ae0d78SAlex Zinenko  %2 = linalg.elemwise_binary {fun = #linalg.binary_fn<max_signed>}
17068ae0d78SAlex Zinenko    ins(%1, %cst : tensor<512x512xf32>, f32)
17168ae0d78SAlex Zinenko    outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
17268ae0d78SAlex Zinenko  return %2 : tensor<512x512xf32>
17368ae0d78SAlex Zinenko}
17468ae0d78SAlex Zinenko```
17568ae0d78SAlex Zinenko
17668ae0d78SAlex ZinenkoBesides producing new handles, the tiling transform operation _consumes_ the operand handle. This means that the handle is _invalidated_ after this operation, and is no longer supposed to be used. Transform operations are required to mark all their operands as either consumed or readonly. Transform operations usually consume the operand if the associated payload operations are erased or recreated (which means erased and created anew with similar structure). As handles are essentially references to payload operations, they would become dangling if the payload no longer exists.
17768ae0d78SAlex Zinenko
17868ae0d78SAlex Zinenko
17968ae0d78SAlex Zinenko## Handle Invalidation and Expensive Checks Mode
18068ae0d78SAlex Zinenko
181*b33b91a2SOleksandr "Alex" ZinenkoUndefined behavior is difficult to grapple with when it does happen, so the Transform dialect interpreter defaults to performing a set of additional, potentially expensive, checks that detect most undefined behavior in the transform IR. For example, if we wanted to  use the `%arg1` handle after it is consumed, it would cause undefined behavior that manifests as an assertion in the debug build, and likely as a segmentation fault in the release mode.
18268ae0d78SAlex Zinenko
18368ae0d78SAlex Zinenko```mlir
184*b33b91a2SOleksandr "Alex" Zinenkomodule attributes {transform.with_named_sequence} {
185*b33b91a2SOleksandr "Alex" Zinenko  transform.named_sequence @__transform_main(
186*b33b91a2SOleksandr "Alex" Zinenko       %arg0: !transform.any_op,
18768ae0d78SAlex Zinenko       %arg1: !transform.op<"linalg.matmul">,
188*b33b91a2SOleksandr "Alex" Zinenko       %arg2: !transform.op<"linalg.elemwise_binary">) {
18968ae0d78SAlex Zinenko    // The actual tiling transformation takes tile sizes as attributes.
19096ff0255SOleksandr "Alex" Zinenko    %loop, %tiled = transform.structured.tile_using_forall %arg1 tile_sizes [4, 32]
19168ae0d78SAlex Zinenko        : (!transform.op<"linalg.matmul">) -> (!transform.any_op, !transform.any_op)
19268ae0d78SAlex Zinenko
19368ae0d78SAlex Zinenko    // This is trying to use an invalidated handle leading to undefined behavior.
1942798b72aSOleksandr "Alex" Zinenko    transform.debug.emit_remark_at %arg1, "remark" : !transform.op<"linalg.matmul">
19568ae0d78SAlex Zinenko    transform.yield
19668ae0d78SAlex Zinenko  }
197*b33b91a2SOleksandr "Alex" Zinenko}
19868ae0d78SAlex Zinenko```
19968ae0d78SAlex Zinenko
20068ae0d78SAlex ZinenkoHowever, with the expensive checks enabled in the interpreter, a nice diagnostic is produced:
20168ae0d78SAlex Zinenko
20268ae0d78SAlex Zinenko```sh
203*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:28:3: error: op uses a handle invalidated by a previously executed transform op
2042798b72aSOleksandr "Alex" Zinenko  transform.debug.emit_remark_at %mm, "elemwise_binaries" : !transform.any_op
20568ae0d78SAlex Zinenko  ^
206*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:26:9: note: handle to invalidated ops
20768ae0d78SAlex Zinenko  %mm = transform.cast %matmul : !transform.op<"linalg.matmul"> to !transform.any_op
20868ae0d78SAlex Zinenko        ^
209*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:27:19: note: invalidated by this transform op that consumes its operand #0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them
21096ff0255SOleksandr "Alex" Zinenko  %loop, %tiled = transform.structured.tile_using_forall %mm tile_sizes [4, 32]
21168ae0d78SAlex Zinenko```
21268ae0d78SAlex Zinenko
213*b33b91a2SOleksandr "Alex" ZinenkoWhen compile-time performance is a concern, and the transformation sequence is sufficiently stable, it is possible to disable expensive checks in the interpreter for improved performance by providing the `disable-expensive-checks` option to the pass or by setting the corresponding flag in the `TransformOptions` passed into `applyTransforms`.
214*b33b91a2SOleksandr "Alex" Zinenko
21568ae0d78SAlex ZinenkoOne may observe that some operations such as `transform.cast` do not consume the operand (because they don’t erase the corresponding operation). So what would happen if we tried to use that operand instead?
21668ae0d78SAlex Zinenko
21768ae0d78SAlex Zinenko```mlir
218*b33b91a2SOleksandr "Alex" Zinenkomodule attributes {transform.with_named_sequence} {
219*b33b91a2SOleksandr "Alex" Zinenko  transform.named_sequence @__transform_main
220*b33b91a2SOleksandr "Alex" Zinenko       %arg0: !transform.any_op,
22168ae0d78SAlex Zinenko       %arg1: !transform.op<"linalg.matmul">,
222*b33b91a2SOleksandr "Alex" Zinenko       %arg2: !transform.op<"linalg.elemwise_binary">) {
22368ae0d78SAlex Zinenko    // We can cast one type to another as long as operations are compatible
22468ae0d78SAlex Zinenko    // with both types. This creates "aliasing" handles.
22568ae0d78SAlex Zinenko    %casted = transform.cast %arg1 : !transform.op<"linalg.matmul">
22668ae0d78SAlex Zinenko        to !transform.any_op
22768ae0d78SAlex Zinenko
22868ae0d78SAlex Zinenko    // The actual tiling transformation takes tile sizes as attributes.
229*b33b91a2SOleksandr "Alex" Zinenko    %loop, %tiled = transform.structured.tile_using_forall %arg1
230*b33b91a2SOleksandr "Alex" Zinenko                    tile_sizes [4, 32]
231*b33b91a2SOleksandr "Alex" Zinenko      : (!transform.op<"linalg.matmul">)
232*b33b91a2SOleksandr "Alex" Zinenko     -> (!transform.any_op, !transform.any_op)
23368ae0d78SAlex Zinenko
234*b33b91a2SOleksandr "Alex" Zinenko    // Consuming an operand invalidates the consumed handle and any other handle
235*b33b91a2SOleksandr "Alex" Zinenko    // that is associated with the same payload operations, or payload
236*b33b91a2SOleksandr "Alex" Zinenko    // operations nested in them.
2372798b72aSOleksandr "Alex" Zinenko    transform.debug.emit_remark_at %casted, "remark"
23868ae0d78SAlex Zinenko      : !transform.any_op
23968ae0d78SAlex Zinenko    transform.yield
24068ae0d78SAlex Zinenko  }
241*b33b91a2SOleksandr "Alex" Zinenko}
24268ae0d78SAlex Zinenko```
24368ae0d78SAlex Zinenko
24468ae0d78SAlex ZinenkoBoth `%arg1` and `%casted` reference the same payload operation. Extending the reference analogy, these references alias. Naturally, when the payload operation is erased, all references to it become dangling. This is also the case for handles. In fact, consuming an operand invalidates the operand handle as well as any other handle that is associated with any of the same payload operations. The payload IR consideration is recursive: a handle associated with a payload operation _nested_ in the erased one is also invalidated (because erasing the operation also erases its regions and all contained operations). The expensive-checks mode can also handle this case.
24568ae0d78SAlex Zinenko
24668ae0d78SAlex Zinenko```sh
247*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:28:3: error: op uses a handle invalidated by a previously executed transform op
2482798b72aSOleksandr "Alex" Zinenko  transform.debug.emit_remark_at %matmul, "elemwise_binaries" : !transform.op<"linalg.matmul">
24968ae0d78SAlex Zinenko  ^
250*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:21:29: note: handle to invalidated ops
25168ae0d78SAlex Zinenko^bb0(%root: !transform.any_op, %matmul: !transform.op<"linalg.matmul">, %elemwise: !transform.op<"linalg.elemwise_binary">):
25268ae0d78SAlex Zinenko                            ^
253*b33b91a2SOleksandr "Alex" Zinenkosequence.mlir:27:19: note: invalidated by this transform op that consumes its operand #0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them
25496ff0255SOleksandr "Alex" Zinenko  %loop, %tiled = transform.structured.tile_using_forall %mm tile_sizes [4, 32]
25568ae0d78SAlex Zinenko```
25668ae0d78SAlex Zinenko
25768ae0d78SAlex Zinenko## Chaining Transformations with Handles
25868ae0d78SAlex Zinenko
25968ae0d78SAlex ZinenkoGoing back to the transformation sequence, we have tiled the matrix multiplication, but we also want to tile and fuse the elementwise operations. The typical way of doing in the structured operations paradigm is to tile the last operation in some acyclic dataflow graph, and then progressively fuse the operations that produce its operands. This removes the need to explicitly tile all operations as fusion can adapt their sizes and inject recomputation if desired. So instead of tiling the matmul operation, we are going to tile the last operation in the chain, and then fuse the preceding operations into the loops produced by tiling.
26068ae0d78SAlex Zinenko
26168ae0d78SAlex Zinenko```mlir
262*b33b91a2SOleksandr "Alex" Zinenkomodule attributes {transform.with_named_sequence} {
263*b33b91a2SOleksandr "Alex" Zinenko  transform.named_sequence @__transform_main(
264*b33b91a2SOleksandr "Alex" Zinenko       %arg0: !transform.any_op,
26568ae0d78SAlex Zinenko       %arg1: !transform.op<"linalg.matmul">,
266*b33b91a2SOleksandr "Alex" Zinenko       %arg2: !transform.op<"linalg.elemwise_binary">) {
26768ae0d78SAlex Zinenko    // Since the %arg2 handle is associated with both elementwise operations,
26868ae0d78SAlex Zinenko    // we need to split it into two handles so we can target only the second
26968ae0d78SAlex Zinenko    // elementwise operation.
27068ae0d78SAlex Zinenko    %add, %max = transform.split_handle %arg2
27168ae0d78SAlex Zinenko        : (!transform.op<"linalg.elemwise_binary">)
27268ae0d78SAlex Zinenko        -> (!transform.any_op, !transform.any_op)
27368ae0d78SAlex Zinenko
274278a8efbSIngo Müller    // The actual tiling transformation takes tile sizes as attributes. It
275278a8efbSIngo Müller    // produces a handle to the loop generated during tiling.
276ca5d34ecSAndrzej Warzyński    %tiled_max, %loop =
27796ff0255SOleksandr "Alex" Zinenko        transform.structured.tile_using_forall %max tile_sizes [8, 32]
27868ae0d78SAlex Zinenko          : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
27968ae0d78SAlex Zinenko
28068ae0d78SAlex Zinenko    // We can now fuse the other operations into the loop. Here, we fuse
281278a8efbSIngo Müller    // operations one by one. This requires the operation that is being fused to
282278a8efbSIngo Müller    // define the value used within the loop, so the order of such fusions is
283278a8efbSIngo Müller    // important. We could also use "transform.merge_handles" to obtain a single
284278a8efbSIngo Müller    // handle to all operations and give it to `fuse_into_containing_op` that
285278a8efbSIngo Müller    // would take care of the ordering in this case.
286ca5d34ecSAndrzej Warzyński    %add_fused, %loop_0 =
287278a8efbSIngo Müller        transform.structured.fuse_into_containing_op %add into %loop
288278a8efbSIngo Müller          : (!transform.any_op, !transform.any_op)
289278a8efbSIngo Müller            -> (!transform.any_op, !transform.any_op)
290ca5d34ecSAndrzej Warzyński    %matmul_fused, %loop_1 =
291ca5d34ecSAndrzej Warzyński        transform.structured.fuse_into_containing_op %arg1 into %loop_0
292278a8efbSIngo Müller          : (!transform.op<"linalg.matmul">, !transform.any_op)
293278a8efbSIngo Müller            -> (!transform.any_op, !transform.any_op)
29468ae0d78SAlex Zinenko
29568ae0d78SAlex Zinenko    transform.yield
29668ae0d78SAlex Zinenko  }
297*b33b91a2SOleksandr "Alex" Zinenko}
29868ae0d78SAlex Zinenko```
29968ae0d78SAlex Zinenko
30068ae0d78SAlex ZinenkoThis achieves the desired tiling and fusion.
30168ae0d78SAlex Zinenko
30268ae0d78SAlex Zinenko## More Handle Invalidation
30368ae0d78SAlex Zinenko
30468ae0d78SAlex ZinenkoFinally, let us assume there exists an efficient microkernel, or a hardware instruction expressed as an intrinsic function, for a 4x4 matrix multiplication. For this purpose, we need to tile the fused operation to the desired size, and then outline it. The resulting function call can then be replaced with a call to the microkernel.
30568ae0d78SAlex Zinenko
30668ae0d78SAlex Zinenko```mlir
307*b33b91a2SOleksandr "Alex" Zinenkomodule attributes {transform.with_named_sequence} {
308*b33b91a2SOleksandr "Alex" Zinenko  transform.named_sequence @__transform_main(
309*b33b91a2SOleksandr "Alex" Zinenko       %arg0: !transform.any_op,
31068ae0d78SAlex Zinenko       %arg1: !transform.op<"linalg.matmul">,
311*b33b91a2SOleksandr "Alex" Zinenko       %arg2: !transform.op<"linalg.elemwise_binary">) {
31268ae0d78SAlex Zinenko    // Since the %arg2 handle is associated with both elementwise operations,
31368ae0d78SAlex Zinenko    // we need to split it into two handles so we can target only the second
31468ae0d78SAlex Zinenko    // elementwise operation.
315278a8efbSIngo Müller    %add, %max = transform.split_handle %arg2
316278a8efbSIngo Müller        : (!transform.op<"linalg.elemwise_binary">)
31768ae0d78SAlex Zinenko          -> (!transform.any_op, !transform.any_op)
31868ae0d78SAlex Zinenko
319278a8efbSIngo Müller    // The actual tiling transformation takes tile sizes as attributes. It
320278a8efbSIngo Müller    // produces a handle to the loop generated during tiling.
321*b33b91a2SOleksandr "Alex" Zinenko    %tiled, %loop = transform.structured.tile_using_forall %max
322*b33b91a2SOleksandr "Alex" Zinenko                    tile_sizes [8, 32]
32368ae0d78SAlex Zinenko        : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
32468ae0d78SAlex Zinenko
32568ae0d78SAlex Zinenko    // We can now fuse the other operations into the loop. Here, we fuse
326278a8efbSIngo Müller    // operations one by one. This requires the operation that is being fused to
327278a8efbSIngo Müller    // define the value used within the loop, so the order of such fusions is
328278a8efbSIngo Müller    // important. We could also use "transform.merge_handles" to obtain a single
329278a8efbSIngo Müller    // handle to all operations and give it to `fuse_into_containing_op` that
330278a8efbSIngo Müller    // would take care of the ordering in this case.
331278a8efbSIngo Müller    %add_fused, %loop_0 =
332278a8efbSIngo Müller        transform.structured.fuse_into_containing_op %add into %loop
333278a8efbSIngo Müller          : (!transform.any_op, !transform.any_op)
334278a8efbSIngo Müller            -> (!transform.any_op, !transform.any_op)
335278a8efbSIngo Müller    %matmul_fused, %loop_1 =
336ca5d34ecSAndrzej Warzyński        transform.structured.fuse_into_containing_op %arg1 into %loop_0
337278a8efbSIngo Müller          : (!transform.op<"linalg.matmul">, !transform.any_op)
338278a8efbSIngo Müller            -> (!transform.any_op, !transform.any_op)
33968ae0d78SAlex Zinenko
34068ae0d78SAlex Zinenko    // Tile again to get the desired size. Note that this time this tiles the
34168ae0d78SAlex Zinenko    // "add" operation and fuses matmul into the loop, but doesn't affect the
342*b33b91a2SOleksandr "Alex" Zinenko    // "max" operation. This illustrates the precise targeting with the
343*b33b91a2SOleksandr "Alex" Zinenko    // transform dialect. Otherwise, it is difficult to differentiate "add" and
344*b33b91a2SOleksandr "Alex" Zinenko    // "max", both of which having the same kind.
345ca5d34ecSAndrzej Warzyński    %tiled_2, %loop_2 =
34696ff0255SOleksandr "Alex" Zinenko        transform.structured.tile_using_forall %add_fused tile_sizes [4, 4]
34768ae0d78SAlex Zinenko          : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
348278a8efbSIngo Müller    %matmul_fused_2, %loop_3 =
349278a8efbSIngo Müller        transform.structured.fuse_into_containing_op %matmul_fused into %loop_2
350278a8efbSIngo Müller          : (!transform.any_op, !transform.any_op)
351278a8efbSIngo Müller            -> (!transform.any_op, !transform.any_op)
35268ae0d78SAlex Zinenko
353*b33b91a2SOleksandr "Alex" Zinenko    // Since outlining is currently only implemented for region-holding
354*b33b91a2SOleksandr "Alex" Zinenko    // operations such as loops, use tiling to size 1 to materialize the outer
355*b33b91a2SOleksandr "Alex" Zinenko    // loop that is going to be outlined.
356ca5d34ecSAndrzej Warzyński    %_, %outline_target =
35796ff0255SOleksandr "Alex" Zinenko        transform.structured.tile_using_forall %tiled_2 tile_sizes [1]
35868ae0d78SAlex Zinenko          : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
359278a8efbSIngo Müller    transform.structured.fuse_into_containing_op %matmul_fused_2
360278a8efbSIngo Müller        into %outline_target
361278a8efbSIngo Müller          : (!transform.any_op, !transform.any_op)
362278a8efbSIngo Müller            -> (!transform.any_op, !transform.any_op)
363*b33b91a2SOleksandr "Alex" Zinenko    %func, %call = transform.loop.outline %outline_target
364*b33b91a2SOleksandr "Alex" Zinenko                   {func_name = "outlined"}
36568ae0d78SAlex Zinenko        : (!transform.any_op) -> (!transform.any_op, !transform.op<"func.call">)
36668ae0d78SAlex Zinenko
36768ae0d78SAlex Zinenko    transform.yield
36868ae0d78SAlex Zinenko  }
369*b33b91a2SOleksandr "Alex" Zinenko}
37068ae0d78SAlex Zinenko```
37168ae0d78SAlex Zinenko
37268ae0d78SAlex ZinenkoThis additional transformation also illustrates handle invalidation for nested operations. The `transform.loop.outline` operation consumes the handle to the loop, which invalidates it and all handles to any operations nested in it, such as `%2`. Attempting to use this handle will cause undefined behavior. (Note that it isn’t strictly necessary for this specific form of the outlining to consume the operand as the implementation only _moves_ the region without recreating the operations, but the author of the transformation chose to invalidate the handle anyway.)
37368ae0d78SAlex Zinenko
37468ae0d78SAlex ZinenkoAttempting to access the fusion result after outlining produces the following error
37568ae0d78SAlex Zinenko
37668ae0d78SAlex Zinenko```sh
37768ae0d78SAlex Zinenkotest/Examples/transform/Ch1/invalidation-2.mlir:109:3: error: op uses a handle invalidated by a previously executed transform op
3782798b72aSOleksandr "Alex" Zinenko  transform.debug.emit_remark_at %outline_target, "outlined loop" : !transform.any_op
37968ae0d78SAlex Zinenko  ^
38068ae0d78SAlex Zinenkotest/Examples/transform/Ch1/invalidation-2.mlir:102:25: note: handle to invalidated ops
38196ff0255SOleksandr "Alex" Zinenko  %outline_target, %_ = transform.structured.tile_using_forall %tiled_2 tile_sizes [1]
38268ae0d78SAlex Zinenko                        ^
38368ae0d78SAlex Zinenkotest/Examples/transform/Ch1/invalidation-2.mlir:106:18: note: invalidated by this transform op that consumes its operand #0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them
38468ae0d78SAlex Zinenko  %func, %call = transform.loop.outline %outline_target {func_name = "outlined"}
38568ae0d78SAlex Zinenko                 ^
38668ae0d78SAlex Zinenkotest/Examples/transform/Ch1/invalidation-2.mlir:24:13: note: ancestor payload op
38768ae0d78SAlex Zinenko  %biased = linalg.elemwise_binary { fun = #linalg.binary_fn<add> }
38868ae0d78SAlex Zinenko            ^
38968ae0d78SAlex Zinenkotest/Examples/transform/Ch1/invalidation-2.mlir:24:13: note: nested payload op
39068ae0d78SAlex Zinenko  %matmul = linalg.matmul ins(%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32>)
39168ae0d78SAlex Zinenko```
39268ae0d78SAlex Zinenko
39368ae0d78SAlex ZinenkoNote that the “add” elementwise operation is indicated as payload ancestor because it was used to produce the tile loop, and the loop therefore has its location.
39468ae0d78SAlex Zinenko
39568ae0d78SAlex ZinenkoFinally, we would like to replace the call to the outlined function with a call to the microkernel. Unfortunately, the Transform dialect doesn’t have support for this transformation (and cannot have if the call is rewritten to a custom, out-of-tree operation). Therefore, we need to define new transform operations. The next chapters will describe how this can be done.
396c63d2b2cSMatthias Springer
397c63d2b2cSMatthias Springer## Tracking IR Modifications
398c63d2b2cSMatthias Springer
39939298b09SAndrzej WarzyńskiThe Transform dialect automatically tracks all IR changes that are made as part
400c63d2b2cSMatthias Springerof transform ops. (Implementations must use the provided rewriter to modify IR.)
401c63d2b2cSMatthias SpringerIf a payload op is erased, it is automatically removed from all handles that it
402c63d2b2cSMatthias Springeris currently associated with. If a payload op is replaced, the transform dialect
403c63d2b2cSMatthias Springertries to find the replacement op and updates all handles accordingly. If a
404c63d2b2cSMatthias Springermulti-result op is replaced with values that are defined by multiple ops, or if
405c63d2b2cSMatthias Springeran op is replaced with an op of a different type, an error is produced. This is
406c63d2b2cSMatthias Springerbecause it is unclear whether the direct replacements actually represent the
407c63d2b2cSMatthias Springercomputation of the original op. There are ways to customize this behavior. More
408c63d2b2cSMatthias Springerdetails can be found at the documentation of `transform::TrackingListener`.
409