xref: /llvm-project/mlir/docs/Tutorials/Toy/Ch-5.md (revision db791b278a414fb6df1acc1799adcf11d8fb9169)
15b4a01d4SMehdi Amini# Chapter 5: Partial Lowering to Lower-Level Dialects for Optimization
25b4a01d4SMehdi Amini
35b4a01d4SMehdi Amini[TOC]
45b4a01d4SMehdi Amini
55b4a01d4SMehdi AminiAt this point, we are eager to generate actual code and see our Toy language
65b4a01d4SMehdi Aminitake life. We will use LLVM to generate code, but just showing the LLVM builder
75b4a01d4SMehdi Aminiinterface here wouldn't be very exciting. Instead, we will show how to perform
85b4a01d4SMehdi Aminiprogressive lowering through a mix of dialects coexisting in the same function.
95b4a01d4SMehdi Amini
105b4a01d4SMehdi AminiTo make it more interesting, in this chapter we will consider that we want to
115b4a01d4SMehdi Aminireuse existing optimizations implemented in a dialect optimizing affine
125b4a01d4SMehdi Aminitransformations: `Affine`. This dialect is tailored to the computation-heavy
135b4a01d4SMehdi Aminipart of the program and is limited: it doesn't support representing our
145b4a01d4SMehdi Amini`toy.print` builtin, for instance, neither should it! Instead, we can target
155b4a01d4SMehdi Amini`Affine` for the computation heavy part of Toy, and in the
162f7707dbSJonathan Roelofs[next chapter](Ch-6.md) directly target the `LLVM IR` dialect for lowering
172f7707dbSJonathan Roelofs`print`. As part of this lowering, we will be lowering from the
18a54f4eaeSMogball[TensorType](../../Dialects/Builtin.md/#rankedtensortype) that `Toy` operates on
19a54f4eaeSMogballto the [MemRefType](../../Dialects/Builtin.md/#memreftype) that is indexed via
20a54f4eaeSMogballan affine loop-nest. Tensors represent an abstract value-typed sequence of data,
21a54f4eaeSMogballmeaning that they don't live in any memory. MemRefs, on the other hand,
22a54f4eaeSMogballrepresent lower level buffer access, as they are concrete references to a region
23a54f4eaeSMogballof memory.
245b4a01d4SMehdi Amini
255b4a01d4SMehdi Amini# Dialect Conversions
265b4a01d4SMehdi Amini
275b4a01d4SMehdi AminiMLIR has many different dialects, so it is important to have a unified framework
28a54f4eaeSMogballfor [converting](../../../getting_started/Glossary.md/#conversion) between them.
29a54f4eaeSMogballThis is where the `DialectConversion` framework comes into play. This framework
30a54f4eaeSMogballallows for transforming a set of *illegal* operations to a set of *legal* ones.
31a54f4eaeSMogballTo use this framework, we need to provide two things (and an optional third):
325b4a01d4SMehdi Amini
3331d1ae79SMarkus Böck*   A [Conversion Target](../../DialectConversion.md/#conversion-target)
345b4a01d4SMehdi Amini
355b4a01d4SMehdi Amini    -   This is the formal specification of what operations or dialects are
365b4a01d4SMehdi Amini        legal for the conversion. Operations that aren't legal will require
375b4a01d4SMehdi Amini        rewrite patterns to perform
3831d1ae79SMarkus Böck        [legalization](../../../getting_started/Glossary.md/#legalization).
395b4a01d4SMehdi Amini
405b4a01d4SMehdi Amini*   A set of
4131d1ae79SMarkus Böck    [Rewrite Patterns](../../DialectConversion.md/#rewrite-pattern-specification)
425b4a01d4SMehdi Amini
43a54f4eaeSMogball    -   This is the set of [patterns](../QuickstartRewrites.md) used to convert
44a54f4eaeSMogball        *illegal* operations into a set of zero or more *legal* ones.
455b4a01d4SMehdi Amini
4631d1ae79SMarkus Böck*   Optionally, a [Type Converter](../../DialectConversion.md/#type-conversion).
475b4a01d4SMehdi Amini
485b4a01d4SMehdi Amini    -   If provided, this is used to convert the types of block arguments. We
495b4a01d4SMehdi Amini        won't be needing this for our conversion.
505b4a01d4SMehdi Amini
515b4a01d4SMehdi Amini## Conversion Target
525b4a01d4SMehdi Amini
535b4a01d4SMehdi AminiFor our purposes, we want to convert the compute-intensive `Toy` operations into
54abc362a1SJakub Kuderskia combination of operations from the `Affine`, `Arith`, `Func`, and `MemRef` dialects
55e2310704SJulian Grossfor further optimization. To start off the lowering, we first define our
56e2310704SJulian Grossconversion target:
575b4a01d4SMehdi Amini
585b4a01d4SMehdi Amini```c++
5941574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() {
605b4a01d4SMehdi Amini  // The first thing to define is the conversion target. This will define the
615b4a01d4SMehdi Amini  // final target for this lowering.
625b4a01d4SMehdi Amini  mlir::ConversionTarget target(getContext());
635b4a01d4SMehdi Amini
645b4a01d4SMehdi Amini  // We define the specific operations, or dialects, that are legal targets for
655b4a01d4SMehdi Amini  // this lowering. In our case, we are lowering to a combination of the
66abc362a1SJakub Kuderski  // `Affine`, `Arith`, `Func`, and `MemRef` dialects.
674c48f016SMatthias Springer  target.addLegalDialect<affine::AffineDialect, arith::ArithDialect,
6823aa5a74SRiver Riddle                         func::FuncDialect, memref::MemRefDialect>();
695b4a01d4SMehdi Amini
705b4a01d4SMehdi Amini  // We also define the Toy dialect as Illegal so that the conversion will fail
715b4a01d4SMehdi Amini  // if any of these operations are *not* converted. Given that we actually want
725b4a01d4SMehdi Amini  // a partial lowering, we explicitly mark the Toy operations that don't want
73015192c6SRiver Riddle  // to lower, `toy.print`, as *legal*. `toy.print` will still need its operands
74015192c6SRiver Riddle  // to be updated though (as we convert from TensorType to MemRefType), so we
75015192c6SRiver Riddle  // only treat it as `legal` if its operands are legal.
765b4a01d4SMehdi Amini  target.addIllegalDialect<ToyDialect>();
77015192c6SRiver Riddle  target.addDynamicallyLegalOp<toy::PrintOp>([](toy::PrintOp op) {
78015192c6SRiver Riddle    return llvm::none_of(op->getOperandTypes(),
79015192c6SRiver Riddle                         [](Type type) { return type.isa<TensorType>(); });
80015192c6SRiver Riddle  });
815b4a01d4SMehdi Amini  ...
825b4a01d4SMehdi Amini}
835b4a01d4SMehdi Amini```
845b4a01d4SMehdi Amini
85a54f4eaeSMogballAbove, we first set the toy dialect to illegal, and then the print operation as
86a54f4eaeSMogballlegal. We could have done this the other way around. Individual operations
87a54f4eaeSMogballalways take precedence over the (more generic) dialect definitions, so the order
88a54f4eaeSMogballdoesn't matter. See `ConversionTarget::getOpInfo` for the details.
89240769c8SMatthias Kramm
905b4a01d4SMehdi Amini## Conversion Patterns
915b4a01d4SMehdi Amini
925b4a01d4SMehdi AminiAfter the conversion target has been defined, we can define how to convert the
93240769c8SMatthias Kramm*illegal* operations into *legal* ones. Similarly to the canonicalization
945b4a01d4SMehdi Aminiframework introduced in [chapter 3](Ch-3.md), the
955b4a01d4SMehdi Amini[`DialectConversion` framework](../../DialectConversion.md) also uses
969197e62cSMehdi Amini[RewritePatterns](../QuickstartRewrites.md) to perform the conversion logic.
975b4a01d4SMehdi AminiThese patterns may be the `RewritePatterns` seen before or a new type of pattern
985b4a01d4SMehdi Aminispecific to the conversion framework `ConversionPattern`. `ConversionPatterns`
995b4a01d4SMehdi Aminiare different from traditional `RewritePatterns` in that they accept an
1005b4a01d4SMehdi Aminiadditional `operands` parameter containing operands that have been
1015b4a01d4SMehdi Aminiremapped/replaced. This is used when dealing with type conversions, as the
1025b4a01d4SMehdi Aminipattern will want to operate on values of the new type but match against the
1035b4a01d4SMehdi Aminiold. For our lowering, this invariant will be useful as it translates from the
104a54f4eaeSMogball[TensorType](../../Dialects/Builtin.md/#rankedtensortype) currently being
105a54f4eaeSMogballoperated on to the [MemRefType](../../Dialects/Builtin.md/#memreftype). Let's
106a54f4eaeSMogballlook at a snippet of lowering the `toy.transpose` operation:
1075b4a01d4SMehdi Amini
1085b4a01d4SMehdi Amini```c++
1095b4a01d4SMehdi Amini/// Lower the `toy.transpose` operation to an affine loop nest.
1105b4a01d4SMehdi Aministruct TransposeOpLowering : public mlir::ConversionPattern {
1115b4a01d4SMehdi Amini  TransposeOpLowering(mlir::MLIRContext *ctx)
1125b4a01d4SMehdi Amini      : mlir::ConversionPattern(TransposeOp::getOperationName(), 1, ctx) {}
1135b4a01d4SMehdi Amini
1145b4a01d4SMehdi Amini  /// Match and rewrite the given `toy.transpose` operation, with the given
1155b4a01d4SMehdi Amini  /// operands that have been remapped from `tensor<...>` to `memref<...>`.
116*db791b27SRamkumar Ramachandra  llvm::LogicalResult
1175b4a01d4SMehdi Amini  matchAndRewrite(mlir::Operation *op, ArrayRef<mlir::Value> operands,
1185b4a01d4SMehdi Amini                  mlir::ConversionPatternRewriter &rewriter) const final {
1195b4a01d4SMehdi Amini    auto loc = op->getLoc();
1205b4a01d4SMehdi Amini
1215b4a01d4SMehdi Amini    // Call to a helper function that will lower the current operation to a set
1225b4a01d4SMehdi Amini    // of affine loops. We provide a functor that operates on the remapped
1235b4a01d4SMehdi Amini    // operands, as well as the loop induction variables for the inner most
1245b4a01d4SMehdi Amini    // loop body.
1255b4a01d4SMehdi Amini    lowerOpToLoops(
1265b4a01d4SMehdi Amini        op, operands, rewriter,
1275b4a01d4SMehdi Amini        [loc](mlir::PatternRewriter &rewriter,
1285b4a01d4SMehdi Amini              ArrayRef<mlir::Value> memRefOperands,
1295b4a01d4SMehdi Amini              ArrayRef<mlir::Value> loopIvs) {
1305b4a01d4SMehdi Amini          // Generate an adaptor for the remapped operands of the TransposeOp.
1315b4a01d4SMehdi Amini          // This allows for using the nice named accessors that are generated
1325b4a01d4SMehdi Amini          // by the ODS. This adaptor is automatically provided by the ODS
1335b4a01d4SMehdi Amini          // framework.
1342d2c73c5SJacques Pienaar          TransposeOpAdaptor transposeAdaptor(memRefOperands);
1355b4a01d4SMehdi Amini          mlir::Value input = transposeAdaptor.input();
1365b4a01d4SMehdi Amini
1375b4a01d4SMehdi Amini          // Transpose the elements by generating a load from the reverse
1385b4a01d4SMehdi Amini          // indices.
1395b4a01d4SMehdi Amini          SmallVector<mlir::Value, 2> reverseIvs(llvm::reverse(loopIvs));
1405b4a01d4SMehdi Amini          return rewriter.create<mlir::AffineLoadOp>(loc, input, reverseIvs);
1415b4a01d4SMehdi Amini        });
1423145427dSRiver Riddle    return success();
1435b4a01d4SMehdi Amini  }
1445b4a01d4SMehdi Amini};
1455b4a01d4SMehdi Amini```
1465b4a01d4SMehdi Amini
1475b4a01d4SMehdi AminiNow we can prepare the list of patterns to use during the lowering process:
1485b4a01d4SMehdi Amini
1495b4a01d4SMehdi Amini```c++
15041574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() {
1515b4a01d4SMehdi Amini  ...
1525b4a01d4SMehdi Amini
1535b4a01d4SMehdi Amini  // Now that the conversion target has been defined, we just need to provide
1545b4a01d4SMehdi Amini  // the set of patterns that will lower the Toy operations.
155289ecccaSChris Lattner  mlir::RewritePatternSet patterns(&getContext());
156dc4e913bSChris Lattner  patterns.add<..., TransposeOpLowering>(&getContext());
1575b4a01d4SMehdi Amini
1585b4a01d4SMehdi Amini  ...
1595b4a01d4SMehdi Amini```
1605b4a01d4SMehdi Amini
1615b4a01d4SMehdi Amini## Partial Lowering
1625b4a01d4SMehdi Amini
1635b4a01d4SMehdi AminiOnce the patterns have been defined, we can perform the actual lowering. The
1645b4a01d4SMehdi Amini`DialectConversion` framework provides several different modes of lowering, but,
1655b4a01d4SMehdi Aminifor our purposes, we will perform a partial lowering, as we will not convert
1665b4a01d4SMehdi Amini`toy.print` at this time.
1675b4a01d4SMehdi Amini
1685b4a01d4SMehdi Amini```c++
16941574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() {
170240769c8SMatthias Kramm  ...
1715b4a01d4SMehdi Amini
1725b4a01d4SMehdi Amini  // With the target and rewrite patterns defined, we can now attempt the
173240769c8SMatthias Kramm  // conversion. The conversion will signal failure if any of our *illegal*
1745b4a01d4SMehdi Amini  // operations were not converted successfully.
175ee2c6cd9SRiver Riddle  if (mlir::failed(mlir::applyPartialConversion(getOperation(), target, patterns)))
1765b4a01d4SMehdi Amini    signalPassFailure();
1775b4a01d4SMehdi Amini}
1785b4a01d4SMehdi Amini```
1795b4a01d4SMehdi Amini
1805b4a01d4SMehdi Amini### Design Considerations With Partial Lowering
1815b4a01d4SMehdi Amini
1825b4a01d4SMehdi AminiBefore diving into the result of our lowering, this is a good time to discuss
1835b4a01d4SMehdi Aminipotential design considerations when it comes to partial lowering. In our
1845b4a01d4SMehdi Aminilowering, we transform from a value-type, TensorType, to an allocated
1855b4a01d4SMehdi Amini(buffer-like) type, MemRefType. However, given that we do not lower the
1865b4a01d4SMehdi Amini`toy.print` operation, we need to temporarily bridge these two worlds. There are
1875b4a01d4SMehdi Aminimany ways to go about this, each with their own tradeoffs:
1885b4a01d4SMehdi Amini
1895b4a01d4SMehdi Amini*   Generate `load` operations from the buffer
1905b4a01d4SMehdi Amini
191a54f4eaeSMogball    One option is to generate `load` operations from the buffer type to
192a54f4eaeSMogball    materialize an instance of the value type. This allows for the definition of
193a54f4eaeSMogball    the `toy.print` operation to remain unchanged. The downside to this approach
194a54f4eaeSMogball    is that the optimizations on the `affine` dialect are limited, because the
195a54f4eaeSMogball    `load` will actually involve a full copy that is only visible *after* our
196a54f4eaeSMogball    optimizations have been performed.
1975b4a01d4SMehdi Amini
1985b4a01d4SMehdi Amini*   Generate a new version of `toy.print` that operates on the lowered type
1995b4a01d4SMehdi Amini
200a54f4eaeSMogball    Another option would be to have another, lowered, variant of `toy.print`
201a54f4eaeSMogball    that operates on the lowered type. The benefit of this option is that there
202a54f4eaeSMogball    is no hidden, unnecessary copy to the optimizer. The downside is that
203a54f4eaeSMogball    another operation definition is needed that may duplicate many aspects of
2041294fa69SRiver Riddle    the first. Defining a base class in [ODS](../../DefiningDialects/Operations.md) may
205a54f4eaeSMogball    simplify this, but you still need to treat these operations separately.
2065b4a01d4SMehdi Amini
2075b4a01d4SMehdi Amini*   Update `toy.print` to allow for operating on the lowered type
2085b4a01d4SMehdi Amini
209a54f4eaeSMogball    A third option is to update the current definition of `toy.print` to allow
210a54f4eaeSMogball    for operating the on the lowered type. The benefit of this approach is that
211a54f4eaeSMogball    it is simple, does not introduce an additional hidden copy, and does not
212a54f4eaeSMogball    require another operation definition. The downside to this option is that it
213a54f4eaeSMogball    requires mixing abstraction levels in the `Toy` dialect.
2145b4a01d4SMehdi Amini
2155b4a01d4SMehdi AminiFor the sake of simplicity, we will use the third option for this lowering. This
2165b4a01d4SMehdi Aminiinvolves updating the type constraints on the PrintOp in the operation
2175b4a01d4SMehdi Aminidefinition file:
2185b4a01d4SMehdi Amini
2195b4a01d4SMehdi Amini```tablegen
2205b4a01d4SMehdi Aminidef PrintOp : Toy_Op<"print"> {
2215b4a01d4SMehdi Amini  ...
2225b4a01d4SMehdi Amini
2235b4a01d4SMehdi Amini  // The print operation takes an input tensor to print.
2245b4a01d4SMehdi Amini  // We also allow a F64MemRef to enable interop during partial lowering.
2255b4a01d4SMehdi Amini  let arguments = (ins AnyTypeOf<[F64Tensor, F64MemRef]>:$input);
2265b4a01d4SMehdi Amini}
2275b4a01d4SMehdi Amini```
2285b4a01d4SMehdi Amini
2295b4a01d4SMehdi Amini## Complete Toy Example
2305b4a01d4SMehdi Amini
231495cf272SLucy FoxLet's take a concrete example:
2325b4a01d4SMehdi Amini
2335b4a01d4SMehdi Amini```mlir
234ee2c6cd9SRiver Riddletoy.func @main() {
2350050e8f0SRiver Riddle  %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
2360050e8f0SRiver Riddle  %2 = toy.transpose(%0 : tensor<2x3xf64>) to tensor<3x2xf64>
2370050e8f0SRiver Riddle  %3 = toy.mul %2, %2 : tensor<3x2xf64>
2380050e8f0SRiver Riddle  toy.print %3 : tensor<3x2xf64>
2390050e8f0SRiver Riddle  toy.return
2405b4a01d4SMehdi Amini}
2415b4a01d4SMehdi Amini```
2425b4a01d4SMehdi Amini
2435b4a01d4SMehdi AminiWith affine lowering added to our pipeline, we can now generate:
2445b4a01d4SMehdi Amini
2455b4a01d4SMehdi Amini```mlir
2462310ced8SRiver Riddlefunc.func @main() {
247a54f4eaeSMogball  %cst = arith.constant 1.000000e+00 : f64
248a54f4eaeSMogball  %cst_0 = arith.constant 2.000000e+00 : f64
249a54f4eaeSMogball  %cst_1 = arith.constant 3.000000e+00 : f64
250a54f4eaeSMogball  %cst_2 = arith.constant 4.000000e+00 : f64
251a54f4eaeSMogball  %cst_3 = arith.constant 5.000000e+00 : f64
252a54f4eaeSMogball  %cst_4 = arith.constant 6.000000e+00 : f64
2535b4a01d4SMehdi Amini
2545b4a01d4SMehdi Amini  // Allocating buffers for the inputs and outputs.
255a54f4eaeSMogball  %0 = memref.alloc() : memref<3x2xf64>
256a54f4eaeSMogball  %1 = memref.alloc() : memref<3x2xf64>
257a54f4eaeSMogball  %2 = memref.alloc() : memref<2x3xf64>
2585b4a01d4SMehdi Amini
2595b4a01d4SMehdi Amini  // Initialize the input buffer with the constant values.
2605b4a01d4SMehdi Amini  affine.store %cst, %2[0, 0] : memref<2x3xf64>
2615b4a01d4SMehdi Amini  affine.store %cst_0, %2[0, 1] : memref<2x3xf64>
2625b4a01d4SMehdi Amini  affine.store %cst_1, %2[0, 2] : memref<2x3xf64>
2635b4a01d4SMehdi Amini  affine.store %cst_2, %2[1, 0] : memref<2x3xf64>
2645b4a01d4SMehdi Amini  affine.store %cst_3, %2[1, 1] : memref<2x3xf64>
2655b4a01d4SMehdi Amini  affine.store %cst_4, %2[1, 2] : memref<2x3xf64>
2665b4a01d4SMehdi Amini
2675b4a01d4SMehdi Amini  // Load the transpose value from the input buffer and store it into the
2685b4a01d4SMehdi Amini  // next input buffer.
2695b4a01d4SMehdi Amini  affine.for %arg0 = 0 to 3 {
2705b4a01d4SMehdi Amini    affine.for %arg1 = 0 to 2 {
2715b4a01d4SMehdi Amini      %3 = affine.load %2[%arg1, %arg0] : memref<2x3xf64>
2725b4a01d4SMehdi Amini      affine.store %3, %1[%arg0, %arg1] : memref<3x2xf64>
2735b4a01d4SMehdi Amini    }
2745b4a01d4SMehdi Amini  }
2755b4a01d4SMehdi Amini
2765b4a01d4SMehdi Amini  // Multiply and store into the output buffer.
2775633813bSRahul Joshi  affine.for %arg0 = 0 to 3 {
2785633813bSRahul Joshi    affine.for %arg1 = 0 to 2 {
2795b4a01d4SMehdi Amini      %3 = affine.load %1[%arg0, %arg1] : memref<3x2xf64>
2805b4a01d4SMehdi Amini      %4 = affine.load %1[%arg0, %arg1] : memref<3x2xf64>
281a54f4eaeSMogball      %5 = arith.mulf %3, %4 : f64
2825b4a01d4SMehdi Amini      affine.store %5, %0[%arg0, %arg1] : memref<3x2xf64>
2835b4a01d4SMehdi Amini    }
2845b4a01d4SMehdi Amini  }
2855b4a01d4SMehdi Amini
2865b4a01d4SMehdi Amini  // Print the value held by the buffer.
2870050e8f0SRiver Riddle  toy.print %0 : memref<3x2xf64>
288a54f4eaeSMogball  memref.dealloc %2 : memref<2x3xf64>
289a54f4eaeSMogball  memref.dealloc %1 : memref<3x2xf64>
290a54f4eaeSMogball  memref.dealloc %0 : memref<3x2xf64>
2915b4a01d4SMehdi Amini  return
2925b4a01d4SMehdi Amini}
2935b4a01d4SMehdi Amini```
2945b4a01d4SMehdi Amini
2955b4a01d4SMehdi Amini## Taking Advantage of Affine Optimization
2965b4a01d4SMehdi Amini
2975b4a01d4SMehdi AminiOur naive lowering is correct, but it leaves a lot to be desired with regards to
2985b4a01d4SMehdi Aminiefficiency. For example, the lowering of `toy.mul` has generated some redundant
2995b4a01d4SMehdi Aminiloads. Let's look at how adding a few existing optimizations to the pipeline can
3001e78c152SJoshua Caohelp clean this up. Adding the `LoopFusion` and `AffineScalarReplacement` passes
3011e78c152SJoshua Caoto the pipeline gives the following result:
3025b4a01d4SMehdi Amini
3035b4a01d4SMehdi Amini```mlir
3042310ced8SRiver Riddlefunc.func @main() {
305a54f4eaeSMogball  %cst = arith.constant 1.000000e+00 : f64
306a54f4eaeSMogball  %cst_0 = arith.constant 2.000000e+00 : f64
307a54f4eaeSMogball  %cst_1 = arith.constant 3.000000e+00 : f64
308a54f4eaeSMogball  %cst_2 = arith.constant 4.000000e+00 : f64
309a54f4eaeSMogball  %cst_3 = arith.constant 5.000000e+00 : f64
310a54f4eaeSMogball  %cst_4 = arith.constant 6.000000e+00 : f64
3115b4a01d4SMehdi Amini
3125b4a01d4SMehdi Amini  // Allocating buffers for the inputs and outputs.
313a54f4eaeSMogball  %0 = memref.alloc() : memref<3x2xf64>
314a54f4eaeSMogball  %1 = memref.alloc() : memref<2x3xf64>
3155b4a01d4SMehdi Amini
3165b4a01d4SMehdi Amini  // Initialize the input buffer with the constant values.
3175b4a01d4SMehdi Amini  affine.store %cst, %1[0, 0] : memref<2x3xf64>
3185b4a01d4SMehdi Amini  affine.store %cst_0, %1[0, 1] : memref<2x3xf64>
3195b4a01d4SMehdi Amini  affine.store %cst_1, %1[0, 2] : memref<2x3xf64>
3205b4a01d4SMehdi Amini  affine.store %cst_2, %1[1, 0] : memref<2x3xf64>
3215b4a01d4SMehdi Amini  affine.store %cst_3, %1[1, 1] : memref<2x3xf64>
3225b4a01d4SMehdi Amini  affine.store %cst_4, %1[1, 2] : memref<2x3xf64>
3235b4a01d4SMehdi Amini
3245b4a01d4SMehdi Amini  affine.for %arg0 = 0 to 3 {
3255b4a01d4SMehdi Amini    affine.for %arg1 = 0 to 2 {
3265b4a01d4SMehdi Amini      // Load the transpose value from the input buffer.
3275b4a01d4SMehdi Amini      %2 = affine.load %1[%arg1, %arg0] : memref<2x3xf64>
3285b4a01d4SMehdi Amini
3295b4a01d4SMehdi Amini      // Multiply and store into the output buffer.
330a54f4eaeSMogball      %3 = arith.mulf %2, %2 : f64
3315b4a01d4SMehdi Amini      affine.store %3, %0[%arg0, %arg1] : memref<3x2xf64>
3325b4a01d4SMehdi Amini    }
3335b4a01d4SMehdi Amini  }
3345b4a01d4SMehdi Amini
3355b4a01d4SMehdi Amini  // Print the value held by the buffer.
3360050e8f0SRiver Riddle  toy.print %0 : memref<3x2xf64>
337a54f4eaeSMogball  memref.dealloc %1 : memref<2x3xf64>
338a54f4eaeSMogball  memref.dealloc %0 : memref<3x2xf64>
3395b4a01d4SMehdi Amini  return
3405b4a01d4SMehdi Amini}
3415b4a01d4SMehdi Amini```
3425b4a01d4SMehdi Amini
3435b4a01d4SMehdi AminiHere, we can see that a redundant allocation was removed, the two loop nests
3445b4a01d4SMehdi Aminiwere fused, and some unnecessary `load`s were removed. You can build `toyc-ch5`
345495cf272SLucy Foxand try yourself: `toyc-ch5 test/Examples/Toy/Ch5/affine-lowering.mlir
346495cf272SLucy Fox-emit=mlir-affine`. We can also check our optimizations by adding `-opt`.
3475b4a01d4SMehdi Amini
3485b4a01d4SMehdi AminiIn this chapter we explored some aspects of partial lowering, with the intent to
3495b4a01d4SMehdi Aminioptimize. In the [next chapter](Ch-6.md) we will continue the discussion about
3505b4a01d4SMehdi Aminidialect conversion by targeting LLVM for code generation.
351