Tutorials/Toy/Ch-5.md

5b4a01d4SMehdi Amini# Chapter 5: Partial Lowering to Lower-Level Dialects for Optimization
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini[TOC]
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiAt this point, we are eager to generate actual code and see our Toy language
5b4a01d4SMehdi Aminitake life. We will use LLVM to generate code, but just showing the LLVM builder
5b4a01d4SMehdi Aminiinterface here wouldn't be very exciting. Instead, we will show how to perform
5b4a01d4SMehdi Aminiprogressive lowering through a mix of dialects coexisting in the same function.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiTo make it more interesting, in this chapter we will consider that we want to
5b4a01d4SMehdi Aminireuse existing optimizations implemented in a dialect optimizing affine
5b4a01d4SMehdi Aminitransformations: `Affine`. This dialect is tailored to the computation-heavy
5b4a01d4SMehdi Aminipart of the program and is limited: it doesn't support representing our
5b4a01d4SMehdi Amini`toy.print` builtin, for instance, neither should it! Instead, we can target
5b4a01d4SMehdi Amini`Affine` for the computation heavy part of Toy, and in the
2f7707dbSJonathan Roelofs[next chapter](Ch-6.md) directly target the `LLVM IR` dialect for lowering
2f7707dbSJonathan Roelofs`print`. As part of this lowering, we will be lowering from the
a54f4eaeSMogball[TensorType](../../Dialects/Builtin.md/#rankedtensortype) that `Toy` operates on
a54f4eaeSMogballto the [MemRefType](../../Dialects/Builtin.md/#memreftype) that is indexed via
a54f4eaeSMogballan affine loop-nest. Tensors represent an abstract value-typed sequence of data,
a54f4eaeSMogballmeaning that they don't live in any memory. MemRefs, on the other hand,
a54f4eaeSMogballrepresent lower level buffer access, as they are concrete references to a region
a54f4eaeSMogballof memory.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini# Dialect Conversions
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiMLIR has many different dialects, so it is important to have a unified framework
a54f4eaeSMogballfor [converting](../../../getting_started/Glossary.md/#conversion) between them.
a54f4eaeSMogballThis is where the `DialectConversion` framework comes into play. This framework
a54f4eaeSMogballallows for transforming a set of *illegal* operations to a set of *legal* ones.
a54f4eaeSMogballTo use this framework, we need to provide two things (and an optional third):
5b4a01d4SMehdi Amini
31d1ae79SMarkus Böck*   A [Conversion Target](../../DialectConversion.md/#conversion-target)
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini    -   This is the formal specification of what operations or dialects are
5b4a01d4SMehdi Amini        legal for the conversion. Operations that aren't legal will require
5b4a01d4SMehdi Amini        rewrite patterns to perform
31d1ae79SMarkus Böck        [legalization](../../../getting_started/Glossary.md/#legalization).
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini*   A set of
31d1ae79SMarkus Böck    [Rewrite Patterns](../../DialectConversion.md/#rewrite-pattern-specification)
5b4a01d4SMehdi Amini
a54f4eaeSMogball    -   This is the set of [patterns](../QuickstartRewrites.md) used to convert
a54f4eaeSMogball        *illegal* operations into a set of zero or more *legal* ones.
5b4a01d4SMehdi Amini
31d1ae79SMarkus Böck*   Optionally, a [Type Converter](../../DialectConversion.md/#type-conversion).
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini    -   If provided, this is used to convert the types of block arguments. We
5b4a01d4SMehdi Amini        won't be needing this for our conversion.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini## Conversion Target
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiFor our purposes, we want to convert the compute-intensive `Toy` operations into
abc362a1SJakub Kuderskia combination of operations from the `Affine`, `Arith`, `Func`, and `MemRef` dialects
e2310704SJulian Grossfor further optimization. To start off the lowering, we first define our
e2310704SJulian Grossconversion target:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```c++
41574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() {
5b4a01d4SMehdi Amini  // The first thing to define is the conversion target. This will define the
5b4a01d4SMehdi Amini  // final target for this lowering.
5b4a01d4SMehdi Amini  mlir::ConversionTarget target(getContext());
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // We define the specific operations, or dialects, that are legal targets for
5b4a01d4SMehdi Amini  // this lowering. In our case, we are lowering to a combination of the
abc362a1SJakub Kuderski  // `Affine`, `Arith`, `Func`, and `MemRef` dialects.
4c48f016SMatthias Springer  target.addLegalDialect<affine::AffineDialect, arith::ArithDialect,
23aa5a74SRiver Riddle                         func::FuncDialect, memref::MemRefDialect>();
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // We also define the Toy dialect as Illegal so that the conversion will fail
5b4a01d4SMehdi Amini  // if any of these operations are *not* converted. Given that we actually want
5b4a01d4SMehdi Amini  // a partial lowering, we explicitly mark the Toy operations that don't want
015192c6SRiver Riddle  // to lower, `toy.print`, as *legal*. `toy.print` will still need its operands
015192c6SRiver Riddle  // to be updated though (as we convert from TensorType to MemRefType), so we
015192c6SRiver Riddle  // only treat it as `legal` if its operands are legal.
5b4a01d4SMehdi Amini  target.addIllegalDialect<ToyDialect>();
015192c6SRiver Riddle  target.addDynamicallyLegalOp<toy::PrintOp>([](toy::PrintOp op) {
015192c6SRiver Riddle    return llvm::none_of(op->getOperandTypes(),
015192c6SRiver Riddle                         [](Type type) { return type.isa<TensorType>(); });
015192c6SRiver Riddle  });
5b4a01d4SMehdi Amini  ...
5b4a01d4SMehdi Amini}
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
a54f4eaeSMogballAbove, we first set the toy dialect to illegal, and then the print operation as
a54f4eaeSMogballlegal. We could have done this the other way around. Individual operations
a54f4eaeSMogballalways take precedence over the (more generic) dialect definitions, so the order
a54f4eaeSMogballdoesn't matter. See `ConversionTarget::getOpInfo` for the details.
240769c8SMatthias Kramm
5b4a01d4SMehdi Amini## Conversion Patterns
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiAfter the conversion target has been defined, we can define how to convert the
240769c8SMatthias Kramm*illegal* operations into *legal* ones. Similarly to the canonicalization
5b4a01d4SMehdi Aminiframework introduced in [chapter 3](Ch-3.md), the
5b4a01d4SMehdi Amini[`DialectConversion` framework](../../DialectConversion.md) also uses
9197e62cSMehdi Amini[RewritePatterns](../QuickstartRewrites.md) to perform the conversion logic.
5b4a01d4SMehdi AminiThese patterns may be the `RewritePatterns` seen before or a new type of pattern
5b4a01d4SMehdi Aminispecific to the conversion framework `ConversionPattern`. `ConversionPatterns`
5b4a01d4SMehdi Aminiare different from traditional `RewritePatterns` in that they accept an
5b4a01d4SMehdi Aminiadditional `operands` parameter containing operands that have been
5b4a01d4SMehdi Aminiremapped/replaced. This is used when dealing with type conversions, as the
5b4a01d4SMehdi Aminipattern will want to operate on values of the new type but match against the
5b4a01d4SMehdi Aminiold. For our lowering, this invariant will be useful as it translates from the
a54f4eaeSMogball[TensorType](../../Dialects/Builtin.md/#rankedtensortype) currently being
a54f4eaeSMogballoperated on to the [MemRefType](../../Dialects/Builtin.md/#memreftype). Let's
a54f4eaeSMogballlook at a snippet of lowering the `toy.transpose` operation:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```c++
5b4a01d4SMehdi Amini/// Lower the `toy.transpose` operation to an affine loop nest.
5b4a01d4SMehdi Aministruct TransposeOpLowering : public mlir::ConversionPattern {
5b4a01d4SMehdi Amini  TransposeOpLowering(mlir::MLIRContext *ctx)
5b4a01d4SMehdi Amini      : mlir::ConversionPattern(TransposeOp::getOperationName(), 1, ctx) {}
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  /// Match and rewrite the given `toy.transpose` operation, with the given
5b4a01d4SMehdi Amini  /// operands that have been remapped from `tensor<...>` to `memref<...>`.
*db791b27SRamkumar Ramachandra  llvm::LogicalResult
5b4a01d4SMehdi Amini  matchAndRewrite(mlir::Operation *op, ArrayRef<mlir::Value> operands,
5b4a01d4SMehdi Amini                  mlir::ConversionPatternRewriter &rewriter) const final {
5b4a01d4SMehdi Amini    auto loc = op->getLoc();
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini    // Call to a helper function that will lower the current operation to a set
5b4a01d4SMehdi Amini    // of affine loops. We provide a functor that operates on the remapped
5b4a01d4SMehdi Amini    // operands, as well as the loop induction variables for the inner most
5b4a01d4SMehdi Amini    // loop body.
5b4a01d4SMehdi Amini    lowerOpToLoops(
5b4a01d4SMehdi Amini        op, operands, rewriter,
5b4a01d4SMehdi Amini        [loc](mlir::PatternRewriter &rewriter,
5b4a01d4SMehdi Amini              ArrayRef<mlir::Value> memRefOperands,
5b4a01d4SMehdi Amini              ArrayRef<mlir::Value> loopIvs) {
5b4a01d4SMehdi Amini          // Generate an adaptor for the remapped operands of the TransposeOp.
5b4a01d4SMehdi Amini          // This allows for using the nice named accessors that are generated
5b4a01d4SMehdi Amini          // by the ODS. This adaptor is automatically provided by the ODS
5b4a01d4SMehdi Amini          // framework.
2d2c73c5SJacques Pienaar          TransposeOpAdaptor transposeAdaptor(memRefOperands);
5b4a01d4SMehdi Amini          mlir::Value input = transposeAdaptor.input();
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini          // Transpose the elements by generating a load from the reverse
5b4a01d4SMehdi Amini          // indices.
5b4a01d4SMehdi Amini          SmallVector<mlir::Value, 2> reverseIvs(llvm::reverse(loopIvs));
5b4a01d4SMehdi Amini          return rewriter.create<mlir::AffineLoadOp>(loc, input, reverseIvs);
5b4a01d4SMehdi Amini        });
3145427dSRiver Riddle    return success();
5b4a01d4SMehdi Amini  }
5b4a01d4SMehdi Amini};
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiNow we can prepare the list of patterns to use during the lowering process:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```c++
41574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() {
5b4a01d4SMehdi Amini  ...
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Now that the conversion target has been defined, we just need to provide
5b4a01d4SMehdi Amini  // the set of patterns that will lower the Toy operations.
289ecccaSChris Lattner  mlir::RewritePatternSet patterns(&getContext());
dc4e913bSChris Lattner  patterns.add<..., TransposeOpLowering>(&getContext());
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  ...
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini## Partial Lowering
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiOnce the patterns have been defined, we can perform the actual lowering. The
5b4a01d4SMehdi Amini`DialectConversion` framework provides several different modes of lowering, but,
5b4a01d4SMehdi Aminifor our purposes, we will perform a partial lowering, as we will not convert
5b4a01d4SMehdi Amini`toy.print` at this time.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```c++
41574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() {
240769c8SMatthias Kramm  ...
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // With the target and rewrite patterns defined, we can now attempt the
240769c8SMatthias Kramm  // conversion. The conversion will signal failure if any of our *illegal*
5b4a01d4SMehdi Amini  // operations were not converted successfully.
ee2c6cd9SRiver Riddle  if (mlir::failed(mlir::applyPartialConversion(getOperation(), target, patterns)))
5b4a01d4SMehdi Amini    signalPassFailure();
5b4a01d4SMehdi Amini}
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini### Design Considerations With Partial Lowering
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiBefore diving into the result of our lowering, this is a good time to discuss
5b4a01d4SMehdi Aminipotential design considerations when it comes to partial lowering. In our
5b4a01d4SMehdi Aminilowering, we transform from a value-type, TensorType, to an allocated
5b4a01d4SMehdi Amini(buffer-like) type, MemRefType. However, given that we do not lower the
5b4a01d4SMehdi Amini`toy.print` operation, we need to temporarily bridge these two worlds. There are
5b4a01d4SMehdi Aminimany ways to go about this, each with their own tradeoffs:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini*   Generate `load` operations from the buffer
5b4a01d4SMehdi Amini
a54f4eaeSMogball    One option is to generate `load` operations from the buffer type to
a54f4eaeSMogball    materialize an instance of the value type. This allows for the definition of
a54f4eaeSMogball    the `toy.print` operation to remain unchanged. The downside to this approach
a54f4eaeSMogball    is that the optimizations on the `affine` dialect are limited, because the
a54f4eaeSMogball    `load` will actually involve a full copy that is only visible *after* our
a54f4eaeSMogball    optimizations have been performed.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini*   Generate a new version of `toy.print` that operates on the lowered type
5b4a01d4SMehdi Amini
a54f4eaeSMogball    Another option would be to have another, lowered, variant of `toy.print`
a54f4eaeSMogball    that operates on the lowered type. The benefit of this option is that there
a54f4eaeSMogball    is no hidden, unnecessary copy to the optimizer. The downside is that
a54f4eaeSMogball    another operation definition is needed that may duplicate many aspects of
1294fa69SRiver Riddle    the first. Defining a base class in [ODS](../../DefiningDialects/Operations.md) may
a54f4eaeSMogball    simplify this, but you still need to treat these operations separately.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini*   Update `toy.print` to allow for operating on the lowered type
5b4a01d4SMehdi Amini
a54f4eaeSMogball    A third option is to update the current definition of `toy.print` to allow
a54f4eaeSMogball    for operating the on the lowered type. The benefit of this approach is that
a54f4eaeSMogball    it is simple, does not introduce an additional hidden copy, and does not
a54f4eaeSMogball    require another operation definition. The downside to this option is that it
a54f4eaeSMogball    requires mixing abstraction levels in the `Toy` dialect.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiFor the sake of simplicity, we will use the third option for this lowering. This
5b4a01d4SMehdi Aminiinvolves updating the type constraints on the PrintOp in the operation
5b4a01d4SMehdi Aminidefinition file:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```tablegen
5b4a01d4SMehdi Aminidef PrintOp : Toy_Op<"print"> {
5b4a01d4SMehdi Amini  ...
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // The print operation takes an input tensor to print.
5b4a01d4SMehdi Amini  // We also allow a F64MemRef to enable interop during partial lowering.
5b4a01d4SMehdi Amini  let arguments = (ins AnyTypeOf<[F64Tensor, F64MemRef]>:$input);
5b4a01d4SMehdi Amini}
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini## Complete Toy Example
5b4a01d4SMehdi Amini
495cf272SLucy FoxLet's take a concrete example:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```mlir
ee2c6cd9SRiver Riddletoy.func @main() {
0050e8f0SRiver Riddle  %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
0050e8f0SRiver Riddle  %2 = toy.transpose(%0 : tensor<2x3xf64>) to tensor<3x2xf64>
0050e8f0SRiver Riddle  %3 = toy.mul %2, %2 : tensor<3x2xf64>
0050e8f0SRiver Riddle  toy.print %3 : tensor<3x2xf64>
0050e8f0SRiver Riddle  toy.return
5b4a01d4SMehdi Amini}
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiWith affine lowering added to our pipeline, we can now generate:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```mlir
2310ced8SRiver Riddlefunc.func @main() {
a54f4eaeSMogball  %cst = arith.constant 1.000000e+00 : f64
a54f4eaeSMogball  %cst_0 = arith.constant 2.000000e+00 : f64
a54f4eaeSMogball  %cst_1 = arith.constant 3.000000e+00 : f64
a54f4eaeSMogball  %cst_2 = arith.constant 4.000000e+00 : f64
a54f4eaeSMogball  %cst_3 = arith.constant 5.000000e+00 : f64
a54f4eaeSMogball  %cst_4 = arith.constant 6.000000e+00 : f64
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Allocating buffers for the inputs and outputs.
a54f4eaeSMogball  %0 = memref.alloc() : memref<3x2xf64>
a54f4eaeSMogball  %1 = memref.alloc() : memref<3x2xf64>
a54f4eaeSMogball  %2 = memref.alloc() : memref<2x3xf64>
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Initialize the input buffer with the constant values.
5b4a01d4SMehdi Amini  affine.store %cst, %2[0, 0] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_0, %2[0, 1] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_1, %2[0, 2] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_2, %2[1, 0] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_3, %2[1, 1] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_4, %2[1, 2] : memref<2x3xf64>
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Load the transpose value from the input buffer and store it into the
5b4a01d4SMehdi Amini  // next input buffer.
5b4a01d4SMehdi Amini  affine.for %arg0 = 0 to 3 {
5b4a01d4SMehdi Amini    affine.for %arg1 = 0 to 2 {
5b4a01d4SMehdi Amini      %3 = affine.load %2[%arg1, %arg0] : memref<2x3xf64>
5b4a01d4SMehdi Amini      affine.store %3, %1[%arg0, %arg1] : memref<3x2xf64>
5b4a01d4SMehdi Amini    }
5b4a01d4SMehdi Amini  }
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Multiply and store into the output buffer.
5633813bSRahul Joshi  affine.for %arg0 = 0 to 3 {
5633813bSRahul Joshi    affine.for %arg1 = 0 to 2 {
5b4a01d4SMehdi Amini      %3 = affine.load %1[%arg0, %arg1] : memref<3x2xf64>
5b4a01d4SMehdi Amini      %4 = affine.load %1[%arg0, %arg1] : memref<3x2xf64>
a54f4eaeSMogball      %5 = arith.mulf %3, %4 : f64
5b4a01d4SMehdi Amini      affine.store %5, %0[%arg0, %arg1] : memref<3x2xf64>
5b4a01d4SMehdi Amini    }
5b4a01d4SMehdi Amini  }
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Print the value held by the buffer.
0050e8f0SRiver Riddle  toy.print %0 : memref<3x2xf64>
a54f4eaeSMogball  memref.dealloc %2 : memref<2x3xf64>
a54f4eaeSMogball  memref.dealloc %1 : memref<3x2xf64>
a54f4eaeSMogball  memref.dealloc %0 : memref<3x2xf64>
5b4a01d4SMehdi Amini  return
5b4a01d4SMehdi Amini}
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini## Taking Advantage of Affine Optimization
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiOur naive lowering is correct, but it leaves a lot to be desired with regards to
5b4a01d4SMehdi Aminiefficiency. For example, the lowering of `toy.mul` has generated some redundant
5b4a01d4SMehdi Aminiloads. Let's look at how adding a few existing optimizations to the pipeline can
1e78c152SJoshua Caohelp clean this up. Adding the `LoopFusion` and `AffineScalarReplacement` passes
1e78c152SJoshua Caoto the pipeline gives the following result:
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini```mlir
2310ced8SRiver Riddlefunc.func @main() {
a54f4eaeSMogball  %cst = arith.constant 1.000000e+00 : f64
a54f4eaeSMogball  %cst_0 = arith.constant 2.000000e+00 : f64
a54f4eaeSMogball  %cst_1 = arith.constant 3.000000e+00 : f64
a54f4eaeSMogball  %cst_2 = arith.constant 4.000000e+00 : f64
a54f4eaeSMogball  %cst_3 = arith.constant 5.000000e+00 : f64
a54f4eaeSMogball  %cst_4 = arith.constant 6.000000e+00 : f64
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Allocating buffers for the inputs and outputs.
a54f4eaeSMogball  %0 = memref.alloc() : memref<3x2xf64>
a54f4eaeSMogball  %1 = memref.alloc() : memref<2x3xf64>
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Initialize the input buffer with the constant values.
5b4a01d4SMehdi Amini  affine.store %cst, %1[0, 0] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_0, %1[0, 1] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_1, %1[0, 2] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_2, %1[1, 0] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_3, %1[1, 1] : memref<2x3xf64>
5b4a01d4SMehdi Amini  affine.store %cst_4, %1[1, 2] : memref<2x3xf64>
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  affine.for %arg0 = 0 to 3 {
5b4a01d4SMehdi Amini    affine.for %arg1 = 0 to 2 {
5b4a01d4SMehdi Amini      // Load the transpose value from the input buffer.
5b4a01d4SMehdi Amini      %2 = affine.load %1[%arg1, %arg0] : memref<2x3xf64>
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini      // Multiply and store into the output buffer.
a54f4eaeSMogball      %3 = arith.mulf %2, %2 : f64
5b4a01d4SMehdi Amini      affine.store %3, %0[%arg0, %arg1] : memref<3x2xf64>
5b4a01d4SMehdi Amini    }
5b4a01d4SMehdi Amini  }
5b4a01d4SMehdi Amini
5b4a01d4SMehdi Amini  // Print the value held by the buffer.
0050e8f0SRiver Riddle  toy.print %0 : memref<3x2xf64>
a54f4eaeSMogball  memref.dealloc %1 : memref<2x3xf64>
a54f4eaeSMogball  memref.dealloc %0 : memref<3x2xf64>
5b4a01d4SMehdi Amini  return
5b4a01d4SMehdi Amini}
5b4a01d4SMehdi Amini```
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiHere, we can see that a redundant allocation was removed, the two loop nests
5b4a01d4SMehdi Aminiwere fused, and some unnecessary `load`s were removed. You can build `toyc-ch5`
495cf272SLucy Foxand try yourself: `toyc-ch5 test/Examples/Toy/Ch5/affine-lowering.mlir
495cf272SLucy Fox-emit=mlir-affine`. We can also check our optimizations by adding `-opt`.
5b4a01d4SMehdi Amini
5b4a01d4SMehdi AminiIn this chapter we explored some aspects of partial lowering, with the intent to
5b4a01d4SMehdi Aminioptimize. In the [next chapter](Ch-6.md) we will continue the discussion about
5b4a01d4SMehdi Aminidialect conversion by targeting LLVM for code generation.