15b4a01d4SMehdi Amini# Chapter 5: Partial Lowering to Lower-Level Dialects for Optimization 25b4a01d4SMehdi Amini 35b4a01d4SMehdi Amini[TOC] 45b4a01d4SMehdi Amini 55b4a01d4SMehdi AminiAt this point, we are eager to generate actual code and see our Toy language 65b4a01d4SMehdi Aminitake life. We will use LLVM to generate code, but just showing the LLVM builder 75b4a01d4SMehdi Aminiinterface here wouldn't be very exciting. Instead, we will show how to perform 85b4a01d4SMehdi Aminiprogressive lowering through a mix of dialects coexisting in the same function. 95b4a01d4SMehdi Amini 105b4a01d4SMehdi AminiTo make it more interesting, in this chapter we will consider that we want to 115b4a01d4SMehdi Aminireuse existing optimizations implemented in a dialect optimizing affine 125b4a01d4SMehdi Aminitransformations: `Affine`. This dialect is tailored to the computation-heavy 135b4a01d4SMehdi Aminipart of the program and is limited: it doesn't support representing our 145b4a01d4SMehdi Amini`toy.print` builtin, for instance, neither should it! Instead, we can target 155b4a01d4SMehdi Amini`Affine` for the computation heavy part of Toy, and in the 162f7707dbSJonathan Roelofs[next chapter](Ch-6.md) directly target the `LLVM IR` dialect for lowering 172f7707dbSJonathan Roelofs`print`. As part of this lowering, we will be lowering from the 18a54f4eaeSMogball[TensorType](../../Dialects/Builtin.md/#rankedtensortype) that `Toy` operates on 19a54f4eaeSMogballto the [MemRefType](../../Dialects/Builtin.md/#memreftype) that is indexed via 20a54f4eaeSMogballan affine loop-nest. Tensors represent an abstract value-typed sequence of data, 21a54f4eaeSMogballmeaning that they don't live in any memory. MemRefs, on the other hand, 22a54f4eaeSMogballrepresent lower level buffer access, as they are concrete references to a region 23a54f4eaeSMogballof memory. 245b4a01d4SMehdi Amini 255b4a01d4SMehdi Amini# Dialect Conversions 265b4a01d4SMehdi Amini 275b4a01d4SMehdi AminiMLIR has many different dialects, so it is important to have a unified framework 28a54f4eaeSMogballfor [converting](../../../getting_started/Glossary.md/#conversion) between them. 29a54f4eaeSMogballThis is where the `DialectConversion` framework comes into play. This framework 30a54f4eaeSMogballallows for transforming a set of *illegal* operations to a set of *legal* ones. 31a54f4eaeSMogballTo use this framework, we need to provide two things (and an optional third): 325b4a01d4SMehdi Amini 3331d1ae79SMarkus Böck* A [Conversion Target](../../DialectConversion.md/#conversion-target) 345b4a01d4SMehdi Amini 355b4a01d4SMehdi Amini - This is the formal specification of what operations or dialects are 365b4a01d4SMehdi Amini legal for the conversion. Operations that aren't legal will require 375b4a01d4SMehdi Amini rewrite patterns to perform 3831d1ae79SMarkus Böck [legalization](../../../getting_started/Glossary.md/#legalization). 395b4a01d4SMehdi Amini 405b4a01d4SMehdi Amini* A set of 4131d1ae79SMarkus Böck [Rewrite Patterns](../../DialectConversion.md/#rewrite-pattern-specification) 425b4a01d4SMehdi Amini 43a54f4eaeSMogball - This is the set of [patterns](../QuickstartRewrites.md) used to convert 44a54f4eaeSMogball *illegal* operations into a set of zero or more *legal* ones. 455b4a01d4SMehdi Amini 4631d1ae79SMarkus Böck* Optionally, a [Type Converter](../../DialectConversion.md/#type-conversion). 475b4a01d4SMehdi Amini 485b4a01d4SMehdi Amini - If provided, this is used to convert the types of block arguments. We 495b4a01d4SMehdi Amini won't be needing this for our conversion. 505b4a01d4SMehdi Amini 515b4a01d4SMehdi Amini## Conversion Target 525b4a01d4SMehdi Amini 535b4a01d4SMehdi AminiFor our purposes, we want to convert the compute-intensive `Toy` operations into 54abc362a1SJakub Kuderskia combination of operations from the `Affine`, `Arith`, `Func`, and `MemRef` dialects 55e2310704SJulian Grossfor further optimization. To start off the lowering, we first define our 56e2310704SJulian Grossconversion target: 575b4a01d4SMehdi Amini 585b4a01d4SMehdi Amini```c++ 5941574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() { 605b4a01d4SMehdi Amini // The first thing to define is the conversion target. This will define the 615b4a01d4SMehdi Amini // final target for this lowering. 625b4a01d4SMehdi Amini mlir::ConversionTarget target(getContext()); 635b4a01d4SMehdi Amini 645b4a01d4SMehdi Amini // We define the specific operations, or dialects, that are legal targets for 655b4a01d4SMehdi Amini // this lowering. In our case, we are lowering to a combination of the 66abc362a1SJakub Kuderski // `Affine`, `Arith`, `Func`, and `MemRef` dialects. 674c48f016SMatthias Springer target.addLegalDialect<affine::AffineDialect, arith::ArithDialect, 6823aa5a74SRiver Riddle func::FuncDialect, memref::MemRefDialect>(); 695b4a01d4SMehdi Amini 705b4a01d4SMehdi Amini // We also define the Toy dialect as Illegal so that the conversion will fail 715b4a01d4SMehdi Amini // if any of these operations are *not* converted. Given that we actually want 725b4a01d4SMehdi Amini // a partial lowering, we explicitly mark the Toy operations that don't want 73015192c6SRiver Riddle // to lower, `toy.print`, as *legal*. `toy.print` will still need its operands 74015192c6SRiver Riddle // to be updated though (as we convert from TensorType to MemRefType), so we 75015192c6SRiver Riddle // only treat it as `legal` if its operands are legal. 765b4a01d4SMehdi Amini target.addIllegalDialect<ToyDialect>(); 77015192c6SRiver Riddle target.addDynamicallyLegalOp<toy::PrintOp>([](toy::PrintOp op) { 78015192c6SRiver Riddle return llvm::none_of(op->getOperandTypes(), 79015192c6SRiver Riddle [](Type type) { return type.isa<TensorType>(); }); 80015192c6SRiver Riddle }); 815b4a01d4SMehdi Amini ... 825b4a01d4SMehdi Amini} 835b4a01d4SMehdi Amini``` 845b4a01d4SMehdi Amini 85a54f4eaeSMogballAbove, we first set the toy dialect to illegal, and then the print operation as 86a54f4eaeSMogballlegal. We could have done this the other way around. Individual operations 87a54f4eaeSMogballalways take precedence over the (more generic) dialect definitions, so the order 88a54f4eaeSMogballdoesn't matter. See `ConversionTarget::getOpInfo` for the details. 89240769c8SMatthias Kramm 905b4a01d4SMehdi Amini## Conversion Patterns 915b4a01d4SMehdi Amini 925b4a01d4SMehdi AminiAfter the conversion target has been defined, we can define how to convert the 93240769c8SMatthias Kramm*illegal* operations into *legal* ones. Similarly to the canonicalization 945b4a01d4SMehdi Aminiframework introduced in [chapter 3](Ch-3.md), the 955b4a01d4SMehdi Amini[`DialectConversion` framework](../../DialectConversion.md) also uses 969197e62cSMehdi Amini[RewritePatterns](../QuickstartRewrites.md) to perform the conversion logic. 975b4a01d4SMehdi AminiThese patterns may be the `RewritePatterns` seen before or a new type of pattern 985b4a01d4SMehdi Aminispecific to the conversion framework `ConversionPattern`. `ConversionPatterns` 995b4a01d4SMehdi Aminiare different from traditional `RewritePatterns` in that they accept an 1005b4a01d4SMehdi Aminiadditional `operands` parameter containing operands that have been 1015b4a01d4SMehdi Aminiremapped/replaced. This is used when dealing with type conversions, as the 1025b4a01d4SMehdi Aminipattern will want to operate on values of the new type but match against the 1035b4a01d4SMehdi Aminiold. For our lowering, this invariant will be useful as it translates from the 104a54f4eaeSMogball[TensorType](../../Dialects/Builtin.md/#rankedtensortype) currently being 105a54f4eaeSMogballoperated on to the [MemRefType](../../Dialects/Builtin.md/#memreftype). Let's 106a54f4eaeSMogballlook at a snippet of lowering the `toy.transpose` operation: 1075b4a01d4SMehdi Amini 1085b4a01d4SMehdi Amini```c++ 1095b4a01d4SMehdi Amini/// Lower the `toy.transpose` operation to an affine loop nest. 1105b4a01d4SMehdi Aministruct TransposeOpLowering : public mlir::ConversionPattern { 1115b4a01d4SMehdi Amini TransposeOpLowering(mlir::MLIRContext *ctx) 1125b4a01d4SMehdi Amini : mlir::ConversionPattern(TransposeOp::getOperationName(), 1, ctx) {} 1135b4a01d4SMehdi Amini 1145b4a01d4SMehdi Amini /// Match and rewrite the given `toy.transpose` operation, with the given 1155b4a01d4SMehdi Amini /// operands that have been remapped from `tensor<...>` to `memref<...>`. 116*db791b27SRamkumar Ramachandra llvm::LogicalResult 1175b4a01d4SMehdi Amini matchAndRewrite(mlir::Operation *op, ArrayRef<mlir::Value> operands, 1185b4a01d4SMehdi Amini mlir::ConversionPatternRewriter &rewriter) const final { 1195b4a01d4SMehdi Amini auto loc = op->getLoc(); 1205b4a01d4SMehdi Amini 1215b4a01d4SMehdi Amini // Call to a helper function that will lower the current operation to a set 1225b4a01d4SMehdi Amini // of affine loops. We provide a functor that operates on the remapped 1235b4a01d4SMehdi Amini // operands, as well as the loop induction variables for the inner most 1245b4a01d4SMehdi Amini // loop body. 1255b4a01d4SMehdi Amini lowerOpToLoops( 1265b4a01d4SMehdi Amini op, operands, rewriter, 1275b4a01d4SMehdi Amini [loc](mlir::PatternRewriter &rewriter, 1285b4a01d4SMehdi Amini ArrayRef<mlir::Value> memRefOperands, 1295b4a01d4SMehdi Amini ArrayRef<mlir::Value> loopIvs) { 1305b4a01d4SMehdi Amini // Generate an adaptor for the remapped operands of the TransposeOp. 1315b4a01d4SMehdi Amini // This allows for using the nice named accessors that are generated 1325b4a01d4SMehdi Amini // by the ODS. This adaptor is automatically provided by the ODS 1335b4a01d4SMehdi Amini // framework. 1342d2c73c5SJacques Pienaar TransposeOpAdaptor transposeAdaptor(memRefOperands); 1355b4a01d4SMehdi Amini mlir::Value input = transposeAdaptor.input(); 1365b4a01d4SMehdi Amini 1375b4a01d4SMehdi Amini // Transpose the elements by generating a load from the reverse 1385b4a01d4SMehdi Amini // indices. 1395b4a01d4SMehdi Amini SmallVector<mlir::Value, 2> reverseIvs(llvm::reverse(loopIvs)); 1405b4a01d4SMehdi Amini return rewriter.create<mlir::AffineLoadOp>(loc, input, reverseIvs); 1415b4a01d4SMehdi Amini }); 1423145427dSRiver Riddle return success(); 1435b4a01d4SMehdi Amini } 1445b4a01d4SMehdi Amini}; 1455b4a01d4SMehdi Amini``` 1465b4a01d4SMehdi Amini 1475b4a01d4SMehdi AminiNow we can prepare the list of patterns to use during the lowering process: 1485b4a01d4SMehdi Amini 1495b4a01d4SMehdi Amini```c++ 15041574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() { 1515b4a01d4SMehdi Amini ... 1525b4a01d4SMehdi Amini 1535b4a01d4SMehdi Amini // Now that the conversion target has been defined, we just need to provide 1545b4a01d4SMehdi Amini // the set of patterns that will lower the Toy operations. 155289ecccaSChris Lattner mlir::RewritePatternSet patterns(&getContext()); 156dc4e913bSChris Lattner patterns.add<..., TransposeOpLowering>(&getContext()); 1575b4a01d4SMehdi Amini 1585b4a01d4SMehdi Amini ... 1595b4a01d4SMehdi Amini``` 1605b4a01d4SMehdi Amini 1615b4a01d4SMehdi Amini## Partial Lowering 1625b4a01d4SMehdi Amini 1635b4a01d4SMehdi AminiOnce the patterns have been defined, we can perform the actual lowering. The 1645b4a01d4SMehdi Amini`DialectConversion` framework provides several different modes of lowering, but, 1655b4a01d4SMehdi Aminifor our purposes, we will perform a partial lowering, as we will not convert 1665b4a01d4SMehdi Amini`toy.print` at this time. 1675b4a01d4SMehdi Amini 1685b4a01d4SMehdi Amini```c++ 16941574554SRiver Riddlevoid ToyToAffineLoweringPass::runOnOperation() { 170240769c8SMatthias Kramm ... 1715b4a01d4SMehdi Amini 1725b4a01d4SMehdi Amini // With the target and rewrite patterns defined, we can now attempt the 173240769c8SMatthias Kramm // conversion. The conversion will signal failure if any of our *illegal* 1745b4a01d4SMehdi Amini // operations were not converted successfully. 175ee2c6cd9SRiver Riddle if (mlir::failed(mlir::applyPartialConversion(getOperation(), target, patterns))) 1765b4a01d4SMehdi Amini signalPassFailure(); 1775b4a01d4SMehdi Amini} 1785b4a01d4SMehdi Amini``` 1795b4a01d4SMehdi Amini 1805b4a01d4SMehdi Amini### Design Considerations With Partial Lowering 1815b4a01d4SMehdi Amini 1825b4a01d4SMehdi AminiBefore diving into the result of our lowering, this is a good time to discuss 1835b4a01d4SMehdi Aminipotential design considerations when it comes to partial lowering. In our 1845b4a01d4SMehdi Aminilowering, we transform from a value-type, TensorType, to an allocated 1855b4a01d4SMehdi Amini(buffer-like) type, MemRefType. However, given that we do not lower the 1865b4a01d4SMehdi Amini`toy.print` operation, we need to temporarily bridge these two worlds. There are 1875b4a01d4SMehdi Aminimany ways to go about this, each with their own tradeoffs: 1885b4a01d4SMehdi Amini 1895b4a01d4SMehdi Amini* Generate `load` operations from the buffer 1905b4a01d4SMehdi Amini 191a54f4eaeSMogball One option is to generate `load` operations from the buffer type to 192a54f4eaeSMogball materialize an instance of the value type. This allows for the definition of 193a54f4eaeSMogball the `toy.print` operation to remain unchanged. The downside to this approach 194a54f4eaeSMogball is that the optimizations on the `affine` dialect are limited, because the 195a54f4eaeSMogball `load` will actually involve a full copy that is only visible *after* our 196a54f4eaeSMogball optimizations have been performed. 1975b4a01d4SMehdi Amini 1985b4a01d4SMehdi Amini* Generate a new version of `toy.print` that operates on the lowered type 1995b4a01d4SMehdi Amini 200a54f4eaeSMogball Another option would be to have another, lowered, variant of `toy.print` 201a54f4eaeSMogball that operates on the lowered type. The benefit of this option is that there 202a54f4eaeSMogball is no hidden, unnecessary copy to the optimizer. The downside is that 203a54f4eaeSMogball another operation definition is needed that may duplicate many aspects of 2041294fa69SRiver Riddle the first. Defining a base class in [ODS](../../DefiningDialects/Operations.md) may 205a54f4eaeSMogball simplify this, but you still need to treat these operations separately. 2065b4a01d4SMehdi Amini 2075b4a01d4SMehdi Amini* Update `toy.print` to allow for operating on the lowered type 2085b4a01d4SMehdi Amini 209a54f4eaeSMogball A third option is to update the current definition of `toy.print` to allow 210a54f4eaeSMogball for operating the on the lowered type. The benefit of this approach is that 211a54f4eaeSMogball it is simple, does not introduce an additional hidden copy, and does not 212a54f4eaeSMogball require another operation definition. The downside to this option is that it 213a54f4eaeSMogball requires mixing abstraction levels in the `Toy` dialect. 2145b4a01d4SMehdi Amini 2155b4a01d4SMehdi AminiFor the sake of simplicity, we will use the third option for this lowering. This 2165b4a01d4SMehdi Aminiinvolves updating the type constraints on the PrintOp in the operation 2175b4a01d4SMehdi Aminidefinition file: 2185b4a01d4SMehdi Amini 2195b4a01d4SMehdi Amini```tablegen 2205b4a01d4SMehdi Aminidef PrintOp : Toy_Op<"print"> { 2215b4a01d4SMehdi Amini ... 2225b4a01d4SMehdi Amini 2235b4a01d4SMehdi Amini // The print operation takes an input tensor to print. 2245b4a01d4SMehdi Amini // We also allow a F64MemRef to enable interop during partial lowering. 2255b4a01d4SMehdi Amini let arguments = (ins AnyTypeOf<[F64Tensor, F64MemRef]>:$input); 2265b4a01d4SMehdi Amini} 2275b4a01d4SMehdi Amini``` 2285b4a01d4SMehdi Amini 2295b4a01d4SMehdi Amini## Complete Toy Example 2305b4a01d4SMehdi Amini 231495cf272SLucy FoxLet's take a concrete example: 2325b4a01d4SMehdi Amini 2335b4a01d4SMehdi Amini```mlir 234ee2c6cd9SRiver Riddletoy.func @main() { 2350050e8f0SRiver Riddle %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64> 2360050e8f0SRiver Riddle %2 = toy.transpose(%0 : tensor<2x3xf64>) to tensor<3x2xf64> 2370050e8f0SRiver Riddle %3 = toy.mul %2, %2 : tensor<3x2xf64> 2380050e8f0SRiver Riddle toy.print %3 : tensor<3x2xf64> 2390050e8f0SRiver Riddle toy.return 2405b4a01d4SMehdi Amini} 2415b4a01d4SMehdi Amini``` 2425b4a01d4SMehdi Amini 2435b4a01d4SMehdi AminiWith affine lowering added to our pipeline, we can now generate: 2445b4a01d4SMehdi Amini 2455b4a01d4SMehdi Amini```mlir 2462310ced8SRiver Riddlefunc.func @main() { 247a54f4eaeSMogball %cst = arith.constant 1.000000e+00 : f64 248a54f4eaeSMogball %cst_0 = arith.constant 2.000000e+00 : f64 249a54f4eaeSMogball %cst_1 = arith.constant 3.000000e+00 : f64 250a54f4eaeSMogball %cst_2 = arith.constant 4.000000e+00 : f64 251a54f4eaeSMogball %cst_3 = arith.constant 5.000000e+00 : f64 252a54f4eaeSMogball %cst_4 = arith.constant 6.000000e+00 : f64 2535b4a01d4SMehdi Amini 2545b4a01d4SMehdi Amini // Allocating buffers for the inputs and outputs. 255a54f4eaeSMogball %0 = memref.alloc() : memref<3x2xf64> 256a54f4eaeSMogball %1 = memref.alloc() : memref<3x2xf64> 257a54f4eaeSMogball %2 = memref.alloc() : memref<2x3xf64> 2585b4a01d4SMehdi Amini 2595b4a01d4SMehdi Amini // Initialize the input buffer with the constant values. 2605b4a01d4SMehdi Amini affine.store %cst, %2[0, 0] : memref<2x3xf64> 2615b4a01d4SMehdi Amini affine.store %cst_0, %2[0, 1] : memref<2x3xf64> 2625b4a01d4SMehdi Amini affine.store %cst_1, %2[0, 2] : memref<2x3xf64> 2635b4a01d4SMehdi Amini affine.store %cst_2, %2[1, 0] : memref<2x3xf64> 2645b4a01d4SMehdi Amini affine.store %cst_3, %2[1, 1] : memref<2x3xf64> 2655b4a01d4SMehdi Amini affine.store %cst_4, %2[1, 2] : memref<2x3xf64> 2665b4a01d4SMehdi Amini 2675b4a01d4SMehdi Amini // Load the transpose value from the input buffer and store it into the 2685b4a01d4SMehdi Amini // next input buffer. 2695b4a01d4SMehdi Amini affine.for %arg0 = 0 to 3 { 2705b4a01d4SMehdi Amini affine.for %arg1 = 0 to 2 { 2715b4a01d4SMehdi Amini %3 = affine.load %2[%arg1, %arg0] : memref<2x3xf64> 2725b4a01d4SMehdi Amini affine.store %3, %1[%arg0, %arg1] : memref<3x2xf64> 2735b4a01d4SMehdi Amini } 2745b4a01d4SMehdi Amini } 2755b4a01d4SMehdi Amini 2765b4a01d4SMehdi Amini // Multiply and store into the output buffer. 2775633813bSRahul Joshi affine.for %arg0 = 0 to 3 { 2785633813bSRahul Joshi affine.for %arg1 = 0 to 2 { 2795b4a01d4SMehdi Amini %3 = affine.load %1[%arg0, %arg1] : memref<3x2xf64> 2805b4a01d4SMehdi Amini %4 = affine.load %1[%arg0, %arg1] : memref<3x2xf64> 281a54f4eaeSMogball %5 = arith.mulf %3, %4 : f64 2825b4a01d4SMehdi Amini affine.store %5, %0[%arg0, %arg1] : memref<3x2xf64> 2835b4a01d4SMehdi Amini } 2845b4a01d4SMehdi Amini } 2855b4a01d4SMehdi Amini 2865b4a01d4SMehdi Amini // Print the value held by the buffer. 2870050e8f0SRiver Riddle toy.print %0 : memref<3x2xf64> 288a54f4eaeSMogball memref.dealloc %2 : memref<2x3xf64> 289a54f4eaeSMogball memref.dealloc %1 : memref<3x2xf64> 290a54f4eaeSMogball memref.dealloc %0 : memref<3x2xf64> 2915b4a01d4SMehdi Amini return 2925b4a01d4SMehdi Amini} 2935b4a01d4SMehdi Amini``` 2945b4a01d4SMehdi Amini 2955b4a01d4SMehdi Amini## Taking Advantage of Affine Optimization 2965b4a01d4SMehdi Amini 2975b4a01d4SMehdi AminiOur naive lowering is correct, but it leaves a lot to be desired with regards to 2985b4a01d4SMehdi Aminiefficiency. For example, the lowering of `toy.mul` has generated some redundant 2995b4a01d4SMehdi Aminiloads. Let's look at how adding a few existing optimizations to the pipeline can 3001e78c152SJoshua Caohelp clean this up. Adding the `LoopFusion` and `AffineScalarReplacement` passes 3011e78c152SJoshua Caoto the pipeline gives the following result: 3025b4a01d4SMehdi Amini 3035b4a01d4SMehdi Amini```mlir 3042310ced8SRiver Riddlefunc.func @main() { 305a54f4eaeSMogball %cst = arith.constant 1.000000e+00 : f64 306a54f4eaeSMogball %cst_0 = arith.constant 2.000000e+00 : f64 307a54f4eaeSMogball %cst_1 = arith.constant 3.000000e+00 : f64 308a54f4eaeSMogball %cst_2 = arith.constant 4.000000e+00 : f64 309a54f4eaeSMogball %cst_3 = arith.constant 5.000000e+00 : f64 310a54f4eaeSMogball %cst_4 = arith.constant 6.000000e+00 : f64 3115b4a01d4SMehdi Amini 3125b4a01d4SMehdi Amini // Allocating buffers for the inputs and outputs. 313a54f4eaeSMogball %0 = memref.alloc() : memref<3x2xf64> 314a54f4eaeSMogball %1 = memref.alloc() : memref<2x3xf64> 3155b4a01d4SMehdi Amini 3165b4a01d4SMehdi Amini // Initialize the input buffer with the constant values. 3175b4a01d4SMehdi Amini affine.store %cst, %1[0, 0] : memref<2x3xf64> 3185b4a01d4SMehdi Amini affine.store %cst_0, %1[0, 1] : memref<2x3xf64> 3195b4a01d4SMehdi Amini affine.store %cst_1, %1[0, 2] : memref<2x3xf64> 3205b4a01d4SMehdi Amini affine.store %cst_2, %1[1, 0] : memref<2x3xf64> 3215b4a01d4SMehdi Amini affine.store %cst_3, %1[1, 1] : memref<2x3xf64> 3225b4a01d4SMehdi Amini affine.store %cst_4, %1[1, 2] : memref<2x3xf64> 3235b4a01d4SMehdi Amini 3245b4a01d4SMehdi Amini affine.for %arg0 = 0 to 3 { 3255b4a01d4SMehdi Amini affine.for %arg1 = 0 to 2 { 3265b4a01d4SMehdi Amini // Load the transpose value from the input buffer. 3275b4a01d4SMehdi Amini %2 = affine.load %1[%arg1, %arg0] : memref<2x3xf64> 3285b4a01d4SMehdi Amini 3295b4a01d4SMehdi Amini // Multiply and store into the output buffer. 330a54f4eaeSMogball %3 = arith.mulf %2, %2 : f64 3315b4a01d4SMehdi Amini affine.store %3, %0[%arg0, %arg1] : memref<3x2xf64> 3325b4a01d4SMehdi Amini } 3335b4a01d4SMehdi Amini } 3345b4a01d4SMehdi Amini 3355b4a01d4SMehdi Amini // Print the value held by the buffer. 3360050e8f0SRiver Riddle toy.print %0 : memref<3x2xf64> 337a54f4eaeSMogball memref.dealloc %1 : memref<2x3xf64> 338a54f4eaeSMogball memref.dealloc %0 : memref<3x2xf64> 3395b4a01d4SMehdi Amini return 3405b4a01d4SMehdi Amini} 3415b4a01d4SMehdi Amini``` 3425b4a01d4SMehdi Amini 3435b4a01d4SMehdi AminiHere, we can see that a redundant allocation was removed, the two loop nests 3445b4a01d4SMehdi Aminiwere fused, and some unnecessary `load`s were removed. You can build `toyc-ch5` 345495cf272SLucy Foxand try yourself: `toyc-ch5 test/Examples/Toy/Ch5/affine-lowering.mlir 346495cf272SLucy Fox-emit=mlir-affine`. We can also check our optimizations by adding `-opt`. 3475b4a01d4SMehdi Amini 3485b4a01d4SMehdi AminiIn this chapter we explored some aspects of partial lowering, with the intent to 3495b4a01d4SMehdi Aminioptimize. In the [next chapter](Ch-6.md) we will continue the discussion about 3505b4a01d4SMehdi Aminidialect conversion by targeting LLVM for code generation. 351