1*7f196862SJeremy Kun# Using `mlir-opt` 2*7f196862SJeremy Kun 3*7f196862SJeremy Kun`mlir-opt` is a command-line entry point for running passes and lowerings on MLIR code. 4*7f196862SJeremy KunThis tutorial will explain how to use `mlir-opt`, show some examples of its usage, 5*7f196862SJeremy Kunand mention some useful tips for working with it. 6*7f196862SJeremy Kun 7*7f196862SJeremy KunPrerequisites: 8*7f196862SJeremy Kun 9*7f196862SJeremy Kun- [Building MLIR from source](/getting_started/) 10*7f196862SJeremy Kun- [MLIR Language Reference](/docs/LangRef/) 11*7f196862SJeremy Kun 12*7f196862SJeremy Kun[TOC] 13*7f196862SJeremy Kun 14*7f196862SJeremy Kun## `mlir-opt` basics 15*7f196862SJeremy Kun 16*7f196862SJeremy KunThe `mlir-opt` tool loads a textual IR or bytecode into an in-memory structure, 17*7f196862SJeremy Kunand optionally executes a sequence of passes 18*7f196862SJeremy Kunbefore serializing back the IR (textual form by default). 19*7f196862SJeremy KunIt is intended as a testing and debugging utility. 20*7f196862SJeremy Kun 21*7f196862SJeremy KunAfter building the MLIR project, 22*7f196862SJeremy Kunthe `mlir-opt` binary (located in `build/bin`) 23*7f196862SJeremy Kunis the entry point for running passes and lowerings, 24*7f196862SJeremy Kunas well as emitting debug and diagnostic data. 25*7f196862SJeremy Kun 26*7f196862SJeremy KunRunning `mlir-opt` with no flags will consume textual or bytecode IR 27*7f196862SJeremy Kunfrom the standard input, parse and run verifiers on it, 28*7f196862SJeremy Kunand write the textual format back to the standard output. 29*7f196862SJeremy KunThis is a good way to test if an input MLIR is well-formed. 30*7f196862SJeremy Kun 31*7f196862SJeremy Kun`mlir-opt --help` shows a complete list of flags 32*7f196862SJeremy Kun(there are nearly 1000). 33*7f196862SJeremy KunEach pass has its own flag, 34*7f196862SJeremy Kunthough it is recommended to use `--pass-pipeline` 35*7f196862SJeremy Kunto run passes rather than bare flags. 36*7f196862SJeremy Kun 37*7f196862SJeremy Kun## Running a pass 38*7f196862SJeremy Kun 39*7f196862SJeremy KunNext we run [`convert-to-llvm`](/docs/Passes/#-convert-to-llvm), 40*7f196862SJeremy Kunwhich converts all supported dialects to the `llvm` dialect, 41*7f196862SJeremy Kunon the following IR: 42*7f196862SJeremy Kun 43*7f196862SJeremy Kun```mlir 44*7f196862SJeremy Kun// mlir/test/Examples/mlir-opt/ctlz.mlir 45*7f196862SJeremy Kunmodule { 46*7f196862SJeremy Kun func.func @main(%arg0: i32) -> i32 { 47*7f196862SJeremy Kun %0 = math.ctlz %arg0 : i32 48*7f196862SJeremy Kun func.return %0 : i32 49*7f196862SJeremy Kun } 50*7f196862SJeremy Kun} 51*7f196862SJeremy Kun``` 52*7f196862SJeremy Kun 53*7f196862SJeremy KunAfter building MLIR, and from the `llvm-project` base directory, run 54*7f196862SJeremy Kun 55*7f196862SJeremy Kun```bash 56*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(convert-math-to-llvm)" mlir/test/Examples/mlir-opt/ctlz.mlir 57*7f196862SJeremy Kun``` 58*7f196862SJeremy Kun 59*7f196862SJeremy Kunwhich produces 60*7f196862SJeremy Kun 61*7f196862SJeremy Kun```mlir 62*7f196862SJeremy Kunmodule { 63*7f196862SJeremy Kun func.func @main(%arg0: i32) -> i32 { 64*7f196862SJeremy Kun %0 = "llvm.intr.ctlz"(%arg0) <{is_zero_poison = false}> : (i32) -> i32 65*7f196862SJeremy Kun return %0 : i32 66*7f196862SJeremy Kun } 67*7f196862SJeremy Kun} 68*7f196862SJeremy Kun``` 69*7f196862SJeremy Kun 70*7f196862SJeremy KunNote that `llvm` here is MLIR's `llvm` dialect, 71*7f196862SJeremy Kunwhich would still need to be processed through `mlir-translate` 72*7f196862SJeremy Kunto generate LLVM-IR. 73*7f196862SJeremy Kun 74*7f196862SJeremy Kun## Running a pass with options 75*7f196862SJeremy Kun 76*7f196862SJeremy KunNext we will show how to run a pass that takes configuration options. 77*7f196862SJeremy KunConsider the following IR containing loops with poor cache locality. 78*7f196862SJeremy Kun 79*7f196862SJeremy Kun```mlir 80*7f196862SJeremy Kun// mlir/test/Examples/mlir-opt/loop_fusion.mlir 81*7f196862SJeremy Kunmodule { 82*7f196862SJeremy Kun func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) { 83*7f196862SJeremy Kun %0 = memref.alloc() : memref<10xf32> 84*7f196862SJeremy Kun %1 = memref.alloc() : memref<10xf32> 85*7f196862SJeremy Kun %cst = arith.constant 0.000000e+00 : f32 86*7f196862SJeremy Kun affine.for %arg2 = 0 to 10 { 87*7f196862SJeremy Kun affine.store %cst, %0[%arg2] : memref<10xf32> 88*7f196862SJeremy Kun affine.store %cst, %1[%arg2] : memref<10xf32> 89*7f196862SJeremy Kun } 90*7f196862SJeremy Kun affine.for %arg2 = 0 to 10 { 91*7f196862SJeremy Kun %2 = affine.load %0[%arg2] : memref<10xf32> 92*7f196862SJeremy Kun %3 = arith.addf %2, %2 : f32 93*7f196862SJeremy Kun affine.store %3, %arg0[%arg2] : memref<10xf32> 94*7f196862SJeremy Kun } 95*7f196862SJeremy Kun affine.for %arg2 = 0 to 10 { 96*7f196862SJeremy Kun %2 = affine.load %1[%arg2] : memref<10xf32> 97*7f196862SJeremy Kun %3 = arith.mulf %2, %2 : f32 98*7f196862SJeremy Kun affine.store %3, %arg1[%arg2] : memref<10xf32> 99*7f196862SJeremy Kun } 100*7f196862SJeremy Kun return 101*7f196862SJeremy Kun } 102*7f196862SJeremy Kun} 103*7f196862SJeremy Kun``` 104*7f196862SJeremy Kun 105*7f196862SJeremy KunRunning this with the [`affine-loop-fusion`](/docs/Passes/#-affine-loop-fusion) pass 106*7f196862SJeremy Kunproduces a fused loop. 107*7f196862SJeremy Kun 108*7f196862SJeremy Kun```bash 109*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion)" mlir/test/Examples/mlir-opt/loop_fusion.mlir 110*7f196862SJeremy Kun``` 111*7f196862SJeremy Kun 112*7f196862SJeremy Kun```mlir 113*7f196862SJeremy Kunmodule { 114*7f196862SJeremy Kun func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) { 115*7f196862SJeremy Kun %alloc = memref.alloc() : memref<1xf32> 116*7f196862SJeremy Kun %alloc_0 = memref.alloc() : memref<1xf32> 117*7f196862SJeremy Kun %cst = arith.constant 0.000000e+00 : f32 118*7f196862SJeremy Kun affine.for %arg2 = 0 to 10 { 119*7f196862SJeremy Kun affine.store %cst, %alloc[0] : memref<1xf32> 120*7f196862SJeremy Kun affine.store %cst, %alloc_0[0] : memref<1xf32> 121*7f196862SJeremy Kun %0 = affine.load %alloc_0[0] : memref<1xf32> 122*7f196862SJeremy Kun %1 = arith.mulf %0, %0 : f32 123*7f196862SJeremy Kun affine.store %1, %arg1[%arg2] : memref<10xf32> 124*7f196862SJeremy Kun %2 = affine.load %alloc[0] : memref<1xf32> 125*7f196862SJeremy Kun %3 = arith.addf %2, %2 : f32 126*7f196862SJeremy Kun affine.store %3, %arg0[%arg2] : memref<10xf32> 127*7f196862SJeremy Kun } 128*7f196862SJeremy Kun return 129*7f196862SJeremy Kun } 130*7f196862SJeremy Kun} 131*7f196862SJeremy Kun``` 132*7f196862SJeremy Kun 133*7f196862SJeremy KunThis pass has options that allow the user to configure its behavior. 134*7f196862SJeremy KunFor example, the `fusion-compute-tolerance` option 135*7f196862SJeremy Kunis described as the "fractional increase in additional computation tolerated while fusing." 136*7f196862SJeremy KunIf this value is set to zero on the command line, 137*7f196862SJeremy Kunthe pass will not fuse the loops. 138*7f196862SJeremy Kun 139*7f196862SJeremy Kun```bash 140*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion{fusion-compute-tolerance=0})" \ 141*7f196862SJeremy Kunmlir/test/Examples/mlir-opt/loop_fusion.mlir 142*7f196862SJeremy Kun``` 143*7f196862SJeremy Kun 144*7f196862SJeremy Kun```mlir 145*7f196862SJeremy Kunmodule { 146*7f196862SJeremy Kun func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) { 147*7f196862SJeremy Kun %alloc = memref.alloc() : memref<10xf32> 148*7f196862SJeremy Kun %alloc_0 = memref.alloc() : memref<10xf32> 149*7f196862SJeremy Kun %cst = arith.constant 0.000000e+00 : f32 150*7f196862SJeremy Kun affine.for %arg2 = 0 to 10 { 151*7f196862SJeremy Kun affine.store %cst, %alloc[%arg2] : memref<10xf32> 152*7f196862SJeremy Kun affine.store %cst, %alloc_0[%arg2] : memref<10xf32> 153*7f196862SJeremy Kun } 154*7f196862SJeremy Kun affine.for %arg2 = 0 to 10 { 155*7f196862SJeremy Kun %0 = affine.load %alloc[%arg2] : memref<10xf32> 156*7f196862SJeremy Kun %1 = arith.addf %0, %0 : f32 157*7f196862SJeremy Kun affine.store %1, %arg0[%arg2] : memref<10xf32> 158*7f196862SJeremy Kun } 159*7f196862SJeremy Kun affine.for %arg2 = 0 to 10 { 160*7f196862SJeremy Kun %0 = affine.load %alloc_0[%arg2] : memref<10xf32> 161*7f196862SJeremy Kun %1 = arith.mulf %0, %0 : f32 162*7f196862SJeremy Kun affine.store %1, %arg1[%arg2] : memref<10xf32> 163*7f196862SJeremy Kun } 164*7f196862SJeremy Kun return 165*7f196862SJeremy Kun } 166*7f196862SJeremy Kun} 167*7f196862SJeremy Kun``` 168*7f196862SJeremy Kun 169*7f196862SJeremy KunOptions passed to a pass 170*7f196862SJeremy Kunare specified via the syntax `{option1=value1 option2=value2 ...}`, 171*7f196862SJeremy Kuni.e., use space-separated `key=value` pairs for each option. 172*7f196862SJeremy Kun 173*7f196862SJeremy Kun## Building a pass pipeline on the command line 174*7f196862SJeremy Kun 175*7f196862SJeremy KunThe `--pass-pipeline` flag supports combining multiple passes into a pipeline. 176*7f196862SJeremy KunSo far we have used the trivial pipeline with a single pass 177*7f196862SJeremy Kunthat is "anchored" on the top-level `builtin.module` op. 178*7f196862SJeremy Kun[Pass anchoring](/docs/PassManagement/#oppassmanager) 179*7f196862SJeremy Kunis a way for passes to specify 180*7f196862SJeremy Kunthat they only run on particular ops. 181*7f196862SJeremy KunWhile many passes are anchored on `builtin.module`, 182*7f196862SJeremy Kunif you try to run a pass that is anchored on some other op 183*7f196862SJeremy Kuninside `--pass-pipeline="builtin.module(pass-name)"`, 184*7f196862SJeremy Kunit will not run. 185*7f196862SJeremy Kun 186*7f196862SJeremy KunMultiple passes can be chained together 187*7f196862SJeremy Kunby providing the pass names in a comma-separated list 188*7f196862SJeremy Kunin the `--pass-pipeline` string, 189*7f196862SJeremy Kune.g., 190*7f196862SJeremy Kun`--pass-pipeline="builtin.module(pass1,pass2)"`. 191*7f196862SJeremy KunThe passes will be run sequentially. 192*7f196862SJeremy Kun 193*7f196862SJeremy KunTo use passes that have nontrivial anchoring, 194*7f196862SJeremy Kunthe appropriate level of nesting must be specified 195*7f196862SJeremy Kunin the pass pipeline. 196*7f196862SJeremy KunFor example, consider the following IR which has the same redundant code, 197*7f196862SJeremy Kunbut in two different levels of nesting. 198*7f196862SJeremy Kun 199*7f196862SJeremy Kun```mlir 200*7f196862SJeremy Kunmodule { 201*7f196862SJeremy Kun module { 202*7f196862SJeremy Kun func.func @func1(%arg0: i32) -> i32 { 203*7f196862SJeremy Kun %0 = arith.addi %arg0, %arg0 : i32 204*7f196862SJeremy Kun %1 = arith.addi %arg0, %arg0 : i32 205*7f196862SJeremy Kun %2 = arith.addi %0, %1 : i32 206*7f196862SJeremy Kun func.return %2 : i32 207*7f196862SJeremy Kun } 208*7f196862SJeremy Kun } 209*7f196862SJeremy Kun 210*7f196862SJeremy Kun gpu.module @gpu_module { 211*7f196862SJeremy Kun gpu.func @func2(%arg0: i32) -> i32 { 212*7f196862SJeremy Kun %0 = arith.addi %arg0, %arg0 : i32 213*7f196862SJeremy Kun %1 = arith.addi %arg0, %arg0 : i32 214*7f196862SJeremy Kun %2 = arith.addi %0, %1 : i32 215*7f196862SJeremy Kun gpu.return %2 : i32 216*7f196862SJeremy Kun } 217*7f196862SJeremy Kun } 218*7f196862SJeremy Kun} 219*7f196862SJeremy Kun``` 220*7f196862SJeremy Kun 221*7f196862SJeremy KunThe following pipeline runs `cse` (common subexpression elimination) 222*7f196862SJeremy Kunbut only on the `func.func` inside the two `builtin.module` ops. 223*7f196862SJeremy Kun 224*7f196862SJeremy Kun```bash 225*7f196862SJeremy Kunbuild/bin/mlir-opt mlir/test/Examples/mlir-opt/ctlz.mlir --pass-pipeline=' 226*7f196862SJeremy Kun builtin.module( 227*7f196862SJeremy Kun builtin.module( 228*7f196862SJeremy Kun func.func(cse,canonicalize), 229*7f196862SJeremy Kun convert-to-llvm 230*7f196862SJeremy Kun ) 231*7f196862SJeremy Kun )' 232*7f196862SJeremy Kun``` 233*7f196862SJeremy Kun 234*7f196862SJeremy KunThe output leaves the `gpu.module` alone 235*7f196862SJeremy Kun 236*7f196862SJeremy Kun```mlir 237*7f196862SJeremy Kunmodule { 238*7f196862SJeremy Kun module { 239*7f196862SJeremy Kun llvm.func @func1(%arg0: i32) -> i32 { 240*7f196862SJeremy Kun %0 = llvm.add %arg0, %arg0 : i32 241*7f196862SJeremy Kun %1 = llvm.add %0, %0 : i32 242*7f196862SJeremy Kun llvm.return %1 : i32 243*7f196862SJeremy Kun } 244*7f196862SJeremy Kun } 245*7f196862SJeremy Kun gpu.module @gpu_module { 246*7f196862SJeremy Kun gpu.func @func2(%arg0: i32) -> i32 { 247*7f196862SJeremy Kun %0 = arith.addi %arg0, %arg0 : i32 248*7f196862SJeremy Kun %1 = arith.addi %arg0, %arg0 : i32 249*7f196862SJeremy Kun %2 = arith.addi %0, %1 : i32 250*7f196862SJeremy Kun gpu.return %2 : i32 251*7f196862SJeremy Kun } 252*7f196862SJeremy Kun } 253*7f196862SJeremy Kun} 254*7f196862SJeremy Kun``` 255*7f196862SJeremy Kun 256*7f196862SJeremy KunSpecifying a pass pipeline with nested anchoring 257*7f196862SJeremy Kunis also beneficial for performance reasons: 258*7f196862SJeremy Kunpasses with anchoring can run on IR subsets in parallel, 259*7f196862SJeremy Kunwhich provides better threaded runtime and cache locality 260*7f196862SJeremy Kunwithin threads. 261*7f196862SJeremy KunFor example, 262*7f196862SJeremy Kuneven if a pass is not restricted to anchor on `func.func`, 263*7f196862SJeremy Kunrunning `builtin.module(func.func(cse, canonicalize))` 264*7f196862SJeremy Kunis more efficient than `builtin.module(cse, canonicalize)`. 265*7f196862SJeremy Kun 266*7f196862SJeremy KunFor a spec of the pass-pipeline textual description language, 267*7f196862SJeremy Kunsee [the docs](/docs/PassManagement/#textual-pass-pipeline-specification). 268*7f196862SJeremy KunFor more general information on pass management, see [Pass Infrastructure](/docs/PassManagement/#). 269*7f196862SJeremy Kun 270*7f196862SJeremy Kun## Useful CLI flags 271*7f196862SJeremy Kun 272*7f196862SJeremy Kun- `--debug` prints all debug information produced by `LLVM_DEBUG` calls. 273*7f196862SJeremy Kun- `--debug-only="my-tag"` prints only the debug information produced by `LLVM_DEBUG` 274*7f196862SJeremy Kun in files that have the macro `#define DEBUG_TYPE "my-tag"`. 275*7f196862SJeremy Kun This often allows you to print only debug information associated with a specific pass. 276*7f196862SJeremy Kun - `"greedy-rewriter"` only prints debug information 277*7f196862SJeremy Kun for patterns applied with the greedy rewriter engine. 278*7f196862SJeremy Kun - `"dialect-conversion"` only prints debug information 279*7f196862SJeremy Kun for the dialect conversion framework. 280*7f196862SJeremy Kun - `--emit-bytecode` emits MLIR in the bytecode format. 281*7f196862SJeremy Kun - `--mlir-pass-statistics` print statistics about the passes run. 282*7f196862SJeremy Kun These are generated via [pass statistics](/docs/PassManagement/#pass-statistics). 283*7f196862SJeremy Kun - `--mlir-print-ir-after-all` prints the IR after each pass. 284*7f196862SJeremy Kun - See also `--mlir-print-ir-after-change`, `--mlir-print-ir-after-failure`, 285*7f196862SJeremy Kun and analogous versions of these flags with `before` instead of `after`. 286*7f196862SJeremy Kun - When using `print-ir` flags, adding `--mlir-print-ir-tree-dir` writes the 287*7f196862SJeremy Kun IRs to files in a directory tree, making them easier to inspect versus a 288*7f196862SJeremy Kun large dump to the terminal. 289*7f196862SJeremy Kun - `--mlir-timing` displays execution times of each pass. 290*7f196862SJeremy Kun 291*7f196862SJeremy Kun## Further readering 292*7f196862SJeremy Kun 293*7f196862SJeremy Kun- [List of passes](/docs/Passes/) 294*7f196862SJeremy Kun- [List of dialects](/docs/Dialects/) 295