xref: /llvm-project/mlir/docs/Tutorials/MlirOpt.md (revision 7f1968625a607fb49a2b9a67f3c8fb2892cf4839)
1*7f196862SJeremy Kun# Using `mlir-opt`
2*7f196862SJeremy Kun
3*7f196862SJeremy Kun`mlir-opt` is a command-line entry point for running passes and lowerings on MLIR code.
4*7f196862SJeremy KunThis tutorial will explain how to use `mlir-opt`, show some examples of its usage,
5*7f196862SJeremy Kunand mention some useful tips for working with it.
6*7f196862SJeremy Kun
7*7f196862SJeremy KunPrerequisites:
8*7f196862SJeremy Kun
9*7f196862SJeremy Kun- [Building MLIR from source](/getting_started/)
10*7f196862SJeremy Kun- [MLIR Language Reference](/docs/LangRef/)
11*7f196862SJeremy Kun
12*7f196862SJeremy Kun[TOC]
13*7f196862SJeremy Kun
14*7f196862SJeremy Kun## `mlir-opt` basics
15*7f196862SJeremy Kun
16*7f196862SJeremy KunThe `mlir-opt` tool loads a textual IR or bytecode into an in-memory structure,
17*7f196862SJeremy Kunand optionally executes a sequence of passes
18*7f196862SJeremy Kunbefore serializing back the IR (textual form by default).
19*7f196862SJeremy KunIt is intended as a testing and debugging utility.
20*7f196862SJeremy Kun
21*7f196862SJeremy KunAfter building the MLIR project,
22*7f196862SJeremy Kunthe `mlir-opt` binary (located in `build/bin`)
23*7f196862SJeremy Kunis the entry point for running passes and lowerings,
24*7f196862SJeremy Kunas well as emitting debug and diagnostic data.
25*7f196862SJeremy Kun
26*7f196862SJeremy KunRunning `mlir-opt` with no flags will consume textual or bytecode IR
27*7f196862SJeremy Kunfrom the standard input, parse and run verifiers on it,
28*7f196862SJeremy Kunand write the textual format back to the standard output.
29*7f196862SJeremy KunThis is a good way to test if an input MLIR is well-formed.
30*7f196862SJeremy Kun
31*7f196862SJeremy Kun`mlir-opt --help` shows a complete list of flags
32*7f196862SJeremy Kun(there are nearly 1000).
33*7f196862SJeremy KunEach pass has its own flag,
34*7f196862SJeremy Kunthough it is recommended to use `--pass-pipeline`
35*7f196862SJeremy Kunto run passes rather than bare flags.
36*7f196862SJeremy Kun
37*7f196862SJeremy Kun## Running a pass
38*7f196862SJeremy Kun
39*7f196862SJeremy KunNext we run [`convert-to-llvm`](/docs/Passes/#-convert-to-llvm),
40*7f196862SJeremy Kunwhich converts all supported dialects to the `llvm` dialect,
41*7f196862SJeremy Kunon the following IR:
42*7f196862SJeremy Kun
43*7f196862SJeremy Kun```mlir
44*7f196862SJeremy Kun// mlir/test/Examples/mlir-opt/ctlz.mlir
45*7f196862SJeremy Kunmodule {
46*7f196862SJeremy Kun  func.func @main(%arg0: i32) -> i32 {
47*7f196862SJeremy Kun    %0 = math.ctlz %arg0 : i32
48*7f196862SJeremy Kun    func.return %0 : i32
49*7f196862SJeremy Kun  }
50*7f196862SJeremy Kun}
51*7f196862SJeremy Kun```
52*7f196862SJeremy Kun
53*7f196862SJeremy KunAfter building MLIR, and from the `llvm-project` base directory, run
54*7f196862SJeremy Kun
55*7f196862SJeremy Kun```bash
56*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(convert-math-to-llvm)" mlir/test/Examples/mlir-opt/ctlz.mlir
57*7f196862SJeremy Kun```
58*7f196862SJeremy Kun
59*7f196862SJeremy Kunwhich produces
60*7f196862SJeremy Kun
61*7f196862SJeremy Kun```mlir
62*7f196862SJeremy Kunmodule {
63*7f196862SJeremy Kun  func.func @main(%arg0: i32) -> i32 {
64*7f196862SJeremy Kun    %0 = "llvm.intr.ctlz"(%arg0) <{is_zero_poison = false}> : (i32) -> i32
65*7f196862SJeremy Kun    return %0 : i32
66*7f196862SJeremy Kun  }
67*7f196862SJeremy Kun}
68*7f196862SJeremy Kun```
69*7f196862SJeremy Kun
70*7f196862SJeremy KunNote that `llvm` here is MLIR's `llvm` dialect,
71*7f196862SJeremy Kunwhich would still need to be processed through `mlir-translate`
72*7f196862SJeremy Kunto generate LLVM-IR.
73*7f196862SJeremy Kun
74*7f196862SJeremy Kun## Running a pass with options
75*7f196862SJeremy Kun
76*7f196862SJeremy KunNext we will show how to run a pass that takes configuration options.
77*7f196862SJeremy KunConsider the following IR containing loops with poor cache locality.
78*7f196862SJeremy Kun
79*7f196862SJeremy Kun```mlir
80*7f196862SJeremy Kun// mlir/test/Examples/mlir-opt/loop_fusion.mlir
81*7f196862SJeremy Kunmodule {
82*7f196862SJeremy Kun  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
83*7f196862SJeremy Kun    %0 = memref.alloc() : memref<10xf32>
84*7f196862SJeremy Kun    %1 = memref.alloc() : memref<10xf32>
85*7f196862SJeremy Kun    %cst = arith.constant 0.000000e+00 : f32
86*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
87*7f196862SJeremy Kun      affine.store %cst, %0[%arg2] : memref<10xf32>
88*7f196862SJeremy Kun      affine.store %cst, %1[%arg2] : memref<10xf32>
89*7f196862SJeremy Kun    }
90*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
91*7f196862SJeremy Kun      %2 = affine.load %0[%arg2] : memref<10xf32>
92*7f196862SJeremy Kun      %3 = arith.addf %2, %2 : f32
93*7f196862SJeremy Kun      affine.store %3, %arg0[%arg2] : memref<10xf32>
94*7f196862SJeremy Kun    }
95*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
96*7f196862SJeremy Kun      %2 = affine.load %1[%arg2] : memref<10xf32>
97*7f196862SJeremy Kun      %3 = arith.mulf %2, %2 : f32
98*7f196862SJeremy Kun      affine.store %3, %arg1[%arg2] : memref<10xf32>
99*7f196862SJeremy Kun    }
100*7f196862SJeremy Kun    return
101*7f196862SJeremy Kun  }
102*7f196862SJeremy Kun}
103*7f196862SJeremy Kun```
104*7f196862SJeremy Kun
105*7f196862SJeremy KunRunning this with the [`affine-loop-fusion`](/docs/Passes/#-affine-loop-fusion) pass
106*7f196862SJeremy Kunproduces a fused loop.
107*7f196862SJeremy Kun
108*7f196862SJeremy Kun```bash
109*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion)" mlir/test/Examples/mlir-opt/loop_fusion.mlir
110*7f196862SJeremy Kun```
111*7f196862SJeremy Kun
112*7f196862SJeremy Kun```mlir
113*7f196862SJeremy Kunmodule {
114*7f196862SJeremy Kun  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
115*7f196862SJeremy Kun    %alloc = memref.alloc() : memref<1xf32>
116*7f196862SJeremy Kun    %alloc_0 = memref.alloc() : memref<1xf32>
117*7f196862SJeremy Kun    %cst = arith.constant 0.000000e+00 : f32
118*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
119*7f196862SJeremy Kun      affine.store %cst, %alloc[0] : memref<1xf32>
120*7f196862SJeremy Kun      affine.store %cst, %alloc_0[0] : memref<1xf32>
121*7f196862SJeremy Kun      %0 = affine.load %alloc_0[0] : memref<1xf32>
122*7f196862SJeremy Kun      %1 = arith.mulf %0, %0 : f32
123*7f196862SJeremy Kun      affine.store %1, %arg1[%arg2] : memref<10xf32>
124*7f196862SJeremy Kun      %2 = affine.load %alloc[0] : memref<1xf32>
125*7f196862SJeremy Kun      %3 = arith.addf %2, %2 : f32
126*7f196862SJeremy Kun      affine.store %3, %arg0[%arg2] : memref<10xf32>
127*7f196862SJeremy Kun    }
128*7f196862SJeremy Kun    return
129*7f196862SJeremy Kun  }
130*7f196862SJeremy Kun}
131*7f196862SJeremy Kun```
132*7f196862SJeremy Kun
133*7f196862SJeremy KunThis pass has options that allow the user to configure its behavior.
134*7f196862SJeremy KunFor example, the `fusion-compute-tolerance` option
135*7f196862SJeremy Kunis described as the "fractional increase in additional computation tolerated while fusing."
136*7f196862SJeremy KunIf this value is set to zero on the command line,
137*7f196862SJeremy Kunthe pass will not fuse the loops.
138*7f196862SJeremy Kun
139*7f196862SJeremy Kun```bash
140*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion{fusion-compute-tolerance=0})" \
141*7f196862SJeremy Kunmlir/test/Examples/mlir-opt/loop_fusion.mlir
142*7f196862SJeremy Kun```
143*7f196862SJeremy Kun
144*7f196862SJeremy Kun```mlir
145*7f196862SJeremy Kunmodule {
146*7f196862SJeremy Kun  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
147*7f196862SJeremy Kun    %alloc = memref.alloc() : memref<10xf32>
148*7f196862SJeremy Kun    %alloc_0 = memref.alloc() : memref<10xf32>
149*7f196862SJeremy Kun    %cst = arith.constant 0.000000e+00 : f32
150*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
151*7f196862SJeremy Kun      affine.store %cst, %alloc[%arg2] : memref<10xf32>
152*7f196862SJeremy Kun      affine.store %cst, %alloc_0[%arg2] : memref<10xf32>
153*7f196862SJeremy Kun    }
154*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
155*7f196862SJeremy Kun      %0 = affine.load %alloc[%arg2] : memref<10xf32>
156*7f196862SJeremy Kun      %1 = arith.addf %0, %0 : f32
157*7f196862SJeremy Kun      affine.store %1, %arg0[%arg2] : memref<10xf32>
158*7f196862SJeremy Kun    }
159*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
160*7f196862SJeremy Kun      %0 = affine.load %alloc_0[%arg2] : memref<10xf32>
161*7f196862SJeremy Kun      %1 = arith.mulf %0, %0 : f32
162*7f196862SJeremy Kun      affine.store %1, %arg1[%arg2] : memref<10xf32>
163*7f196862SJeremy Kun    }
164*7f196862SJeremy Kun    return
165*7f196862SJeremy Kun  }
166*7f196862SJeremy Kun}
167*7f196862SJeremy Kun```
168*7f196862SJeremy Kun
169*7f196862SJeremy KunOptions passed to a pass
170*7f196862SJeremy Kunare specified via the syntax `{option1=value1 option2=value2 ...}`,
171*7f196862SJeremy Kuni.e., use space-separated `key=value` pairs for each option.
172*7f196862SJeremy Kun
173*7f196862SJeremy Kun## Building a pass pipeline on the command line
174*7f196862SJeremy Kun
175*7f196862SJeremy KunThe `--pass-pipeline` flag supports combining multiple passes into a pipeline.
176*7f196862SJeremy KunSo far we have used the trivial pipeline with a single pass
177*7f196862SJeremy Kunthat is "anchored" on the top-level `builtin.module` op.
178*7f196862SJeremy Kun[Pass anchoring](/docs/PassManagement/#oppassmanager)
179*7f196862SJeremy Kunis a way for passes to specify
180*7f196862SJeremy Kunthat they only run on particular ops.
181*7f196862SJeremy KunWhile many passes are anchored on `builtin.module`,
182*7f196862SJeremy Kunif you try to run a pass that is anchored on some other op
183*7f196862SJeremy Kuninside `--pass-pipeline="builtin.module(pass-name)"`,
184*7f196862SJeremy Kunit will not run.
185*7f196862SJeremy Kun
186*7f196862SJeremy KunMultiple passes can be chained together
187*7f196862SJeremy Kunby providing the pass names in a comma-separated list
188*7f196862SJeremy Kunin the `--pass-pipeline` string,
189*7f196862SJeremy Kune.g.,
190*7f196862SJeremy Kun`--pass-pipeline="builtin.module(pass1,pass2)"`.
191*7f196862SJeremy KunThe passes will be run sequentially.
192*7f196862SJeremy Kun
193*7f196862SJeremy KunTo use passes that have nontrivial anchoring,
194*7f196862SJeremy Kunthe appropriate level of nesting must be specified
195*7f196862SJeremy Kunin the pass pipeline.
196*7f196862SJeremy KunFor example, consider the following IR which has the same redundant code,
197*7f196862SJeremy Kunbut in two different levels of nesting.
198*7f196862SJeremy Kun
199*7f196862SJeremy Kun```mlir
200*7f196862SJeremy Kunmodule {
201*7f196862SJeremy Kun  module {
202*7f196862SJeremy Kun    func.func @func1(%arg0: i32) -> i32 {
203*7f196862SJeremy Kun      %0 = arith.addi %arg0, %arg0 : i32
204*7f196862SJeremy Kun      %1 = arith.addi %arg0, %arg0 : i32
205*7f196862SJeremy Kun      %2 = arith.addi %0, %1 : i32
206*7f196862SJeremy Kun      func.return %2 : i32
207*7f196862SJeremy Kun    }
208*7f196862SJeremy Kun  }
209*7f196862SJeremy Kun
210*7f196862SJeremy Kun  gpu.module @gpu_module {
211*7f196862SJeremy Kun    gpu.func @func2(%arg0: i32) -> i32 {
212*7f196862SJeremy Kun      %0 = arith.addi %arg0, %arg0 : i32
213*7f196862SJeremy Kun      %1 = arith.addi %arg0, %arg0 : i32
214*7f196862SJeremy Kun      %2 = arith.addi %0, %1 : i32
215*7f196862SJeremy Kun      gpu.return %2 : i32
216*7f196862SJeremy Kun    }
217*7f196862SJeremy Kun  }
218*7f196862SJeremy Kun}
219*7f196862SJeremy Kun```
220*7f196862SJeremy Kun
221*7f196862SJeremy KunThe following pipeline runs `cse` (common subexpression elimination)
222*7f196862SJeremy Kunbut only on the `func.func` inside the two `builtin.module` ops.
223*7f196862SJeremy Kun
224*7f196862SJeremy Kun```bash
225*7f196862SJeremy Kunbuild/bin/mlir-opt mlir/test/Examples/mlir-opt/ctlz.mlir --pass-pipeline='
226*7f196862SJeremy Kun    builtin.module(
227*7f196862SJeremy Kun        builtin.module(
228*7f196862SJeremy Kun            func.func(cse,canonicalize),
229*7f196862SJeremy Kun            convert-to-llvm
230*7f196862SJeremy Kun        )
231*7f196862SJeremy Kun    )'
232*7f196862SJeremy Kun```
233*7f196862SJeremy Kun
234*7f196862SJeremy KunThe output leaves the `gpu.module` alone
235*7f196862SJeremy Kun
236*7f196862SJeremy Kun```mlir
237*7f196862SJeremy Kunmodule {
238*7f196862SJeremy Kun  module {
239*7f196862SJeremy Kun    llvm.func @func1(%arg0: i32) -> i32 {
240*7f196862SJeremy Kun      %0 = llvm.add %arg0, %arg0 : i32
241*7f196862SJeremy Kun      %1 = llvm.add %0, %0 : i32
242*7f196862SJeremy Kun      llvm.return %1 : i32
243*7f196862SJeremy Kun    }
244*7f196862SJeremy Kun  }
245*7f196862SJeremy Kun  gpu.module @gpu_module {
246*7f196862SJeremy Kun    gpu.func @func2(%arg0: i32) -> i32 {
247*7f196862SJeremy Kun      %0 = arith.addi %arg0, %arg0 : i32
248*7f196862SJeremy Kun      %1 = arith.addi %arg0, %arg0 : i32
249*7f196862SJeremy Kun      %2 = arith.addi %0, %1 : i32
250*7f196862SJeremy Kun      gpu.return %2 : i32
251*7f196862SJeremy Kun    }
252*7f196862SJeremy Kun  }
253*7f196862SJeremy Kun}
254*7f196862SJeremy Kun```
255*7f196862SJeremy Kun
256*7f196862SJeremy KunSpecifying a pass pipeline with nested anchoring
257*7f196862SJeremy Kunis also beneficial for performance reasons:
258*7f196862SJeremy Kunpasses with anchoring can run on IR subsets in parallel,
259*7f196862SJeremy Kunwhich provides better threaded runtime and cache locality
260*7f196862SJeremy Kunwithin threads.
261*7f196862SJeremy KunFor example,
262*7f196862SJeremy Kuneven if a pass is not restricted to anchor on `func.func`,
263*7f196862SJeremy Kunrunning `builtin.module(func.func(cse, canonicalize))`
264*7f196862SJeremy Kunis more efficient than `builtin.module(cse, canonicalize)`.
265*7f196862SJeremy Kun
266*7f196862SJeremy KunFor a spec of the pass-pipeline textual description language,
267*7f196862SJeremy Kunsee [the docs](/docs/PassManagement/#textual-pass-pipeline-specification).
268*7f196862SJeremy KunFor more general information on pass management, see [Pass Infrastructure](/docs/PassManagement/#).
269*7f196862SJeremy Kun
270*7f196862SJeremy Kun## Useful CLI flags
271*7f196862SJeremy Kun
272*7f196862SJeremy Kun- `--debug` prints all debug information produced by `LLVM_DEBUG` calls.
273*7f196862SJeremy Kun- `--debug-only="my-tag"` prints only the debug information produced by `LLVM_DEBUG`
274*7f196862SJeremy Kun  in files that have the macro `#define DEBUG_TYPE "my-tag"`.
275*7f196862SJeremy Kun  This often allows you to print only debug information associated with a specific pass.
276*7f196862SJeremy Kun    - `"greedy-rewriter"` only prints debug information
277*7f196862SJeremy Kun      for patterns applied with the greedy rewriter engine.
278*7f196862SJeremy Kun    - `"dialect-conversion"` only prints debug information
279*7f196862SJeremy Kun      for the dialect conversion framework.
280*7f196862SJeremy Kun - `--emit-bytecode` emits MLIR in the bytecode format.
281*7f196862SJeremy Kun - `--mlir-pass-statistics` print statistics about the passes run.
282*7f196862SJeremy Kun    These are generated via [pass statistics](/docs/PassManagement/#pass-statistics).
283*7f196862SJeremy Kun - `--mlir-print-ir-after-all` prints the IR after each pass.
284*7f196862SJeremy Kun    - See also `--mlir-print-ir-after-change`, `--mlir-print-ir-after-failure`,
285*7f196862SJeremy Kun      and analogous versions of these flags with `before` instead of `after`.
286*7f196862SJeremy Kun    - When using `print-ir` flags, adding `--mlir-print-ir-tree-dir` writes the
287*7f196862SJeremy Kun      IRs to files in a directory tree, making them easier to inspect versus a
288*7f196862SJeremy Kun      large dump to the terminal.
289*7f196862SJeremy Kun - `--mlir-timing` displays execution times of each pass.
290*7f196862SJeremy Kun
291*7f196862SJeremy Kun## Further readering
292*7f196862SJeremy Kun
293*7f196862SJeremy Kun- [List of passes](/docs/Passes/)
294*7f196862SJeremy Kun- [List of dialects](/docs/Dialects/)
295