docs/Tutorials/MlirOpt.md

*7f196862SJeremy Kun# Using `mlir-opt`
*7f196862SJeremy Kun
*7f196862SJeremy Kun`mlir-opt` is a command-line entry point for running passes and lowerings on MLIR code.
*7f196862SJeremy KunThis tutorial will explain how to use `mlir-opt`, show some examples of its usage,
*7f196862SJeremy Kunand mention some useful tips for working with it.
*7f196862SJeremy Kun
*7f196862SJeremy KunPrerequisites:
*7f196862SJeremy Kun
*7f196862SJeremy Kun- [Building MLIR from source](/getting_started/)
*7f196862SJeremy Kun- [MLIR Language Reference](/docs/LangRef/)
*7f196862SJeremy Kun
*7f196862SJeremy Kun[TOC]
*7f196862SJeremy Kun
*7f196862SJeremy Kun## `mlir-opt` basics
*7f196862SJeremy Kun
*7f196862SJeremy KunThe `mlir-opt` tool loads a textual IR or bytecode into an in-memory structure,
*7f196862SJeremy Kunand optionally executes a sequence of passes
*7f196862SJeremy Kunbefore serializing back the IR (textual form by default).
*7f196862SJeremy KunIt is intended as a testing and debugging utility.
*7f196862SJeremy Kun
*7f196862SJeremy KunAfter building the MLIR project,
*7f196862SJeremy Kunthe `mlir-opt` binary (located in `build/bin`)
*7f196862SJeremy Kunis the entry point for running passes and lowerings,
*7f196862SJeremy Kunas well as emitting debug and diagnostic data.
*7f196862SJeremy Kun
*7f196862SJeremy KunRunning `mlir-opt` with no flags will consume textual or bytecode IR
*7f196862SJeremy Kunfrom the standard input, parse and run verifiers on it,
*7f196862SJeremy Kunand write the textual format back to the standard output.
*7f196862SJeremy KunThis is a good way to test if an input MLIR is well-formed.
*7f196862SJeremy Kun
*7f196862SJeremy Kun`mlir-opt --help` shows a complete list of flags
*7f196862SJeremy Kun(there are nearly 1000).
*7f196862SJeremy KunEach pass has its own flag,
*7f196862SJeremy Kunthough it is recommended to use `--pass-pipeline`
*7f196862SJeremy Kunto run passes rather than bare flags.
*7f196862SJeremy Kun
*7f196862SJeremy Kun## Running a pass
*7f196862SJeremy Kun
*7f196862SJeremy KunNext we run [`convert-to-llvm`](/docs/Passes/#-convert-to-llvm),
*7f196862SJeremy Kunwhich converts all supported dialects to the `llvm` dialect,
*7f196862SJeremy Kunon the following IR:
*7f196862SJeremy Kun
*7f196862SJeremy Kun```mlir
*7f196862SJeremy Kun// mlir/test/Examples/mlir-opt/ctlz.mlir
*7f196862SJeremy Kunmodule {
*7f196862SJeremy Kun  func.func @main(%arg0: i32) -> i32 {
*7f196862SJeremy Kun    %0 = math.ctlz %arg0 : i32
*7f196862SJeremy Kun    func.return %0 : i32
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun}
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunAfter building MLIR, and from the `llvm-project` base directory, run
*7f196862SJeremy Kun
*7f196862SJeremy Kun```bash
*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(convert-math-to-llvm)" mlir/test/Examples/mlir-opt/ctlz.mlir
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy Kunwhich produces
*7f196862SJeremy Kun
*7f196862SJeremy Kun```mlir
*7f196862SJeremy Kunmodule {
*7f196862SJeremy Kun  func.func @main(%arg0: i32) -> i32 {
*7f196862SJeremy Kun    %0 = "llvm.intr.ctlz"(%arg0) <{is_zero_poison = false}> : (i32) -> i32
*7f196862SJeremy Kun    return %0 : i32
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun}
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunNote that `llvm` here is MLIR's `llvm` dialect,
*7f196862SJeremy Kunwhich would still need to be processed through `mlir-translate`
*7f196862SJeremy Kunto generate LLVM-IR.
*7f196862SJeremy Kun
*7f196862SJeremy Kun## Running a pass with options
*7f196862SJeremy Kun
*7f196862SJeremy KunNext we will show how to run a pass that takes configuration options.
*7f196862SJeremy KunConsider the following IR containing loops with poor cache locality.
*7f196862SJeremy Kun
*7f196862SJeremy Kun```mlir
*7f196862SJeremy Kun// mlir/test/Examples/mlir-opt/loop_fusion.mlir
*7f196862SJeremy Kunmodule {
*7f196862SJeremy Kun  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
*7f196862SJeremy Kun    %0 = memref.alloc() : memref<10xf32>
*7f196862SJeremy Kun    %1 = memref.alloc() : memref<10xf32>
*7f196862SJeremy Kun    %cst = arith.constant 0.000000e+00 : f32
*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
*7f196862SJeremy Kun      affine.store %cst, %0[%arg2] : memref<10xf32>
*7f196862SJeremy Kun      affine.store %cst, %1[%arg2] : memref<10xf32>
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
*7f196862SJeremy Kun      %2 = affine.load %0[%arg2] : memref<10xf32>
*7f196862SJeremy Kun      %3 = arith.addf %2, %2 : f32
*7f196862SJeremy Kun      affine.store %3, %arg0[%arg2] : memref<10xf32>
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
*7f196862SJeremy Kun      %2 = affine.load %1[%arg2] : memref<10xf32>
*7f196862SJeremy Kun      %3 = arith.mulf %2, %2 : f32
*7f196862SJeremy Kun      affine.store %3, %arg1[%arg2] : memref<10xf32>
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun    return
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun}
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunRunning this with the [`affine-loop-fusion`](/docs/Passes/#-affine-loop-fusion) pass
*7f196862SJeremy Kunproduces a fused loop.
*7f196862SJeremy Kun
*7f196862SJeremy Kun```bash
*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion)" mlir/test/Examples/mlir-opt/loop_fusion.mlir
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy Kun```mlir
*7f196862SJeremy Kunmodule {
*7f196862SJeremy Kun  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
*7f196862SJeremy Kun    %alloc = memref.alloc() : memref<1xf32>
*7f196862SJeremy Kun    %alloc_0 = memref.alloc() : memref<1xf32>
*7f196862SJeremy Kun    %cst = arith.constant 0.000000e+00 : f32
*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
*7f196862SJeremy Kun      affine.store %cst, %alloc[0] : memref<1xf32>
*7f196862SJeremy Kun      affine.store %cst, %alloc_0[0] : memref<1xf32>
*7f196862SJeremy Kun      %0 = affine.load %alloc_0[0] : memref<1xf32>
*7f196862SJeremy Kun      %1 = arith.mulf %0, %0 : f32
*7f196862SJeremy Kun      affine.store %1, %arg1[%arg2] : memref<10xf32>
*7f196862SJeremy Kun      %2 = affine.load %alloc[0] : memref<1xf32>
*7f196862SJeremy Kun      %3 = arith.addf %2, %2 : f32
*7f196862SJeremy Kun      affine.store %3, %arg0[%arg2] : memref<10xf32>
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun    return
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun}
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunThis pass has options that allow the user to configure its behavior.
*7f196862SJeremy KunFor example, the `fusion-compute-tolerance` option
*7f196862SJeremy Kunis described as the "fractional increase in additional computation tolerated while fusing."
*7f196862SJeremy KunIf this value is set to zero on the command line,
*7f196862SJeremy Kunthe pass will not fuse the loops.
*7f196862SJeremy Kun
*7f196862SJeremy Kun```bash
*7f196862SJeremy Kunbuild/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion{fusion-compute-tolerance=0})" \
*7f196862SJeremy Kunmlir/test/Examples/mlir-opt/loop_fusion.mlir
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy Kun```mlir
*7f196862SJeremy Kunmodule {
*7f196862SJeremy Kun  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
*7f196862SJeremy Kun    %alloc = memref.alloc() : memref<10xf32>
*7f196862SJeremy Kun    %alloc_0 = memref.alloc() : memref<10xf32>
*7f196862SJeremy Kun    %cst = arith.constant 0.000000e+00 : f32
*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
*7f196862SJeremy Kun      affine.store %cst, %alloc[%arg2] : memref<10xf32>
*7f196862SJeremy Kun      affine.store %cst, %alloc_0[%arg2] : memref<10xf32>
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
*7f196862SJeremy Kun      %0 = affine.load %alloc[%arg2] : memref<10xf32>
*7f196862SJeremy Kun      %1 = arith.addf %0, %0 : f32
*7f196862SJeremy Kun      affine.store %1, %arg0[%arg2] : memref<10xf32>
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun    affine.for %arg2 = 0 to 10 {
*7f196862SJeremy Kun      %0 = affine.load %alloc_0[%arg2] : memref<10xf32>
*7f196862SJeremy Kun      %1 = arith.mulf %0, %0 : f32
*7f196862SJeremy Kun      affine.store %1, %arg1[%arg2] : memref<10xf32>
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun    return
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun}
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunOptions passed to a pass
*7f196862SJeremy Kunare specified via the syntax `{option1=value1 option2=value2 ...}`,
*7f196862SJeremy Kuni.e., use space-separated `key=value` pairs for each option.
*7f196862SJeremy Kun
*7f196862SJeremy Kun## Building a pass pipeline on the command line
*7f196862SJeremy Kun
*7f196862SJeremy KunThe `--pass-pipeline` flag supports combining multiple passes into a pipeline.
*7f196862SJeremy KunSo far we have used the trivial pipeline with a single pass
*7f196862SJeremy Kunthat is "anchored" on the top-level `builtin.module` op.
*7f196862SJeremy Kun[Pass anchoring](/docs/PassManagement/#oppassmanager)
*7f196862SJeremy Kunis a way for passes to specify
*7f196862SJeremy Kunthat they only run on particular ops.
*7f196862SJeremy KunWhile many passes are anchored on `builtin.module`,
*7f196862SJeremy Kunif you try to run a pass that is anchored on some other op
*7f196862SJeremy Kuninside `--pass-pipeline="builtin.module(pass-name)"`,
*7f196862SJeremy Kunit will not run.
*7f196862SJeremy Kun
*7f196862SJeremy KunMultiple passes can be chained together
*7f196862SJeremy Kunby providing the pass names in a comma-separated list
*7f196862SJeremy Kunin the `--pass-pipeline` string,
*7f196862SJeremy Kune.g.,
*7f196862SJeremy Kun`--pass-pipeline="builtin.module(pass1,pass2)"`.
*7f196862SJeremy KunThe passes will be run sequentially.
*7f196862SJeremy Kun
*7f196862SJeremy KunTo use passes that have nontrivial anchoring,
*7f196862SJeremy Kunthe appropriate level of nesting must be specified
*7f196862SJeremy Kunin the pass pipeline.
*7f196862SJeremy KunFor example, consider the following IR which has the same redundant code,
*7f196862SJeremy Kunbut in two different levels of nesting.
*7f196862SJeremy Kun
*7f196862SJeremy Kun```mlir
*7f196862SJeremy Kunmodule {
*7f196862SJeremy Kun  module {
*7f196862SJeremy Kun    func.func @func1(%arg0: i32) -> i32 {
*7f196862SJeremy Kun      %0 = arith.addi %arg0, %arg0 : i32
*7f196862SJeremy Kun      %1 = arith.addi %arg0, %arg0 : i32
*7f196862SJeremy Kun      %2 = arith.addi %0, %1 : i32
*7f196862SJeremy Kun      func.return %2 : i32
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun
*7f196862SJeremy Kun  gpu.module @gpu_module {
*7f196862SJeremy Kun    gpu.func @func2(%arg0: i32) -> i32 {
*7f196862SJeremy Kun      %0 = arith.addi %arg0, %arg0 : i32
*7f196862SJeremy Kun      %1 = arith.addi %arg0, %arg0 : i32
*7f196862SJeremy Kun      %2 = arith.addi %0, %1 : i32
*7f196862SJeremy Kun      gpu.return %2 : i32
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun}
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunThe following pipeline runs `cse` (common subexpression elimination)
*7f196862SJeremy Kunbut only on the `func.func` inside the two `builtin.module` ops.
*7f196862SJeremy Kun
*7f196862SJeremy Kun```bash
*7f196862SJeremy Kunbuild/bin/mlir-opt mlir/test/Examples/mlir-opt/ctlz.mlir --pass-pipeline='
*7f196862SJeremy Kun    builtin.module(
*7f196862SJeremy Kun        builtin.module(
*7f196862SJeremy Kun            func.func(cse,canonicalize),
*7f196862SJeremy Kun            convert-to-llvm
*7f196862SJeremy Kun        )
*7f196862SJeremy Kun    )'
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunThe output leaves the `gpu.module` alone
*7f196862SJeremy Kun
*7f196862SJeremy Kun```mlir
*7f196862SJeremy Kunmodule {
*7f196862SJeremy Kun  module {
*7f196862SJeremy Kun    llvm.func @func1(%arg0: i32) -> i32 {
*7f196862SJeremy Kun      %0 = llvm.add %arg0, %arg0 : i32
*7f196862SJeremy Kun      %1 = llvm.add %0, %0 : i32
*7f196862SJeremy Kun      llvm.return %1 : i32
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun  gpu.module @gpu_module {
*7f196862SJeremy Kun    gpu.func @func2(%arg0: i32) -> i32 {
*7f196862SJeremy Kun      %0 = arith.addi %arg0, %arg0 : i32
*7f196862SJeremy Kun      %1 = arith.addi %arg0, %arg0 : i32
*7f196862SJeremy Kun      %2 = arith.addi %0, %1 : i32
*7f196862SJeremy Kun      gpu.return %2 : i32
*7f196862SJeremy Kun    }
*7f196862SJeremy Kun  }
*7f196862SJeremy Kun}
*7f196862SJeremy Kun```
*7f196862SJeremy Kun
*7f196862SJeremy KunSpecifying a pass pipeline with nested anchoring
*7f196862SJeremy Kunis also beneficial for performance reasons:
*7f196862SJeremy Kunpasses with anchoring can run on IR subsets in parallel,
*7f196862SJeremy Kunwhich provides better threaded runtime and cache locality
*7f196862SJeremy Kunwithin threads.
*7f196862SJeremy KunFor example,
*7f196862SJeremy Kuneven if a pass is not restricted to anchor on `func.func`,
*7f196862SJeremy Kunrunning `builtin.module(func.func(cse, canonicalize))`
*7f196862SJeremy Kunis more efficient than `builtin.module(cse, canonicalize)`.
*7f196862SJeremy Kun
*7f196862SJeremy KunFor a spec of the pass-pipeline textual description language,
*7f196862SJeremy Kunsee [the docs](/docs/PassManagement/#textual-pass-pipeline-specification).
*7f196862SJeremy KunFor more general information on pass management, see [Pass Infrastructure](/docs/PassManagement/#).
*7f196862SJeremy Kun
*7f196862SJeremy Kun## Useful CLI flags
*7f196862SJeremy Kun
*7f196862SJeremy Kun- `--debug` prints all debug information produced by `LLVM_DEBUG` calls.
*7f196862SJeremy Kun- `--debug-only="my-tag"` prints only the debug information produced by `LLVM_DEBUG`
*7f196862SJeremy Kun  in files that have the macro `#define DEBUG_TYPE "my-tag"`.
*7f196862SJeremy Kun  This often allows you to print only debug information associated with a specific pass.
*7f196862SJeremy Kun    - `"greedy-rewriter"` only prints debug information
*7f196862SJeremy Kun      for patterns applied with the greedy rewriter engine.
*7f196862SJeremy Kun    - `"dialect-conversion"` only prints debug information
*7f196862SJeremy Kun      for the dialect conversion framework.
*7f196862SJeremy Kun - `--emit-bytecode` emits MLIR in the bytecode format.
*7f196862SJeremy Kun - `--mlir-pass-statistics` print statistics about the passes run.
*7f196862SJeremy Kun    These are generated via [pass statistics](/docs/PassManagement/#pass-statistics).
*7f196862SJeremy Kun - `--mlir-print-ir-after-all` prints the IR after each pass.
*7f196862SJeremy Kun    - See also `--mlir-print-ir-after-change`, `--mlir-print-ir-after-failure`,
*7f196862SJeremy Kun      and analogous versions of these flags with `before` instead of `after`.
*7f196862SJeremy Kun    - When using `print-ir` flags, adding `--mlir-print-ir-tree-dir` writes the
*7f196862SJeremy Kun      IRs to files in a directory tree, making them easier to inspect versus a
*7f196862SJeremy Kun      large dump to the terminal.
*7f196862SJeremy Kun - `--mlir-timing` displays execution times of each pass.
*7f196862SJeremy Kun
*7f196862SJeremy Kun## Further readering
*7f196862SJeremy Kun
*7f196862SJeremy Kun- [List of passes](/docs/Passes/)
*7f196862SJeremy Kun- [List of dialects](/docs/Dialects/)