xref: /llvm-project/mlir/docs/Rationale/MLIRForGraphAlgorithms.md (revision 2310ced8745b28a79a4ff7f08d461605e52d153d)
155de49acSRiver Riddle# MLIR: Incremental Application to Graph Algorithms in ML Frameworks
255de49acSRiver Riddle
355de49acSRiver RiddleThe existing documentation about MLIR focuses on long term vision, how its
455de49acSRiver Riddlepieces fit together, and the benefits of modular and composable infrastructure
555de49acSRiver Riddlein the vast and distant future. While this viewpoint appeals to some, it causes
655de49acSRiver Riddleconcern for others who are more concerned about the "here and now" - why does it
755de49acSRiver Riddlemake sense to make a "revolutionary" change when any individual problem can be
855de49acSRiver Riddlefixed in place?
955de49acSRiver Riddle
1055de49acSRiver RiddleThis document explains that adoption of MLIR to solve graph based problems
11a54f4eaeSMogball*isn't* a revolutionary change: it is an incremental series of steps which build
1255de49acSRiver Riddleon each other, each of which delivers local value. This document also addresses
1355de49acSRiver Riddlesome points of confusion that keep coming up.
1455de49acSRiver Riddle
1555de49acSRiver RiddleOne note: even though a major advantage of MLIR is that it can span the full
1655de49acSRiver Riddlespectrum from graph algorithms down to low-level code generation, this document
1755de49acSRiver Riddlefocuses on the use of MLIR for **graph-level algorithms**. MLIR will also unlock
1855de49acSRiver Riddleexciting code generation opportunities (particularly given its novel approach to
1955de49acSRiver Riddleintegrating state of the art polyhedral techniques), but issues that touch on
2055de49acSRiver RiddleMLIR's relationship to XLA, Eigen, etc, are out of scope for this particular
2155de49acSRiver Riddledoc.
2255de49acSRiver Riddle
2355de49acSRiver RiddleThis document uses TensorFlow as the example given that it is the focus of our
2455de49acSRiver Riddleimmediate work, but we believe that the same viewpoint could be useful for
2555de49acSRiver Riddlepeople working in the context of other ML frameworks that may consider adopting
2655de49acSRiver RiddleMLIR in the future.
2755de49acSRiver Riddle
2855de49acSRiver Riddle### How is MLIR relevant?
2955de49acSRiver Riddle
3055de49acSRiver RiddleMLIR is an overloaded acronym which unpacks as "Multi-Level Intermediate
3155de49acSRiver RiddleRepresentation". Its high-level purpose is to provide mechanics for describing
3255de49acSRiver Riddleand transforming programs and computations in a flexible way. It provides common
3355de49acSRiver Riddlecompiler infrastructure for things like constant folding, dead code elimination,
3455de49acSRiver Riddlegraph rewriting, and others - which are independent of the representational
3555de49acSRiver Riddlechoices picked by a given dialect (e.g. its concurrency semantics). It was built
3655de49acSRiver Riddlewith a specific focus on compile time and memory efficiency, accurate
3755de49acSRiver Riddlepropagation of source location information (important for reporting high quality
3855de49acSRiver Riddleerrors and warnings) and is designed for testability.
3955de49acSRiver Riddle
4055de49acSRiver RiddleTensorFlow has numerous subsystems (some of which are proprietary, e.g.
4155de49acSRiver RiddleTensor-RT, nGraph, CoreML, etc) as well as translation layers between these
4255de49acSRiver Riddledifferent subsystems, and these translation layers face similar challenges. ((As
4355de49acSRiver Riddlean aside, the internals of each of these subsystems could often benefit from
4455de49acSRiver RiddleMLIR infrastructure, but that isn't a focus of this doc.))
4555de49acSRiver Riddle
4655de49acSRiver RiddleA key observation that MLIR makes is that these subsystems often have two things
4755de49acSRiver Riddlegoing on: they are both particular data structures and encodings (e.g. HLO
4855de49acSRiver Riddlegraphs, TF-Lite's flat buffer format, TensorFlow's Graph format, the ONNX
4955de49acSRiver Riddleabstraction, etc) as well as an abstraction of computation (a specific way of
5055de49acSRiver Riddlemodeling a convolution, a set of supported operations etc).
5155de49acSRiver Riddle
5255de49acSRiver RiddleMLIR uses a standard IR (i.e., a set of data structures) for representing these
5355de49acSRiver Riddlecomputations - this allows a huge amount of shared infrastructure across these
5455de49acSRiver Riddleproblem domains. MLIR then allows the definition of domain-specific "dialects"
5555de49acSRiver Riddlethat describe the set of operations that are legal and supported for a given
5655de49acSRiver Riddleapplication. This means that the actual translations between data structures are
5755de49acSRiver Riddlekept as simple as possible - and are thus relatively easy to make "correct".
5855de49acSRiver RiddleThis allows the common compiler infrastructure to handle the mapping problems
5955de49acSRiver Riddleand the other issues within the domain.
6055de49acSRiver Riddle
6155de49acSRiver RiddleMLIR's design is directly informed by the experience of building (and then
6255de49acSRiver Riddleliving with) intermediate representations like the LLVM IR, LLVM SelectionDAG,
6355de49acSRiver Riddlethe LLVM machine instruction representation, Swift SIL IR, and learns new
6455de49acSRiver Riddlelessons from TensorFlow and XLA HLO, as well as learning from building countless
6555de49acSRiver Riddleresearch and production systems on top of them. Our goal is to drag the state of
6655de49acSRiver Riddlethe art in compilers forward, not to merely apply a few well-known techniques to
6755de49acSRiver Riddlethe machine learning domain.
6855de49acSRiver Riddle
6955de49acSRiver Riddle### What does adoption mean?
7055de49acSRiver Riddle
7155de49acSRiver RiddleThe point of this document is not to advocate for rewriting any particular
7255de49acSRiver Riddlesubsystem in TensorFlow - indeed, the burden required to justify a rewrite is
7355de49acSRiver Riddlehigh, and often very specific to that subsystem. That said, there are several
7455de49acSRiver Riddlesubsystems that are about to get rewritten or substantially revised anyway, so
7555de49acSRiver Riddlewe use those as examples to concretely describe the benefits that MLIR provides
7655de49acSRiver Riddlein these cases and what it will take. The subsystems discussed are:
7755de49acSRiver Riddle
7855de49acSRiver Riddle1.  the TF Lite TOCO translator, which we need to improve error
7955de49acSRiver Riddle    reporting/reliability issues and generalize it to support more ops, and
8055de49acSRiver Riddle1.  the TF/XLA bridge which needs to improve usability by merging some of its
8155de49acSRiver Riddle    usage models, support dynamic shapes and generalize guest subsystem support
8255de49acSRiver Riddle    to Tensor-RT and nGraph.
8355de49acSRiver Riddle1.  Grappler is another subsystem that is likely to get substantial revisions in
8455de49acSRiver Riddle    the future, and would definitely benefit from the MLIR framework, but there
8555de49acSRiver Riddle    are no known plans to do that work at this point, so we don't discuss it
8655de49acSRiver Riddle    further.
8755de49acSRiver Riddle
8855de49acSRiver RiddleAdopting MLIR for these works the same way - and, in fact, the work to support
8955de49acSRiver RiddleTF Lite is mostly a subset of the larger work to support the functionality of
9055de49acSRiver Riddlethe TF/XLA bridge. TF Lite and the TF/XLA bridge include several compiler passes
9155de49acSRiver Riddle(things like encapsulate, functionalize control flow, lowering of ops, fusion,
9255de49acSRiver Riddleconstant folding, shape inference, etc).
9355de49acSRiver Riddle
9455de49acSRiver RiddleMLIR supports converting from TensorFlow Graphs to MLIR and back, which means
9555de49acSRiver Riddlethat we can start by putting in a no-op translation to MLIR and back into the
9655de49acSRiver Riddlepipeline, and verify that nothing breaks. Then we can work on replacing the
9755de49acSRiver Riddlecompiler transformations one by one by reimplementing them (with the improved
9855de49acSRiver Riddlealgorithms that we're planning).
9955de49acSRiver Riddle
10055de49acSRiver RiddleThis is a development plan, we wouldn't actually ship a TensorFlow that just
10155de49acSRiver Riddleuses MLIR for a single pass. In practice, we'll have the MLIR flag gated under
10255de49acSRiver Riddlean option, build out a replacement for an entire subsystem (e.g. the TOCO
10355de49acSRiver Riddletranslator) and when the time is right, we'll do A/B comparisons and eventually
10455de49acSRiver Riddlemake a switch and phase out the old code over time.
10555de49acSRiver Riddle
10655de49acSRiver Riddle## What benefit does MLIR provide?
10755de49acSRiver Riddle
10855de49acSRiver RiddleThe adoption plan above might sound like it only makes things worse in the
10955de49acSRiver Riddleimmediate term - we have two implementations of the same functionality, we are
11055de49acSRiver Riddledividing our efforts, etc. In order for this to be worth it, we should have a
11155de49acSRiver Riddlegood sense that we are building towards an improved future that will make
11255de49acSRiver Riddlecustomers and TensorFlow engineers happier when it lands. Here we describe a few
11355de49acSRiver Riddleof the benefits that MLIR provides, in no particular order:
11455de49acSRiver Riddle
11555de49acSRiver Riddle### A Lossless Human Editable Textual Representation
11655de49acSRiver Riddle
11755de49acSRiver RiddleThe MLIR in-memory data structure has a human readable and writable format, as
1185e2a302eSMarkus Böckwell as [a specification](../LangRef.md) for that format - built just like any
11955de49acSRiver Riddleother programming language. Important properties of this format are that it is
12055de49acSRiver Riddlecompact, easy to read, and lossless. You can dump an MLIR program out to disk
12155de49acSRiver Riddleand munge around with it, then send it through a few more passes.
12255de49acSRiver Riddle
12355de49acSRiver RiddleIf you haven't worked with a system that works this way, it is hard to overstate
12455de49acSRiver Riddlehow big of a deal this in practice: it means that you can call `foo->dump()` on
12555de49acSRiver Riddlean IR object to see its full contents, it means you can diff the IR before and
12655de49acSRiver Riddleafter a change, delta reduce IR files, and many other things.
12755de49acSRiver Riddle
12855de49acSRiver Riddle### A Graph Verification Pass
12955de49acSRiver Riddle
13055de49acSRiver RiddleLike many other popular compiler infrastructures, MLIR provides infrastructure
13155de49acSRiver Riddleand implementation for a "verifier" which checks that the IR is well formed. The
13255de49acSRiver RiddleMLIR verifier is a simple framework that makes it easy to provide a single
13355de49acSRiver Riddlesource of truth for those correctness properties and is general across all
13455de49acSRiver RiddleDialects (e.g. TF Graph, TF Lite flat buffer, XLA HLO, etc).
13555de49acSRiver Riddle
13655de49acSRiver RiddleA verifier pass is sort of like a 'super assertion' that catches mistakes in
13755de49acSRiver Riddleprogram transformations early, making you as an engineer more productive, making
13855de49acSRiver Riddlethe product more reliable, and making it easier to track down bugs when they
13955de49acSRiver Riddleappear - because the verifier can be run at any time, either as a compiler pass
14055de49acSRiver Riddleor with a single function call.
14155de49acSRiver Riddle
14255de49acSRiver RiddleWhile MLIR provides a well-considered infrastructure for IR verification, and
14355de49acSRiver Riddlehas simple checks for existing TensorFlow operations, there is a lot that should
14455de49acSRiver Riddlebe added here and lots of opportunity to get involved!
14555de49acSRiver Riddle
14655de49acSRiver Riddle### Designed for Testability
14755de49acSRiver Riddle
14855de49acSRiver RiddleThere are many aspects of this in MLIR, but we'll focus on compiler
14955de49acSRiver Riddletransformations since they are the easiest to understand. Compiler
15055de49acSRiver Riddletransformations are modeled as subclasses of the `Pass` C++ class, which are
15155de49acSRiver Riddledriven by an `mlir-opt` tool. When combined with a lossless textual
15255de49acSRiver Riddlerepresentation, it becomes really easy to write unit tests for compiler
15355de49acSRiver Riddletransformations, for example, this is a simple test that shows "x-x" is being
15455de49acSRiver Riddleturned into zero:
15555de49acSRiver Riddle
15655de49acSRiver Riddle```mlir
15755de49acSRiver Riddle  // RUN: mlir-opt %s -canonicalize | FileCheck %s
158*2310ced8SRiver Riddle  func.func @test_subi_zero_cfg(%arg0: i32) -> i32 {
159a54f4eaeSMogball    %y = arith.subi %arg0, %arg0 : i32
16055de49acSRiver Riddle    return %y: i32
16155de49acSRiver Riddle  }
16255de49acSRiver Riddle  // CHECK-LABEL: func @test_subi_zero_cfg(%arg0: i32)
163cb3aa49eSMogball  // CHECK-NEXT: %c0_i32 = arith.constant 0 : i32
16455de49acSRiver Riddle  // CHECK-NEXT: return %c0
16555de49acSRiver Riddle```
16655de49acSRiver Riddle
16755de49acSRiver RiddleThe "CHECK" comments are interpreted by the
16855de49acSRiver Riddle[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which
16955de49acSRiver Riddleis sort of like a really advanced grep. This test is fully self-contained: it
1705e2a302eSMarkus Böckfeeds the input into the [canonicalize pass](../Canonicalization.md), and checks
17155de49acSRiver Riddlethat the output matches the CHECK lines. See the `test/Transforms` directory for
17255de49acSRiver Riddlemore examples. In contrast, standard unit testing exposes the API of the
17355de49acSRiver Riddleunderlying framework to lots and lots of tests (making it harder to refactor and
17455de49acSRiver Riddlemove the API), typically requires a lot more code, and exacerbates issues with
17555de49acSRiver Riddlelink time. For examples, see
17655de49acSRiver Riddle[the TEST_F functions in TensorFlow's testsuite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/arithmetic_optimizer_test.cc).
17755de49acSRiver Riddle
17855de49acSRiver RiddleMLIR has been pervasively designed with this sort of design by testability,
17955de49acSRiver Riddleallowing us to put in place a culture that expects every behavior changing
18055de49acSRiver Riddlecommit to include a test case, and for these test cases to be stable and
18155de49acSRiver Riddlereliable over time, since they are testing exactly what they are supposed to.
18255de49acSRiver RiddleEnd to end integration tests are still super useful for some things of course!
18355de49acSRiver Riddle
18455de49acSRiver Riddle### Infrastructure for Warnings and Error Diagnostics and Location Tracking
18555de49acSRiver Riddle
18655de49acSRiver RiddleMLIR benefits from the lessons learned from building other compilers - including
18755de49acSRiver RiddleClang which
18855de49acSRiver Riddle[[set the standard](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)
18955de49acSRiver Riddlefor quality of implementation in C/C++ compiler diagnostics. Drawing from this
19055de49acSRiver Riddleexperience (and fixing mistakes in LLVM), MLIR requires that operations and
19155de49acSRiver Riddlefunctions carry abstract location information, that transformations propagate
19255de49acSRiver Riddlethis information, and provides standardized mechanisms to emit errors and
19355de49acSRiver Riddlewarnings, as well as for clients to hook into them to capture and report them in
19455de49acSRiver Riddlecustom ways.
19555de49acSRiver Riddle
19655de49acSRiver RiddleWhy is this important? In practice, many graph-to-graph translators can fail
19755de49acSRiver Riddle(e.g. TF Lite when an unsupported op is used) and it is important to be able to
19855de49acSRiver Riddlereport the error up through to the user in the most precise way possible, in
19955de49acSRiver Riddleorder for it to be actionable. This includes tracking rewrites through fusions
20055de49acSRiver Riddleand fissions of ops, mapping back into language / API specific domains, etc.
20155de49acSRiver Riddle
20255de49acSRiver RiddleMore selfishly for infrastructure hackers, this is a huge boon because it means
20355de49acSRiver Riddlethat it is easy to write good tests for this: the testing tools for MLIR capture
20455de49acSRiver Riddlethe diagnostics produced by passes (using the standard diagnostic hooks) and
20555de49acSRiver Riddlecheck that they match the expected diagnostics in the testcase. For example, to
20655de49acSRiver Riddletest the dependence analysis infra in the code generator, Andy Davis wrote a
20755de49acSRiver Riddlesimple pass that checks dependencies and emits them as "notes", allowing him to
20855de49acSRiver Riddlewrite tests like this:
20955de49acSRiver Riddle
21055de49acSRiver Riddle```mlir
21155de49acSRiver Riddle  // RUN: mlir-opt %s -memref-dependence-check -verify-diagnostics
212*2310ced8SRiver Riddle  func.func @different_memrefs() {
213a54f4eaeSMogball    %m.a = memref.alloc() : memref<100xf32>
214a54f4eaeSMogball    %m.b = memref.alloc() : memref<100xf32>
215a54f4eaeSMogball    %c0 = arith.constant 0 : index
216a54f4eaeSMogball    %c1 = arith.constant 1.0 : f32
217a54f4eaeSMogball    memref.store %c1, %m.a[%c0] : memref<100xf32>
21855de49acSRiver Riddle    // expected-note@-1 {{dependence from memref access 0 to access 1 = false}}
219a54f4eaeSMogball    %v0 = memref.load %m.b[%c0] : memref<100xf32>
22055de49acSRiver Riddle    return
22155de49acSRiver Riddle  }
22255de49acSRiver Riddle```
22355de49acSRiver Riddle
22455de49acSRiver RiddleNote that a major limitation of this is that MLIR suffers from a problem of
22555de49acSRiver Riddle"garbage in, garbage out": if the input locations to MLIR are imprecise, then
22655de49acSRiver Riddlethere is nothing that it can do to recover them. There is work underway in
22755de49acSRiver RiddleTensorFlow/Python to improve the situation, and Swift for TensorFlow already has
22855de49acSRiver Riddleperfect location tracking due to its design.
22955de49acSRiver Riddle
23055de49acSRiver Riddle### Shape Information Captured in the IR
23155de49acSRiver Riddle
23255de49acSRiver RiddleIn TensorFlow Graphs, each op takes and returns values using a very simple type
23355de49acSRiver Riddlesystem (TF_DataType) in which each value is a tensor of unknown rank and
23455de49acSRiver Riddledimensions. At the same time, many graphs have static shapes easily knowable for
23555de49acSRiver Riddlewide swaths of the computation, and even dynamically shaped operations often
23655de49acSRiver Riddlehave statically knowable dimensions. Many analyses and transformations benefit
23755de49acSRiver Riddleand use this information when available, but because TensorFlow graphs don't
23855de49acSRiver Riddlecapture this (e.g. serialize it to proto), passes have to recompute it on demand
23955de49acSRiver Riddlewith ShapeRefiner.
24055de49acSRiver Riddle
241a54f4eaeSMogballThe [MLIR Tensor Type](../Dialects/Builtin.md/#rankedtensortype) directly
242a54f4eaeSMogballcaptures shape information, so you can have things like:
24355de49acSRiver Riddle
24455de49acSRiver Riddle```mlir
24555de49acSRiver Riddle  %x = tf.Add %x, %y : tensor<128 x 8 x ? x f32>
24655de49acSRiver Riddle```
24755de49acSRiver Riddle
24855de49acSRiver RiddleCapturing this in the IR is expected to speed up transformations (avoiding
24955de49acSRiver Riddlerecomputing the same info over and over again) which therefore makes it
25055de49acSRiver Riddlepractical to apply stronger shape analysis algorithms. It also makes it easier
25155de49acSRiver Riddleto work with the IR, because on-the-side representations can get out of date,
25255de49acSRiver Riddleand the API is easier to work with from an ergonomics perspective.
25355de49acSRiver Riddle
25455de49acSRiver Riddle### Unified Graph Rewriting Infrastructure
25555de49acSRiver Riddle
25655de49acSRiver RiddleThis is still a work in progress, but we have sightlines towards a
257a54f4eaeSMogball[general rewriting infrastructure](RationaleGenericDAGRewriter.md) for
258a54f4eaeSMogballtransforming DAG tiles into other DAG tiles, using a declarative pattern format.
259a54f4eaeSMogballDAG to DAG rewriting is a generalized solution for many common compiler
260a54f4eaeSMogballoptimizations, lowerings, and other rewrites and having an IR enables us to
261a54f4eaeSMogballinvest in building a single high-quality implementation.
26255de49acSRiver Riddle
26355de49acSRiver RiddleDeclarative pattern rules are preferable to imperative C++ code for a number of
26455de49acSRiver Riddlereasons: they are more compact, easier to reason about, can have checkers
26555de49acSRiver Riddlewritten against them, and new tools can be built that inspect and manipulate the
26655de49acSRiver Riddledeclarative patterns in interesting ways - e.g. applying theorem provers to
26755de49acSRiver Riddlethem. It will be exciting to see this ecosystem develop as the infrastructure
26855de49acSRiver Riddlematures.
26955de49acSRiver Riddle
27055de49acSRiver Riddle### Clarified Semantics for TensorFlow Operations
27155de49acSRiver Riddle
27255de49acSRiver RiddleOne of the challenging things about working with TensorFlow is that there are
27355de49acSRiver Riddlemany invariants and behaviors that need to be preserved and known about when
27455de49acSRiver Riddleworking with Graphs, and these can be difficult to reason about and lead to
27555de49acSRiver Riddlebugs. Things like 'dead values', Switch and Merge nodes, concurrency semantics,
27655de49acSRiver Riddlenodes that execute even when passed a dead value, multiple device program
27755de49acSRiver Riddlerepresentation - etc... all add complexities that can make it challenging to
27855de49acSRiver Riddlereason about whether a transformation or analysis is correct in general. Even
27955de49acSRiver Riddlesomething as simple as constant folding or transforming integer `x-x` into `0`
28055de49acSRiver Riddleis non-trivial because you need to consider control dependence edges.
28155de49acSRiver Riddle
28255de49acSRiver RiddleOne of our major goals for the TensorFlow dialect of MLIR is to sort out these
28355de49acSRiver Riddlesituations and upgrade existing TensorFlow graphs to semantics that are easier
28455de49acSRiver Riddleto reason about. The solutions to these problems are all still being debated,
28555de49acSRiver Riddlebut those discussions have already yielded a lot of potential answers:
28655de49acSRiver Riddleintroducing a `tf_dead_or<x>` types for switch/merge, modeling of TF operations
28755de49acSRiver Riddleusing futures/async semantics etc. None of these particular battles are critical
28855de49acSRiver Riddleor important for MLIR to succeed (because of its "meta" nature, the abstraction
28955de49acSRiver Riddledecisions of any given dialect are up for it to decide), but each one that works
29055de49acSRiver Riddleout will make it easier to work with and transform TensorFlow operations. We
29155de49acSRiver Riddleexpect these issues to get nailed down in the next couple of months when MLIR
29255de49acSRiver Riddleeffort moves beyond TF Lite / TOCO support. The discussions that are happening
29355de49acSRiver Riddlenow are super valuable and making progress.
29455de49acSRiver Riddle
29555de49acSRiver Riddle### Ergonomics
29655de49acSRiver Riddle
29755de49acSRiver RiddleA minor-in-theory, but important-in-practice point is that MLIR is designed to
29855de49acSRiver Riddlemake it easy, memory efficient, and less error prone to transform code than
29955de49acSRiver Riddleother systems. `TensorFlow::Graph` has implementation issues where the same
30055de49acSRiver Riddleinformation is stored redundantly in different places (which must be manually
30155de49acSRiver Riddlekept up to date), has somewhat unusual representation of certain constructs
30255de49acSRiver Riddle(e.g. the function library, which makes it very difficult to add or remove
30355de49acSRiver Riddlefunctions, e.g. during interprocedural transformations), and stores information
30455de49acSRiver Riddlein the graph that is used by the executor, but isn't necessary for program
30555de49acSRiver Riddletransformation.
30655de49acSRiver Riddle
30755de49acSRiver RiddleTensorFlow has made a lot of progress in this area over the years, and there are
30855de49acSRiver Riddlelots of ideas about further improvements in the future, we are happy that MLIR
30955de49acSRiver Riddleaddresses these needs (making it much easier to implement correct program
31055de49acSRiver Riddletransformations) today, and are committed to pushing hard to make it better.
31155de49acSRiver Riddle
31255de49acSRiver Riddle### Compile Time Performance and Memory Use
31355de49acSRiver Riddle
31455de49acSRiver RiddleMLIR has been designed to be memory and compile-time efficient in its algorithms
31555de49acSRiver Riddleand data structures, using immutable and uniqued structures, low level
31655de49acSRiver Riddlebit-packing, and other well-known techniques to avoid unnecessary heap
31755de49acSRiver Riddleallocations, and allow simple and safe multithreaded optimization of MLIR
31855de49acSRiver Riddleprograms. There are other reasons to believe that the MLIR implementations of
31955de49acSRiver Riddlecommon transformations will be more efficient than the Python and C++
32055de49acSRiver RiddleTensorFlow::Graph implementations of the same things, given the current
32155de49acSRiver Riddleimplementation details of TensorFlow.
32255de49acSRiver Riddle
32355de49acSRiver RiddleThat said, this is very much a theory at this point. When the new implementation
32455de49acSRiver Riddleof various subsystems are available, we will see what happens in practice: there
32555de49acSRiver Riddlewill be no reason to speculate - we can measure.
32655de49acSRiver Riddle
32755de49acSRiver Riddle## Common Questions and Concerns
32855de49acSRiver Riddle
32955de49acSRiver RiddleHere we address some frequently asked questions and concerns.
33055de49acSRiver Riddle
33155de49acSRiver Riddle### Isn't MLIR a big dependency to take on?
33255de49acSRiver Riddle
33355de49acSRiver RiddleWe've heard that at least some people are concerned that MLIR is a "big"
33455de49acSRiver Riddledependency to take on, and could result in large code size. Here are some key
33555de49acSRiver Riddlepoints MLIR:
33655de49acSRiver Riddle
33755de49acSRiver Riddle1.  The entire MLIR codebase is a pretty small C++ code base in absolute terms
33855de49acSRiver Riddle    compared to what goes into a modern ML framework.
33955de49acSRiver Riddle1.  Like LLVM, MLIR is designed as a set of libraries that clients can link in
34055de49acSRiver Riddle    or ignore as they wish. For example, the transformations in MLIR kept
34155de49acSRiver Riddle    separate from the core IR abstractions, and dialect specific code (e.g.
34255de49acSRiver Riddle    TensorFlow, TF-Lite, XLA, etc) is all independently selectable by the build
34355de49acSRiver Riddle    system. Clients that don't care about XLA don't link in that code, whether
34455de49acSRiver Riddle    they are a TF-Lite system or a client that is completely unrelated to
34555de49acSRiver Riddle    TensorFlow.
34655de49acSRiver Riddle1.  MLIR's only third party dependency is on LLVM, but it doesn't depend on LLVM
34755de49acSRiver Riddle    IR or any other heavy dependency - it just depends on LLVM's support library
34855de49acSRiver Riddle    which provides efficient hash tables and other
34955de49acSRiver Riddle    [memory efficient data structures that the STL does not](http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task).
35055de49acSRiver Riddle    There have been discussions about splitting this set of libraries out to its
35155de49acSRiver Riddle    own subproject in LLVM that the LLVM IR project depends on. This would be
35255de49acSRiver Riddle    great for MLIR as well as other LLVM subprojects.
35355de49acSRiver Riddle1.  TensorFlow and many other frameworks already use LLVM - if so, MLIR would
35455de49acSRiver Riddle    not be pulling in an additional dependency at all.
35555de49acSRiver Riddle
35655de49acSRiver Riddle### How does MLIR represent {control flow, concurrency, …} semantics in TensorFlow?
35755de49acSRiver Riddle
35855de49acSRiver RiddleMLIR provides a dialect that is an isomorphic 1-1 mapping between TensorFlow
35955de49acSRiver Riddlegraphs and MLIR, as well as a pretty complete translator back and forth (the
36055de49acSRiver Riddleonly known gap is that a few TF_DataType enums aren't handled yet). MLIR is a
36155de49acSRiver Riddle"Multi-Level IR", which allows it to represent code with different abstraction
36255de49acSRiver Riddlelevels, so the ability to faithfully represent TensorFlow code in a completely
36355de49acSRiver Riddlebackwards compatible way (even if there are some historical warts!) is critical.
36455de49acSRiver Riddle
36555de49acSRiver RiddleIn *addition* to the isomorphic mapping, we are actively working on efforts to
36655de49acSRiver Riddleraise the abstraction level for working with TensorFlow graphs in MLIR. Doing so
36755de49acSRiver Riddlewould make it even easier to write TensorFlow transformations than it is today,
36855de49acSRiver Riddleand would provide a path to migrating TF 1.x graphs forward into the TF 2.x
36955de49acSRiver Riddleworld. For example, because MLIR has an extensible type system, we can directly
37055de49acSRiver Riddlemodel whether it is impossible for a Tensor value to be a "dead" value - similar
37155de49acSRiver Riddleto the use of optional types in modern programming languages.
37255de49acSRiver Riddle
37355de49acSRiver RiddleThese discussions occasionally cause confusion because there are several issues
37455de49acSRiver Riddlebeing mixed up into one:
37555de49acSRiver Riddle
37655de49acSRiver Riddle*   What are the current semantics of TensorFlow graphs, and what invariants can
37755de49acSRiver Riddle    we rely on?
37855de49acSRiver Riddle*   What should the semantics be in TensorFlow 2.0?
37955de49acSRiver Riddle*   What do programs rely on in practice, and if it is unfriendly, can we
38055de49acSRiver Riddle    migrate it?
38155de49acSRiver Riddle*   Can we find a way to make it so transforms don't have to worry about the
38255de49acSRiver Riddle    complexities of Switch/Merge, by using higher level control flow
38355de49acSRiver Riddle    representations? (tentative answer: yes)
38455de49acSRiver Riddle*   How should MLIR represent async vs sync operations, what invariants are
38555de49acSRiver Riddle    provided, how does this dovetail with control flow?
38655de49acSRiver Riddle*   When is it safe and beneficial to perform optimizations that might reduce
38755de49acSRiver Riddle    parallelism?
38855de49acSRiver Riddle
38955de49acSRiver RiddleAll of these questions have a "conservative/safe fallback": we can continue
39055de49acSRiver Riddleproviding exactly the same abstractions that TensorFlow always has. That said,
39155de49acSRiver Riddlewe are trying hard to level-up the representation (taking advantage of the
39255de49acSRiver Riddle"Multi-Level" part of MLIR) because doing so will make it much much easier to
39355de49acSRiver Riddlewrite analyses and transformations than it currently is in TensorFlow.
39455de49acSRiver Riddle
39555de49acSRiver Riddle### Non Goals
39655de49acSRiver Riddle
39755de49acSRiver RiddleIt is important to point out things that MLIR does not aim to do. For example,
39855de49acSRiver Riddlethere is no runtime component to MLIR: the TensorFlow executor, the TF Lite
39955de49acSRiver RiddleFlatBuffer interpreter, or other existing runtime should be used as-is.
40055de49acSRiver Riddle
40155de49acSRiver RiddleAnother non-goal is that MLIR currently doesn't support a stable binary
40255de49acSRiver Riddleencoding. We will certainly add this at some point, but existing formats should
40355de49acSRiver Riddlebe used for serialization and distribution in the meantime.
404