155de49acSRiver Riddle# MLIR: Incremental Application to Graph Algorithms in ML Frameworks 255de49acSRiver Riddle 355de49acSRiver RiddleThe existing documentation about MLIR focuses on long term vision, how its 455de49acSRiver Riddlepieces fit together, and the benefits of modular and composable infrastructure 555de49acSRiver Riddlein the vast and distant future. While this viewpoint appeals to some, it causes 655de49acSRiver Riddleconcern for others who are more concerned about the "here and now" - why does it 755de49acSRiver Riddlemake sense to make a "revolutionary" change when any individual problem can be 855de49acSRiver Riddlefixed in place? 955de49acSRiver Riddle 1055de49acSRiver RiddleThis document explains that adoption of MLIR to solve graph based problems 11a54f4eaeSMogball*isn't* a revolutionary change: it is an incremental series of steps which build 1255de49acSRiver Riddleon each other, each of which delivers local value. This document also addresses 1355de49acSRiver Riddlesome points of confusion that keep coming up. 1455de49acSRiver Riddle 1555de49acSRiver RiddleOne note: even though a major advantage of MLIR is that it can span the full 1655de49acSRiver Riddlespectrum from graph algorithms down to low-level code generation, this document 1755de49acSRiver Riddlefocuses on the use of MLIR for **graph-level algorithms**. MLIR will also unlock 1855de49acSRiver Riddleexciting code generation opportunities (particularly given its novel approach to 1955de49acSRiver Riddleintegrating state of the art polyhedral techniques), but issues that touch on 2055de49acSRiver RiddleMLIR's relationship to XLA, Eigen, etc, are out of scope for this particular 2155de49acSRiver Riddledoc. 2255de49acSRiver Riddle 2355de49acSRiver RiddleThis document uses TensorFlow as the example given that it is the focus of our 2455de49acSRiver Riddleimmediate work, but we believe that the same viewpoint could be useful for 2555de49acSRiver Riddlepeople working in the context of other ML frameworks that may consider adopting 2655de49acSRiver RiddleMLIR in the future. 2755de49acSRiver Riddle 2855de49acSRiver Riddle### How is MLIR relevant? 2955de49acSRiver Riddle 3055de49acSRiver RiddleMLIR is an overloaded acronym which unpacks as "Multi-Level Intermediate 3155de49acSRiver RiddleRepresentation". Its high-level purpose is to provide mechanics for describing 3255de49acSRiver Riddleand transforming programs and computations in a flexible way. It provides common 3355de49acSRiver Riddlecompiler infrastructure for things like constant folding, dead code elimination, 3455de49acSRiver Riddlegraph rewriting, and others - which are independent of the representational 3555de49acSRiver Riddlechoices picked by a given dialect (e.g. its concurrency semantics). It was built 3655de49acSRiver Riddlewith a specific focus on compile time and memory efficiency, accurate 3755de49acSRiver Riddlepropagation of source location information (important for reporting high quality 3855de49acSRiver Riddleerrors and warnings) and is designed for testability. 3955de49acSRiver Riddle 4055de49acSRiver RiddleTensorFlow has numerous subsystems (some of which are proprietary, e.g. 4155de49acSRiver RiddleTensor-RT, nGraph, CoreML, etc) as well as translation layers between these 4255de49acSRiver Riddledifferent subsystems, and these translation layers face similar challenges. ((As 4355de49acSRiver Riddlean aside, the internals of each of these subsystems could often benefit from 4455de49acSRiver RiddleMLIR infrastructure, but that isn't a focus of this doc.)) 4555de49acSRiver Riddle 4655de49acSRiver RiddleA key observation that MLIR makes is that these subsystems often have two things 4755de49acSRiver Riddlegoing on: they are both particular data structures and encodings (e.g. HLO 4855de49acSRiver Riddlegraphs, TF-Lite's flat buffer format, TensorFlow's Graph format, the ONNX 4955de49acSRiver Riddleabstraction, etc) as well as an abstraction of computation (a specific way of 5055de49acSRiver Riddlemodeling a convolution, a set of supported operations etc). 5155de49acSRiver Riddle 5255de49acSRiver RiddleMLIR uses a standard IR (i.e., a set of data structures) for representing these 5355de49acSRiver Riddlecomputations - this allows a huge amount of shared infrastructure across these 5455de49acSRiver Riddleproblem domains. MLIR then allows the definition of domain-specific "dialects" 5555de49acSRiver Riddlethat describe the set of operations that are legal and supported for a given 5655de49acSRiver Riddleapplication. This means that the actual translations between data structures are 5755de49acSRiver Riddlekept as simple as possible - and are thus relatively easy to make "correct". 5855de49acSRiver RiddleThis allows the common compiler infrastructure to handle the mapping problems 5955de49acSRiver Riddleand the other issues within the domain. 6055de49acSRiver Riddle 6155de49acSRiver RiddleMLIR's design is directly informed by the experience of building (and then 6255de49acSRiver Riddleliving with) intermediate representations like the LLVM IR, LLVM SelectionDAG, 6355de49acSRiver Riddlethe LLVM machine instruction representation, Swift SIL IR, and learns new 6455de49acSRiver Riddlelessons from TensorFlow and XLA HLO, as well as learning from building countless 6555de49acSRiver Riddleresearch and production systems on top of them. Our goal is to drag the state of 6655de49acSRiver Riddlethe art in compilers forward, not to merely apply a few well-known techniques to 6755de49acSRiver Riddlethe machine learning domain. 6855de49acSRiver Riddle 6955de49acSRiver Riddle### What does adoption mean? 7055de49acSRiver Riddle 7155de49acSRiver RiddleThe point of this document is not to advocate for rewriting any particular 7255de49acSRiver Riddlesubsystem in TensorFlow - indeed, the burden required to justify a rewrite is 7355de49acSRiver Riddlehigh, and often very specific to that subsystem. That said, there are several 7455de49acSRiver Riddlesubsystems that are about to get rewritten or substantially revised anyway, so 7555de49acSRiver Riddlewe use those as examples to concretely describe the benefits that MLIR provides 7655de49acSRiver Riddlein these cases and what it will take. The subsystems discussed are: 7755de49acSRiver Riddle 7855de49acSRiver Riddle1. the TF Lite TOCO translator, which we need to improve error 7955de49acSRiver Riddle reporting/reliability issues and generalize it to support more ops, and 8055de49acSRiver Riddle1. the TF/XLA bridge which needs to improve usability by merging some of its 8155de49acSRiver Riddle usage models, support dynamic shapes and generalize guest subsystem support 8255de49acSRiver Riddle to Tensor-RT and nGraph. 8355de49acSRiver Riddle1. Grappler is another subsystem that is likely to get substantial revisions in 8455de49acSRiver Riddle the future, and would definitely benefit from the MLIR framework, but there 8555de49acSRiver Riddle are no known plans to do that work at this point, so we don't discuss it 8655de49acSRiver Riddle further. 8755de49acSRiver Riddle 8855de49acSRiver RiddleAdopting MLIR for these works the same way - and, in fact, the work to support 8955de49acSRiver RiddleTF Lite is mostly a subset of the larger work to support the functionality of 9055de49acSRiver Riddlethe TF/XLA bridge. TF Lite and the TF/XLA bridge include several compiler passes 9155de49acSRiver Riddle(things like encapsulate, functionalize control flow, lowering of ops, fusion, 9255de49acSRiver Riddleconstant folding, shape inference, etc). 9355de49acSRiver Riddle 9455de49acSRiver RiddleMLIR supports converting from TensorFlow Graphs to MLIR and back, which means 9555de49acSRiver Riddlethat we can start by putting in a no-op translation to MLIR and back into the 9655de49acSRiver Riddlepipeline, and verify that nothing breaks. Then we can work on replacing the 9755de49acSRiver Riddlecompiler transformations one by one by reimplementing them (with the improved 9855de49acSRiver Riddlealgorithms that we're planning). 9955de49acSRiver Riddle 10055de49acSRiver RiddleThis is a development plan, we wouldn't actually ship a TensorFlow that just 10155de49acSRiver Riddleuses MLIR for a single pass. In practice, we'll have the MLIR flag gated under 10255de49acSRiver Riddlean option, build out a replacement for an entire subsystem (e.g. the TOCO 10355de49acSRiver Riddletranslator) and when the time is right, we'll do A/B comparisons and eventually 10455de49acSRiver Riddlemake a switch and phase out the old code over time. 10555de49acSRiver Riddle 10655de49acSRiver Riddle## What benefit does MLIR provide? 10755de49acSRiver Riddle 10855de49acSRiver RiddleThe adoption plan above might sound like it only makes things worse in the 10955de49acSRiver Riddleimmediate term - we have two implementations of the same functionality, we are 11055de49acSRiver Riddledividing our efforts, etc. In order for this to be worth it, we should have a 11155de49acSRiver Riddlegood sense that we are building towards an improved future that will make 11255de49acSRiver Riddlecustomers and TensorFlow engineers happier when it lands. Here we describe a few 11355de49acSRiver Riddleof the benefits that MLIR provides, in no particular order: 11455de49acSRiver Riddle 11555de49acSRiver Riddle### A Lossless Human Editable Textual Representation 11655de49acSRiver Riddle 11755de49acSRiver RiddleThe MLIR in-memory data structure has a human readable and writable format, as 1185e2a302eSMarkus Böckwell as [a specification](../LangRef.md) for that format - built just like any 11955de49acSRiver Riddleother programming language. Important properties of this format are that it is 12055de49acSRiver Riddlecompact, easy to read, and lossless. You can dump an MLIR program out to disk 12155de49acSRiver Riddleand munge around with it, then send it through a few more passes. 12255de49acSRiver Riddle 12355de49acSRiver RiddleIf you haven't worked with a system that works this way, it is hard to overstate 12455de49acSRiver Riddlehow big of a deal this in practice: it means that you can call `foo->dump()` on 12555de49acSRiver Riddlean IR object to see its full contents, it means you can diff the IR before and 12655de49acSRiver Riddleafter a change, delta reduce IR files, and many other things. 12755de49acSRiver Riddle 12855de49acSRiver Riddle### A Graph Verification Pass 12955de49acSRiver Riddle 13055de49acSRiver RiddleLike many other popular compiler infrastructures, MLIR provides infrastructure 13155de49acSRiver Riddleand implementation for a "verifier" which checks that the IR is well formed. The 13255de49acSRiver RiddleMLIR verifier is a simple framework that makes it easy to provide a single 13355de49acSRiver Riddlesource of truth for those correctness properties and is general across all 13455de49acSRiver RiddleDialects (e.g. TF Graph, TF Lite flat buffer, XLA HLO, etc). 13555de49acSRiver Riddle 13655de49acSRiver RiddleA verifier pass is sort of like a 'super assertion' that catches mistakes in 13755de49acSRiver Riddleprogram transformations early, making you as an engineer more productive, making 13855de49acSRiver Riddlethe product more reliable, and making it easier to track down bugs when they 13955de49acSRiver Riddleappear - because the verifier can be run at any time, either as a compiler pass 14055de49acSRiver Riddleor with a single function call. 14155de49acSRiver Riddle 14255de49acSRiver RiddleWhile MLIR provides a well-considered infrastructure for IR verification, and 14355de49acSRiver Riddlehas simple checks for existing TensorFlow operations, there is a lot that should 14455de49acSRiver Riddlebe added here and lots of opportunity to get involved! 14555de49acSRiver Riddle 14655de49acSRiver Riddle### Designed for Testability 14755de49acSRiver Riddle 14855de49acSRiver RiddleThere are many aspects of this in MLIR, but we'll focus on compiler 14955de49acSRiver Riddletransformations since they are the easiest to understand. Compiler 15055de49acSRiver Riddletransformations are modeled as subclasses of the `Pass` C++ class, which are 15155de49acSRiver Riddledriven by an `mlir-opt` tool. When combined with a lossless textual 15255de49acSRiver Riddlerepresentation, it becomes really easy to write unit tests for compiler 15355de49acSRiver Riddletransformations, for example, this is a simple test that shows "x-x" is being 15455de49acSRiver Riddleturned into zero: 15555de49acSRiver Riddle 15655de49acSRiver Riddle```mlir 15755de49acSRiver Riddle // RUN: mlir-opt %s -canonicalize | FileCheck %s 158*2310ced8SRiver Riddle func.func @test_subi_zero_cfg(%arg0: i32) -> i32 { 159a54f4eaeSMogball %y = arith.subi %arg0, %arg0 : i32 16055de49acSRiver Riddle return %y: i32 16155de49acSRiver Riddle } 16255de49acSRiver Riddle // CHECK-LABEL: func @test_subi_zero_cfg(%arg0: i32) 163cb3aa49eSMogball // CHECK-NEXT: %c0_i32 = arith.constant 0 : i32 16455de49acSRiver Riddle // CHECK-NEXT: return %c0 16555de49acSRiver Riddle``` 16655de49acSRiver Riddle 16755de49acSRiver RiddleThe "CHECK" comments are interpreted by the 16855de49acSRiver Riddle[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which 16955de49acSRiver Riddleis sort of like a really advanced grep. This test is fully self-contained: it 1705e2a302eSMarkus Böckfeeds the input into the [canonicalize pass](../Canonicalization.md), and checks 17155de49acSRiver Riddlethat the output matches the CHECK lines. See the `test/Transforms` directory for 17255de49acSRiver Riddlemore examples. In contrast, standard unit testing exposes the API of the 17355de49acSRiver Riddleunderlying framework to lots and lots of tests (making it harder to refactor and 17455de49acSRiver Riddlemove the API), typically requires a lot more code, and exacerbates issues with 17555de49acSRiver Riddlelink time. For examples, see 17655de49acSRiver Riddle[the TEST_F functions in TensorFlow's testsuite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/arithmetic_optimizer_test.cc). 17755de49acSRiver Riddle 17855de49acSRiver RiddleMLIR has been pervasively designed with this sort of design by testability, 17955de49acSRiver Riddleallowing us to put in place a culture that expects every behavior changing 18055de49acSRiver Riddlecommit to include a test case, and for these test cases to be stable and 18155de49acSRiver Riddlereliable over time, since they are testing exactly what they are supposed to. 18255de49acSRiver RiddleEnd to end integration tests are still super useful for some things of course! 18355de49acSRiver Riddle 18455de49acSRiver Riddle### Infrastructure for Warnings and Error Diagnostics and Location Tracking 18555de49acSRiver Riddle 18655de49acSRiver RiddleMLIR benefits from the lessons learned from building other compilers - including 18755de49acSRiver RiddleClang which 18855de49acSRiver Riddle[[set the standard](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html) 18955de49acSRiver Riddlefor quality of implementation in C/C++ compiler diagnostics. Drawing from this 19055de49acSRiver Riddleexperience (and fixing mistakes in LLVM), MLIR requires that operations and 19155de49acSRiver Riddlefunctions carry abstract location information, that transformations propagate 19255de49acSRiver Riddlethis information, and provides standardized mechanisms to emit errors and 19355de49acSRiver Riddlewarnings, as well as for clients to hook into them to capture and report them in 19455de49acSRiver Riddlecustom ways. 19555de49acSRiver Riddle 19655de49acSRiver RiddleWhy is this important? In practice, many graph-to-graph translators can fail 19755de49acSRiver Riddle(e.g. TF Lite when an unsupported op is used) and it is important to be able to 19855de49acSRiver Riddlereport the error up through to the user in the most precise way possible, in 19955de49acSRiver Riddleorder for it to be actionable. This includes tracking rewrites through fusions 20055de49acSRiver Riddleand fissions of ops, mapping back into language / API specific domains, etc. 20155de49acSRiver Riddle 20255de49acSRiver RiddleMore selfishly for infrastructure hackers, this is a huge boon because it means 20355de49acSRiver Riddlethat it is easy to write good tests for this: the testing tools for MLIR capture 20455de49acSRiver Riddlethe diagnostics produced by passes (using the standard diagnostic hooks) and 20555de49acSRiver Riddlecheck that they match the expected diagnostics in the testcase. For example, to 20655de49acSRiver Riddletest the dependence analysis infra in the code generator, Andy Davis wrote a 20755de49acSRiver Riddlesimple pass that checks dependencies and emits them as "notes", allowing him to 20855de49acSRiver Riddlewrite tests like this: 20955de49acSRiver Riddle 21055de49acSRiver Riddle```mlir 21155de49acSRiver Riddle // RUN: mlir-opt %s -memref-dependence-check -verify-diagnostics 212*2310ced8SRiver Riddle func.func @different_memrefs() { 213a54f4eaeSMogball %m.a = memref.alloc() : memref<100xf32> 214a54f4eaeSMogball %m.b = memref.alloc() : memref<100xf32> 215a54f4eaeSMogball %c0 = arith.constant 0 : index 216a54f4eaeSMogball %c1 = arith.constant 1.0 : f32 217a54f4eaeSMogball memref.store %c1, %m.a[%c0] : memref<100xf32> 21855de49acSRiver Riddle // expected-note@-1 {{dependence from memref access 0 to access 1 = false}} 219a54f4eaeSMogball %v0 = memref.load %m.b[%c0] : memref<100xf32> 22055de49acSRiver Riddle return 22155de49acSRiver Riddle } 22255de49acSRiver Riddle``` 22355de49acSRiver Riddle 22455de49acSRiver RiddleNote that a major limitation of this is that MLIR suffers from a problem of 22555de49acSRiver Riddle"garbage in, garbage out": if the input locations to MLIR are imprecise, then 22655de49acSRiver Riddlethere is nothing that it can do to recover them. There is work underway in 22755de49acSRiver RiddleTensorFlow/Python to improve the situation, and Swift for TensorFlow already has 22855de49acSRiver Riddleperfect location tracking due to its design. 22955de49acSRiver Riddle 23055de49acSRiver Riddle### Shape Information Captured in the IR 23155de49acSRiver Riddle 23255de49acSRiver RiddleIn TensorFlow Graphs, each op takes and returns values using a very simple type 23355de49acSRiver Riddlesystem (TF_DataType) in which each value is a tensor of unknown rank and 23455de49acSRiver Riddledimensions. At the same time, many graphs have static shapes easily knowable for 23555de49acSRiver Riddlewide swaths of the computation, and even dynamically shaped operations often 23655de49acSRiver Riddlehave statically knowable dimensions. Many analyses and transformations benefit 23755de49acSRiver Riddleand use this information when available, but because TensorFlow graphs don't 23855de49acSRiver Riddlecapture this (e.g. serialize it to proto), passes have to recompute it on demand 23955de49acSRiver Riddlewith ShapeRefiner. 24055de49acSRiver Riddle 241a54f4eaeSMogballThe [MLIR Tensor Type](../Dialects/Builtin.md/#rankedtensortype) directly 242a54f4eaeSMogballcaptures shape information, so you can have things like: 24355de49acSRiver Riddle 24455de49acSRiver Riddle```mlir 24555de49acSRiver Riddle %x = tf.Add %x, %y : tensor<128 x 8 x ? x f32> 24655de49acSRiver Riddle``` 24755de49acSRiver Riddle 24855de49acSRiver RiddleCapturing this in the IR is expected to speed up transformations (avoiding 24955de49acSRiver Riddlerecomputing the same info over and over again) which therefore makes it 25055de49acSRiver Riddlepractical to apply stronger shape analysis algorithms. It also makes it easier 25155de49acSRiver Riddleto work with the IR, because on-the-side representations can get out of date, 25255de49acSRiver Riddleand the API is easier to work with from an ergonomics perspective. 25355de49acSRiver Riddle 25455de49acSRiver Riddle### Unified Graph Rewriting Infrastructure 25555de49acSRiver Riddle 25655de49acSRiver RiddleThis is still a work in progress, but we have sightlines towards a 257a54f4eaeSMogball[general rewriting infrastructure](RationaleGenericDAGRewriter.md) for 258a54f4eaeSMogballtransforming DAG tiles into other DAG tiles, using a declarative pattern format. 259a54f4eaeSMogballDAG to DAG rewriting is a generalized solution for many common compiler 260a54f4eaeSMogballoptimizations, lowerings, and other rewrites and having an IR enables us to 261a54f4eaeSMogballinvest in building a single high-quality implementation. 26255de49acSRiver Riddle 26355de49acSRiver RiddleDeclarative pattern rules are preferable to imperative C++ code for a number of 26455de49acSRiver Riddlereasons: they are more compact, easier to reason about, can have checkers 26555de49acSRiver Riddlewritten against them, and new tools can be built that inspect and manipulate the 26655de49acSRiver Riddledeclarative patterns in interesting ways - e.g. applying theorem provers to 26755de49acSRiver Riddlethem. It will be exciting to see this ecosystem develop as the infrastructure 26855de49acSRiver Riddlematures. 26955de49acSRiver Riddle 27055de49acSRiver Riddle### Clarified Semantics for TensorFlow Operations 27155de49acSRiver Riddle 27255de49acSRiver RiddleOne of the challenging things about working with TensorFlow is that there are 27355de49acSRiver Riddlemany invariants and behaviors that need to be preserved and known about when 27455de49acSRiver Riddleworking with Graphs, and these can be difficult to reason about and lead to 27555de49acSRiver Riddlebugs. Things like 'dead values', Switch and Merge nodes, concurrency semantics, 27655de49acSRiver Riddlenodes that execute even when passed a dead value, multiple device program 27755de49acSRiver Riddlerepresentation - etc... all add complexities that can make it challenging to 27855de49acSRiver Riddlereason about whether a transformation or analysis is correct in general. Even 27955de49acSRiver Riddlesomething as simple as constant folding or transforming integer `x-x` into `0` 28055de49acSRiver Riddleis non-trivial because you need to consider control dependence edges. 28155de49acSRiver Riddle 28255de49acSRiver RiddleOne of our major goals for the TensorFlow dialect of MLIR is to sort out these 28355de49acSRiver Riddlesituations and upgrade existing TensorFlow graphs to semantics that are easier 28455de49acSRiver Riddleto reason about. The solutions to these problems are all still being debated, 28555de49acSRiver Riddlebut those discussions have already yielded a lot of potential answers: 28655de49acSRiver Riddleintroducing a `tf_dead_or<x>` types for switch/merge, modeling of TF operations 28755de49acSRiver Riddleusing futures/async semantics etc. None of these particular battles are critical 28855de49acSRiver Riddleor important for MLIR to succeed (because of its "meta" nature, the abstraction 28955de49acSRiver Riddledecisions of any given dialect are up for it to decide), but each one that works 29055de49acSRiver Riddleout will make it easier to work with and transform TensorFlow operations. We 29155de49acSRiver Riddleexpect these issues to get nailed down in the next couple of months when MLIR 29255de49acSRiver Riddleeffort moves beyond TF Lite / TOCO support. The discussions that are happening 29355de49acSRiver Riddlenow are super valuable and making progress. 29455de49acSRiver Riddle 29555de49acSRiver Riddle### Ergonomics 29655de49acSRiver Riddle 29755de49acSRiver RiddleA minor-in-theory, but important-in-practice point is that MLIR is designed to 29855de49acSRiver Riddlemake it easy, memory efficient, and less error prone to transform code than 29955de49acSRiver Riddleother systems. `TensorFlow::Graph` has implementation issues where the same 30055de49acSRiver Riddleinformation is stored redundantly in different places (which must be manually 30155de49acSRiver Riddlekept up to date), has somewhat unusual representation of certain constructs 30255de49acSRiver Riddle(e.g. the function library, which makes it very difficult to add or remove 30355de49acSRiver Riddlefunctions, e.g. during interprocedural transformations), and stores information 30455de49acSRiver Riddlein the graph that is used by the executor, but isn't necessary for program 30555de49acSRiver Riddletransformation. 30655de49acSRiver Riddle 30755de49acSRiver RiddleTensorFlow has made a lot of progress in this area over the years, and there are 30855de49acSRiver Riddlelots of ideas about further improvements in the future, we are happy that MLIR 30955de49acSRiver Riddleaddresses these needs (making it much easier to implement correct program 31055de49acSRiver Riddletransformations) today, and are committed to pushing hard to make it better. 31155de49acSRiver Riddle 31255de49acSRiver Riddle### Compile Time Performance and Memory Use 31355de49acSRiver Riddle 31455de49acSRiver RiddleMLIR has been designed to be memory and compile-time efficient in its algorithms 31555de49acSRiver Riddleand data structures, using immutable and uniqued structures, low level 31655de49acSRiver Riddlebit-packing, and other well-known techniques to avoid unnecessary heap 31755de49acSRiver Riddleallocations, and allow simple and safe multithreaded optimization of MLIR 31855de49acSRiver Riddleprograms. There are other reasons to believe that the MLIR implementations of 31955de49acSRiver Riddlecommon transformations will be more efficient than the Python and C++ 32055de49acSRiver RiddleTensorFlow::Graph implementations of the same things, given the current 32155de49acSRiver Riddleimplementation details of TensorFlow. 32255de49acSRiver Riddle 32355de49acSRiver RiddleThat said, this is very much a theory at this point. When the new implementation 32455de49acSRiver Riddleof various subsystems are available, we will see what happens in practice: there 32555de49acSRiver Riddlewill be no reason to speculate - we can measure. 32655de49acSRiver Riddle 32755de49acSRiver Riddle## Common Questions and Concerns 32855de49acSRiver Riddle 32955de49acSRiver RiddleHere we address some frequently asked questions and concerns. 33055de49acSRiver Riddle 33155de49acSRiver Riddle### Isn't MLIR a big dependency to take on? 33255de49acSRiver Riddle 33355de49acSRiver RiddleWe've heard that at least some people are concerned that MLIR is a "big" 33455de49acSRiver Riddledependency to take on, and could result in large code size. Here are some key 33555de49acSRiver Riddlepoints MLIR: 33655de49acSRiver Riddle 33755de49acSRiver Riddle1. The entire MLIR codebase is a pretty small C++ code base in absolute terms 33855de49acSRiver Riddle compared to what goes into a modern ML framework. 33955de49acSRiver Riddle1. Like LLVM, MLIR is designed as a set of libraries that clients can link in 34055de49acSRiver Riddle or ignore as they wish. For example, the transformations in MLIR kept 34155de49acSRiver Riddle separate from the core IR abstractions, and dialect specific code (e.g. 34255de49acSRiver Riddle TensorFlow, TF-Lite, XLA, etc) is all independently selectable by the build 34355de49acSRiver Riddle system. Clients that don't care about XLA don't link in that code, whether 34455de49acSRiver Riddle they are a TF-Lite system or a client that is completely unrelated to 34555de49acSRiver Riddle TensorFlow. 34655de49acSRiver Riddle1. MLIR's only third party dependency is on LLVM, but it doesn't depend on LLVM 34755de49acSRiver Riddle IR or any other heavy dependency - it just depends on LLVM's support library 34855de49acSRiver Riddle which provides efficient hash tables and other 34955de49acSRiver Riddle [memory efficient data structures that the STL does not](http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task). 35055de49acSRiver Riddle There have been discussions about splitting this set of libraries out to its 35155de49acSRiver Riddle own subproject in LLVM that the LLVM IR project depends on. This would be 35255de49acSRiver Riddle great for MLIR as well as other LLVM subprojects. 35355de49acSRiver Riddle1. TensorFlow and many other frameworks already use LLVM - if so, MLIR would 35455de49acSRiver Riddle not be pulling in an additional dependency at all. 35555de49acSRiver Riddle 35655de49acSRiver Riddle### How does MLIR represent {control flow, concurrency, …} semantics in TensorFlow? 35755de49acSRiver Riddle 35855de49acSRiver RiddleMLIR provides a dialect that is an isomorphic 1-1 mapping between TensorFlow 35955de49acSRiver Riddlegraphs and MLIR, as well as a pretty complete translator back and forth (the 36055de49acSRiver Riddleonly known gap is that a few TF_DataType enums aren't handled yet). MLIR is a 36155de49acSRiver Riddle"Multi-Level IR", which allows it to represent code with different abstraction 36255de49acSRiver Riddlelevels, so the ability to faithfully represent TensorFlow code in a completely 36355de49acSRiver Riddlebackwards compatible way (even if there are some historical warts!) is critical. 36455de49acSRiver Riddle 36555de49acSRiver RiddleIn *addition* to the isomorphic mapping, we are actively working on efforts to 36655de49acSRiver Riddleraise the abstraction level for working with TensorFlow graphs in MLIR. Doing so 36755de49acSRiver Riddlewould make it even easier to write TensorFlow transformations than it is today, 36855de49acSRiver Riddleand would provide a path to migrating TF 1.x graphs forward into the TF 2.x 36955de49acSRiver Riddleworld. For example, because MLIR has an extensible type system, we can directly 37055de49acSRiver Riddlemodel whether it is impossible for a Tensor value to be a "dead" value - similar 37155de49acSRiver Riddleto the use of optional types in modern programming languages. 37255de49acSRiver Riddle 37355de49acSRiver RiddleThese discussions occasionally cause confusion because there are several issues 37455de49acSRiver Riddlebeing mixed up into one: 37555de49acSRiver Riddle 37655de49acSRiver Riddle* What are the current semantics of TensorFlow graphs, and what invariants can 37755de49acSRiver Riddle we rely on? 37855de49acSRiver Riddle* What should the semantics be in TensorFlow 2.0? 37955de49acSRiver Riddle* What do programs rely on in practice, and if it is unfriendly, can we 38055de49acSRiver Riddle migrate it? 38155de49acSRiver Riddle* Can we find a way to make it so transforms don't have to worry about the 38255de49acSRiver Riddle complexities of Switch/Merge, by using higher level control flow 38355de49acSRiver Riddle representations? (tentative answer: yes) 38455de49acSRiver Riddle* How should MLIR represent async vs sync operations, what invariants are 38555de49acSRiver Riddle provided, how does this dovetail with control flow? 38655de49acSRiver Riddle* When is it safe and beneficial to perform optimizations that might reduce 38755de49acSRiver Riddle parallelism? 38855de49acSRiver Riddle 38955de49acSRiver RiddleAll of these questions have a "conservative/safe fallback": we can continue 39055de49acSRiver Riddleproviding exactly the same abstractions that TensorFlow always has. That said, 39155de49acSRiver Riddlewe are trying hard to level-up the representation (taking advantage of the 39255de49acSRiver Riddle"Multi-Level" part of MLIR) because doing so will make it much much easier to 39355de49acSRiver Riddlewrite analyses and transformations than it currently is in TensorFlow. 39455de49acSRiver Riddle 39555de49acSRiver Riddle### Non Goals 39655de49acSRiver Riddle 39755de49acSRiver RiddleIt is important to point out things that MLIR does not aim to do. For example, 39855de49acSRiver Riddlethere is no runtime component to MLIR: the TensorFlow executor, the TF Lite 39955de49acSRiver RiddleFlatBuffer interpreter, or other existing runtime should be used as-is. 40055de49acSRiver Riddle 40155de49acSRiver RiddleAnother non-goal is that MLIR currently doesn't support a stable binary 40255de49acSRiver Riddleencoding. We will certainly add this at some point, but existing formats should 40355de49acSRiver Riddlebe used for serialization and distribution in the meantime. 404