xref: /llvm-project/mlir/docs/Tutorials/Toy/Ch-2.md (revision 9d0fa42fbbffe3ff584b26f3a48f8f786f68da72)
1# Chapter 2: Emitting Basic MLIR
2
3[TOC]
4
5Now that we're familiar with our language and the AST, let's see how MLIR can
6help to compile Toy.
7
8## Introduction: Multi-Level Intermediate Representation
9
10Other compilers, like LLVM (see the
11[Kaleidoscope tutorial](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html)),
12offer a fixed set of predefined types and (usually *low-level* / RISC-like)
13instructions. It is up to the frontend for a given language to perform any
14language-specific type-checking, analysis, or transformation before emitting
15LLVM IR. For example, Clang will use its AST to perform not only static analysis
16but also transformations, such as C++ template instantiation through AST cloning
17and rewrite. Finally, languages with construction at a higher-level than C/C++
18may require non-trivial lowering from their AST to generate LLVM IR.
19
20As a consequence, multiple frontends end up reimplementing significant pieces of
21infrastructure to support the need for these analyses and transformation. MLIR
22addresses this issue by being designed for extensibility. As such, there are few
23pre-defined instructions (*operations* in MLIR terminology) or types.
24
25## Interfacing with MLIR
26
27[Language Reference](../../LangRef.md)
28
29MLIR is designed to be a completely extensible infrastructure; there is no
30closed set of attributes (think: constant metadata), operations, or types. MLIR
31supports this extensibility with the concept of
32[Dialects](../../LangRef.md/#dialects). Dialects provide a grouping mechanism for
33abstraction under a unique `namespace`.
34
35In MLIR, [`Operations`](../../LangRef.md/#operations) are the core unit of
36abstraction and computation, similar in many ways to LLVM instructions.
37Operations can have application-specific semantics and can be used to represent
38all of the core IR structures in LLVM: instructions, globals (like functions),
39modules, etc.
40
41Here is the MLIR assembly for the Toy `transpose` operations:
42
43```mlir
44%t_tensor = "toy.transpose"(%tensor) {inplace = true} : (tensor<2x3xf64>) -> tensor<3x2xf64> loc("example/file/path":12:1)
45```
46
47Let's break down the anatomy of this MLIR operation:
48
49-   `%t_tensor`
50
51    *   The name given to the result defined by this operation (which includes
52        [a prefixed sigil to avoid collisions](../../LangRef.md/#identifiers-and-keywords)).
53        An operation may define zero or more results (in the context of Toy, we
54        will limit ourselves to single-result operations), which are SSA values.
55        The name is used during parsing but is not persistent (e.g., it is not
56        tracked in the in-memory representation of the SSA value).
57
58-   `"toy.transpose"`
59
60    *   The name of the operation. It is expected to be a unique string, with
61        the namespace of the dialect prefixed before the "`.`". This can be read
62        as the `transpose` operation in the `toy` dialect.
63
64-   `(%tensor)`
65
66    *   A list of zero or more input operands (or arguments), which are SSA
67        values defined by other operations or referring to block arguments.
68
69-   `{ inplace = true }`
70
71    *   A dictionary of zero or more attributes, which are special operands that
72        are always constant. Here we define a boolean attribute named 'inplace'
73        that has a constant value of true.
74
75-   `(tensor<2x3xf64>) -> tensor<3x2xf64>`
76
77    *   This refers to the type of the operation in a functional form, spelling
78        the types of the arguments in parentheses and the type of the return
79        values afterward.
80
81-   `loc("example/file/path":12:1)`
82
83    *   This is the location in the source code from which this operation
84        originated.
85
86Shown here is the general form of an operation. As described above,
87the set of operations in MLIR is extensible. Operations are modeled
88using a small set of concepts, enabling operations to be reasoned
89about and manipulated generically. These concepts are:
90
91-   A name for the operation.
92-   A list of SSA operand values.
93-   A list of [attributes](../../LangRef.md/#attributes).
94-   A list of [types](../../LangRef.md/#type-system) for result values.
95-   A [source location](../../Diagnostics.md/#source-locations) for debugging
96    purposes.
97-   A list of successors [blocks](../../LangRef.md/#blocks) (for branches,
98    mostly).
99-   A list of [regions](../../LangRef.md/#regions) (for structural operations
100    like functions).
101
102In MLIR, every operation has a mandatory source location associated with it.
103Contrary to LLVM, where debug info locations are metadata and can be dropped, in
104MLIR, the location is a core requirement, and APIs depend on and manipulate it.
105Dropping a location is thus an explicit choice which cannot happen by mistake.
106
107To provide an illustration: If a transformation replaces an operation by
108another, that new operation must still have a location attached. This makes it
109possible to track where that operation came from.
110
111It's worth noting that the mlir-opt tool - a tool for testing
112compiler passes - does not include locations in the output by default. The
113`-mlir-print-debuginfo` flag specifies to include locations. (Run `mlir-opt
114--help` for more options.)
115
116### Opaque API
117
118MLIR is designed to allow all IR elements, such as attributes, operations, and
119types, to be customized. At the same time, IR elements can always be reduced to
120the above fundamental concepts. This allows MLIR to parse, represent, and
121[round-trip](../../../getting_started/Glossary.md/#round-trip) IR for *any*
122operation. For example, we could place our Toy operation from above into an
123`.mlir` file and round-trip through *mlir-opt* without registering any `toy`
124related dialect:
125
126```mlir
127func.func @toy_func(%tensor: tensor<2x3xf64>) -> tensor<3x2xf64> {
128  %t_tensor = "toy.transpose"(%tensor) { inplace = true } : (tensor<2x3xf64>) -> tensor<3x2xf64>
129  return %t_tensor : tensor<3x2xf64>
130}
131```
132
133In the cases of unregistered attributes, operations, and types, MLIR will
134enforce some structural constraints (e.g. dominance, etc.), but otherwise they
135are completely opaque. For instance, MLIR has little information about whether
136an unregistered operation can operate on particular data types, how many
137operands it can take, or how many results it produces. This flexibility can be
138useful for bootstrapping purposes, but it is generally advised against in mature
139systems. Unregistered operations must be treated conservatively by
140transformations and analyses, and they are much harder to construct and
141manipulate.
142
143This handling can be observed by crafting what should be an invalid IR for Toy
144and seeing it round-trip without tripping the verifier:
145
146```mlir
147func.func @main() {
148  %0 = "toy.print"() : () -> tensor<2x3xf64>
149}
150```
151
152There are multiple problems here: the `toy.print` operation is not a terminator;
153it should take an operand; and it shouldn't return any values. In the next
154section, we will register our dialect and operations with MLIR, plug into the
155verifier, and add nicer APIs to manipulate our operations.
156
157## Defining a Toy Dialect
158
159To effectively interface with MLIR, we will define a new Toy dialect. This
160dialect will model the structure of the Toy language, as well as provide an easy
161avenue for high-level analysis and transformation.
162
163```c++
164/// This is the definition of the Toy dialect. A dialect inherits from
165/// mlir::Dialect and registers custom attributes, operations, and types. It can
166/// also override virtual methods to change some general behavior, which will be
167/// demonstrated in later chapters of the tutorial.
168class ToyDialect : public mlir::Dialect {
169public:
170  explicit ToyDialect(mlir::MLIRContext *ctx);
171
172  /// Provide a utility accessor to the dialect namespace.
173  static llvm::StringRef getDialectNamespace() { return "toy"; }
174
175  /// An initializer called from the constructor of ToyDialect that is used to
176  /// register attributes, operations, types, and more within the Toy dialect.
177  void initialize();
178};
179```
180
181This is the C++ definition of a dialect, but MLIR also supports defining
182dialects declaratively via
183[tablegen](https://llvm.org/docs/TableGen/ProgRef.html). Using the declarative
184specification is much cleaner as it removes the need for a large portion of the
185boilerplate when defining a new dialect. It also enables easy generation of
186dialect documentation, which can be described directly alongside the dialect. In
187this declarative format, the toy dialect would be specified as:
188
189```tablegen
190// Provide a definition of the 'toy' dialect in the ODS framework so that we
191// can define our operations.
192def Toy_Dialect : Dialect {
193  // The namespace of our dialect, this corresponds 1-1 with the string we
194  // provided in `ToyDialect::getDialectNamespace`.
195  let name = "toy";
196
197  // A short one-line summary of our dialect.
198  let summary = "A high-level dialect for analyzing and optimizing the "
199                "Toy language";
200
201  // A much longer description of our dialect.
202  let description = [{
203    The Toy language is a tensor-based language that allows you to define
204    functions, perform some math computation, and print results. This dialect
205    provides a representation of the language that is amenable to analysis and
206    optimization.
207  }];
208
209  // The C++ namespace that the dialect class definition resides in.
210  let cppNamespace = "toy";
211}
212```
213
214To see what this generates, we can run the `mlir-tblgen` command with the
215`gen-dialect-decls` action like so:
216
217```shell
218${build_root}/bin/mlir-tblgen -gen-dialect-decls ${mlir_src_root}/examples/toy/Ch2/include/toy/Ops.td -I ${mlir_src_root}/include/
219```
220
221After the dialect has been defined, it can now be loaded into an MLIRContext:
222
223```c++
224  context.loadDialect<ToyDialect>();
225```
226
227By default, an `MLIRContext` only loads the
228[Builtin Dialect](../../Dialects/Builtin.md), which provides a few core IR
229components, meaning that other dialects, such as our `Toy` dialect, must be
230explicitly loaded.
231
232## Defining Toy Operations
233
234Now that we have a `Toy` dialect, we can start defining the operations. This
235will allow for providing semantic information that the rest of the system can
236hook into. As an example, let's walk through the creation of a `toy.constant`
237operation. This operation will represent a constant value in the Toy language.
238
239```mlir
240 %4 = "toy.constant"() {value = dense<1.0> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
241```
242
243This operation takes zero operands, a
244[dense elements](../../Dialects/Builtin.md/#denseintorfpelementsattr) attribute named
245`value` to represent the constant value, and returns a single result of
246[RankedTensorType](../../Dialects/Builtin.md/#rankedtensortype). An operation class
247inherits from the [CRTP](https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern)
248`mlir::Op` class which also takes some optional [*traits*](../../Traits) to
249customize its behavior. `Traits` are a mechanism with which we can inject
250additional behavior into an Operation, such as additional accessors,
251verification, and more. Let's look below at a possible definition for the
252constant operation that we have described above:
253
254```c++
255class ConstantOp : public mlir::Op<
256                     /// `mlir::Op` is a CRTP class, meaning that we provide the
257                     /// derived class as a template parameter.
258                     ConstantOp,
259                     /// The ConstantOp takes zero input operands.
260                     mlir::OpTrait::ZeroOperands,
261                     /// The ConstantOp returns a single result.
262                     mlir::OpTrait::OneResult,
263                     /// We also provide a utility `getType` accessor that
264                     /// returns the TensorType of the single result.
265                     mlir::OpTrait::OneTypedResult<TensorType>::Impl> {
266
267 public:
268  /// Inherit the constructors from the base Op class.
269  using Op::Op;
270
271  /// Provide the unique name for this operation. MLIR will use this to register
272  /// the operation and uniquely identify it throughout the system. The name
273  /// provided here must be prefixed by the parent dialect namespace followed
274  /// by a `.`.
275  static llvm::StringRef getOperationName() { return "toy.constant"; }
276
277  /// Return the value of the constant by fetching it from the attribute.
278  mlir::DenseElementsAttr getValue();
279
280  /// Operations may provide additional verification beyond what the attached
281  /// traits provide.  Here we will ensure that the specific invariants of the
282  /// constant operation are upheld, for example the result type must be
283  /// of TensorType and matches the type of the constant `value`.
284  LogicalResult verifyInvariants();
285
286  /// Provide an interface to build this operation from a set of input values.
287  /// This interface is used by the `builder` classes to allow for easily
288  /// generating instances of this operation:
289  ///   mlir::OpBuilder::create<ConstantOp>(...)
290  /// This method populates the given `state` that MLIR uses to create
291  /// operations. This state is a collection of all of the discrete elements
292  /// that an operation may contain.
293  /// Build a constant with the given return type and `value` attribute.
294  static void build(mlir::OpBuilder &builder, mlir::OperationState &state,
295                    mlir::Type result, mlir::DenseElementsAttr value);
296  /// Build a constant and reuse the type from the given 'value'.
297  static void build(mlir::OpBuilder &builder, mlir::OperationState &state,
298                    mlir::DenseElementsAttr value);
299  /// Build a constant by broadcasting the given 'value'.
300  static void build(mlir::OpBuilder &builder, mlir::OperationState &state,
301                    double value);
302};
303```
304
305and we can register this operation in the `ToyDialect` initializer:
306
307```c++
308void ToyDialect::initialize() {
309  addOperations<ConstantOp>();
310}
311```
312
313### Op vs Operation: Using MLIR Operations
314
315Now that we have defined an operation, we will want to access and transform it.
316In MLIR, there are two main classes related to operations: `Operation` and `Op`.
317The `Operation` class is used to generically model all operations. It is
318'opaque', in the sense that it does not describe the properties of particular
319operations or types of operations. Instead, the `Operation` class provides a
320general API into an operation instance. On the other hand, each specific type of
321operation is represented by an `Op` derived class. For instance `ConstantOp`
322represents a operation with zero inputs, and one output, which is always set to
323the same value. `Op` derived classes act as smart pointer wrapper around a
324`Operation*`, provide operation-specific accessor methods, and type-safe
325properties of operations. This means that when we define our Toy operations, we
326are simply defining a clean, semantically useful interface for building and
327interfacing with the `Operation` class. This is why our `ConstantOp` defines no
328class fields; all of the data for this operation is stored in the referenced
329`Operation`. A side effect of this design is that we always pass around `Op`
330derived classes "by-value", instead of by reference or pointer (*passing by
331value* is a common idiom in MLIR and applies similarly to attributes, types,
332etc). Given a generic `Operation*` instance, we can always get a specific `Op`
333instance using LLVM's casting infrastructure:
334
335```c++
336void processConstantOp(mlir::Operation *operation) {
337  ConstantOp op = llvm::dyn_cast<ConstantOp>(operation);
338
339  // This operation is not an instance of `ConstantOp`.
340  if (!op)
341    return;
342
343  // Get the internal operation instance wrapped by the smart pointer.
344  mlir::Operation *internalOperation = op.getOperation();
345  assert(internalOperation == operation &&
346         "these operation instances are the same");
347}
348```
349
350### Using the Operation Definition Specification (ODS) Framework
351
352In addition to specializing the `mlir::Op` C++ template, MLIR also supports
353defining operations in a declarative manner. This is achieved via the
354[Operation Definition Specification](../../DefiningDialects/Operations.md) framework. Facts
355regarding an operation are specified concisely into a TableGen record, which
356will be expanded into an equivalent `mlir::Op` C++ template specialization at
357compile time. Using the ODS framework is the desired way for defining operations
358in MLIR given the simplicity, conciseness, and general stability in the face of
359C++ API changes.
360
361Lets see how to define the ODS equivalent of our ConstantOp:
362
363Operations in ODS are defined by inheriting from the `Op` class. To simplify our
364operation definitions, we will define a base class for operations in the Toy
365dialect.
366
367```tablegen
368// Base class for toy dialect operations. This operation inherits from the base
369// `Op` class in OpBase.td, and provides:
370//   * The parent dialect of the operation.
371//   * The mnemonic for the operation, or the name without the dialect prefix.
372//   * A list of traits for the operation.
373class Toy_Op<string mnemonic, list<Trait> traits = []> :
374    Op<Toy_Dialect, mnemonic, traits>;
375```
376
377With all of the preliminary pieces defined, we can begin to define the constant
378operation.
379
380We define a toy operation by inheriting from our base 'Toy_Op' class above. Here
381we provide the mnemonic and a list of traits for the operation. The
382[mnemonic](../../DefiningDialects/Operations.md/#operation-name) here matches the one given in
383`ConstantOp::getOperationName` without the dialect prefix; `toy.`. Missing here
384from our C++ definition are the `ZeroOperands` and `OneResult` traits; these
385will be automatically inferred based upon the `arguments` and `results` fields
386we define later.
387
388```tablegen
389def ConstantOp : Toy_Op<"constant"> {
390}
391```
392
393At this point you probably might want to know what the C++ code generated by
394TableGen looks like. Simply run the `mlir-tblgen` command with the
395`gen-op-decls` or the `gen-op-defs` action like so:
396
397```shell
398${build_root}/bin/mlir-tblgen -gen-op-defs ${mlir_src_root}/examples/toy/Ch2/include/toy/Ops.td -I ${mlir_src_root}/include/
399```
400
401Depending on the selected action, this will print either the `ConstantOp` class
402declaration or its implementation. Comparing this output to the hand-crafted
403implementation is incredibly useful when getting started with TableGen.
404
405#### Defining Arguments and Results
406
407With the shell of the operation defined, we can now provide the
408[inputs](../../DefiningDialects/Operations.md/#operation-arguments) and
409[outputs](../../DefiningDialects/Operations.md/#operation-results) to our operation. The
410inputs, or arguments, to an operation may be attributes or types for SSA operand
411values. The results correspond to a set of types for the values produced by the
412operation:
413
414```tablegen
415def ConstantOp : Toy_Op<"constant"> {
416  // The constant operation takes an attribute as the only input.
417  // `F64ElementsAttr` corresponds to a 64-bit floating-point ElementsAttr.
418  let arguments = (ins F64ElementsAttr:$value);
419
420  // The constant operation returns a single value of TensorType.
421  // F64Tensor corresponds to a 64-bit floating-point TensorType.
422  let results = (outs F64Tensor);
423}
424```
425
426By providing a name to the arguments or results, e.g. `$value`, ODS will
427automatically generate a matching accessor: `DenseElementsAttr
428ConstantOp::value()`.
429
430#### Adding Documentation
431
432The next step after defining the operation is to document it. Operations may
433provide
434[`summary` and `description`](../../DefiningDialects/Operations.md/#operation-documentation)
435fields to describe the semantics of the operation. This information is useful
436for users of the dialect and can even be used to auto-generate Markdown
437documents.
438
439```tablegen
440def ConstantOp : Toy_Op<"constant"> {
441  // Provide a summary and description for this operation. This can be used to
442  // auto-generate documentation of the operations within our dialect.
443  let summary = "constant operation";
444  let description = [{
445    Constant operation turns a literal into an SSA value. The data is attached
446    to the operation as an attribute. For example:
447
448      %0 = "toy.constant"()
449         { value = dense<[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]> : tensor<2x3xf64> }
450        : () -> tensor<2x3xf64>
451  }];
452
453  // The constant operation takes an attribute as the only input.
454  // `F64ElementsAttr` corresponds to a 64-bit floating-point ElementsAttr.
455  let arguments = (ins F64ElementsAttr:$value);
456
457  // The generic call operation returns a single value of TensorType.
458  // F64Tensor corresponds to a 64-bit floating-point TensorType.
459  let results = (outs F64Tensor);
460}
461```
462
463#### Verifying Operation Semantics
464
465At this point we've already covered a majority of the original C++ operation
466definition. The next piece to define is the verifier. Luckily, much like the
467named accessor, the ODS framework will automatically generate a lot of the
468necessary verification logic based upon the constraints we have given. This
469means that we don't need to verify the structure of the return type, or even the
470input attribute `value`. In many cases, additional verification is not even
471necessary for ODS operations. To add additional verification logic, an operation
472can override the [`verifier`](../../DefiningDialects/Operations.md/#custom-verifier-code)
473field. The `verifier` field allows for defining a C++ code blob that will be run
474as part of `ConstantOp::verify`. This blob can assume that all of the other
475invariants of the operation have already been verified:
476
477```tablegen
478def ConstantOp : Toy_Op<"constant"> {
479  // Provide a summary and description for this operation. This can be used to
480  // auto-generate documentation of the operations within our dialect.
481  let summary = "constant operation";
482  let description = [{
483    Constant operation turns a literal into an SSA value. The data is attached
484    to the operation as an attribute. For example:
485
486      %0 = "toy.constant"()
487         { value = dense<[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]> : tensor<2x3xf64> }
488        : () -> tensor<2x3xf64>
489  }];
490
491  // The constant operation takes an attribute as the only input.
492  // `F64ElementsAttr` corresponds to a 64-bit floating-point ElementsAttr.
493  let arguments = (ins F64ElementsAttr:$value);
494
495  // The generic call operation returns a single value of TensorType.
496  // F64Tensor corresponds to a 64-bit floating-point TensorType.
497  let results = (outs F64Tensor);
498
499  // Add additional verification logic to the constant operation. Setting this bit
500  // to `1` will generate a `::llvm::LogicalResult verify()` declaration on the
501  // operation class that is called after ODS constructs have been verified, for
502  // example the types of arguments and results. We implement additional verification
503  // in the definition of this `verify` method in the C++ source file.
504  let hasVerifier = 1;
505}
506```
507
508#### Attaching `build` Methods
509
510The final missing component here from our original C++ example are the `build`
511methods. ODS can generate some simple build methods automatically, and in this
512case it will generate our first build method for us. For the rest, we define the
513[`builders`](../../DefiningDialects/Operations.md/#custom-builder-methods) field. This field
514takes a list of `OpBuilder` objects that take a string corresponding to a list
515of C++ parameters, as well as an optional code block that can be used to specify
516the implementation inline.
517
518```tablegen
519def ConstantOp : Toy_Op<"constant"> {
520  ...
521
522  // Add custom build methods for the constant operation. These methods populate
523  // the `state` that MLIR uses to create operations, i.e. these are used when
524  // using `builder.create<ConstantOp>(...)`.
525  let builders = [
526    // Build a constant with a given constant tensor value.
527    OpBuilder<(ins "DenseElementsAttr":$value), [{
528      // Call into an autogenerated `build` method.
529      build(builder, result, value.getType(), value);
530    }]>,
531
532    // Build a constant with a given constant floating-point value. This builder
533    // creates a declaration for `ConstantOp::build` with the given parameters.
534    OpBuilder<(ins "double":$value)>
535  ];
536}
537```
538
539#### Specifying a Custom Assembly Format
540
541At this point we can generate our "Toy IR". For example, the following:
542
543```toy
544# User defined generic function that operates on unknown shaped arguments.
545def multiply_transpose(a, b) {
546  return transpose(a) * transpose(b);
547}
548
549def main() {
550  var a<2, 3> = [[1, 2, 3], [4, 5, 6]];
551  var b<2, 3> = [1, 2, 3, 4, 5, 6];
552  var c = multiply_transpose(a, b);
553  var d = multiply_transpose(b, a);
554  print(d);
555}
556```
557
558Results in the following IR:
559
560```mlir
561module {
562  "toy.func"() ({
563  ^bb0(%arg0: tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":4:1), %arg1: tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":4:1)):
564    %0 = "toy.transpose"(%arg0) : (tensor<*xf64>) -> tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":5:10)
565    %1 = "toy.transpose"(%arg1) : (tensor<*xf64>) -> tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":5:25)
566    %2 = "toy.mul"(%0, %1) : (tensor<*xf64>, tensor<*xf64>) -> tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":5:25)
567    "toy.return"(%2) : (tensor<*xf64>) -> () loc("test/Examples/Toy/Ch2/codegen.toy":5:3)
568  }) {sym_name = "multiply_transpose", type = (tensor<*xf64>, tensor<*xf64>) -> tensor<*xf64>} : () -> () loc("test/Examples/Toy/Ch2/codegen.toy":4:1)
569  "toy.func"() ({
570    %0 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64> loc("test/Examples/Toy/Ch2/codegen.toy":9:17)
571    %1 = "toy.reshape"(%0) : (tensor<2x3xf64>) -> tensor<2x3xf64> loc("test/Examples/Toy/Ch2/codegen.toy":9:3)
572    %2 = "toy.constant"() {value = dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64>} : () -> tensor<6xf64> loc("test/Examples/Toy/Ch2/codegen.toy":10:17)
573    %3 = "toy.reshape"(%2) : (tensor<6xf64>) -> tensor<2x3xf64> loc("test/Examples/Toy/Ch2/codegen.toy":10:3)
574    %4 = "toy.generic_call"(%1, %3) {callee = @multiply_transpose} : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":11:11)
575    %5 = "toy.generic_call"(%3, %1) {callee = @multiply_transpose} : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":12:11)
576    "toy.print"(%5) : (tensor<*xf64>) -> () loc("test/Examples/Toy/Ch2/codegen.toy":13:3)
577    "toy.return"() : () -> () loc("test/Examples/Toy/Ch2/codegen.toy":8:1)
578  }) {sym_name = "main", type = () -> ()} : () -> () loc("test/Examples/Toy/Ch2/codegen.toy":8:1)
579} loc(unknown)
580```
581
582One thing to notice here is that all of our Toy operations are printed using the
583generic assembly format. This format is the one shown when breaking down
584`toy.transpose` at the beginning of this chapter. MLIR allows for operations to
585define their own custom assembly format, either
586[declaratively](../../DefiningDialects/Operations.md/#declarative-assembly-format) or
587imperatively via C++. Defining a custom assembly format allows for tailoring the
588generated IR into something a bit more readable by removing a lot of the fluff
589that is required by the generic format. Let's walk through an example of an
590operation format that we would like to simplify.
591
592##### `toy.print`
593
594The current form of `toy.print` is a little verbose. There are a lot of
595additional characters that we would like to strip away. Let's begin by thinking
596of what a good format of `toy.print` would be, and see how we can implement it.
597Looking at the basics of `toy.print` we get:
598
599```mlir
600toy.print %5 : tensor<*xf64> loc(...)
601```
602
603Here we have stripped much of the format down to the bare essentials, and it has
604become much more readable. To provide a custom assembly format, an operation can
605either override the `hasCustomAssemblyFormat` field for a C++ format, or the
606`assemblyFormat` field for the declarative format. Let's look at the C++ variant
607first, as this is what the declarative format maps to internally.
608
609```tablegen
610/// Consider a stripped definition of `toy.print` here.
611def PrintOp : Toy_Op<"print"> {
612  let arguments = (ins F64Tensor:$input);
613
614  // Divert the printer and parser to `parse` and `print` methods on our operation,
615  // to be implemented in the .cpp file. More details on these methods is shown below.
616  let hasCustomAssemblyFormat = 1;
617}
618```
619
620A C++ implementation for the printer and parser is shown below:
621
622```c++
623/// The 'OpAsmPrinter' class is a stream that will allows for formatting
624/// strings, attributes, operands, types, etc.
625void PrintOp::print(mlir::OpAsmPrinter &printer) {
626  printer << "toy.print " << op.input();
627  printer.printOptionalAttrDict(op.getAttrs());
628  printer << " : " << op.input().getType();
629}
630
631/// The 'OpAsmParser' class provides a collection of methods for parsing
632/// various punctuation, as well as attributes, operands, types, etc. Each of
633/// these methods returns a `ParseResult`. This class is a wrapper around
634/// `LogicalResult` that can be converted to a boolean `true` value on failure,
635/// or `false` on success. This allows for easily chaining together a set of
636/// parser rules. These rules are used to populate an `mlir::OperationState`
637/// similarly to the `build` methods described above.
638mlir::ParseResult PrintOp::parse(mlir::OpAsmParser &parser,
639                                 mlir::OperationState &result) {
640  // Parse the input operand, the attribute dictionary, and the type of the
641  // input.
642  mlir::OpAsmParser::UnresolvedOperand inputOperand;
643  mlir::Type inputType;
644  if (parser.parseOperand(inputOperand) ||
645      parser.parseOptionalAttrDict(result.attributes) || parser.parseColon() ||
646      parser.parseType(inputType))
647    return mlir::failure();
648
649  // Resolve the input operand to the type we parsed in.
650  if (parser.resolveOperand(inputOperand, inputType, result.operands))
651    return mlir::failure();
652
653  return mlir::success();
654}
655```
656
657With the C++ implementation defined, let's see how this can be mapped to the
658[declarative format](../../DefiningDialects/Operations.md/#declarative-assembly-format). The
659declarative format is largely composed of three different components:
660
661*   Directives
662    -   A type of builtin function, with an optional set of arguments.
663*   Literals
664    -   A keyword or punctuation surrounded by \`\`.
665*   Variables
666    -   An entity that has been registered on the operation itself, i.e. an
667        argument(attribute or operand), result, successor, etc. In the `PrintOp`
668        example above, a variable would be `$input`.
669
670A direct mapping of our C++ format looks something like:
671
672```tablegen
673/// Consider a stripped definition of `toy.print` here.
674def PrintOp : Toy_Op<"print"> {
675  let arguments = (ins F64Tensor:$input);
676
677  // In the following format we have two directives, `attr-dict` and `type`.
678  // These correspond to the attribute dictionary and the type of a given
679  // variable represectively.
680  let assemblyFormat = "$input attr-dict `:` type($input)";
681}
682```
683
684The [declarative format](../../DefiningDialects/Operations.md/#declarative-assembly-format) has
685many more interesting features, so be sure to check it out before implementing a
686custom format in C++. After beautifying the format of a few of our operations we
687now get a much more readable:
688
689```mlir
690module {
691  toy.func @multiply_transpose(%arg0: tensor<*xf64>, %arg1: tensor<*xf64>) -> tensor<*xf64> {
692    %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":5:10)
693    %1 = toy.transpose(%arg1 : tensor<*xf64>) to tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":5:25)
694    %2 = toy.mul %0, %1 : tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":5:25)
695    toy.return %2 : tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":5:3)
696  } loc("test/Examples/Toy/Ch2/codegen.toy":4:1)
697  toy.func @main() {
698    %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64> loc("test/Examples/Toy/Ch2/codegen.toy":9:17)
699    %1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64> loc("test/Examples/Toy/Ch2/codegen.toy":9:3)
700    %2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64> loc("test/Examples/Toy/Ch2/codegen.toy":10:17)
701    %3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64> loc("test/Examples/Toy/Ch2/codegen.toy":10:3)
702    %4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":11:11)
703    %5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":12:11)
704    toy.print %5 : tensor<*xf64> loc("test/Examples/Toy/Ch2/codegen.toy":13:3)
705    toy.return loc("test/Examples/Toy/Ch2/codegen.toy":8:1)
706  } loc("test/Examples/Toy/Ch2/codegen.toy":8:1)
707} loc(unknown)
708```
709
710Above we introduce several of the concepts for defining operations in the ODS
711framework, but there are many more that we haven't had a chance to: regions,
712variadic operands, etc. Check out the
713[full specification](../../DefiningDialects/Operations.md) for more details.
714
715## Complete Toy Example
716
717We can now generate our "Toy IR". You can build `toyc-ch2` and try yourself on
718the above example: `toyc-ch2 test/Examples/Toy/Ch2/codegen.toy -emit=mlir
719-mlir-print-debuginfo`. We can also check our RoundTrip: `toyc-ch2
720test/Examples/Toy/Ch2/codegen.toy -emit=mlir -mlir-print-debuginfo 2>
721codegen.mlir` followed by `toyc-ch2 codegen.mlir -emit=mlir`. You should also
722use `mlir-tblgen` on the final definition file and study the generated C++ code.
723
724At this point, MLIR knows about our Toy dialect and operations. In the
725[next chapter](Ch-3.md), we will leverage our new dialect to implement some
726high-level language-specific analyses and transformations for the Toy language.
727