Rationale.md - OpenGrok cross reference for /llvm-project/mlir/docs/Rationale/Rationale.md

Lines Matching +full:docs +full:- +full:polly +full:- +full:html
5 decisions we made. This is not intended to be a "finely groomed" document - we
14 three-address SSA representations (like
15 [LLVM IR](http://llvm.org/docs/LangRef.html) or
16 [SIL](https://github.com/apple/swift/blob/main/docs/SIL.rst)), but which
19 high level dataflow graphs as well as target-specific code generated for high
24 MLIR stands for one of "Multi-Level IR" or "Multi-dimensional Loop IR" or
26 provides the rationale behind MLIR -- its actual
31 The Multi-Level Intermediate Representation (MLIR) is intended for easy
33 matrices of high dimensionality. It is thus well-suited to deep learning
35 sequential computation. The representation allows high-level optimization and
37 deep memory hierarchies --- general-purpose multicores, GPUs, and specialized
45 MLIR is a multi-level IR, i.e., it represents code at a domain-specific
50 implementations (such as LLVM [Polly](https://polly.llvm.org/)) that are able to
60 underlying a polyhedral representation of high-dimensional loop nests and
70 addressed memory in accelerators, mapping to pre-tuned expert-written
73 subsume all traditional loop transformations (unimodular and non-unimodular)
79 MLIR's design allows a progressive lowering to target-specific forms. Besides
80 high-level transformations for loop nests and data layouts that a typical
81 mid-level optimizer is expected to deal with, MLIR is also designed to perform
82 certain low-level scheduling and mapping decisions that a typical backend IR is
84 auto-vectorization, and software pipelining. The need to support these
91 low-level IR close to assembly to lift and reconstruct loops and perform such a
94 MLIR also facilitates automatic mapping to expert pre-tuned primitives or vendor
99 needed to lower to general-purpose as well as specialized accelerators. It also
105 This section sheds light on some of the design decisions -- some of these are
112 n-ranked tensor. This disallows the equivalent of pointer arithmetic or the
115 follow use-def chains (e.g. through
116 [affine.apply operations](../Dialects/Affine.md/#affineapply-affineapplyop) or
119 references at compile-time using polyhedral techniques. This is possible because
121 [restrictions on dimensions and symbols](../Dialects/Affine.md/#restrictions-on-dimensions-and-symb…
123 A scalar of element-type (a primitive type or a vector type) that is stored in
124 memory is modeled as a 0-d memref. This is also necessary for scalars that are
126 have an SSA representation --
127 [an extension](#affineif-and-affinefor-extensions-for-escaping-scalars) to allow
166 type - memref<8x%Nxf32>. We went for the current approach in MLIR because it
167 simplifies the design --- types remain immutable when the values of symbols
173 rather than [PHI instructions](http://llvm.org/docs/LangRef.html#i-phi) used in
195     [landingpad instruction](http://llvm.org/docs/LangRef.html#landingpad-instruction)
198     [switch_enum instruction](https://github.com/apple/swift/blob/main/docs/SIL.rst#switch-enum).
201 [SIL Intermediate Representation](https://github.com/apple/swift/blob/main/docs/SIL.rst),
203 [a talk on YouTube](https://www.youtube.com/watch?v=Ntj8ab-5cvE). The section of
205 [starts here](https://www.youtube.com/watch?v=Ntj8ab-5cvE&t=596s).
209 Index types are intended to be used for platform-specific "size" values and may
220 ### Data layout of non-primitive types
223 target and ABI-specific and thus should be configurable rather than imposed by
231 64-bit integers, so that its storage requirement is `4 x 64 / 8 = 32` bytes,
237 The data layout of dialect-specific types is undefined at MLIR level. Yet
259 LLVM uses the [same design](http://llvm.org/docs/LangRef.html#integer-type),
261 [in the LLVM 2.0 integer type](http://releases.llvm.org/2.0/docs/LangRef.html#t_derived).
263 [LLVM 1.0](http://releases.llvm.org/1.0/docs/LangRef.html#t_classifications) to
264 [1.9](http://releases.llvm.org/1.9/docs/LangRef.html#t_classifications), LLVM
288 ([following the design of LLVM](http://llvm.org/docs/LangRef.html#binary-operations)).
295 [integer comparisons](http://llvm.org/docs/LangRef.html#icmp-instruction) and
296 [floating point comparisons](http://llvm.org/docs/LangRef.html#fcmp-instruction)
299 ["fast math"](http://llvm.org/docs/LangRef.html#fadd-instruction), and integers
301 [various forms of wrapping](http://llvm.org/docs/LangRef.html#add-instruction)
314 Since integers are [signless](#integer-signedness-semantics), it is necessary to
318 results for unsigned (`8 > 3`) and signed (`-8 < 3`) interpretations. This
343 logic redefinitions for custom assembly form of non-builtin operations.
361 We considered attaching a "force-inline" attribute on a function and/or a
381 Being able to use values defined outside the region implies that use-def chains
384 boundaries, for example in case of TableGen-defined canonicalization patterns.
387 to enable cross-region analyses and transformations that are simpler than
388 inter-procedural transformations. Having uses from different regions appear in
389 the same use-def chain, contrary to an additional data structure maintaining
397 pattern for example): traversing use-def chains potentially crosses implicitly
402 arguments to explicitly break the use-def chains in the current proposal. This
403 can be combined with an attribute-imposed semantic requirement disallowing the
418     -   For standard/builtin operations, only builtin types are allowed. This
421     -   Outside of standard/builtin operations, dialects are expected to verify
426     -   For builtin types, these types are allowed to contain types from other
432     -   For dialect types, the dialect is expected to verify any type
438 Following the separation between the built-in and standard dialect, it makes
439 sense to separate built-in types and standard dialect types. Built-in types are
450 for types to be round-tripped without needing to link in the dialect library
480 %s = "foo"() : () -> !llvm<"i32*">
492 [type aliases](../LangRef.md/#type-aliases).
505 provide a myriad of benefits, such as alleviating any need for tuple-extract
524 specific needs. The custom assembly form can de-duplicate information from the
533 ### Non-affine control flow
548 The presence of dynamic control flow leads to an inner non-affine function
594 ### Non-affine loop bounds
601     // Non-affine loop bound for k loop.
613       %pow = call @pow(2, %j) : (index, index) ->  index
639                                    S3 - d0 - 1 >= 0, S4 - d1 - 1 >= 0)
641 #map0 = (d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3, d4, d5, d6)
647 //     %out0 =  %0#1 * %h_stride + %0#4 * %h_kernel_dilation - %h_pad_low
648 //     %out1=  %0#2 * %w_stride + %0#5 * %w_kernel_dilation - %w_pad_low
649 #map1_0 = (d0, d1, d2, d3) [S0, S1, S2, S3, S4, S5] -> (d0 * S0 + d2 * S2 - %S4)
650 #map1_1 = (d0, d1, d2, d3) [S0, S1, S2, S3, S4, S5] -> (d1 * S1 + d3 * S3 - %S5)
652 // Semi-affine map to undilated input coordinate space.
654 #map2_0 = (d0, d1) [S0, S1] -> (d0 / S0)
655 #map2_1 = (d0, d1) [S0, S1] -> (d1 / S1)
744 are interleaved. We model domains as non-piece-wise convex integer sets, and
746 latter can be piece-wise affine relations. In the schedule tree representation,
747 domain and schedules for instructions are represented in a tree-like structure
748 which is called a schedule tree. Each non-leaf node of the tree is an abstract
755 // #map0 = (d0, d1, d2, d3, d4, d5) -> (128*d0 + d3, 128*d1 + d4, 128*d2 + d5)
756 #intset_ij = (i, j) [M, N, K]  : i >= 0, -i + N - 1 >= 0, j >= 0, -j + N-1 >= 0
757 #intset_ijk = (i, j, k) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0,
758                                      -j +  M-1 >= 0, k >= 0, -k + N - 1 >= 0)
763       // (%i, %j) = affine.apply (d0, d1) -> (128*d0, 128*d1) (%t1, %t2)
768         //                          -> (128*d0, 128*d1, 128*d2)  (%t1, %t2, %t3)
771         //                          -> (128*d0, 128*d1, 128*d2)  (%t1, %t2, %t3)
783       // (%i, %j) = affine.apply (d0, d1) -> (128*d0, 128*d1) (%t1, %t2)
797 library calls where no implementation is available, high-performance vendor
798 libraries, or user-provided / user-tuned routines.
808 affine-rel-def ::= affine-rel-id `=` affine-relation-inline
810 affine-rel-id ::= `##` prefixed-id
812 affine-relation-inline ::=
813        `(` input-dims `)` (`[` symbols `]`)? `->`
814        `(` output-dims `)` :  affine-constraint-conjunction
816 input-dims ::= bare-id-list
817 output-dims ::= bare-id-list
818 symbols ::= bare-id-list
820 affine-rel ::= affine-rel-id | affine-relation-inline
823 affine-rel-spec ::= affine-rel dim-and-symbol-use-list
826 All identifiers appearing in input-dims, output-dims, and symbol-dims are
827 pairwise distinct. All affine-constraint non-terminals in the above syntax are
828 allowed to contain identifiers only from input-dims, output-dims, and
829 symbol-dims.
839 ##aff_rel9 = (d0) -> (r0) : r0 - d0 >= 0, d0 - r0 + 1 >= 0
841 func.func @count (%A : memref<128xf32>, %pos : i32) -> f32
858 MLIR supports values of a Function type. Instead of having first-class IR
888 per-region attributes with array attributes attached to the entity containing
890 IR and enables more concise and op-specific forms, e.g., when all regions of an
895 This can be reconsidered in the future if we see a non-neglectable amount of use
901 include opaque ones, high-performance vendor libraries such as CuDNN, CuB, MKL,
902 FFT libraries, user-provided/optimized functions, or data movement runtimes such
905 such calls on sub-tensors. For user-provided or custom hand-tuned functions, the
906 read/write/may_read/may_write sets could be provided a-priori by a user as part
914 ##rel9 ( ) [s0] -> (r0, r1) : 0 <= r0 <= 1023, 0 <= r1 <= s0 - 1
917   -> f32 [
940     the IR and the in-memory form.
943     1.  Allow piece-wise affine maps for layouts: allows clean modeling of
947     1.  Allow many-to-one layout maps: Index and layout maps in the current
948         proposal are bijective. Extending them to many-to-one layout maps allows
952     Proposal 2(a) requires non-trivial changes to the IR and the in-memory
962 situations, where escaping is necessary, we use zero-dimensional tensors and
975 [<out-var-list> =]
976 for %<index-variable-name> = <lower-bound> ... <upper-bound> step <step>
977    [with <in-var-list>] { <loop-instruction-list> }
980 out-var-list is a comma separated list of SSA values defined in the loop body
981 and used outside the loop body. in-var-list is a comma separated list of SSA
982 values used inside the loop body and their initializers. loop-instruction-list
988 // Return sum of elements in 1-dimensional mref A
989 func.func i32 @sum(%A : memref<?xi32>, %N : i32) -> (i32) {
1005 <out-var-list> = affine.if (<cond-list>) {...} [else {...}]
1008 Out-var-list is a list of SSA values defined by the if-instruction. The values
1009 are arguments to the yield-instruction that occurs in both then and else clauses
1019 func.func i32 @sum_half(%A : memref<?xi32>, %N : i32) -> (i32) {
1036 multi-thread them. There are multiple strategies for this, but a simple one is
1057 2.  Constants are defined in per-operation pools, instead of being globally
1059 3.  Functions, and other global-like operations, themselves are not SSA values