xref: /llvm-project/mlir/docs/Dialects/LLVM.md (revision dfeb3991fb489a703f631ab0c34b58f80568038d)
1# 'llvm' Dialect
2
3This dialect maps [LLVM IR](https://llvm.org/docs/LangRef.html) into MLIR by
4defining the corresponding operations and types. LLVM IR metadata is usually
5represented as MLIR attributes, which offer additional structure verification.
6
7We use "LLVM IR" to designate the
8[intermediate representation of LLVM](https://llvm.org/docs/LangRef.html) and
9"LLVM _dialect_" or "LLVM IR _dialect_" to refer to this MLIR dialect.
10
11Unless explicitly stated otherwise, the semantics of the LLVM dialect operations
12must correspond to the semantics of LLVM IR instructions and any divergence is
13considered a bug. The dialect also contains auxiliary operations that smoothen
14the differences in the IR structure, e.g., MLIR does not have `phi` operations
15and LLVM IR does not have a `constant` operation. These auxiliary operations are
16systematically prefixed with `mlir`, e.g. `llvm.mlir.constant` where `llvm.` is
17the dialect namespace prefix.
18
19[TOC]
20
21## Dependency on LLVM IR
22
23LLVM dialect is not expected to depend on any object that requires an
24`LLVMContext`, such as an LLVM IR instruction or type. Instead, MLIR provides
25thread-safe alternatives compatible with the rest of the infrastructure. The
26dialect is allowed to depend on the LLVM IR objects that don't require a
27context, such as data layout and triple description.
28
29## Module Structure
30
31IR modules use the built-in MLIR `ModuleOp` and support all its features. In
32particular, modules can be named, nested and are subject to symbol visibility.
33Modules can contain any operations, including LLVM functions and globals.
34
35### Data Layout and Triple
36
37An IR module may have an optional data layout and triple information attached
38using MLIR attributes `llvm.data_layout` and `llvm.triple`, respectively. Both
39are string attributes with the
40[same syntax](https://llvm.org/docs/LangRef.html#data-layout) as in LLVM IR and
41are verified to be correct. They can be defined as follows.
42
43```mlir
44module attributes {llvm.data_layout = "e",
45                   llvm.target_triple = "aarch64-linux-android"} {
46  // module contents
47}
48```
49
50### Functions
51
52LLVM functions are represented by a special operation, `llvm.func`, that has
53syntax similar to that of the built-in function operation but supports
54LLVM-related features such as linkage and variadic argument lists. See detailed
55description in the operation list [below](#llvmfunc-llvmllvmfuncop).
56
57### PHI Nodes and Block Arguments
58
59MLIR uses block arguments instead of PHI nodes to communicate values between
60blocks. Therefore, the LLVM dialect has no operation directly equivalent to
61`phi` in LLVM IR. Instead, all terminators can pass values as successor operands
62as these values will be forwarded as block arguments when the control flow is
63transferred.
64
65For example:
66
67```mlir
68^bb1:
69  %0 = llvm.addi %arg0, %cst : i32
70  llvm.br ^bb2[%0: i32]
71
72// If the control flow comes from ^bb1, %arg1 == %0.
73^bb2(%arg1: i32)
74  // ...
75```
76
77is equivalent to LLVM IR
78
79```llvm
80%0:
81  %1 = add i32 %arg0, %cst
82  br %3
83
84%3:
85  %arg1 = phi [%1, %0], //...
86```
87
88Since there is no need to use the block identifier to differentiate the source
89of different values, the LLVM dialect supports terminators that transfer the
90control flow to the same block with different arguments. For example:
91
92```mlir
93^bb1:
94  llvm.cond_br %cond, ^bb2[%0: i32], ^bb2[%1: i32]
95
96^bb2(%arg0: i32):
97  // ...
98```
99
100### Context-Level Values
101
102Some value kinds in LLVM IR, such as constants and undefs, are uniqued in
103context and used directly in relevant operations. MLIR does not support such
104values for thread-safety and concept parsimony reasons. Instead, regular values
105are produced by dedicated operations that have the corresponding semantics:
106[`llvm.mlir.constant`](#llvmmlirconstant-llvmconstantop),
107[`llvm.mlir.undef`](#llvmmlirundef-llvmundefop),
108[`llvm.mlir.poison`](#llvmmlirpoison-llvmpoisonop),
109[`llvm.mlir.zero`](#llvmmlirzero-llvmzeroop). Note how these operations are
110prefixed with `mlir.` to indicate that they don't belong to LLVM IR but are only
111necessary to model it in MLIR. The values produced by these operations are
112usable just like any other value.
113
114Examples:
115
116```mlir
117// Create an undefined value of structure type with a 32-bit integer followed
118// by a float.
119%0 = llvm.mlir.undef : !llvm.struct<(i32, f32)>
120
121// Null pointer.
122%1 = llvm.mlir.zero : !llvm.ptr
123
124// Create an zero initialized value of structure type with a 32-bit integer
125// followed by a float.
126%2 = llvm.mlir.zero :  !llvm.struct<(i32, f32)>
127
128// Constant 42 as i32.
129%3 = llvm.mlir.constant(42 : i32) : i32
130
131// Splat dense vector constant.
132%3 = llvm.mlir.constant(dense<1.0> : vector<4xf32>) : vector<4xf32>
133```
134
135Note that constants list the type twice. This is an artifact of the LLVM dialect
136not using built-in types, which are used for typed MLIR attributes. The syntax
137will be reevaluated after considering composite constants.
138
139### Globals
140
141Global variables are also defined using a special operation,
142[`llvm.mlir.global`](#llvmmlirglobal-llvmglobalop), located at the module
143level. Globals are MLIR symbols and are identified by their name.
144
145Since functions need to be isolated-from-above, i.e. values defined outside the
146function cannot be directly used inside the function, an additional operation,
147[`llvm.mlir.addressof`](#llvmmliraddressof-llvmaddressofop), is provided to
148locally define a value containing the _address_ of a global. The actual value
149can then be loaded from that pointer, or a new value can be stored into it if
150the global is not declared constant. This is similar to LLVM IR where globals
151are accessed through name and have a pointer type.
152
153### Linkage
154
155Module-level named objects in the LLVM dialect, namely functions and globals,
156have an optional _linkage_ attribute derived from LLVM IR
157[linkage types](https://llvm.org/docs/LangRef.html#linkage-types). Linkage is
158specified by the same keyword as in LLVM IR and is located between the operation
159name (`llvm.func` or `llvm.global`) and the symbol name. If no linkage keyword
160is present, `external` linkage is assumed by default. Linkage is _distinct_ from
161MLIR symbol visibility.
162
163### Attribute Pass-Through
164
165**WARNING:** this feature MUST NOT be used for any real workload. It is
166exclusively intended for quick prototyping. After that, attributes must be
167introduced as proper first-class concepts in the dialect.
168
169The LLVM dialect provides a mechanism to forward function-level attributes to
170LLVM IR using the `passthrough` attribute. This is an array attribute containing
171either string attributes or array attributes. In the former case, the value of
172the string is interpreted as the name of LLVM IR function attribute. In the
173latter case, the array is expected to contain exactly two string attributes, the
174first corresponding to the name of LLVM IR function attribute, and the second
175corresponding to its value. Note that even integer LLVM IR function attributes
176have their value represented in the string form.
177
178Example:
179
180```mlir
181llvm.func @func() attributes {
182  passthrough = ["readonly",           // value-less attribute
183                 ["alignstack", "4"],  // integer attribute with value
184                 ["other", "attr"]]    // attribute unknown to LLVM
185} {
186  llvm.return
187}
188```
189
190If the attribute is not known to LLVM IR, it will be attached as a string
191attribute.
192
193## Types
194
195LLVM dialect uses built-in types whenever possible and defines a set of
196complementary types, which correspond to the LLVM IR types that cannot be
197directly represented with built-in types. Similarly to other MLIR context-owned
198objects, the creation and manipulation of LLVM dialect types is thread-safe.
199
200MLIR does not support module-scoped named type declarations, e.g. `%s = type
201{i32, i32}` in LLVM IR. Instead, types must be fully specified at each use,
202except for recursive types where only the first reference to a named type needs
203to be fully specified. MLIR [type aliases](../LangRef.md/#type-aliases) can be
204used to achieve more compact syntax.
205
206The general syntax of LLVM dialect types is `!llvm.`, followed by a type kind
207identifier (e.g., `ptr` for pointer or `struct` for structure) and by an
208optional list of type parameters in angle brackets. The dialect follows MLIR
209style for types with nested angle brackets and keyword specifiers rather than
210using different bracket styles to differentiate types. Types inside the angle
211brackets may omit the `!llvm.` prefix for brevity: the parser first attempts to
212find a type (starting with `!` or a built-in type) and falls back to accepting a
213keyword. For example, `!llvm.struct<(!llvm.ptr, f32)>` and
214`!llvm.struct<(ptr, f32)>` are equivalent, with the latter being the canonical
215form, and denote a struct containing a pointer and a float.
216
217### Built-in Type Compatibility
218
219LLVM dialect accepts a subset of built-in types that are referred to as _LLVM
220dialect-compatible types_. The following types are compatible:
221
222-   Signless integers - `iN` (`IntegerType`).
223-   Floating point types - `bfloat`, `half`, `float`, `double` , `f80`, `f128`
224    (`FloatType`).
225-   1D vectors of signless integers or floating point types - `vector<NxT>`
226    (`VectorType`).
227
228Note that only a subset of types that can be represented by a given class is
229compatible. For example, signed and unsigned integers are not compatible. LLVM
230provides a function, `bool LLVM::isCompatibleType(Type)`, that can be used as a
231compatibility check.
232
233Each LLVM IR type corresponds to *exactly one* MLIR type, either built-in or
234LLVM dialect type. For example, because `i32` is LLVM-compatible, there is no
235`!llvm.i32` type. However, `!llvm.struct<(T, ...)>` is defined in the LLVM
236dialect as there is no corresponding built-in type.
237
238### Additional Simple Types
239
240The following non-parametric types derived from the LLVM IR are available in the
241LLVM dialect:
242
243-   `!llvm.ppc_fp128` (`LLVMPPCFP128Type`) - 128-bit floating-point value (two
244    64 bits).
245-   `!llvm.token` (`LLVMTokenType`) - a non-inspectable value associated with an
246    operation.
247-   `!llvm.metadata` (`LLVMMetadataType`) - LLVM IR metadata, to be used only if
248    the metadata cannot be represented as structured MLIR attributes.
249-   `!llvm.void` (`LLVMVoidType`) - does not represent any value; can only
250    appear in function results.
251
252These types represent a single value (or an absence thereof in case of `void`)
253and correspond to their LLVM IR counterparts.
254
255### Additional Parametric Types
256
257These types are parameterized by the types they contain, e.g., the pointee or
258the element type, which can be either compatible built-in or LLVM dialect types.
259
260#### Pointer Types
261
262Pointer types specify an address in memory.
263
264Pointers are [opaque](https://llvm.org/docs/OpaquePointers.html), i.e., do not
265indicate the type of the data pointed to, and are intended to simplify LLVM IR
266by encoding behavior relevant to the pointee type into operations rather than
267into types. Pointers can optionally be parametrized with an address space. The
268address space is an integer, but this choice may be reconsidered if MLIR
269implements named address spaces. The syntax of pointer types is as follows:
270
271```
272  llvm-ptr-type ::= `!llvm.ptr` (`<` integer-literal `>`)?
273```
274
275where the optional group containing the integer literal corresponds to the
276address space. All cases are represented by `LLVMPointerType` internally.
277
278#### Array Types
279
280Array types represent sequences of elements in memory. Array elements can be
281addressed with a value unknown at compile time, and can be nested. Only 1D
282arrays are allowed though.
283
284Array types are parameterized by the fixed size and the element type.
285Syntactically, their representation is the following:
286
287```
288  llvm-array-type ::= `!llvm.array<` integer-literal `x` type `>`
289```
290
291and they are internally represented as `LLVMArrayType`.
292
293#### Function Types
294
295Function types represent the type of a function, i.e. its signature.
296
297Function types are parameterized by the result type, the list of argument types
298and by an optional "variadic" flag. Unlike built-in `FunctionType`, LLVM dialect
299functions (`LLVMFunctionType`) always have single result, which may be
300`!llvm.void` if the function does not return anything. The syntax is as follows:
301
302```
303  llvm-func-type ::= `!llvm.func<` type `(` type-list (`,` `...`)? `)` `>`
304```
305
306For example,
307
308```mlir
309!llvm.func<void ()>           // a function with no arguments;
310!llvm.func<i32 (f32, i32)>    // a function with two arguments and a result;
311!llvm.func<void (i32, ...)>   // a variadic function with at least one argument.
312```
313
314In the LLVM dialect, functions are not first-class objects and one cannot have a
315value of function type. Instead, one can take the address of a function and
316operate on pointers to functions.
317
318### Vector Types
319
320Vector types represent sequences of elements, typically when multiple data
321elements are processed by a single instruction (SIMD). Vectors are thought of as
322stored in registers and therefore vector elements can only be addressed through
323constant indices.
324
325Vector types are parameterized by the size, which may be either _fixed_ or a
326multiple of some fixed size in case of _scalable_ vectors, and the element type.
327Vectors cannot be nested and only 1D vectors are supported. Scalable vectors are
328still considered 1D.
329
330LLVM dialect uses built-in vector types for _fixed_-size vectors of built-in
331types, and provides additional types for fixed-sized vectors of LLVM dialect
332types (`LLVMFixedVectorType`) and scalable vectors of any types
333(`LLVMScalableVectorType`). These two additional types share the following
334syntax:
335
336```
337  llvm-vec-type ::= `!llvm.vec<` (`?` `x`)? integer-literal `x` type `>`
338```
339
340Note that the sets of element types supported by built-in and LLVM dialect
341vector types are mutually exclusive, e.g., the built-in vector type does not
342accept `!llvm.ptr` and the LLVM dialect fixed-width vector type does not
343accept `i32`.
344
345The following functions are provided to operate on any kind of the vector types
346compatible with the LLVM dialect:
347
348-   `bool LLVM::isCompatibleVectorType(Type)` - checks whether a type is a
349    vector type compatible with the LLVM dialect;
350-   `Type LLVM::getVectorElementType(Type)` - returns the element type of any
351    vector type compatible with the LLVM dialect;
352-   `llvm::ElementCount LLVM::getVectorNumElements(Type)` - returns the number
353    of elements in any vector type compatible with the LLVM dialect;
354-   `Type LLVM::getFixedVectorType(Type, unsigned)` - gets a fixed vector type
355    with the given element type and size; the resulting type is either a
356    built-in or an LLVM dialect vector type depending on which one supports the
357    given element type.
358
359#### Examples of Compatible Vector Types
360
361```mlir
362vector<42 x i32>                   // Vector of 42 32-bit integers.
363!llvm.vec<42 x ptr>                // Vector of 42 pointers.
364!llvm.vec<? x 4 x i32>             // Scalable vector of 32-bit integers with
365                                   // size divisible by 4.
366!llvm.array<2 x vector<2 x i32>>   // Array of 2 vectors of 2 32-bit integers.
367!llvm.array<2 x vec<2 x ptr>> // Array of 2 vectors of 2 pointers.
368```
369
370### Structure Types
371
372The structure type is used to represent a collection of data members together in
373memory. The elements of a structure may be any type that has a size.
374
375Structure types are represented in a single dedicated class
376mlir::LLVM::LLVMStructType. Internally, the struct type stores a (potentially
377empty) name, a (potentially empty) list of contained types and a bitmask
378indicating whether the struct is named, opaque, packed or uninitialized.
379Structure types that don't have a name are referred to as _literal_ structs.
380Such structures are uniquely identified by their contents. _Identified_ structs
381on the other hand are uniquely identified by the name.
382
383#### Identified Structure Types
384
385Identified structure types are uniqued using their name in a given context.
386Attempting to construct an identified structure with the same name a structure
387that already exists in the context *will result in the existing structure being
388returned*. **MLIR does not auto-rename identified structs in case of name
389conflicts** because there is no naming scope equivalent to a module in LLVM IR
390since MLIR modules can be arbitrarily nested.
391
392Programmatically, identified structures can be constructed in an _uninitialized_
393state. In this case, they are given a name but the body must be set up by a
394later call, using MLIR's type mutation mechanism. Such uninitialized types can
395be used in type construction, but must be eventually initialized for IR to be
396valid. This mechanism allows for constructing _recursive_ or mutually referring
397structure types: an uninitialized type can be used in its own initialization.
398
399Once the type is initialized, its body cannot be changed anymore. Any further
400attempts to modify the body will fail and return failure to the caller _unless
401the type is initialized with the exact same body_. Type initialization is
402thread-safe; however, if a concurrent thread initializes the type before the
403current thread, the initialization may return failure.
404
405The syntax for identified structure types is as follows.
406
407```
408llvm-ident-struct-type ::= `!llvm.struct<` string-literal, `opaque` `>`
409                         | `!llvm.struct<` string-literal, `packed`?
410                           `(` type-or-ref-list  `)` `>`
411type-or-ref-list ::= <maybe empty comma-separated list of type-or-ref>
412type-or-ref ::= <any compatible type with optional !llvm.>
413              | `!llvm.`? `struct<` string-literal `>`
414```
415
416#### Literal Structure Types
417
418Literal structures are uniqued according to the list of elements they contain,
419and can optionally be packed. The syntax for such structs is as follows.
420
421```
422llvm-literal-struct-type ::= `!llvm.struct<` `packed`? `(` type-list `)` `>`
423type-list ::= <maybe empty comma-separated list of types with optional !llvm.>
424```
425
426Literal structs cannot be recursive, but can contain other structs. Therefore,
427they must be constructed in a single step with the entire list of contained
428elements provided.
429
430#### Examples of Structure Types
431
432```mlir
433!llvm.struct<>                  // NOT allowed
434!llvm.struct<()>                // empty, literal
435!llvm.struct<(i32)>             // literal
436!llvm.struct<(struct<(i32)>)>   // struct containing a struct
437!llvm.struct<packed (i8, i32)>  // packed struct
438!llvm.struct<"a">               // recursive reference, only allowed within
439                                // another struct, NOT allowed at top level
440!llvm.struct<"a", ()>           // empty, named (necessary to differentiate from
441                                // recursive reference)
442!llvm.struct<"a", opaque>       // opaque, named
443!llvm.struct<"a", (i32, ptr)>        // named
444!llvm.struct<"a", packed (i8, i32)>  // named, packed
445```
446
447### Unsupported Types
448
449LLVM IR `label` type does not have a counterpart in the LLVM dialect since, in
450MLIR, blocks are not values and don't need a type.
451
452## Operations
453
454All operations in the LLVM IR dialect have a custom form in MLIR. The mnemonic
455of an operation is that used in LLVM IR prefixed with "`llvm.`".
456
457[include "Dialects/LLVMOps.md"]
458
459## Operations for LLVM IR Intrinsics
460
461MLIR operation system is open making it unnecessary to introduce a hard bound
462between "core" operations and "intrinsics". General LLVM IR intrinsics are
463modeled as first-class operations in the LLVM dialect. Target-specific LLVM IR
464intrinsics, e.g., NVVM or ROCDL, are modeled as separate dialects.
465
466[include "Dialects/LLVMIntrinsicOps.md"]
467
468### Debug Info
469
470Debug information within the LLVM dialect is represented using locations in
471combination with a set of attributes that mirror the DINode structure defined by
472the debug info metadata within LLVM IR. Debug scoping information is attached
473to LLVM IR dialect operations using a fused location (`FusedLoc`) whose metadata
474holds the DIScopeAttr representing the debug scope. Similarly, the subprogram
475of LLVM IR dialect `FuncOp` operations is attached using a fused location whose
476metadata is a DISubprogramAttr.
477