1# LLVM IR Target 2 3This document describes the mechanisms of producing LLVM IR from MLIR. The 4overall flow is two-stage: 5 61. **conversion** of the IR to a set of dialects translatable to LLVM IR, for 7 example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific 8 dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md), 9 [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md); 102. **translation** of MLIR dialects to LLVM IR. 11 12This flow allows the non-trivial transformation to be performed within MLIR 13using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and 14potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR 15are expected to closely match the corresponding LLVM IR instructions and 16intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well 17as reduces the churn in case of changes. 18 19Note that many different dialects can be lowered to LLVM but are provided as 20different sets of patterns and have different passes available to mlir-opt. 21However, this is primarily useful for testing and prototyping, and using the 22collection of patterns together is highly recommended. One place this is 23important and visible is the ControlFlow dialect's branching operations which 24will fail to apply if their types mismatch with the blocks they jump to in the 25parent op. 26 27SPIR-V to LLVM dialect conversion has a 28[dedicated document](SPIRVToLLVMDialectConversion.md). 29 30[TOC] 31 32## Conversion to the LLVM Dialect 33 34Conversion to the LLVM dialect from other dialects is the first step to produce 35LLVM IR. All non-trivial IR modifications are expected to happen at this stage 36or before. The conversion is *progressive*: most passes convert one dialect to 37the LLVM dialect and keep operations from other dialects intact. For example, 38the `-finalize-memref-to-llvm` pass will only convert operations from the 39`memref` dialect but will not convert operations from other dialects even if 40they use or produce `memref`-typed values. 41 42The process relies on the [Dialect Conversion](DialectConversion.md) 43infrastructure and, in particular, on the 44[materialization](DialectConversion.md/#type-conversion) hooks of `TypeConverter` 45to support progressive lowering by injecting `unrealized_conversion_cast` 46operations between converted and unconverted operations. After multiple partial 47conversions to the LLVM dialect are performed, the cast operations that became 48noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass 49is not specific to the LLVM dialect and can remove any noop casts. 50 51### Conversion of Built-in Types 52 53Built-in types have a default conversion to LLVM dialect types provided by the 54`LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend 55this type converter to support other types. Extra care must be taken if the 56conversion rules for built-in types are overridden: all conversion must use the 57same type converter. 58 59#### LLVM Dialect-compatible Types 60 61The types [compatible](Dialects/LLVM.md/#built-in-type-compatibility) with the 62LLVM dialect are kept as is. 63 64#### Complex Type 65 66Complex type is converted into an LLVM dialect literal structure type with two 67elements: 68 69- real part; 70- imaginary part. 71 72The elemental type is converted recursively using these rules. 73 74Example: 75 76```mlir 77 complex<f32> 78 // -> 79 !llvm.struct<(f32, f32)> 80``` 81 82#### Index Type 83 84Index type is converted into an LLVM dialect integer type with the bitwidth 85specified by the [data layout](DataLayout.md) of the closest module. For 86example, on x86-64 CPUs it converts to i64. This behavior can be overridden by 87the type converter configuration, which is often exposed as a pass option by 88conversion passes. 89 90Example: 91 92```mlir 93 index 94 // -> on x86_64 95 i64 96``` 97 98#### Ranked MemRef Types 99 100Ranked memref types are converted into an LLVM dialect literal structure type 101that contains the dynamic information associated with the memref object, 102referred to as *descriptor*. Only memrefs in the 103**[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the 104LLVM dialect with the default descriptor format. Memrefs with other, less 105trivial layouts should be converted into the strided form first, e.g., by 106materializing the non-trivial address remapping due to layout as `affine.apply` 107operations. 108 109The default memref descriptor is a struct with the following fields: 110 1111. The pointer to the data buffer as allocated, referred to as "allocated 112 pointer". This is only useful for deallocating the memref. 1132. The pointer to the properly aligned data pointer that the memref indexes, 114 referred to as "aligned pointer". 1153. A lowered converted `index`-type integer containing the distance in number 116 of elements between the beginning of the (aligned) buffer and the first 117 element to be accessed through the memref, referred to as "offset". 1184. An array containing as many converted `index`-type integers as the rank of 119 the memref: the array represents the size, in number of elements, of the 120 memref along the given dimension. 1215. A second array containing as many converted `index`-type integers as the 122 rank of memref: the second array represents the "stride" (in tensor 123 abstraction sense), i.e. the number of consecutive elements of the 124 underlying buffer one needs to jump over to get to the next logically 125 indexed element. 126 127For constant memref dimensions, the corresponding size entry is a constant whose 128runtime value matches the static value. This normalization serves as an ABI for 129the memref type to interoperate with externally linked functions. In the 130particular case of rank `0` memrefs, the size and stride arrays are omitted, 131resulting in a struct containing two pointers + offset. 132 133Examples: 134 135```mlir 136// Assuming index is converted to i64. 137 138memref<f32> -> !llvm.struct<(ptr , ptr, i64)> 139memref<1 x f32> -> !llvm.struct<(ptr, ptr, i64, 140 array<1 x i64>, array<1 x i64>)> 141memref<? x f32> -> !llvm.struct<(ptr, ptr, i64 142 array<1 x i64>, array<1 x i64>)> 143memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr, ptr, i64 144 array<5 x i64>, array<5 x i64>)> 145memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr, ptr, i64 146 array<5 x i64>, array<5 x i64>)> 147 148// Memref types can have vectors as element types 149memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>, 150 array<2 x i64>)> 151``` 152 153#### Unranked MemRef Types 154 155Unranked memref types are converted to LLVM dialect literal structure type that 156contains the dynamic information associated with the memref object, referred to 157as *unranked descriptor*. It contains: 158 1591. a converted `index`-typed integer representing the dynamic rank of the 160 memref; 1612. a type-erased pointer (`!llvm.ptr`) to a ranked memref descriptor with 162 the contents listed above. 163 164This descriptor is primarily intended for interfacing with rank-polymorphic 165library functions. The pointer to the ranked memref descriptor points to some 166*allocated* memory, which may reside on stack of the current function or in 167heap. Conversion patterns for operations producing unranked memrefs are expected 168to manage the allocation. Note that this may lead to stack allocations 169(`llvm.alloca`) being performed in a loop and not reclaimed until the end of the 170current function. 171 172#### Function Types 173 174Function types are converted to LLVM dialect function types as follows: 175 176- function argument and result types are converted recursively using these 177 rules; 178- if a function type has multiple results, they are wrapped into an LLVM 179 dialect literal structure type since LLVM function types must have exactly 180 one result; 181- if a function type has no results, the corresponding LLVM dialect function 182 type will have one `!llvm.void` result since LLVM function types must have a 183 result; 184- function types used in arguments of another function type are wrapped in an 185 LLVM dialect pointer type to comply with LLVM IR expectations; 186- the structs corresponding to `memref` types, both ranked and unranked, 187 appearing as function arguments are unbundled into individual function 188 arguments to allow for specifying metadata such as aliasing information on 189 individual pointers; 190- the conversion of `memref`-typed arguments is subject to 191 [calling conventions](#calling-conventions). 192- if a function type has boolean attribute `func.varargs` being set, the 193 converted LLVM function will be variadic. 194 195Examples: 196 197```mlir 198// Zero-ary function type with no results: 199() -> () 200// is converted to a zero-ary function with `void` result. 201!llvm.func<void ()> 202 203// Unary function with one result: 204(i32) -> (i64) 205// has its argument and result type converted, before creating the LLVM dialect 206// function type. 207!llvm.func<i64 (i32)> 208 209// Binary function with one result: 210(i32, f32) -> (i64) 211// has its arguments handled separately 212!llvm.func<i64 (i32, f32)> 213 214// Binary function with two results: 215(i32, f32) -> (i64, f64) 216// has its result aggregated into a structure type. 217!llvm.func<struct<(i64, f64)> (i32, f32)> 218 219// Function-typed arguments or results in higher-order functions: 220(() -> ()) -> (() -> ()) 221// are converted into opaque pointers. 222!llvm.func<ptr (ptr)> 223 224// A memref descriptor appearing as function argument: 225(memref<f32>) -> () 226// gets converted into a list of individual scalar components of a descriptor. 227!llvm.func<void (ptr, ptr, i64)> 228 229// The list of arguments is linearized and one can freely mix memref and other 230// types in this list: 231(memref<f32>, f32) -> () 232// which gets converted into a flat list. 233!llvm.func<void (ptr, ptr, i64, f32)> 234 235// For nD ranked memref descriptors: 236(memref<?x?xf32>) -> () 237// the converted signature will contain 2n+1 `index`-typed integer arguments, 238// offset, n sizes and n strides, per memref argument type. 239!llvm.func<void (ptr, ptr, i64, i64, i64, i64, i64)> 240 241// Same rules apply to unranked descriptors: 242(memref<*xf32>) -> () 243// which get converted into their components. 244!llvm.func<void (i64, ptr)> 245 246// However, returning a memref from a function is not affected: 247() -> (memref<?xf32>) 248// gets converted to a function returning a descriptor structure. 249!llvm.func<struct<(ptr, ptr, i64, array<1xi64>, array<1xi64>)> ()> 250 251// If multiple memref-typed results are returned: 252() -> (memref<f32>, memref<f64>) 253// their descriptor structures are additionally packed into another structure, 254// potentially with other non-memref typed results. 255!llvm.func<struct<(struct<(ptr, ptr, i64)>, 256 struct<(ptr, ptr, i64)>)> ()> 257 258// If "func.varargs" attribute is set: 259(i32) -> () attributes { "func.varargs" = true } 260// the corresponding LLVM function will be variadic: 261!llvm.func<void (i32, ...)> 262``` 263 264Conversion patterns are available to convert built-in function operations and 265standard call operations targeting those functions using these conversion rules. 266 267#### Multi-dimensional Vector Types 268 269LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can 270be multi-dimensional. Vector types cannot be nested in either IR. In the 271one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same 272size with element type converted using these conversion rules. In the 273n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types 274of one-dimensional vectors. 275 276Examples: 277 278``` 279vector<4x8 x f32> 280// -> 281!llvm.array<4 x vector<8 x f32>> 282 283memref<2 x vector<4x8 x f32> 284// -> 285!llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> 286``` 287 288#### Tensor Types 289 290Tensor types cannot be converted to the LLVM dialect. Operations on tensors must 291be [bufferized](Bufferization.md) before being converted. 292 293### Conversion of LLVM Container Types with Non-Compatible Element Types 294 295Progressive lowering may result in there LLVM container types, such 296as LLVM dialect structures, containing non-compatible types: 297`!llvm.struct<(index)>`. Such types are converted recursively using the rules 298described above. 299 300Identified structures are converted to _new_ structures that have their 301identifiers prefixed with `_Converted.` since the bodies of identified types 302cannot be updated once initialized. Such names are considered _reserved_ and 303must not appear in the input code (in practice, C reserves names starting with 304`_` and a capital, and `.` cannot appear in valid C types anyway). If they do 305and have a different body than the result of the conversion, the type conversion 306will stop. 307 308### Calling Conventions 309 310Calling conventions provides a mechanism to customize the conversion of function 311and function call operations without changing how individual types are handled 312elsewhere. They are implemented simultaneously by the default type converter and 313by the conversion patterns for the relevant operations. 314 315#### Function Result Packing 316 317In case of multi-result functions, the returned values are inserted into a 318structure-typed value before being returned and extracted from it at the call 319site. This transformation is a part of the conversion and is transparent to the 320defines and uses of the values being returned. 321 322Example: 323 324```mlir 325func.func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) { 326 return %arg0, %arg1 : i32, i64 327} 328func.func @bar() { 329 %0 = arith.constant 42 : i32 330 %1 = arith.constant 17 : i64 331 %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64) 332 "use_i32"(%2#0) : (i32) -> () 333 "use_i64"(%2#1) : (i64) -> () 334} 335 336// is transformed into 337 338llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> { 339 // insert the values into a structure 340 %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)> 341 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)> 342 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)> 343 344 // return the structure value 345 llvm.return %2 : !llvm.struct<(i32, i64)> 346} 347llvm.func @bar() { 348 %0 = llvm.mlir.constant(42 : i32) : i32 349 %1 = llvm.mlir.constant(17 : i64) : i64 350 351 // call and extract the values from the structure 352 %2 = llvm.call @foo(%0, %1) 353 : (i32, i64) -> !llvm.struct<(i32, i64)> 354 %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)> 355 %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)> 356 357 // use as before 358 "use_i32"(%3) : (i32) -> () 359 "use_i64"(%4) : (i64) -> () 360} 361``` 362 363#### Default Calling Convention for Ranked MemRef 364 365The default calling convention converts `memref`-typed function arguments to 366LLVM dialect literal structs 367[defined above](#ranked-memref-types) before unbundling them into 368individual scalar arguments. 369 370Examples: 371 372This convention is implemented in the conversion of `func.func` and `func.call` to 373the LLVM dialect, with the former unpacking the descriptor into a set of 374individual values and the latter packing those values back into a descriptor so 375as to make it transparently usable by other operations. Conversions from other 376dialects should take this convention into account. 377 378This specific convention is motivated by the necessity to specify alignment and 379aliasing attributes on the raw pointers underpinning the memref. 380 381Examples: 382 383```mlir 384func.func @foo(%arg0: memref<?xf32>) -> () { 385 "use"(%arg0) : (memref<?xf32>) -> () 386 return 387} 388 389// Gets converted to the following 390// (using type alias for brevity): 391!llvm.memref_1d = !llvm.struct<(ptr, ptr, i64, array<1xi64>, array<1xi64>)> 392 393llvm.func @foo(%arg0: !llvm.ptr, // Allocated pointer. 394 %arg1: !llvm.ptr, // Aligned pointer. 395 %arg2: i64, // Offset. 396 %arg3: i64, // Size in dim 0. 397 %arg4: i64) { // Stride in dim 0. 398 // Populate memref descriptor structure. 399 %0 = llvm.mlir.undef : !llvm.memref_1d 400 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d 401 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d 402 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d 403 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d 404 %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d 405 406 // Descriptor is now usable as a single value. 407 "use"(%5) : (!llvm.memref_1d) -> () 408 llvm.return 409} 410``` 411 412```mlir 413func.func @bar() { 414 %0 = "get"() : () -> (memref<?xf32>) 415 call @foo(%0) : (memref<?xf32>) -> () 416 return 417} 418 419// Gets converted to the following 420// (using type alias for brevity): 421!llvm.memref_1d = !llvm.struct<(ptr, ptr, i64, array<1xi64>, array<1xi64>)> 422 423llvm.func @bar() { 424 %0 = "get"() : () -> !llvm.memref_1d 425 426 // Unpack the memref descriptor. 427 %1 = llvm.extractvalue %0[0] : !llvm.memref_1d 428 %2 = llvm.extractvalue %0[1] : !llvm.memref_1d 429 %3 = llvm.extractvalue %0[2] : !llvm.memref_1d 430 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d 431 %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d 432 433 // Pass individual values to the callee. 434 llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> () 435 llvm.return 436} 437``` 438 439#### Default Calling Convention for Unranked MemRef 440 441For unranked memrefs, the list of function arguments always contains two 442elements, same as the unranked memref descriptor: an integer rank, and a 443type-erased (`!llvm.ptr`) pointer to the ranked memref descriptor. Note that 444while the *calling convention* does not require allocation, *casting* to 445unranked memref does since one cannot take an address of an SSA value containing 446the ranked memref, which must be stored in some memory instead. The caller is in 447charge of ensuring the thread safety and management of the allocated memory, in 448particular the deallocation. 449 450Example 451 452```mlir 453llvm.func @foo(%arg0: memref<*xf32>) -> () { 454 "use"(%arg0) : (memref<*xf32>) -> () 455 return 456} 457 458// Gets converted to the following. 459 460llvm.func @foo(%arg0: i64 // Rank. 461 %arg1: !llvm.ptr) { // Type-erased pointer to descriptor. 462 // Pack the unranked memref descriptor. 463 %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr)> 464 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr)> 465 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr)> 466 467 "use"(%2) : (!llvm.struct<(i64, ptr)>) -> () 468 llvm.return 469} 470``` 471 472```mlir 473llvm.func @bar() { 474 %0 = "get"() : () -> (memref<*xf32>) 475 call @foo(%0): (memref<*xf32>) -> () 476 return 477} 478 479// Gets converted to the following. 480 481llvm.func @bar() { 482 %0 = "get"() : () -> (!llvm.struct<(i64, ptr)>) 483 484 // Unpack the memref descriptor. 485 %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr)> 486 %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr)> 487 488 // Pass individual values to the callee. 489 llvm.call @foo(%1, %2) : (i64, !llvm.ptr) 490 llvm.return 491} 492``` 493 494**Lifetime.** The second element of the unranked memref descriptor points to 495some memory in which the ranked memref descriptor is stored. By convention, this 496memory is allocated on stack and has the lifetime of the function. (*Note:* due 497to function-length lifetime, creation of multiple unranked memref descriptors, 498e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to 499be returned from a function, the ranked descriptor it points to is copied into 500dynamically allocated memory, and the pointer in the unranked descriptor is 501updated accordingly. The allocation happens immediately before returning. It is 502the responsibility of the caller to free the dynamically allocated memory. The 503default conversion of `func.call` and `func.call_indirect` copies the ranked 504descriptor to newly allocated memory on the caller's stack. Thus, the convention 505of the ranked memref descriptor pointed to by an unranked memref descriptor 506being stored on stack is respected. 507 508#### Bare Pointer Calling Convention for Ranked MemRef 509 510The "bare pointer" calling convention converts `memref`-typed function arguments 511to a *single* pointer to the aligned data. Note that this does *not* apply to 512uses of `memref` outside of function signatures, the default descriptor 513structures are still used. This convention further restricts the supported cases 514to the following. 515 516- `memref` types with default layout. 517- `memref` types with all dimensions statically known. 518- `memref` values allocated in such a way that the allocated and aligned 519 pointer match. Alternatively, the same function must handle allocation and 520 deallocation since only one pointer is passed to any callee. 521 522Examples: 523 524``` 525func.func @callee(memref<2x4xf32>) 526 527func.func @caller(%0 : memref<2x4xf32>) { 528 call @callee(%0) : (memref<2x4xf32>) -> () 529} 530 531// -> 532 533!descriptor = !llvm.struct<(ptr, ptr, i64, 534 array<2xi64>, array<2xi64>)> 535 536llvm.func @callee(!llvm.ptr) 537 538llvm.func @caller(%arg0: !llvm.ptr) { 539 // A descriptor value is defined at the function entry point. 540 %0 = llvm.mlir.undef : !descriptor 541 542 // Both the allocated and aligned pointer are set up to the same value. 543 %1 = llvm.insertelement %arg0, %0[0] : !descriptor 544 %2 = llvm.insertelement %arg0, %1[1] : !descriptor 545 546 // The offset is set up to zero. 547 %3 = llvm.mlir.constant(0 : index) : i64 548 %4 = llvm.insertelement %3, %2[2] : !descriptor 549 550 // The sizes and strides are derived from the statically known values. 551 %5 = llvm.mlir.constant(2 : index) : i64 552 %6 = llvm.mlir.constant(4 : index) : i64 553 %7 = llvm.insertelement %5, %4[3, 0] : !descriptor 554 %8 = llvm.insertelement %6, %7[3, 1] : !descriptor 555 %9 = llvm.mlir.constant(1 : index) : i64 556 %10 = llvm.insertelement %9, %8[4, 0] : !descriptor 557 %11 = llvm.insertelement %10, %9[4, 1] : !descriptor 558 559 // The function call corresponds to extracting the aligned data pointer. 560 %12 = llvm.extractelement %11[1] : !descriptor 561 llvm.call @callee(%12) : (!llvm.ptr) -> () 562} 563``` 564 565#### Bare Pointer Calling Convention For Unranked MemRef 566 567The "bare pointer" calling convention does not support unranked memrefs as their 568shape cannot be known at compile time. 569 570### Generic alloction and deallocation functions 571 572When converting the Memref dialect, allocations and deallocations are converted 573into calls to `malloc` (`aligned_alloc` if aligned allocations are requested) 574and `free`. However, it is possible to convert them to more generic functions 575which can be implemented by a runtime library, thus allowing custom allocation 576strategies or runtime profiling. When the conversion pass is instructed to 577perform such operation, the names of the calles are 578`_mlir_memref_to_llvm_alloc`, `_mlir_memref_to_llvm_aligned_alloc` and 579`_mlir_memref_to_llvm_free`. Their signatures are the same of `malloc`, 580`aligned_alloc` and `free`. 581 582### C-compatible wrapper emission 583 584In practical cases, it may be desirable to have externally-facing functions with 585a single attribute corresponding to a MemRef argument. When interfacing with 586LLVM IR produced from C, the code needs to respect the corresponding calling 587convention. The conversion to the LLVM dialect provides an option to generate 588wrapper functions that take memref descriptors as pointers-to-struct compatible 589with data types produced by Clang when compiling C sources. The generation of 590such wrapper functions can additionally be controlled at a function granularity 591by setting the `llvm.emit_c_interface` unit attribute. 592 593More specifically, a memref argument is converted into a pointer-to-struct 594argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where 595`T` is the converted element type and `N` is the memref rank. This type is 596compatible with that produced by Clang for the following C++ structure template 597instantiations or their equivalents in C. 598 599```cpp 600template<typename T, size_t N> 601struct MemRefDescriptor { 602 T *allocated; 603 T *aligned; 604 intptr_t offset; 605 intptr_t sizes[N]; 606 intptr_t strides[N]; 607}; 608``` 609 610Furthermore, we also rewrite function results to pointer parameters if the 611rewritten function result has a struct type. The special result parameter is 612added as the first parameter and is of pointer-to-struct type. 613 614If enabled, the option will do the following. For *external* functions declared 615in the MLIR module. 616 6171. Declare a new function `_mlir_ciface_<original name>` where memref arguments 618 are converted to pointer-to-struct and the remaining arguments are converted 619 as usual. Results are converted to a special argument if they are of struct 620 type. 6212. Add a body to the original function (making it non-external) that 622 1. allocates memref descriptors, 623 2. populates them, 624 3. potentially allocates space for the result struct, and 625 4. passes the pointers to these into the newly declared interface function, 626 then 627 5. collects the result of the call (potentially from the result struct), 628 and 629 6. returns it to the caller. 630 631For (non-external) functions defined in the MLIR module. 632 6331. Define a new function `_mlir_ciface_<original name>` where memref arguments 634 are converted to pointer-to-struct and the remaining arguments are converted 635 as usual. Results are converted to a special argument if they are of struct 636 type. 6372. Populate the body of the newly defined function with IR that 638 1. loads descriptors from pointers; 639 2. unpacks descriptor into individual non-aggregate values; 640 3. passes these values into the original function; 641 4. collects the results of the call and 642 5. either copies the results into the result struct or returns them to the 643 caller. 644 645Examples: 646 647```mlir 648 649func.func @qux(%arg0: memref<?x?xf32>) attributes {llvm.emit_c_interface} 650 651// Gets converted into the following 652// (using type alias for brevity): 653!llvm.memref_2d = !llvm.struct<(ptr, ptr, i64, array<2xi64>, array<2xi64>)> 654 655// Function with unpacked arguments. 656llvm.func @qux(%arg0: !llvm.ptr, %arg1: !llvm.ptr, 657 %arg2: i64, %arg3: i64, %arg4: i64, 658 %arg5: i64, %arg6: i64) { 659 // Populate memref descriptor (as per calling convention). 660 %0 = llvm.mlir.undef : !llvm.memref_2d 661 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d 662 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d 663 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d 664 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d 665 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d 666 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d 667 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d 668 669 // Store the descriptor in a stack-allocated space. 670 %8 = llvm.mlir.constant(1 : index) : i64 671 %9 = llvm.alloca %8 x !llvm.memref_2d 672 : (i64) -> !llvm.ptr 673 llvm.store %7, %9 : !llvm.memref_2d, !llvm.ptr 674 675 // Call the interface function. 676 llvm.call @_mlir_ciface_qux(%9) : (!llvm.ptr) -> () 677 678 // The stored descriptor will be freed on return. 679 llvm.return 680} 681 682// Interface function. 683llvm.func @_mlir_ciface_qux(!llvm.ptr) 684``` 685 686 687```cpp 688// The C function implementation for the interface function. 689extern "C" { 690void _mlir_ciface_qux(MemRefDescriptor<float, 2> *input) { 691 // detailed impl 692} 693} 694``` 695 696```mlir 697func.func @foo(%arg0: memref<?x?xf32>) attributes {llvm.emit_c_interface} { 698 return 699} 700 701// Gets converted into the following 702// (using type alias for brevity): 703!llvm.memref_2d = !llvm.struct<(ptr, ptr, i64, array<2xi64>, array<2xi64>)> 704 705// Function with unpacked arguments. 706llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr, 707 %arg2: i64, %arg3: i64, %arg4: i64, 708 %arg5: i64, %arg6: i64) { 709 llvm.return 710} 711 712// Interface function callable from C. 713llvm.func @_mlir_ciface_foo(%arg0: !llvm.ptr) { 714 // Load the descriptor. 715 %0 = llvm.load %arg0 : !llvm.ptr -> !llvm.memref_2d 716 717 // Unpack the descriptor as per calling convention. 718 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d 719 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d 720 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d 721 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d 722 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d 723 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d 724 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d 725 llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) 726 : (!llvm.ptr, !llvm.ptr, i64, i64, i64, 727 i64, i64) -> () 728 llvm.return 729} 730``` 731 732```cpp 733// The C function signature for the interface function. 734extern "C" { 735void _mlir_ciface_foo(MemRefDescriptor<float, 2> *input); 736} 737``` 738 739```mlir 740func.func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> attributes {llvm.emit_c_interface} { 741 return %arg0 : memref<?x?xf32> 742} 743 744// Gets converted into the following 745// (using type alias for brevity): 746!llvm.memref_2d = !llvm.struct<(ptr, ptr, i64, array<2xi64>, array<2xi64>)> 747 748// Function with unpacked arguments. 749llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, 750 %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64) 751 -> !llvm.memref_2d { 752 %0 = llvm.mlir.undef : !llvm.memref_2d 753 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d 754 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d 755 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d 756 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d 757 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d 758 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d 759 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d 760 llvm.return %7 : !llvm.memref_2d 761} 762 763// Interface function callable from C. 764// NOTE: the returned memref becomes the first argument 765llvm.func @_mlir_ciface_foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr) { 766 %0 = llvm.load %arg1 : !llvm.ptr 767 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d 768 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d 769 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d 770 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d 771 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d 772 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d 773 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d 774 %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7) 775 : (!llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64) -> !llvm.memref_2d 776 llvm.store %8, %arg0 : !llvm.memref_2d, !llvm.ptr 777 llvm.return 778} 779``` 780 781```cpp 782// The C function signature for the interface function. 783extern "C" { 784void _mlir_ciface_foo(MemRefDescriptor<float, 2> *output, 785 MemRefDescriptor<float, 2> *input); 786} 787``` 788 789Rationale: Introducing auxiliary functions for C-compatible interfaces is 790preferred to modifying the calling convention since it will minimize the effect 791of C compatibility on intra-module calls or calls between MLIR-generated 792functions. In particular, when calling external functions from an MLIR module in 793a (parallel) loop, the fact of storing a memref descriptor on stack can lead to 794stack exhaustion and/or concurrent access to the same address. Auxiliary 795interface function serves as an allocation scope in this case. Furthermore, when 796targeting accelerators with separate memory spaces such as GPUs, stack-allocated 797descriptors passed by pointer would have to be transferred to the device memory, 798which introduces significant overhead. In such situations, auxiliary interface 799functions are executed on host and only pass the values through device function 800invocation mechanism. 801 802Limitation: Right now we cannot generate C interface for variadic functions, 803regardless of being non-external or external. Because C functions are unable to 804"forward" variadic arguments like this: 805```c 806void bar(int, ...); 807 808void foo(int x, ...) { 809 // ERROR: no way to forward variadic arguments. 810 void bar(x, ...); 811} 812``` 813 814### Address Computation 815 816Accesses to a memref element are transformed into an access to an element of the 817buffer pointed to by the descriptor. The position of the element in the buffer 818is calculated by linearizing memref indices in row-major order (lexically first 819index is the slowest varying, similar to C, but accounting for strides). The 820computation of the linear address is emitted as arithmetic operation in the LLVM 821IR dialect. Strides are extracted from the memref descriptor. 822 823Examples: 824 825An access to a memref with indices: 826 827```mlir 828%0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?> 829``` 830 831is transformed into the equivalent of the following code: 832 833```mlir 834// Compute the linearized index from strides. 835// When strides or, in absence of explicit strides, the corresponding sizes are 836// dynamic, extract the stride value from the descriptor. 837%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr, ptr, i64, 838 array<4xi64>, array<4xi64>)> 839%addr1 = arith.muli %stride1, %1 : i64 840 841// When the stride or, in absence of explicit strides, the trailing sizes are 842// known statically, this value is used as a constant. The natural value of 843// strides is the product of all sizes following the current dimension. 844%stride2 = llvm.mlir.constant(32 : index) : i64 845%addr2 = arith.muli %stride2, %2 : i64 846%addr3 = arith.addi %addr1, %addr2 : i64 847 848%stride3 = llvm.mlir.constant(8 : index) : i64 849%addr4 = arith.muli %stride3, %3 : i64 850%addr5 = arith.addi %addr3, %addr4 : i64 851 852// Multiplication with the known unit stride can be omitted. 853%addr6 = arith.addi %addr5, %4 : i64 854 855// If the linear offset is known to be zero, it can also be omitted. If it is 856// dynamic, it is extracted from the descriptor. 857%offset = llvm.extractvalue[2] : !llvm.struct<(ptr, ptr, i64, 858 array<4xi64>, array<4xi64>)> 859%addr7 = arith.addi %addr6, %offset : i64 860 861// All accesses are based on the aligned pointer. 862%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr, ptr, i64, 863 array<4xi64>, array<4xi64>)> 864 865// Get the address of the data pointer. 866%ptr = llvm.getelementptr %aligned[%addr7] 867 : !llvm.struct<(ptr, ptr, i64, array<4xi64>, array<4xi64>)> -> !llvm.ptr 868 869// Perform the actual load. 870%0 = llvm.load %ptr : !llvm.ptr -> f32 871``` 872 873For stores, the address computation code is identical and only the actual store 874operation is different. 875 876Note: the conversion does not perform any sort of common subexpression 877elimination when emitting memref accesses. 878 879### Utility Classes 880 881Utility classes common to many conversions to the LLVM dialect can be found 882under `lib/Conversion/LLVMCommon`. They include the following. 883 884- `LLVMConversionTarget` specifies all LLVM dialect operations as legal. 885- `LLVMTypeConverter` implements the default type conversion as described 886 above. 887- `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM 888 dialect-specific functionality. 889- `VectorConvertOpToLLVMPattern` extends the previous class to automatically 890 unroll operations on higher-dimensional vectors into lists of operations on 891 one-dimensional vectors before. 892- `StructBuilder` provides a convenient API for building IR that creates or 893 accesses values of LLVM dialect structure types; it is derived by 894 `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the 895 built-in types convertible to LLVM dialect structure types. 896 897## Translation to LLVM IR 898 899MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata` 900operations can be translated to LLVM IR modules using the following scheme. 901 902- Module-level globals are translated to LLVM IR global values. 903- Module-level metadata are translated to LLVM IR metadata, which can be later 904 augmented with additional metadata defined on specific ops. 905- All functions are declared in the module so that they can be referenced. 906- Each function is then translated separately and has access to the complete 907 mappings between MLIR and LLVM IR globals, metadata, and functions. 908- Within a function, blocks are traversed in topological order and translated 909 to LLVM IR basic blocks. In each basic block, PHI nodes are created for each 910 of the block arguments, but not connected to their source blocks. 911- Within each block, operations are translated in their order. Each operation 912 has access to the same mappings as the function and additionally to the 913 mapping of values between MLIR and LLVM IR, including PHI nodes. Operations 914 with regions are responsible for translated the regions they contain. 915- After operations in a function are translated, the PHI nodes of blocks in 916 this function are connected to their source values, which are now available. 917 918The translation mechanism provides extension hooks for translating custom 919operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`: 920 921- `convertOperation` translates an operation that belongs to the current 922 dialect to LLVM IR given an `IRBuilderBase` and various mappings; 923- `amendOperation` performs additional actions on an operation if it contains 924 a dialect attribute that belongs to the current dialect, for example sets up 925 instruction-level metadata. 926 927Dialects containing operations or attributes that want to be translated to LLVM 928IR must provide an implementation of this interface and register it with the 929system. Note that registration may happen without creating the dialect, for 930example, in a separate library to avoid the need for the "main" dialect library 931to depend on LLVM IR libraries. The implementations of these methods may used 932the 933[`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html) 934object provided to them which holds the state of the translation and contains 935numerous utilities. 936 937Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a 938small, relatively stable set of instructions and types that MLIR intends to 939model fully. Therefore, the extension mechanism is provided only for LLVM IR 940constructs that are more often extended -- intrinsics and metadata. The primary 941goal of the extension mechanism is to support sets of intrinsics, for example 942those representing a particular instruction set. The extension mechanism does 943not allow for customizing type or block translation, nor does it support custom 944module-level operations. Such transformations should be performed within MLIR 945and target the corresponding MLIR constructs. 946 947## Translation from LLVM IR 948 949An experimental flow allows one to import a substantially limited subset of LLVM 950IR into MLIR, producing LLVM dialect operations. 951 952``` 953 mlir-translate -import-llvm filename.ll 954``` 955