1# High-Level Fortran IR (HLFIR) 2 3The approach of FIR and lowering design so far was to start with the minimal set 4of IR operations that could allow implementing the core aspects of Fortran (like 5memory allocations, array addressing, runtime descriptors, and structured 6control flow operations). One notable aspect of the current FIR is that array 7and character operations are buffered (some storage is allocated for the result, 8and the storage is addressed to implement the operation). While this proved 9functional so far, the code lowering expressions and assignments from the 10front-end representations (the evaluate::Expr and parser nodes) to FIR has 11significantly grown in complexity while it still lacks some F95 features around 12character array expressions or FORALL. This is mainly explained by the fact that 13the representation level gap is big, and a lot is happening in lowering. It 14appears more and more that some intermediate steps would help to split concerns 15between translating the front-end representation to MLIR, implementing some 16Fortran concepts at a lower-level (like character or derived type assignments), 17and how bufferizations of character and array expressions should be done. 18 19This document proposes the addition of two concepts and a set of related 20operations in a new dialect HLFIR to allow a simpler lowering to a higher-level 21FIR representation that would later be lowered to the current FIR representation 22via MLIR translation passes. As a result of these additions, it is likely that 23the fir.array_load/fir.array_merge_store and related array operations could be 24removed from FIR since array assignment analysis could directly happen on the 25higher-level FIR representation. 26 27 28The main principles of the new lowering design are: 29- Make expression lowering context independent and rather naive 30- Do not materialize temporaries while lowering to FIR 31- Preserve Fortran semantics/information for high-level optimizations 32 33The core impact on lowering will be: 34- Lowering expressions and assignments in the exact same way, regardless of 35 whether it is an array assignment context and/or an expression inside a 36 forall. 37- Lowering transformational intrinsics in a verbatim way (no runtime calls and 38 memory aspects yet). 39- Lowering character expressions in a verbatim way (no memcpy/runtime calls 40 and memory aspects yet). 41- Argument association side effects will be delayed (copy-in/copy-out) to help 42 inlining/function specialization to get rid of them when they are not 43 relevant. 44 45 46## Variable and Expression value concepts in HLFIR 47 48### Strengthening the variable concept 49 50Fortran variables are currently represented in FIR as mlir::Value with reference 51or box type coming from special operations or block arguments. They are either 52the result of a fir.alloca, fir.allocmem, or fir.address_of operations with the 53mangled name of the variable as attribute, or they are function block arguments 54with the mangled name of the variable as attribute. 55 56Fortran variables are defined with a Fortran type (both dynamic and static) that 57may have type parameters, a rank and shape (including lower bounds), and some 58attributes (like TARGET, OPTIONAL, VOLATILE...). All this information is 59currently not represented in FIR. Instead, lowering keeps track of all this 60information in the fir::ExtendedValue lowering data structure and uses it when 61needed. If unused in lowering, some information about variables is lost (like 62non-constant array bound expressions). In the IR, only the static type, the 63compile time constant extents, and compile time character lengths can be 64retrieved from the mlir::Value of a variable in the general case (more can be 65retrieved if the variable is tracked via a fir.box, but not if it is a bare 66memory reference). 67 68This makes reasoning about Fortran variables in FIR harder, and in general 69forces lowering to apply all decisions related to the information that is lost 70in FIR. A more problematic point is that it does not allow generating debug 71information for the variables from FIR, since the bounds and type parameters 72information is not tightly linked to the base mlir::Value. 73 74The proposal is to add a hlfir.declare operation that would anchor the 75fir::ExtendedValue information in the IR. A variable will be represented by a 76single SSA value with a memory type (fir.ref<T>, fir.ptr<T>, fir.heap<T>, 77fir.box<T>, fir.boxchar or fir.ref<fir.box<T>>). Not all memory types will be 78allowed for a variable: it should allow retrieving all the shape, type 79parameters, and dynamic type information without requiring to look-up for any 80defining operations. For instance, `fir.ref<fir.array<?xf32>>` will not be 81allowed as an HLFIR variable, and fir.box<fir.array<?xf32>> will be used 82instead. However, `fir.ref<fir.array<100xf32>>` will be allowed to represent an 83array whose lower bounds are all ones (if the lower bounds are different than 84one, a fir.box will still be needed). The hlfir.declare operation will be 85responsible for producing the SSA value with the right memory type given the 86variable specifications. One notable point is that, except for the POINTER and 87ALLOCATABLE attributes that are retrievable from the SSA value type, other 88Fortran attributes (OPTIONAL, TARGET, VOLATILE...) will not be retrievable from 89the SSA value alone, and it will be required to access the defining 90hlfir.declare to get the full picture. 91 92This means that semantically relevant attributes will need to be set by 93lowering on operations using variables when that is relevant (for instance when 94using an OPTIONAL variable in an intrinsic where it is allowed to be absent). 95Then, the optimizations passes will be allowed to look for the defining 96hlfir.declare, but not to assume that it must be visible. For instance, simple 97contiguity of fir.box can be deduced in certain case from the hlfir.declare, 98and if the hlfir.declare cannot be found, transformation passes will have to 99assume that the variable may be non-contiguous. 100 101In practice, it is expected that the passes will be able to leverage 102hlfir.declare in most cases, but that guaranteeing that it will always be the 103case would constraint the IR and optimizations too much. The goal is also to 104remove the fir.box usages when possible while lowering to FIR, to avoid 105needlessly creating runtime descriptors for variables that do not really 106require it. 107 108The hlfir.declare operation and restrained memory types will allow: 109- Pushing higher-level Fortran concepts into FIR operations (like array 110 assignments or transformational intrinsics). 111- Generating debug information for the variables based on the hlfir.declare 112 operation. 113- Generic Fortran aliasing analysis (currently implemented only around array 114 assignments with the fir.array_load concept). 115 116The hlfir.declare will have a sibling fir.declare operation in FIR that will 117allow keeping variable information until debug info is generated. The main 118difference is that the fir.declare will simply return its first operand, 119making its codegen a no-op, while hlfir.declare might change the type of 120its first operand to return an HLFIR variable compatible type. 121The fir.declare op is the only operation described by this change that will be 122added to FIR. The rational for this is that it is intended to survive until 123LLVM dialect codegeneration so that debug info generation can use them and 124alias information can take advantage of them even on FIR. 125 126Note that Fortran variables are not necessarily named objects, they can also be 127the result of function references returning POINTERs. hlfir.declare will also 128accept such variables to be described in the IR (a unique name will be built 129from the caller scope name and the function name.). In general, fir.declare 130will allow to view every memory storage as a variable, and this will be used to 131describe and use compiler created array temporaries. 132 133### Adding an expression value concept in HLFIR 134 135Currently, Fortran expressions can be represented as SSA values for scalar 136logical, integer, real, and complex expressions. Scalar character or 137derived-type expressions and all array expressions are buffered in lowering: 138their results are directly given a memory storage in lowering and are 139manipulated as variables. 140 141While this keeps FIR simple, this makes the amount of IR generated for these 142expressions higher, and in general makes later optimization passes job harder 143since they present non-trivial patterns (with memory operations) and cannot be 144eliminated by naive dead code elimination when the result is unused. This also 145forces lowering to combine elemental array expressions into single loop nests to 146avoid bufferizing all array sub-expressions (which would yield terrible 147performance). These combinations, which are implemented using C++ lambdas in 148lowering makes lowering code harder to understand. It also makes the expression 149lowering code context dependent (especially designators lowering). The lowering 150code paths may be different when lowering a syntactically similar expression in 151an elemental expression context, in a forall context, or in a normal context. 152 153Some of the combinations described in [Array Composition](ArrayComposition.md) 154are currently not implemented in lowering because they are less trivial 155optimizations, and do not really belong in lowering. However, deploying such 156combinations on the generated FIR with bufferizations requires the usage of 157non-trivial pattern matching and rewrites (recognizing temporary allocation, 158usage, and related runtime calls). Note that the goal of such combination is not 159only about inlining transformational runtime calls, it is mainly about never 160generating a temporary for an array expression sub-operand that is a 161transformational intrinsic call matching certain criteria. So the optimization 162pass will not only need to recognize the intrinsic call, it must understand the 163context it is being called in. 164 165The usage of memory manipulations also makes some of the alias analysis more 166complex, especially when dealing with foralls (the alias analysis cannot simply 167follow an operand tree, it must understand indirect dependencies from operations 168stored in memory). 169 170The proposal is to add a !hlfir.expr<T> SSA value type concept, and set of 171character operations (concatenation, TRIM, MAX, MIN, comparisons...), a set of 172array transformational operations (SUM, MATMUL, TRANSPOSE, ...), and a generic 173hlfir.elemental operation. The hlfir.expr<T> type is not intended to be used 174with scalar types that already have SSA value types (e.g., integer or real 175scalars). Instead, these existing SSA types will implicitly be considered as 176being expressions when used in high-level FIR operations, which will simplify 177interfacing with other dialects that define operations with these types (e.g., 178the arith dialect). 179 180These hlfir.expr values could then be placed in memory when needed (assigned to 181a variable, passed as a procedure argument, or an IO output item...) via 182hlfir.assign or hlfir.associate operations that will later be described. 183 184When no special optimization pass is run, a translation pass would lower the 185operations producing hlfir.expr to buffer allocations and memory operations just 186as in the currently generated FIR. 187 188However, these high-level operations should allow the writing of optimization 189passes combining chains of operations producing hlfir.expr into optimized forms 190via pattern matching on the operand tree. 191 192The hlfir.elemental operation will be discussed in more detail below. It allows 193simplifying lowering while keeping the ability to combine elemental 194sub-expressions into a single loop nest. It should also allow rewriting some of 195the transformational intrinsic operations to functions of the indices as 196described in [Array Composition](ArrayComposition.md). 197 198## Proposed design for HLFIR (High-Level Fortran IR) 199 200### HLFIR Operations and Types 201 202#### Introduce a hlfir.expr<T> type 203 204Motivation: avoid the need to materialize expressions in temporaries while 205lowering. 206 207Syntax: ``` !hlfir.expr<[extent x]* T [, class]> ``` 208 209- `[extent x]*` represents the shape for arrays similarly to !fir.array<> type, 210 except that the shape cannot be assumed rank (!hlfir.expr<..xT> is invalid). 211 This restriction can be added because it is impossible to create an assumed 212 rank expression in Fortran that is not a variable. 213- `T` is the element type of the static type 214- `class` flag can be set to denote that this a polymorphic expression (that the 215 dynamic type should not be assumed to be the static type). 216 217 218examples: !hlfir.expr<fir.char<?>>, !hlfir.expr<10xi32>, 219!hlfir.expr<?x10x?xfir.complex<4>> 220 221T in scalar hlfir.expr<T> can be: 222- A character type (fir.char<10, kind>, fir.char<?, kind>) 223- A derived type: (fir.type<t{...}>) 224 225T in an array hlfir.expr< e1 x ex2 .. : T> can be: 226- A character or derived type 227- A logical type (fir.logical<kind>) 228- An integer type (i1, i32, ….) 229- A floating point type (f32, f16…) 230- A complex type (fir.complex<4> or mlir::complex<f32>...) 231 232Some expressions may be polymorphic (for instance, MERGE can be used on 233polymorphic entities). The hlfir.expr type has an optional "class" flag to 234denote this: hlfir.expr<T, class>. 235 236Note that the ALLOCATABLE, POINTER, TARGET, VOLATILE, ASYNCHRONOUS, OPTIONAL 237aspects do not apply to expressions, they apply to variables. 238 239It is possible to query the following about an expression: 240- What is the extent : via hlfir.get_extent %expr, dim 241- What are the length parameters: via hlfir.get_typeparam %expr [, param_name] 242- What is the dynamic type: via hlfir.get_dynamic_type %expr 243 244It is possible to get the value of an array expression element: 245- %element = hlfir.apply %expr, %i, %j : (!hlfir.expr<T>, index index) -> 246 hlfir.expr<ScalarType> | AnyConstantSizeScalarType 247 248It is not directly possible to take an address for the expression, but an 249expression value can be associated to a new variable whose address can be used 250(required when passing the expression in a user call, or to concepts that are 251kept low level in FIR, like IO runtime calls). The variable created may be a 252compiler created temporary, or may relate to a Fortran source variable if this 253mechanism is used to implement ASSOCIATE. 254 255- %var = hlfir.associate %expr [attributes about the association]-> 256 AnyMemoryOrBoxType 257- hlfir.end_association %var 258 259The intention is that the hlfir.expr<T> is the result of an operation, and 260should most often not be a block argument. This is because the hlfir.expr is 261mostly intended to allow combining chains of operations into more optimal 262forms. But it is possible to represent any expression result via a Fortran 263runtime descriptor (fir.box<T>), implying that if a hlfir.expr<T> is passed as 264a block argument, the expression bufferization pass will evaluate the operation 265producing the expression in a temporary, and transform the block operand into a 266fir.box describing the temporary. Clean-up for the temporary will be inserted 267after the last use of the hlfir.expr. Note that, at least at first, lowering 268may help FIR to find the last use of a hlfir.expr by explicitly inserting a 269hlfir.finalize %expr operation that may turn into a no-op if the expression is 270not later materialized in memory. 271 272It is nonetheless not intended that such abstract types be used as block 273arguments to avoid introducing allocations and descriptor manipulations. 274 275#### hlfir.declare operation 276 277Motivation: represent variables, linking together a memory storage, shape, 278length parameters, attributes and the variable name. 279 280Syntax: 281``` 282%var = hlfir.declare %base [shape %extent1, %extent2, ...] [lbs %lb1, %lb2, ...] [typeparams %l1, ...] {fir.def = mangled_variable_name, attributes} : [(....) ->] T1, T2 283``` 284 285%var#0 will have a FIR memory type that is allowed for HLFIR variables. %var#1 286will have the same type as %base, it is intended to be used when lowering HLFIR 287to FIR in order to avoid creating unnecessary fir.box (that would become 288runtime descriptors). When an HLFIR operation has access to the defining 289hlfir.declare of its variable operands, the operation codegen will be allowed 290to replace the %var#0 reference by the simpler %var#1 reference. 291 292- Extents should only be provided if %base is not a fir.box and the entity is an 293 array. 294- lower bounds should only be provided if the entity is an array and the lower 295 bounds are not default (all ones). It should also not be provided for POINTERs 296 and ALLOCATABLES since the lower bounds may change. 297- type parameters should be provided for entities with length parameters, unless 298 the entity is a CHARACTER where the length is constant in %base type. 299- The attributes will include the Fortran attributes: TARGET (fir.target), 300 POINTER (fir.ptr), ALLOCATABLE (fir.alloc), CONTIGUOUS (fir.contiguous), 301 OPTIONAL (fir.optional), VOLATILE (fir.volatile), ASYNCHRONOUS (fir.async). 302 They will also indicate when an entity is part of an equivalence by giving the 303 equivalence name (fir.equiv = mangled_equivalence_name). 304 305hlfir.declare will be used for all Fortran variables, except the ones created via 306the ASSOCIATE construct that will use hlfir.associate described below. 307 308hlfir.declare will also be used when creating compiler created temporaries, in 309which case the fir.tmp attribute will be given. 310 311Examples: 312 313| FORTRAN | HLFIR | 314| ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | 315| REAL :: X | %mem = fir.alloca f32 <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<f32>, fir.ref<f32> | 316| REAL, TARGET :: X(10) | %mem = fir.alloca f32 <br> %nval = fir.load %n <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.target} : fir.ref<fir.array<10xf32>>, fir.ref<fir.array<10xf32>> | 317| REAL :: X(N) | %mem = // … alloc or dummy argument <br> %nval = fir.load %n : i64 <br> %x = hlfir.declare %mem shape %nval {fir.def = "\_QPfooEx"} : (i64) -> fir.box<fir.array<?xf32>>, fir.ref<fir.array<?xf32>> | 318| REAL :: X(0:) | %mem = // … dummy argument <br> %c0 = arith.constant 0 : index <br> %x = hlfir.declare %mem lbs %c0 {fir.def = "\_QPfooEx"} : (index) -> fir.box<fir.array<?xf32>>, fir.box<fir.array<?xf32>> | 319| <br>REAL, POINTER :: X(:) | %mem = // … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.ref<fir.box<fir.ptr<fir.array<?xf32>>>>, fir.ref<fir.box<fir.ptr<fir.array<?xf32>>>> | 320| REAL, ALLOCATABLE :: X(:) | %mem = // … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.alloc} : fir.ref<fir.box<fir.heap<fir.array<?xf32>>>>, fir.ref<fir.box<fir.heap<fir.array<?xf32>>>> | 321| CHARACTER(10) :: C | %mem = // … dummy argument, or local, or global <br> %c = hlfir.declare %mem lbs %c0 {fir.def = "\_QPfooEc"} : fir.ref<fir.char<10>>, fir.ref<fir.char<10>> | 322| CHARACTER(\*) :: C | %unbox = fir.unbox %bochar (fir.boxchar<1>) -> (fir.ref<fir.char<?>>, index) <br> %c = hlfir.declare %unbox#0 typeparams %unbox#1 {fir.def = "\_QPfooEc"} : (index) -> fir.boxchar<1>, fir.ref<fir.char<?>> | 323| CHARACTER(\*), OPTIONAL, ALLOCATABLE :: C | %mem = // … dummy argument <br> %c = hlfir.declare %mem {fir.def = "\_QPfooEc", fir.alloc, fir.optional, fir.assumed\_len\_alloc} : fir.ref<fir.box<fir.heap<fir.char<?>>>>, fir.ref<fir.box<fir.heap<fir.char<?>>>> | 324| TYPE(T) :: X | %mem = // … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<fir.type<t{...}>>, fir.ref<fir.type<t{...}>> | 325| TYPE(T(L)) :: X | %mem = // … dummy argument, or local, or global <br> %lval = fir.load %l <br> %x = hlfir.declare %mem typeparams %lval {fir.def = "\_QPfooEx"} : fir.box<fir.type<t{...}>>, fir.box<fir.type<t{...}>> | 326| CLASS(\*), POINTER :: X | %mem = // … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.class<fir.ptr<None>> fir.class<fir.ptr<None>> | 327| REAL :: X(..) | %mem = // … dummy argument <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx"} : fir.box<fir.array<..xf32>>, fir.box<fir.array<..xf32>> | 328 329#### fir.declare operation 330 331Motivation: keep variable information available in FIR, at least with 332the intent to be able to produce debug information. 333 334Syntax: 335``` 336%var = fir.declare %base [shape %extent1, %extent2, ...] [lbs %lb1, %lb2, ...] [typeparams %l1, ...] {fir.def = mangled_variable_name, attributes} : [(....) ->] T 337``` 338 339%var will have the same type as %base. When no debug info is generated, the 340operation can be replaced by %base when lowering to LLVM. Otherwise, the 341operation is similar to hlfir.declare and will be produced from it. 342 343#### hlfir.associate operation 344 345Motivation: represent Fortran associations (both from variables and expressions) 346and allow keeping actual/dummy argument association information after inlining. 347 348Syntax: 349``` 350%var = hlfir.associate %expr_or_var {fir.def = mangled_uniq_name, attributes} (AnyExprOrVarType) -> AnyVarType 351``` 352 353hlfir.associate is used to represent the following associations: 354- Dummy/Actual association on the caller side (the callee side uses 355 hlfir.declare). 356- Host association in block constructs when VOLATILE/ASYNC attributes are added 357 locally 358- ASSOCIATE construct (both from variable and expressions). 359 360When the operand is a variable, hlfir.associate allows changing the attributes 361of the variable locally, and to encode certain side-effects (like 362copy-in/copy-out when going from a non-contiguous variable to a contiguous 363variable, with the help of the related hlfir.end_association operation). 364 365When the operand is an expression, hlfir.associate allows associating a storage 366location to an expression value. 367 368A hlfir.associate must be followed by a related hlfir.end_association that will 369allow inserting any necessary finalization or copy-out later. 370 371#### hlfir.end_association operation 372 373Motivation: mark the place where some association should end and some side 374effects might need to occur. 375 376The hlfir.end_associate is a placeholder to later insert 377deallocation/finalization if the variable was associated with an expression, 378and to insert copy-out/deallocation if the variable was associated with another 379variable with a copy-in. 380 381Syntax: 382``` 383hlfir.end_association %var [%original_variable] {attributes} 384``` 385 386 387The attributes can be: 388- copy_out (copy out the associated variable back into the original variable 389 if a copy-in occurred) 390- finalize_copy_in (deallocate the temporary storage for the associated 391 variable if a copy-in occurred but the associated variable was not modified 392 (e.g., it is intent(in))). 393- finalize: indicate that a finalizer should be run on the entity associated 394 with the variable (There is currently no way to deduce this only from the 395 variable type in FIR). It will give the finalizer mangled name so that it 396 can be later called. 397 398If the copy_out or finalize_copy_in attribute is set, “original_variable” (the 399argument of the hlfir.associate that produced %var) must be provided. The 400rationale is that the original variable address is needed to verify if a 401temporary was created, and if needed, to copy the data back to it. 402 403#### hlfir.finalize 404 405Motivation: mark end of life of local variables 406 407Mark the place where a local variable will go out of scope. The main goal is to 408retain this information even after local variables are inlined. 409 410Syntax: 411``` 412hlfir.finalize %var {attributes} 413``` 414 415The attributes can be: 416- finalize: indicate that a finalizer should be run on the entity associated 417 with the variable (There is currently no way to deduce this only from the 418 variable type in FIR). 419 420Note that finalization will not free the local variable storage if it was 421allocated on the heap. If lowering created the storage passed to hlfir.declare 422via a fir.allocmem, lowering should insert a fir.freemem after the 423hlfir.finalize. This could help making fir.allocmem to fir.alloca promotion 424simpler, and also because finalization may be run without the intent to 425deallocate the variable storage (like on INTENT(OUT) dummies). 426 427 428#### hlfir.designate 429 430Motivation: Represent designators at a high-level and allow representing some 431information about derived type components that would otherwise be lost, like 432component lower bounds. 433 434Represent Fortran designators in a verbatim way: both triplet, and component 435parts. 436 437Syntax: 438``` 439%var = hlfir.designate %base [“component”,] [(%i, %k:l%:%m)] [substr ub, lb] [imag|real] [shape extent1, extent2, ....] [lbs lb1, lb2, .....] [typeparams %l1, ...] {attributes} 440``` 441 442hlfir.designate is intended to encode a single part-ref (as defined by the 443fortran standard). That means that a(:)%x(i, j, k) must be split into two 444hlfir.designate: one for a(:), and one for x(i, j, k). If the base is ranked, 445and the component is an array, the subscripts are mandatory and must not 446contain triplets. This ensures that the result of a fir.designator cannot be a 447"super-array". 448 449The subscripts passed to hlfir.designate must be based on the base lower bounds 450(one by default). 451 452A substring is built by providing the lower and upper character indices after 453`substr`. Implicit substring bounds must be made explicit by lowering. It is 454not possible to provide substr if a component is already provided. Instead the 455related Fortran designator must be split into two fir.designator. This is 456because the component character length will be needed to compute the right 457stride, and it might be lost if not placed on the first designator typeparams. 458 459Real and Imaginary complex parts are represented by an optional imag or real 460tag. It can be added even if there is already a component. 461 462The shape, lower bound, and type parameter operands represent the output entity 463properties. The point of having those made explicit is to allow early folding 464and hoisting of array section shape and length parameters (which especially in 465FORALL contexts, can simplify later assignment temporary insertion a lot). Also, 466if lower bounds of a derived type component array could not be added here, they 467would be lost since they are not represented by other means in FIR (the fir.type 468does not include this information). 469 470hlfir.designate is not intended to describe vector subscripted variables. 471Instead, lowering will have to introduce loops to do element by element 472addressing. See the Examples section. This helps keeping hlfir.designate simple, 473and since the contexts where a vector subscripted entity is considered to be a 474variable (in the sense that it can be modified) are very limited, it seems 475reasonable to have lowering deal with this aspect. For instance, a vector 476subscripted entity cannot be passed as a variable, it cannot be a pointer 477assignment target, and when it appears as an associated entity in an ASSOCIATE, 478the related variable cannot be modified. 479 480#### hlfir.assign 481 482Motivation: represent assignment at a high-level (mainly a change for array and 483character assignment) so that optimization pass can clearly reason about it 484(value propagation, inserting temporary for right-hand side evaluation only when 485needed), and that lowering does not have to implement it all. 486 487Syntax: 488``` 489hlfir.assign %expr_or_var to %var [attributes] 490``` 491 492The attributes can be: 493 494- realloc: mark that assignment has F2003 semantics and that the left-hand 495 side may have to be deallocated/reallocated… 496- use_assign=@function: mark a user defined assignment 497- no_overlap: mark that an assignment does not need a temporary (added by an 498 analysis pass). 499- unordered : mark that an assignment can happen in any element order (not 500 true if there is an impure elemental function being called). 501- temporary_lhs: mark that the left hand side of the assignment is 502 a compiler generated temporary. 503 504This will replace the current array_load/array_access/array_merge semantics. 505Instead, a more generic alias analysis will be performed on the LHS and RHS to 506detect aliasing, and a temporary inserted if needed. The alias analysis will 507look at all the memory references in the RHS operand tree and base overlap 508decisions on the related variable declaration operations. This same analysis 509should later allow moving/merging some expression evaluation between different 510statements. 511 512Note about user defined assignments: semantics is resolving them and building 513the related subroutine call. So a fir.call could directly be made in lowering if 514the right hand side was always evaluated in a temporary. The motivation to use 515hlfir.assign is to help the temporary removal, and also to deal with two edge 516cases: user assignment in a FORALL (the forall pass will need to understand that 517this an assignment), and allocatable assignment mixed with user assignment 518(implementing this as a call in lowering would require lowering the whole 519reallocation logic in lowering already, duplicating the fact that hlfir.assign 520should deal with it). 521 522#### hlfir.ptr_assign 523 524Motivation: represent pointer assignment without lowering the exact pointer 525implementation (descriptor address, fir.ref<fir.box> or simple pointer scalar 526fir.llvm_ptr<fir.ptr>). 527 528Syntax: 529``` 530hlfir.ptr_assign %var [[reshape %reshape] | [lbounds %lb1, …., %lbn]] to %ptr 531``` 532 533It is important to keep pointer assignment at a high-level so that they can 534later correctly be processed in hlfir.forall. 535 536#### hlfir.allocate 537 538Motivation: keep POINTER and ALLOCATABLE allocation explicit in HLFIR, while 539allowing later lowering to either inlined fir.allocmem or Fortran runtime 540calls. Generating runtime calls allow the runtime to do Fortran specific 541bookkeeping or flagging and to provide better runtime error reports. 542 543The main difference with the ALLOCATE statement is that one distinct 544hlfir.allocate has to be created for each element of the allocation-list. 545Otherwise, it is a naive lowering of the ALLOCATE statement. 546 547Syntax: 548``` 549%stat = hlfir.allocate %var [%shape] [%type_params] [[src=%source] | [mold=%mold]] [errmsg =%errmsg] 550``` 551 552#### hlfir.deallocate 553 554Motivation: keep deallocation explicit in HLFIR, while allowing later lowering 555to Fortran runtime calls to allow the runtime to do Fortran specific 556bookkeeping or flagging of allocations. 557 558Similarly to hlfir.allocate, one operation must be created for each 559allocate-object-list object. 560 561Syntax: 562``` 563%stat = hlfir.deallocate %var [errmsg=err]. 564``` 565 566#### hlfir.elemental 567 568Motivation: represent elemental operations without defining array level 569operations for each of them, and allow the representation of array expressions 570as function of the indices. 571 572The hlfir.elemental operation can be seen as a closure: it is defining a 573function of the indices that returns the value of the element of the 574represented array expression at the given indices. This an operation with an 575MLIR region. It allows detailing how an elemental expression is implemented at 576the element level, without yet requiring materializing the operands and result 577in memory. The hlfir.expr<T> elements value can be obtained using hlfir.apply. 578 579The element result is built with a fir.result op, whose result type can be a 580scalar hlfir.expr<T> or any scalar constant size types (e.g. i32, or f32). 581 582Syntax: 583``` 584%op = hlfir.elemental (%indices) %shape [%type_params] [%dynamic_type] { 585 …. 586 fir.result %result_element 587} 588``` 589 590 591Note that %indices are not operands, they are the elemental region block 592arguments, representing the array iteration space in a one based fashion. 593The choice of using one based indices is to match Fortran default for 594array variables, so that there is no need to generate bound adjustments 595when working with one based array variables in an expression. 596 597Illustration: “A + B” represented with a hlfir.elemental. 598 599``` 600%add = hlfir.elemental (%i:index, %j:index) shape %shape (!fir.shape<2>) -> !hlfir.expr<?x?xf32> { 601 %belt = hlfir.designate %b, %i, %j : (!fir.box<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32> 602 %celt = hlfir.designate %c, %i, %j : (!fir.box<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32> 603 %bval = fir.load %belt : (!fir.ref<f32>) -> f32 604 %cval = fir.load %celt : (!fir.ref<f32>) -> f32 605 %add = arith.addf %bval, %cval : f32 606 fir.result %res : f32 607} 608``` 609 610In contexts where it can be proved that the array operands were not modified 611between the hlfir.elemental and the hlfir.apply, the region of the 612hlfir.elemental can be inlined at the hlfir.apply. Otherwise, if there is no 613such guarantee, or if the hlfir.elemental is not “visible” (because its result 614is passed as a block argument), the hlfir.elemental will be lowered to an array 615temporary. This will be done as a HLFIR to HLFIR optimization pass. Note that 616MLIR inlining could be used if hlfir.elemental implemented the 617CallableInterface and hlfir.apply the CallInterface. But MLIR generic inlining 618is probably too generic for this case: no recursion is possible here, the call 619graphs are trivial, and using MLIR inlining here could introduce later 620conflicts or make normal function inlining more complex because FIR inlining 621hooks would already be used. 622 623hlfir.elemental allows delaying elemental array expression buffering and 624combination. Its generic aspect has two advantages: 625- It avoids defining one operation per elemental operation or intrinsic, 626 instead, the related arith dialect operations can be used directly in the 627 elemental regions. This avoids growing HLFIR and having to maintain about a 628 hundred operations. 629- It allows representing transformational intrinsics as functions of the indices 630 while doing optimization as described in 631 [Array Composition](ArrayComposition.md). This because the indices can be 632 transformed inside the region before being applied to array variables 633 according to any kind of transformation (semi-affine or not). 634 635 636#### Introducing the hlfir.apply operation 637 638Motivation: provide a way to get the element of an array expression 639(hlfir.expr<?x…xT>) 640 641This is the addressing equivalent for expressions. A notable difference is that 642it can only take simple scalar indices (no triplets) because it is not clear 643why supporting triplets would be needed, and keeping the indexing simple makes 644inlining of hlfir.elemental much easier. 645 646If hlfir.elemental inlining is not performed, or if the hlfir.expr<T> array 647expression is produced by another operation (like fir.intrinsic) that is not 648rewritten, hlfir.apply will be lowered to an actual addressing operation that 649will address the temporary that was created for the hlfir.expr<T> value that 650was materialized in memory. 651 652hlfir.apply indices will be one based to make further lowering simpler. 653 654Syntax: 655``` 656%element = hlfir.apply %array_expr %i, %j: (hlfir.expr<?x?xi32>) -> i32 657``` 658 659#### Introducing operations for transformational intrinsic functions 660 661Motivation: Represent transformational intrinsics functions at a high-level so 662that they can be manipulated easily by the optimizer, and do not require 663materializing the result as a temporary in lowering. 664 665An operation will be added for each Fortran transformational functions (SUM, 666MATMUL, TRANSPOSE....). It translates the Fortran expression verbatim: it takes 667the same number of arguments as the Fortran intrinsics and returns a 668hlfir.expr<T>. The arguments may be hlfir.expr<T>, simple scalar types (e.g., 669i32, f32), or variables. 670 671The exception being that the arguments that are statically absent would be 672passed to it (passing results of fir.absent operation), so that the arguments 673can be identified via their positions. 674 675This operation is meant for the transformational intrinsics, not the elemental 676intrinsics, that will be implemented using hlfir.elemental + mlir math dialect 677operations, nor the intrinsic subroutines (like random_seed or system_clock), 678that will be directly lowered in lowering. 679 680Syntax: 681``` 682%res = hlfir."intrinsic_name" %expr_or_var, ... 683``` 684 685These operations will all inherit a same operation base in tablegen to make 686their definition and identification easy. 687 688Without any optimization, codegen would then translate the operations to 689exactly the same FIR as currently generated by IntrinsicCall.cpp (runtime calls 690or inlined code with temporary allocation for array results). The fact that 691they are the verbatim Fortran translations should allow to move the lowering 692code to a translation pass without massive changes. 693 694An operation will at least be created for each of the following transformational 695intrinsics: all, any, count, cshift, dot_product, eoshift, findloc, iall, iany, 696iparity, matmul, maxloc, maxval, minloc, minval, norm2, pack, parity, product, 697reduce, repeat, reshape, spread, sum, transfer, transpose, trim, unpack. 698 699For the following transformational intrinsics, the current lowering to runtime 700call will probably be used since there is little point to keep them high level: 701- command_argument_count, get_team, null, num_images, team_number, this_image 702 that are more program related (and cannot appear for instance in constant 703 expressions) 704- selected_char_kind, selected_int_kind, selected_real_kind that returns scalar 705 integers 706 707#### Introducing operations for composed intrinsic functions 708 709Motivation: optimize commonly composed intrinsic functions (e.g. 710MATMUL(TRANSPOSE(a), b)). This optimization is implemented in Classic Flang. 711 712An operation and runtime function will be added for each commonly used 713composition of intrinsic functions. The operation will be the canonical way to 714write this chained operation (the MLIR canonicalization pass will rewrite the 715operations for the composed intrinsics into this one operation). 716 717These new operations will be treated as though they were standard 718transformational intrinsic functions. 719 720The composed intrinsic operation will return a hlfir.expr<T>. The arguments 721may be hlfir.expr<T>, boxed arrays, simple scalar types (e.g. i32, f32), or 722variables. 723 724To keep things simple, these operations will only match one form of the composed 725intrinsic functions: therefore there will be no optional arguments. 726 727Syntax: 728``` 729%res = hlfir."intrinsic_name" %expr_or_var, ... 730``` 731 732The composed intrinsic operation will be lowered to a `fir.call` to the newly 733added runtime implementation of the operation. 734 735These operations should not be added where the only improvement is to avoid 736creating a temporary intermediate buffer which would otherwise be removed by 737intelligent bufferization of a hlfir.expr. Similarly, these should not replace 738profitable uses of hlfir.elemental. 739 740#### Introducing operations for character operations and elemental intrinsic functions 741 742 743Motivation: represent character operations without requiring the operand and 744results to be materialized in memory. 745 746fir.char_op is intended to represent: 747- Character concatenation (//) 748- Character MIN/MAX 749- Character MERGE 750- “SET_LENGTH” 751- Character conversions 752- REPEAT 753- INDEX 754- CHAR 755- Character comparisons 756- LEN_TRIM 757 758The arguments must be scalars, the elemental aspect should be handled by a 759hlfir.elemental operation. 760 761Syntax: 762``` 763%res = hlfir.“char_op” %expr_or_var 764``` 765 766Just like for the transformational intrinsics, if no optimization occurs, these 767operations will be lowered to memory operations with temporary results (if the 768result is a character), using the same generation code as the one currently used 769in lowering. 770 771#### hlfir.array_ctor 772 773Motivation: represent array constructor without creating temporary 774 775Many array constructors have a limited number of elements (less than 10), the 776current lowering of array constructor is rather complex because it must deal 777with the generic cases. 778 779Having a representation to represent array constructor will allow an easier 780lowering of array constructor, and make array ctor a lot easier to manipulate. 781For instance, for small array constructors, loops could could be unrolled with 782the array ctor elements without ever creating a dynamically allocated array 783temporary and loop nest using it. 784 785Syntax: 786``` 787%array_ctor = hlfir.array_ctor %expr1, %expr2 …. 788``` 789 790Note that hlfir.elemental could be used to implement some ac-implied-do, 791although this is not yet clarified since ac-implied-do may contain more than 792one scalar element (they may contain a list of scalar and array values, which 793would render the representation in a hlfir.elemental tricky, but maybe not 794impossible using if/then/else and hlfir.elemental nests using the index value). 795One big issue though is that hlfir.elemental requires the result shape to be 796pre-computed (it is an operand), and with an ac-implied-do containing user 797transformational calls returning allocatable or pointer arrays, it is 798impossible to pre-evaluate the shape without evaluating all the function calls 799entirely (and therefore all the array constructor elements). 800 801#### hlfir.get_extent 802 803Motivation: inquire about the extent of a hlfir.expr, variable, or fir.shape 804 805Syntax: 806``` 807%extent = hlfir.get_extent %shape_expr_or_var, dim 808``` 809 810dim is a constant integer attribute. 811 812This allows inquiring about the extents of expressions whose shape may not be 813yet computable without generating detailed, low level operations (e.g, for some 814transformational intrinsics), or to avoid going into low level details for 815pointer and allocatable variables (where the descriptor needs to be read and 816loaded). 817 818#### hlfir.get_typeparam 819 820Motivation: inquire about the type parameters of a hlfir.expr, or variable. 821 822Syntax: 823``` 824%param = hlfir.get_typeparam %expr_or_var [, param_name] 825``` 826- param_name is an optional string attribute that must contain the length 827 parameter name if %expr_or_var is a derived type. 828 829#### hlfir.get_dynamic_type 830 831Motivation: inquire about the dynamic type of a polymorphic hlfir.expr or 832variable. 833 834Syntax: 835``` 836%dynamic_type = hlfir.get_dynamic_type %expr_or_var 837``` 838 839#### hlfir.get_lbound 840 841Motivation: inquire about the lower bounds of variables without digging into 842the implementation details of pointers and allocatables. 843 844Syntax: 845``` 846%lb = hlfir.get_lbound %var, n 847``` 848 849Note: n is an integer constant attribute for the (zero based) dimension. 850 851#### hlfir.shape_meet 852 853Motivation: represent conformity requirement/information between two array 854operands so that later optimization can choose the best shape information 855source, or insert conformity runtime checks. 856 857Syntax: 858``` 859%shape = hlfir.shape_meet %shape1, %shape2 860``` 861 862Suppose A(n), B(m) are two explicit shape arrays. Currently, when A+B is 863lowered, lowering chose which operand shape gives the result shape information, 864and it is later not retrievable that both n and m can be used. If lowering 865chose n, but m later gets folded thanks to inlining or constant propagation, the 866optimization passes have no way to use this constant information to optimize the 867result storage allocation or vectorization of A+B. hlfir.shape_meet intends to 868delay this choice until constant propagation or inlining can provide better 869information about n and m. 870 871#### hlfir.forall 872 873Motivation: segregate the Forall lowering complexity in its own unit. 874 875Forall is tough to lower because: 876- Lowering it in an optimal way requires analyzing several assignments/mask 877 expressions. 878- The shape of the temporary needed to store intermediate evaluation values is 879 not a Fortran array in the general case, and cannot in the general case be 880 maximized/pre-computed without executing the forall to compute the bounds of 881 inner forall, and the shape of the assignment operands that may depend on 882 the bound values. 883- Mask expressions evaluation should be affected by previous assignment 884 statements, but not by the following ones. Array temporaries may be 885 required for the masks to cover this. 886- On top of the above points, Forall can contain user assignments, pointer 887 assignments, and assignment to whole allocatable. 888 889 890The hlfir.forall syntax would be exactly the one of a fir.do_loop. The 891difference would be that hlfir.assign and hlfir.ptr_assign inside hlfir.forall 892have specific semantics (the same as in Fortran): 893- Given one hlfir.assign, all the iteration values of the LHS/RHS must be 894 evaluated before the assignment of any value is done. 895- Given two hlfir.assign, the first hlfir.assign must be fully performed 896 before any evaluation of the operands of the second assignment is done. 897- Masks (fir.if arguments), if any, should be evaluated before any nested 898 assignments. Any assignments syntactically before the where mask occurrence 899 must be performed before the mask evaluation. 900 901Note that forall forbids impure function calls, hence, no calls should modify 902any other expression evaluation and can be removed if unused. 903 904The translation of hlfir.forall will happen by: 905- 1. Determining if the where masks value may be modified by any assignments 906 - Yes, pre-compute all masks in a pre-run of the forall loop, creating 907 a “forall temps” (we may need a FIR concept to help here). 908 - No, Do nothing (or indicate it is safe to evaluate masks while evaluating 909 the rest). 910- 2. Determining if a hlfir.assign operand expression depends on the 911 previous hlfir.assign left-hand side base value. 912 - Yes, split the hlfir.assign into their own nest of hlfir.forall loops. 913 - No, do nothing (or indicate it is safe to evaluate the assignment while 914 evaluating previous assignments) 915- 3. For each assignments, check if the RHS/LHS operands value may depend 916 on the LHS base: 917 - Yes, split the forall loops. Insert a “forall temps” before the loops for 918 the “smallest” part that may overlap (which may be the whole RHS, or some 919 RHS sub-part, or some LHS indices). In the first nest, evaluate this 920 overlapping part into the temp. In the next forall loop nest, modify the 921 assignment to use the temporary, and add the [no_overlap] flag to indicate 922 no further temporary is needed. Insert code to finalize the temp after its 923 usage. 924 925## New HLFIR Transformation Passes 926 927### Mandatory Passes (translation towards lower-level representation) 928 929Note that these passes could be implemented as a single MLIR pass, or successive 930passes. 931 932- Forall rewrites (getting rid of hlfir.forall) 933- Array assignment rewrites (getting rid of array hlfir.assign) 934- Bufferization: expression temporary materialization (getting rid of 935 hlfir.expr, and all the operations that may produce it like transformational 936 intrinsics and hlfir.elemental, hlfir.apply). 937- Call interface argument association lowering (getting rid of hlfir.associate 938 and hlfir.end_associate) 939- Lowering high level operations using variables into FIR operations 940 operating on memory (translating hlfir.designate, scalar hlfir.assign, 941 hlfir.finalize into fir.array_coor, fir.do_loop, fir.store, fir.load. 942 fir.embox/fir.rebox operations). 943 944Note that these passes do not have to be the first one run after lowering. It is 945intended that CSE, DCE, algebraic simplification, inlining and some other new 946high-level optimization passes discused below be run before doing any of these 947translations. 948 949After that, the current FIR pipeline could be used to continue lowering towards 950LLVM. 951 952### Optimization Passes 953 954- Elemental expression inlining (inlining of hlfir.elemental in hlfir.apply) 955- User function Inlining 956- Transformational intrinsic rewrites as hlfir.elemental expressions 957- Assignments propagation 958- Shape/Rank/dynamic type propagation 959 960These high level optimization passes can be run any number of times in any 961order. 962 963## Transition Plan 964 965The new higher-level steps proposed in this document will require significant 966refactoring of lowering. Codegen should not be impacted since the current FIR 967will remain untouched. 968 969A lot of the code in lowering generating Fortran features (like an intrinsic or 970how to do assignments) is based on the fir::ExtendedValue concept. This 971currently is a collection of mlir::Value that allows describing a Fortran object 972(either a variable or an evaluated expression result). The variable and 973expression concepts described above should allow to keep an interface very 974similar to the fir::ExtendedValue, but having the fir::ExtendedValue wrap a 975single value or mlir::Operation* from which all of the object entity 976information can be inferred. 977 978That way, all the helpers currently generating FIR from fir::ExtendedValue could 979be kept and used with the new variable and expression concepts with as little 980modification as possible. 981 982The proposed plan is to: 983- 1. Introduce the new HLFIR operations. 984- 2. Refactor fir::ExtendedValue so that it can work with the new variable and 985 expression concepts (requires part of 1.). 986- 3. Introduce the new translation passes, using the fir::ExtendedValue helpers 987 (requires 1.). 988- 3.b Introduce the new optimization passes (requires 1.). 989- 4. Introduce the fir.declare and hlfir.finalize usage in lowering (requires 1. 990 and 2. and part of 3.). 991 992The following steps might have to be done in parallel of the current lowering, 993to avoid disturbing the work on performance until the new lowering is complete 994and on par. 995 996- 5. Introduce hlfir.designate and hlfir.associate usage in lowering. 997- 6. Introduce lowering to hlfir.assign (with RHS that is not a hlfir.expr), 998 hlfir.ptr_assign. 999- 7. Introduce lowering to hlfir.expr and related operations. 1000- 8. Introduce lowering to hlfir.forall. 1001 1002At that point, lowering using the high-level FIR should be in place, allowing 1003extensive testing. 1004- 9. Debugging correctness. 1005- 10. Debugging execution performance. 1006 1007The plan is to do these steps incrementally upstream, but for lowering this will 1008most likely be safer to do have the new expression lowering implemented in 1009parallel upstream, and to add an option to use the new lowering rather than to 1010directly modify the current expression lowering and have it step by step 1011equivalent functionally and performance wise. 1012 1013## Examples 1014 1015### Example 1: simple array assignment 1016 1017```Fortran 1018subroutine foo(a, b) 1019 real :: a(:), b(:) 1020 a = b 1021end subroutine 1022``` 1023 1024Lowering output: 1025 1026``` 1027func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) { 1028 %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1029 %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1030 hlfir.assign %b#0 to %a#0 : !fir.box<!fir.array<?xf32>> 1031 return 1032} 1033``` 1034 1035HLFIR array assignment lowering pass: 1036- Query: can %b value depend on %a? No, they are two different argument 1037 associated variables that are neither target nor pointers. 1038- Lower to assignment to loop: 1039 1040``` 1041func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) { 1042 %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1043 %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1044 1045 %ashape = hlfir.shape_of %a#0 1046 %bshape = hlfir.shape_of %b#0 1047 %shape = hlfir.shape_meet %ashape, %bshape 1048 %extent = hlfir.get_extent %shape, 0 1049 1050 %c1 = arith.constant 1 : index 1051 1052 fir.do_loop %i = %c1 to %extent step %c1 unordered { 1053 %belt = hlfir.designate %b#0, %i 1054 %aelt = hlfir.designate %a#0, %i 1055 hlfir.assign %belt to %aelt : fir.ref<f32>, fir.ref<f32> 1056 } 1057 return 1058} 1059``` 1060 1061HLFIR variable operations to memory translation pass: 1062- hlfir.designate is rewritten into fir.array_coor operation on the variable 1063 associated memory buffer, and returns the element address 1064- For numerical scalar, hlfir.assign is rewritten to fir.store (and fir.load 1065 of the operand if needed), for derived type and characters, memory copy 1066 (and padding for characters) is done. 1067- hlfir.shape_of are lowered to fir.box_dims, here, no constant information 1068 was obtained from any of the source shape, so hlfir.shape_meet is a no-op, 1069 selecting the first shape (a conformity runtime check could be inserted 1070 under debug options). 1071- hlfir.declare are translated into fir.declare that are no-ops and will allow 1072 generating debug information for LLVM. 1073 1074This pass would wrap operations defining variables (hlfir.declare/hlfir.designate) 1075as fir::ExtendedValue, and use all the current helpers operating on it 1076(e.g.: fir::factory::genScalarAssignment). 1077 1078``` 1079func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: 1080 !fir.box<!fir.array<?xf32>>) { 1081 %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>> 1082 %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>> 1083 %c1 = arith.constant 1 : index 1084 %dims = fir.box_dims %a, 1 1085 fir.do_loop %i = %c1 to %dims#1 step %c1 unordered { 1086 %belt = fir.array_coor %b, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32> 1087 %aelt = fir.array_coor %a, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32> 1088 %bval = fir.load %belt : f32 1089 fir.store %bval to %aelt : fir.ref<f32> 1090 } 1091 return 1092} 1093``` 1094 1095This reaches the current FIR level (except fir.declare that can be kept until 1096LLVM codegen and dropped on the floor if there is no debug information 1097generated). 1098 1099### Example 2: array assignment with elemental expression 1100 1101```Fortran 1102subroutine foo(a, b, p, c) 1103 real, target :: a(:) 1104 real :: b(:), c(100) 1105 real, pointer :: p(:) 1106 a = b*p + c 1107end subroutine 1108``` 1109 1110Lowering output: 1111 1112``` 1113func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) { 1114 %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32> 1115 %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1116 %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>> 1117 %c = hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>> 1118 %bshape = hlfir.shape_of %b#0 1119 %pshape = hlfir.shape_of %p#0 1120 %shape1 = hlfir.shape_meet %bshape, %pshape 1121 %mul = hlfir.elemental(%i:index) %shape1 { 1122 %belt = hlfir.designate %b#0, %i 1123 %p_lb = hlfir.get_lbound %p#0, 1 1124 %i_zero = arith.subi %i, %c1 1125 %i_p = arith.addi %i_zero, %p_lb 1126 %pelt = hlfir.designate %p#0, %i_p 1127 %bval = fir.load %belt : f32 1128 %pval = fir.load %pelt : f32 1129 %mulres = arith.mulf %bval, %pval : f32 1130 fir.result %mulres : f32 1131 } 1132 %cshape = hlfir.shape_of %c 1133 %shape2 = hlfir.shape_meet %cshape, %shape1 1134 %add = hlfir.elemental(%i:index) %shape2 { 1135 %mulval = hlfir.apply %mul, %i : f32 1136 %celt = hlfir.designate %c#0, %i 1137 %cval = fir.load %celt 1138 %add_res = arith.addf %mulval, %cval 1139 fir.result %add_res 1140 } 1141 hlfir.assign %add to %a#0 : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32> 1142 return 1143} 1144``` 1145 1146Step 1: hlfir.elemental inlining: inline the first hlfir.elemental into the 1147second one at the hlfir.apply. 1148 1149 1150``` 1151func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) { 1152 %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32> 1153 %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1154 %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>> 1155 %c = hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>> 1156 %bshape = hlfir.shape_of %b#0 1157 %pshape = hlfir.shape_of %p#0 1158 %shape1 = hlfir.shape_meet %bshape, %pshape 1159 %cshape = hlfir.shape_of %c 1160 %shape2 = hlfir.shape_meet %cshape, %shape1 1161 %add = hlfir.elemental(%i:index) %shape2 { 1162 %belt = hlfir.designate %b#0, %i 1163 %p_lb = hlfir.get_lbound %p#0, 1 1164 %i_zero = arith.subi %i, %c1 1165 %i_p = arith.addi %i_zero, %p_lb 1166 %pelt = hlfir.designate %p#0, %i_p 1167 %bval = fir.load %belt : f32 1168 %pval = fir.load %pelt : f32 1169 %mulval = arith.mulf %bval, %pval : f32 1170 %celt = hlfir.designate %c#0, %i 1171 %cval = fir.load %celt 1172 %add_res = arith.addf %mulval, %cval 1173 fir.result %add_res 1174 } 1175 hlfir.assign %add to %a#0 : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32> 1176 return 1177} 1178``` 1179 1180Step2: alias analysis around the array assignment: 1181 1182- May %add value depend on %a variable? 1183- Gather variable and function calls in %add operand tree (visiting 1184 hlfir.elemental regions) 1185- Gather references to %b, %p, and %c. %p is a pointer variable according to 1186 its defining operations. It may alias with %a that is a target. -> answer 1187 yes. 1188- Insert temporary, and duplicate array assignments, that can be lowered to 1189 loops at that point 1190 1191Note that the alias analysis could have already occurred without inlining the 1192%add hlfir.elemental. 1193 1194 1195``` 1196func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) { 1197 %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32> 1198 %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1199 %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>> 1200 %c = hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>> 1201 %bshape = hlfir.shape_of %b#0 1202 %pshape = hlfir.shape_of %p#0 1203 %shape1 = hlfir.shape_meet %bshape, %pshape 1204 %cshape = hlfir.shape_of %c 1205 %shape2 = hlfir.shape_meet %cshape, %shape1 1206 %add = hlfir.elemental(%i:index) %shape2 { 1207 %belt = hlfir.designate %b#0, %i 1208 %p_lb = hlfir.get_lbound %p#0, 1 1209 %i_zero = arith.subi %i, %c1 1210 %i_p = arith.addi %i_zero, %p_lb 1211 %pelt = hlfir.designate %p#0, %i_p 1212 %bval = fir.load %belt : f32 1213 %pval = fir.load %pelt : f32 1214 %mulval = arith.mulf %bval, %pval : f32 1215 %celt = hlfir.designate %c#0, %i 1216 %cval = fir.load %celt 1217 %add_res = arith.addf %mulval, %cval 1218 fir.result %add_res 1219 } 1220 %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index 1221 %tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>> 1222 %temp = hlfir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.box<fir.array<?xf32>>, fir.heap<fir.array<?xf32>> 1223 hlfir.assign %add to %temp#0 no_overlap : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>> 1224 hlfir.assign %temp to %a#0 : no_overlap : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1225 hlfir.finalize %temp#0 1226 fir.freemem %tempstorage 1227 return 1228} 1229``` 1230 1231Step 4: Lower assignments to regular loops since they have the no_overlap 1232attribute, and inline the hlfir.elemental into the first loop nest. 1233 1234``` 1235func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) { 1236 %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32> 1237 %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>> 1238 %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>> 1239 %c = hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>> 1240 %bshape = hlfir.shape_of %b#0 1241 %pshape = hlfir.shape_of %p#0 1242 %shape1 = hlfir.shape_meet %bshape, %pshape 1243 %cshape = hlfir.shape_of %c 1244 %shape2 = hlfir.shape_meet %cshape, %shape1 1245 %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index 1246 %tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>> 1247 %temp = hlfir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.box<fir.array<?xf32>>, fir.heap<fir.array<?xf32>> 1248 fir.do_loop %i = %c1 to %shape2 step %c1 unordered { 1249 %belt = hlfir.designate %b#0, %i 1250 %p_lb = hlfir.get_lbound %p#0, 1 1251 %i_zero = arith.subi %i, %c1 1252 %i_p = arith.addi %i_zero, %p_lb 1253 %pelt = hlfir.designate %p#0, %i_p 1254 %bval = fir.load %belt : f32 1255 %pval = fir.load %pelt : f32 1256 %mulval = arith.mulf %bval, %pval : f32 1257 %celt = hlfir.designate %c#0, %i 1258 %cval = fir.load %celt 1259 %add_res = arith.addf %mulval, %cval 1260 %tempelt = hlfir.designate %temp#0, %i 1261 hlfir.assign %add_res to %tempelt : f32, fir.ref<f32> 1262 } 1263 fir.do_loop %i = %c1 to %shape2 step %c1 unordered { 1264 %aelt = hlfir.designate %a#0, %i 1265 %tempelt = hlfir.designate %temp#0, %i 1266 hlfir.assign %add_res to %tempelt : f32, fir.ref<f32> 1267 } 1268 hlfir.finalize %temp#0 1269 fir.freemem %tempstorage 1270 return 1271} 1272``` 1273 1274Step 5 (may also occur earlier or several times): shape propagation. 1275- %shape2 can be inferred from %cshape that has constant shape: the 1276 hlfir.shape_meet results can be replaced by it, and if the option is set, 1277 conformance checks can be added for %a, %b and %p. 1278- %temp is small, and its fir.allocmem can be promoted to a stack allocation 1279 1280``` 1281func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) { 1282 // ..... 1283 %cshape = fir.shape %c100 1284 %extent = %c100 1285 // updated fir.alloca 1286 %tempstorage = fir.alloca %extent : fir.ref<fir.array<100xf32>> 1287 %temp = hlfir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.box<fir.array<?xf32>>, fir.heap<fir.array<?xf32>> 1288 fir.do_loop %i = %c1 to %c100 step %c1 unordered { 1289 // ... 1290 } 1291 fir.do_loop %i = %c1 to %c100 step %c1 unordered { 1292 // ... 1293 } 1294 hlfir.finalize %temp#0 1295 // deleted fir.freemem %tempstorage 1296 return 1297} 1298``` 1299 1300Step 6: lower hlfir.designate/hlfir.assign in a translation pass: 1301 1302At this point, the representation is similar to the current representation after 1303the array value copy pass, and the existing FIR flow is used (lowering 1304fir.do_loop to cfg and doing codegen to LLVM). 1305 1306### Example 3: assignments with vector subscript 1307 1308```Fortran 1309subroutine foo(a, b, v) 1310 real :: a(*), b(*) 1311 integer :: v(:) 1312 a(v) = b(v) 1313end subroutine 1314``` 1315 1316Lowering of vector subscripted entities would happen as follow: 1317- vector subscripted entities would be lowered as a hlfir.elemental implementing 1318 the vector subscript addressing. 1319- If the vector appears in a context where it can be modified (which can only 1320 be an assignment LHS, or in input IO), lowering could transform the 1321 hlfir.elemental into hlfir.forall (for assignments), or a fir.iter_while (for 1322 input IO) by inlining the elemental body into the created loops, and 1323 identifying the hlfir.designate producing the result. 1324 1325``` 1326func.func @_QPfoo(%arg0: !fir.ref<!fir.array<?xf32>>, %arg1: !fir.ref<!fir.array<?xf32>>, %arg2: !fir.box<<!fir.array<?xi32>>) { 1327 %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>, !fir.ref<!fir.array<?xf32>> 1328 %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.ref<!fir.array<?xf32>> 1329 %v = hlfir.declare %arg2 {fir.def = "_QPfooEv"} : !fir.box<!fir.array<?xi32>>, !fir.box<!fir.array<?xi32>> 1330 %vshape = hlfir.shape_of %v : fir.shape<1> 1331 %bsection = hlfir.elemental(%i:index) %vshape : (fir.shape<1>) -> hlfir.expr<?xf32> { 1332 %v_elt = hlfir.designate %v#0, %i : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32> 1333 %v_val = fir.load %v_elt : fir.ref<i32> 1334 %cast = fir.convert %v_val : (i32) -> index 1335 %b_elt = hlfir.designate %b#0, %v_val : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32> 1336 %b_val = fir.load %b_elt : fir.ref<f32> 1337 fir.result %b_elt 1338 } 1339 %extent = hlfir.get_extent %vshape, 0 : (fir.shape<1>) -> index 1340 %c1 = arith.constant 1 : index 1341 hlfir.forall (%i from %c1 to %extent step %c1) { 1342 %b_section_val = hlfir.apply %bsection, %i : (hlfir.expr<?xf32>, index) -> f32 1343 %v_elt = hlfir.designate %v#0, %i : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32> 1344 %v_val = fir.load %v_elt : fir.ref<i32> 1345 %cast = fir.convert %v_val : (i32) -> index 1346 %a_elt = hlfir.designate %a#0, %v_val : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32> 1347 hlfir.assign %b_section_val to %a_elt : f32, fir.ref<f32> 1348 } 1349 return 1350} 1351``` 1352 1353This would then be lowered as described in the examples above (hlfir.elemental 1354will be inlined, hlfir.forall will be rewritten into normal loops taking into 1355account the alias analysis, and hlfir.assign/hlfir.designate operations will be 1356lowered to fir.array_coor and fir.store operations). 1357 1358## Alternatives that were not retained 1359 1360### Using a non-MLIR based mutable CFG representation 1361 1362An option would have been to extend the PFT to describe expressions in a way 1363that can be annotated and modified with the ability to introduce temporaries. 1364This has been rejected because this would imply a whole new set of 1365infrastructure and data structures while FIR is already using MLIR 1366infrastructure, so enriching FIR seems a smoother approach and will benefit from 1367the MLIR infrastructure experience that was gained. 1368 1369### Using symbols for HLFIR variables 1370 1371#### Using attributes as pseudo variable symbols 1372 1373Instead of restricting the memory types an HLFIR variable can have, it was 1374force the defining operation of HLFIR variable SSA values to always be 1375retrievable. The idea was to add a fir.ref attribute that would repeat the name 1376of the HLFIR variable. Using such an attribute would prevent MLIR from merging 1377two operations using different variables when merging IR blocks. (which is the 1378main reason why the defining op may become inaccessible). The advantage of 1379forcing the defining operation to be retrievable is that it allowed all Fortran 1380information of variables (like attributes) to always be accessible in HLFIR 1381when looking at their uses, and avoids requiring the introduction of fir.box 1382usages for simply contiguous variables. The big drawback is that this implies 1383naming all HLFIR variables, and there are many more of them than there are 1384Fortran named variables. Naming designators with unique names was not very 1385natural, and would make designator CSE harder. It also made inlining harder, 1386because inlining HLFIR code without any fir.def/fir.ref attributes renaming 1387would break the name uniqueness, which could lead to some operations using 1388different variables to be merged, and to break the assumption that parent 1389operations must be visible. Renaming would be possible, but would increase 1390complexity and risks. Besides, inlining may not be the only transformation 1391doing code motion, and whose complexity would be increased by the naming 1392constraints. 1393 1394 1395#### Using MLIR symbols for variables 1396 1397Using MLIR symbols for HLFIR variables has been rejected because MLIR symbols 1398are mainly intended to deal with globals and functions that may refer to each 1399other before being defined. Their processing is not as light as normal values, 1400and would require to turn every FIR operation with a region into an MLIR symbol 1401table. This would especially be annoying since fir.designator also produces 1402variables with their own properties, which would imply creating a lot of MLIR 1403symbols. All the operations that both accept variable and expression operands 1404would also either need to be more complex in order to both accept SSA values or 1405MLIR symbol operands (or some fir.as_expr %var operation should be added to 1406turn a variable into an expression). Given all variable definitions will 1407dominate their uses, it seems better to use an SSA model with named attributes. 1408Using SSA values also makes the transition and mixture with lower-level FIR 1409operations smoother: a variable SSA usage can simply be replaced by lower-level 1410FIR operations using the same SSA value. 1411 1412### Using some existing MLIR dialects for the high-level Fortran. 1413 1414#### Why not using Linalg dialect? 1415 1416The linalg dialects offers a powerful way to represent array operations: the 1417linalg.generic operation takes a set of input and output arrays, a related set 1418of affine maps to represent how these inputs/outputs are to be addressed, and a 1419region detailing what operation should happen at each iteration point, given the 1420input and output array elements. It seems mainly intended to optimize matmul, 1421dot, and sum. 1422 1423Issues: 1424 1425- The linalg dialect is tightly linked to the tensor/memref concepts that 1426 cannot represent byte stride based discontinuity and would most likely 1427 require FIR to use MLIR memref descriptor format to take advantage of it. 1428- It is not clear whether all Fortran array expression addressing can be 1429 represented as semi affine maps. For instance, vector subscripted entities 1430 can probably not, which may force creating temporaries for the related 1431 designator expressions to fit in this framework. Fortran has a lot more 1432 transformational intrinsics than matmul, dot, and sum that can and should 1433 still be optimized. 1434 1435So while there may be benefits to use linalg at the optimization level (like 1436rewriting fir.sum/fir.matmul to a linalg sum, with dialect types plumbing 1437around the operand and results, to get tiling done by linalg), using it as a 1438lowering target would not cover all Fortran needs (especially for the non 1439semi-affine cases). 1440So using linalg is for now left as an optimization pass opportunity in some 1441cases that could be experimented. 1442 1443#### Why not using Shape dialect? 1444 1445MLIR shape dialect gives a set of operations to manipulate shapes. The 1446shape.meet operation is exactly similar with hlfir.shape_meet, except that it 1447returns a tensor or a shape.shape. 1448 1449The main issue with using the shape dialect is that it is dependent on tensors. 1450Bringing the tensor toolchain in flang for the sole purpose of manipulating 1451shape is not seen as beneficial given that the only thing Fortran needs is 1452shape.meet The shape dialect is a lot more complex because it is intended to 1453deal with computations involving dynamically ranked entity, which is not the 1454case in Fortran (assumed rank usage in Fortran is greatly limited). 1455 1456### Using embox/rebox and box as an alternative to fir.declare/hlfir.designate and hlfir.expr/ variable concept 1457 1458All Fortran entities (*) can be described at runtime by a fir.box, except for 1459some attributes that are not part of the runtime descriptors (like TARGET, 1460OPTIONAL or VOLATILE). In that sense, it would be possible to have 1461fir.declare, hlfir.designate, and hlfir.associate be replaced by embox/rebox, 1462and also to have all operation creating hlfir.expr to create fir.box. 1463 1464This was rejected because this would lack clarity, and make embox/rebox 1465semantics way too complex (their codegen is already non-trivial), and also 1466because it would then not really be possible to know if a fir.box is an 1467expression or a variable when it is an operand, which would make reasoning 1468harder: this would already imply that expressions have been buffered, and it is 1469not clear when looking at a fir.box if the value it describe may change or not, 1470while a hlfir.expr value cannot change, which allows moving its usages more 1471easily. 1472 1473This would also risk generating too many runtime descriptors read and writes 1474that could make later optimizations harder. 1475 1476Hence, while this would be functionally possible, this makes the reasoning about 1477the IR harder and would not benefit high-level optimizations. 1478 1479(*) This not true for vector subscripted variables, but the proposed plan will 1480also not allow creating vector subscripted variables as the result of a 1481hlfir.designate. Lowering will deal with the assignment and input IO special 1482case using hlfir.elemental. 1483