1*2d416219STom Eccles# Stack arrays pass 25922b886STom Eccles## Problem Description 35922b886STom EcclesIn gfortran, `-fstack-arrays` will cause all local arrays, including those of 45922b886STom Ecclesunknown size, to be allocated from stack memory. Gfortran enables this flag by 55922b886STom Ecclesdefault at `-Ofast`. 65922b886STom Eccles 75922b886STom EcclesFlang already allocates all local arrays on the stack (the 85922b886STom Eccles`memory-allocation-opt` pass can move large allocations to the heap, but by 95922b886STom Ecclesdefault it will not). But there are some cases where temporary arrays are 105922b886STom Ecclescreated on the heap by Flang. Moving these temporary arrays is the goal here. 115922b886STom Eccles 125922b886STom Eccles## Proposed Solution 135922b886STom EcclesOne approach would be to write a pass which attempts to move heap allocations to 145922b886STom Ecclesthe stack (like the `memory-allocation-opt` pass in reverse). This approach has 155922b886STom Ecclesthe advantage of a self-contained implementation and ensuring that newly added 165922b886STom Ecclescases are covered. 175922b886STom Eccles 185922b886STom EcclesAnother approach would be to modify each place in which arrays are allocated on 195922b886STom Ecclesthe heap to instead generate a stack allocation. The advantage of the second 205922b886STom Ecclesapproach is that it would be difficult to match up all `fir.allocmem` operations 215922b886STom Eccleswith their associated `fir.freemem`. In general, the lifetimes of heap allocated 225922b886STom Ecclesmemory are not constrained by the current function's stack frame and so cannot 235922b886STom Ecclesbe always converted to stack allocations. It is much easier to swap heap 245922b886STom Ecclesallocations for stack allocations when they are first generated because the 255922b886STom Eccleslifetime information is conveniently available. 265922b886STom Eccles 275922b886STom EcclesFor example, to rewrite the heap allocation in the `array-value-copy` pass with 285922b886STom Ecclesa stack allocation using the first approach would require analysis to ensure 295922b886STom Ecclesthat the heap allocation is always freed before the function returns. This is 305922b886STom Ecclesmuch more complex than never generating a heap allocation (and free) in the 315922b886STom Ecclesfirst place (the second approach). 325922b886STom Eccles 335922b886STom EcclesThe plan is to take the more complex first approach so that newly added changes 345922b886STom Ecclesto lowering code do not need to be made to support the stack arrays option. The 355922b886STom Ecclesgeneral problem of determining heap allocation lifetimes can be simplified in 365922b886STom Ecclesthis case because only those allocations which are always freed before exit from 375922b886STom Ecclesthe same function are possible to be moved to heap allocations in that 385922b886STom Ecclesfunction's stack frame. Furthermore, the aim of this pass would be to only move 395922b886STom Ecclesthose allocations generated by Flang, and so some of the more difficult cases 405922b886STom Ecclescan be avoided. Where the transformation is not possible, the existing heap 415922b886STom Ecclesallocation will be kept. 425922b886STom Eccles 435922b886STom Eccles## Implementation details overview 445922b886STom EcclesIn order to limit the complexity of the proposed pass, it is useful to 455922b886STom Ecclesunderstand the situations in which Flang will generate heap allocations. 465922b886STom Eccles 475922b886STom Eccles### Known Heap Array Allocations 485922b886STom EcclesFlang allocates most arrays on the stack by default, but there are a few cases 495922b886STom Eccleswhere temporary arrays are allocated on the heap: 505922b886STom Eccles- `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp` 515922b886STom Eccles- `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp` 525922b886STom Eccles- `flang/lib/Lower/ConvertExpr.cpp` 535922b886STom Eccles- `flang/lib/Lower/IntrinsicCall.cpp` 545922b886STom Eccles- `flang/lib/Lower/ConvertVariable.cpp` 555922b886STom Eccles 565922b886STom EcclesLowering code is being updated and in the future, temporaries for expressions 575922b886STom Eccleswill be created in the HLFIR bufferization pass in 585922b886STom Eccles`flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`. 595922b886STom Eccles 605922b886STom Eccles#### `ArrayValueCopy.cpp` 615922b886STom EcclesMemory is allocated for a temporary array in `allocateArrayTemp()`. This 625922b886STom Ecclestemporary array is used to ensure that assignments of one array to itself 635922b886STom Ecclesproduce the required value. E.g. 645922b886STom Eccles 655922b886STom Eccles``` 665922b886STom Ecclesinteger, dimension(5), intent(inout) :: x 675922b886STom Ecclesx(3,4) = x(1,2) 685922b886STom Eccles``` 695922b886STom Eccles 705922b886STom Eccles#### `MemoryAllocation.cpp` 715922b886STom EcclesThe default options for the Memory Allocation transformation ensure that no 725922b886STom Ecclesarray allocations, no matter how large, are moved from the stack to the heap. 735922b886STom Eccles 745922b886STom Eccles#### `ConvertExpr.cpp` 755922b886STom Eccles`ConvertExpr.cpp` allocates many array temporaries on the heap: 765922b886STom Eccles - Passing array arguments by value or when they need re-shaping 775922b886STom Eccles - Lowering elemental array expressions 785922b886STom Eccles - Lowering mask expressions 795922b886STom Eccles - Array constructors 805922b886STom Eccles 815922b886STom EcclesThe last two of these cases are **not** covered by the current stack arrays pass 825922b886STom Ecclesdesign. 835922b886STom Eccles 845922b886STom EcclesThe FIR code generated for mask expressions (the WHERE construct) sets a 855922b886STom Ecclesboolean variable to indicate whether a heap allocation was necessary. The 865922b886STom Ecclesallocation is only freed if the variable indicates that the allocation was 875922b886STom Ecclesperformed to begin with. The proposed dataflow analysis is not intelligent 885922b886STom Ecclesenough to statically determine that the boolean variable will always be true 895922b886STom Eccleswhen the allocation is performed. Beyond this, the control flow in the generated 905922b886STom EcclesFIR code passes the allocated memory through `fir.result`, resulting in a 915922b886STom Ecclesdifferent SSA value to be allocated and freed, causing the analysis not to 925922b886STom Ecclesrealise that the allocated memory is freed. The most convenient solution here 935922b886STom Eccleswould be to generate less complicated FIR code, as the existing codegen has 945922b886STom Ecclesknown bugs: https://github.com/llvm/llvm-project/issues/56921, 955922b886STom Eccleshttps://github.com/llvm/llvm-project/issues/59803. 965922b886STom Eccles 975922b886STom EcclesCode generated for array constructors uses `realloc()` to grow the allocated 985922b886STom Ecclesbuffer because the size of the resulting array cannot always be determined 995922b886STom Ecclesahead of running the constructor. This makes this temporary unsuitable 1005922b886STom Ecclesfor allocation on the stack. 1015922b886STom Eccles 1025922b886STom Eccles#### `IntrinsicCall.cpp` 1035922b886STom EcclesThe existing design is for the runtime to do the allocation and the lowering 1045922b886STom Ecclescode to insert `fir.freemem` to remove the allocation. It is not clear whether 1055922b886STom Ecclesthis can be replaced by adapting lowering code to do stack allocations and to 1065922b886STom Ecclespass these to the runtime. This would be a significant change and so is out of 1075922b886STom Ecclesscope of `-fstack-arrays`. 1085922b886STom Eccles 1095922b886STom Eccles#### `ConvertVariable.cpp` 1105922b886STom EcclesSee `Fortran::lower::MapSymbolAttributes`: 1115922b886STom Eccles 1125922b886STom EcclesDummy arguments that are not declared in the current entry point require a 1135922b886STom Ecclesskeleton definition. Most such "unused" dummies will not survive into final 1145922b886STom Ecclesgenerated code, but some will. It is illegal to reference one at run time if it 1155922b886STom Ecclesdoes. There are three ways these dummies can be mapped to a value: 1165922b886STom Eccles- a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic 1175922b886STom Eccles bounds or used to define another dummy. 1185922b886STom Eccles- A stack slot. This has intermediate-weight cost but may not be usable for 1195922b886STom Eccles objects with dynamic bounds. 1205922b886STom Eccles- A heap box/descriptor is the heaviest weight option, only used for dynamic 1215922b886STom Eccles objects. 1225922b886STom Eccles 1235922b886STom EcclesThese heap allocations will be out of scope for `-fstack-arrays` because the 1245922b886STom Ecclesexisting implementation already uses stack allocations (or better) where 1255922b886STom Ecclespossible, and most such dummy arguments will be removed from generated code. 1265922b886STom Eccles 1275922b886STom Eccles#### `BufferizeHLFIR.cpp` 1285922b886STom EcclesTODO 1295922b886STom Eccles 1305922b886STom Eccles### Detecting Allocations to Move 1315922b886STom EcclesAllocations which could be moved to the stack will be detected by performing a 132b2b7efb9SAlex Zinenkoforward dense data flow analysis using `mlir::dataflow::DenseForwardDataFlowAnalysis`. 1335922b886STom EcclesThis analysis will search for SSA values created by a `fir.allocmem` which are 1345922b886STom Ecclesalways freed using `fir.freemem` within the same function. 1355922b886STom Eccles 1365922b886STom EcclesTracking allocations by SSA values will miss some cases where address 1375922b886STom Ecclescalculations are duplicated in different blocks: resulting in different SSA 1385922b886STom Ecclesvalues as arguments for the allocation and the free. It is believed that simple 1395922b886STom Ecclestracking of SSA values is sufficient for detecting the allocations for array 1405922b886STom Ecclestemporaries added by Flang, because these temporaries should be simple arrays: 1415922b886STom Ecclesnot nested inside of derived types or other arrays. 1425922b886STom Eccles 1435922b886STom EcclesArrays allocated by an `allocate` statement in source code should not be moved 1445922b886STom Ecclesto the stack. These will be identified by adding an attribute to these 1455922b886STom Eccles`fir.allocmem` operations when they are lowered. Intrinsics such as `allocated` 1465922b886STom Ecclesand `move_alloc` do not need to be supported because the array temporaries moved 1475922b886STom Ecclesto the stack will never be visible in user code. 1485922b886STom Eccles 1495922b886STom EcclesOnly allocations for arrays will be considered for moving to the stack. 1505922b886STom Eccles 1515922b886STom EcclesWhen conducting the dense data flow analysis, the pass will assume that existing 1525922b886STom Ecclesallocations are not freed inside called functions. This is true for the existing 1535922b886STom Ecclesheap array temporary allocations generated by Flang. If an allocation were to be 1545922b886STom Ecclesfreed inside of a called function, the allocation would presumably not also be 1555922b886STom Ecclesfreed later in the caller function (as this would be a double-free). Therefore 1565922b886STom Ecclesthe stack arrays pass would not find where the allocation was freed and so not 1575922b886STom Ecclestransform the allocation. It is necessary to make this assumption so that the 1585922b886STom Ecclesstack arrays pass can transform array allocations created for pass-by-value 1595922b886STom Ecclesfunction arguments. 1605922b886STom Eccles 1615922b886STom Eccles### Moving Allocations 1625922b886STom EcclesThe `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments. 1635922b886STom EcclesThe `fir.freemem` will be removed entirely. 1645922b886STom Eccles 1655922b886STom EcclesWhere possible, the `fir.alloca` should be placed in the function entry block 1665922b886STom Eccles(so we can be sure it only happens once). This may not be possible if the array 1675922b886STom Eccleshas non-constant extents (causing the `fir.alloca` to have SSA values as 1685922b886STom Ecclesoperands). In this case, the `fir.alloca` will be placed immediately after the 1695922b886STom Eccleslast operand becomes available. 1705922b886STom Eccles 1715922b886STom EcclesIf this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a 1725922b886STom Ecclescyclic CFG), the transformation should attempt to use the `llvm.stacksave`/ 1735922b886STom Eccles`llvm.stackrestore` intrinsics to ensure that the stack does not grow on every 1745922b886STom Ecclesloop iteration. Use of these intrinsics is considered valid when the allocation 1755922b886STom Ecclesand deallocation occur within the same block and there are no other stack 1765922b886STom Ecclesallocations between them. If this is not the case, the existing heap allocation 1775922b886STom Eccleswill be preserved. 1785922b886STom Eccles 1795922b886STom Eccles### FIR attribute 1805922b886STom EcclesA FIR attribute will be added to distinguish `fir.allocmem` arising from 1815922b886STom Eccles`allocate` statements in source from `fir.allocmem` operations added by Flang. 1825922b886STom EcclesThe attribute will be called `"fir.must_be_heap"` and will have a boolean value: 1835922b886STom Eccles`true` meaning that the stack arrays pass must not move the allocation, `false` 1845922b886STom Ecclesmeaning that stack arrays may move the allocation. Not specifying the attribute 1855922b886STom Eccleswill be equivalent to setting it to `false`. 1865922b886STom Eccles 1875922b886STom Eccles## Testing Plan 1885922b886STom EcclesFileCheck tests will be written to check each of the above identified sources of 1895922b886STom Ecclesheap allocated array temporaries are detected and converted by the new pass. 1905922b886STom Eccles 1915922b886STom EcclesAnother test will check that `allocate` statements in source code will not be 1925922b886STom Ecclesmoved to the stack. 193