1# Stack arrays pass 2## Problem Description 3In gfortran, `-fstack-arrays` will cause all local arrays, including those of 4unknown size, to be allocated from stack memory. Gfortran enables this flag by 5default at `-Ofast`. 6 7Flang already allocates all local arrays on the stack (the 8`memory-allocation-opt` pass can move large allocations to the heap, but by 9default it will not). But there are some cases where temporary arrays are 10created on the heap by Flang. Moving these temporary arrays is the goal here. 11 12## Proposed Solution 13One approach would be to write a pass which attempts to move heap allocations to 14the stack (like the `memory-allocation-opt` pass in reverse). This approach has 15the advantage of a self-contained implementation and ensuring that newly added 16cases are covered. 17 18Another approach would be to modify each place in which arrays are allocated on 19the heap to instead generate a stack allocation. The advantage of the second 20approach is that it would be difficult to match up all `fir.allocmem` operations 21with their associated `fir.freemem`. In general, the lifetimes of heap allocated 22memory are not constrained by the current function's stack frame and so cannot 23be always converted to stack allocations. It is much easier to swap heap 24allocations for stack allocations when they are first generated because the 25lifetime information is conveniently available. 26 27For example, to rewrite the heap allocation in the `array-value-copy` pass with 28a stack allocation using the first approach would require analysis to ensure 29that the heap allocation is always freed before the function returns. This is 30much more complex than never generating a heap allocation (and free) in the 31first place (the second approach). 32 33The plan is to take the more complex first approach so that newly added changes 34to lowering code do not need to be made to support the stack arrays option. The 35general problem of determining heap allocation lifetimes can be simplified in 36this case because only those allocations which are always freed before exit from 37the same function are possible to be moved to heap allocations in that 38function's stack frame. Furthermore, the aim of this pass would be to only move 39those allocations generated by Flang, and so some of the more difficult cases 40can be avoided. Where the transformation is not possible, the existing heap 41allocation will be kept. 42 43## Implementation details overview 44In order to limit the complexity of the proposed pass, it is useful to 45understand the situations in which Flang will generate heap allocations. 46 47### Known Heap Array Allocations 48Flang allocates most arrays on the stack by default, but there are a few cases 49where temporary arrays are allocated on the heap: 50- `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp` 51- `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp` 52- `flang/lib/Lower/ConvertExpr.cpp` 53- `flang/lib/Lower/IntrinsicCall.cpp` 54- `flang/lib/Lower/ConvertVariable.cpp` 55 56Lowering code is being updated and in the future, temporaries for expressions 57will be created in the HLFIR bufferization pass in 58`flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`. 59 60#### `ArrayValueCopy.cpp` 61Memory is allocated for a temporary array in `allocateArrayTemp()`. This 62temporary array is used to ensure that assignments of one array to itself 63produce the required value. E.g. 64 65``` 66integer, dimension(5), intent(inout) :: x 67x(3,4) = x(1,2) 68``` 69 70#### `MemoryAllocation.cpp` 71The default options for the Memory Allocation transformation ensure that no 72array allocations, no matter how large, are moved from the stack to the heap. 73 74#### `ConvertExpr.cpp` 75`ConvertExpr.cpp` allocates many array temporaries on the heap: 76 - Passing array arguments by value or when they need re-shaping 77 - Lowering elemental array expressions 78 - Lowering mask expressions 79 - Array constructors 80 81The last two of these cases are **not** covered by the current stack arrays pass 82design. 83 84The FIR code generated for mask expressions (the WHERE construct) sets a 85boolean variable to indicate whether a heap allocation was necessary. The 86allocation is only freed if the variable indicates that the allocation was 87performed to begin with. The proposed dataflow analysis is not intelligent 88enough to statically determine that the boolean variable will always be true 89when the allocation is performed. Beyond this, the control flow in the generated 90FIR code passes the allocated memory through `fir.result`, resulting in a 91different SSA value to be allocated and freed, causing the analysis not to 92realise that the allocated memory is freed. The most convenient solution here 93would be to generate less complicated FIR code, as the existing codegen has 94known bugs: https://github.com/llvm/llvm-project/issues/56921, 95https://github.com/llvm/llvm-project/issues/59803. 96 97Code generated for array constructors uses `realloc()` to grow the allocated 98buffer because the size of the resulting array cannot always be determined 99ahead of running the constructor. This makes this temporary unsuitable 100for allocation on the stack. 101 102#### `IntrinsicCall.cpp` 103The existing design is for the runtime to do the allocation and the lowering 104code to insert `fir.freemem` to remove the allocation. It is not clear whether 105this can be replaced by adapting lowering code to do stack allocations and to 106pass these to the runtime. This would be a significant change and so is out of 107scope of `-fstack-arrays`. 108 109#### `ConvertVariable.cpp` 110See `Fortran::lower::MapSymbolAttributes`: 111 112Dummy arguments that are not declared in the current entry point require a 113skeleton definition. Most such "unused" dummies will not survive into final 114generated code, but some will. It is illegal to reference one at run time if it 115does. There are three ways these dummies can be mapped to a value: 116- a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic 117 bounds or used to define another dummy. 118- A stack slot. This has intermediate-weight cost but may not be usable for 119 objects with dynamic bounds. 120- A heap box/descriptor is the heaviest weight option, only used for dynamic 121 objects. 122 123These heap allocations will be out of scope for `-fstack-arrays` because the 124existing implementation already uses stack allocations (or better) where 125possible, and most such dummy arguments will be removed from generated code. 126 127#### `BufferizeHLFIR.cpp` 128TODO 129 130### Detecting Allocations to Move 131Allocations which could be moved to the stack will be detected by performing a 132forward dense data flow analysis using `mlir::dataflow::DenseForwardDataFlowAnalysis`. 133This analysis will search for SSA values created by a `fir.allocmem` which are 134always freed using `fir.freemem` within the same function. 135 136Tracking allocations by SSA values will miss some cases where address 137calculations are duplicated in different blocks: resulting in different SSA 138values as arguments for the allocation and the free. It is believed that simple 139tracking of SSA values is sufficient for detecting the allocations for array 140temporaries added by Flang, because these temporaries should be simple arrays: 141not nested inside of derived types or other arrays. 142 143Arrays allocated by an `allocate` statement in source code should not be moved 144to the stack. These will be identified by adding an attribute to these 145`fir.allocmem` operations when they are lowered. Intrinsics such as `allocated` 146and `move_alloc` do not need to be supported because the array temporaries moved 147to the stack will never be visible in user code. 148 149Only allocations for arrays will be considered for moving to the stack. 150 151When conducting the dense data flow analysis, the pass will assume that existing 152allocations are not freed inside called functions. This is true for the existing 153heap array temporary allocations generated by Flang. If an allocation were to be 154freed inside of a called function, the allocation would presumably not also be 155freed later in the caller function (as this would be a double-free). Therefore 156the stack arrays pass would not find where the allocation was freed and so not 157transform the allocation. It is necessary to make this assumption so that the 158stack arrays pass can transform array allocations created for pass-by-value 159function arguments. 160 161### Moving Allocations 162The `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments. 163The `fir.freemem` will be removed entirely. 164 165Where possible, the `fir.alloca` should be placed in the function entry block 166(so we can be sure it only happens once). This may not be possible if the array 167has non-constant extents (causing the `fir.alloca` to have SSA values as 168operands). In this case, the `fir.alloca` will be placed immediately after the 169last operand becomes available. 170 171If this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a 172cyclic CFG), the transformation should attempt to use the `llvm.stacksave`/ 173`llvm.stackrestore` intrinsics to ensure that the stack does not grow on every 174loop iteration. Use of these intrinsics is considered valid when the allocation 175and deallocation occur within the same block and there are no other stack 176allocations between them. If this is not the case, the existing heap allocation 177will be preserved. 178 179### FIR attribute 180A FIR attribute will be added to distinguish `fir.allocmem` arising from 181`allocate` statements in source from `fir.allocmem` operations added by Flang. 182The attribute will be called `"fir.must_be_heap"` and will have a boolean value: 183`true` meaning that the stack arrays pass must not move the allocation, `false` 184meaning that stack arrays may move the allocation. Not specifying the attribute 185will be equivalent to setting it to `false`. 186 187## Testing Plan 188FileCheck tests will be written to check each of the above identified sources of 189heap allocated array temporaries are detected and converted by the new pass. 190 191Another test will check that `allocate` statements in source code will not be 192moved to the stack. 193