xref: /llvm-project/flang/docs/fstack-arrays.md (revision 2d416219af5c0091f7887e4d4463e63f5a37d811)
1*2d416219STom Eccles# Stack arrays pass
25922b886STom Eccles## Problem Description
35922b886STom EcclesIn gfortran, `-fstack-arrays` will cause all local arrays, including those of
45922b886STom Ecclesunknown size, to be allocated from stack memory. Gfortran enables this flag by
55922b886STom Ecclesdefault at `-Ofast`.
65922b886STom Eccles
75922b886STom EcclesFlang already allocates all local arrays on the stack (the
85922b886STom Eccles`memory-allocation-opt` pass can move large allocations to the heap, but by
95922b886STom Ecclesdefault it will not). But there are some cases where temporary arrays are
105922b886STom Ecclescreated on the heap by Flang. Moving these temporary arrays is the goal here.
115922b886STom Eccles
125922b886STom Eccles## Proposed Solution
135922b886STom EcclesOne approach would be to write a pass which attempts to move heap allocations to
145922b886STom Ecclesthe stack (like the `memory-allocation-opt` pass in reverse). This approach has
155922b886STom Ecclesthe advantage of a self-contained implementation and ensuring that newly added
165922b886STom Ecclescases are covered.
175922b886STom Eccles
185922b886STom EcclesAnother approach would be to modify each place in which arrays are allocated on
195922b886STom Ecclesthe heap to instead generate a stack allocation. The advantage of the second
205922b886STom Ecclesapproach is that it would be difficult to match up all `fir.allocmem` operations
215922b886STom Eccleswith their associated `fir.freemem`. In general, the lifetimes of heap allocated
225922b886STom Ecclesmemory are not constrained by the current function's stack frame and so cannot
235922b886STom Ecclesbe always converted to stack allocations. It is much easier to swap heap
245922b886STom Ecclesallocations for stack allocations when they are first generated because the
255922b886STom Eccleslifetime information is conveniently available.
265922b886STom Eccles
275922b886STom EcclesFor example, to rewrite the heap allocation in the `array-value-copy` pass with
285922b886STom Ecclesa stack allocation using the first approach would require analysis to ensure
295922b886STom Ecclesthat the heap allocation is always freed before the function returns. This is
305922b886STom Ecclesmuch more complex than never generating a heap allocation (and free) in the
315922b886STom Ecclesfirst place (the second approach).
325922b886STom Eccles
335922b886STom EcclesThe plan is to take the more complex first approach so that newly added changes
345922b886STom Ecclesto lowering code do not need to be made to support the stack arrays option. The
355922b886STom Ecclesgeneral problem of determining heap allocation lifetimes can be simplified in
365922b886STom Ecclesthis case because only those allocations which are always freed before exit from
375922b886STom Ecclesthe same function are possible to be moved to heap allocations in that
385922b886STom Ecclesfunction's stack frame. Furthermore, the aim of this pass would be to only move
395922b886STom Ecclesthose allocations generated by Flang, and so some of the more difficult cases
405922b886STom Ecclescan be avoided. Where the transformation is not possible, the existing heap
415922b886STom Ecclesallocation will be kept.
425922b886STom Eccles
435922b886STom Eccles## Implementation details overview
445922b886STom EcclesIn order to limit the complexity of the proposed pass, it is useful to
455922b886STom Ecclesunderstand the situations in which Flang will generate heap allocations.
465922b886STom Eccles
475922b886STom Eccles### Known Heap Array Allocations
485922b886STom EcclesFlang allocates most arrays on the stack by default, but there are a few cases
495922b886STom Eccleswhere temporary arrays are allocated on the heap:
505922b886STom Eccles- `flang/lib/Optimizer/Transforms/ArrayValueCopy.cpp`
515922b886STom Eccles- `flang/lib/Optimizer/Transforms/MemoryAllocation.cpp`
525922b886STom Eccles- `flang/lib/Lower/ConvertExpr.cpp`
535922b886STom Eccles- `flang/lib/Lower/IntrinsicCall.cpp`
545922b886STom Eccles- `flang/lib/Lower/ConvertVariable.cpp`
555922b886STom Eccles
565922b886STom EcclesLowering code is being updated and in the future, temporaries for expressions
575922b886STom Eccleswill be created in the HLFIR bufferization pass in
585922b886STom Eccles`flang/lib/Optimizer/HLFIR/Trnasforms/BufferizeHLFIR.cpp`.
595922b886STom Eccles
605922b886STom Eccles#### `ArrayValueCopy.cpp`
615922b886STom EcclesMemory is allocated for a temporary array in `allocateArrayTemp()`. This
625922b886STom Ecclestemporary array is used to ensure that assignments of one array to itself
635922b886STom Ecclesproduce the required value. E.g.
645922b886STom Eccles
655922b886STom Eccles```
665922b886STom Ecclesinteger, dimension(5), intent(inout) :: x
675922b886STom Ecclesx(3,4) = x(1,2)
685922b886STom Eccles```
695922b886STom Eccles
705922b886STom Eccles#### `MemoryAllocation.cpp`
715922b886STom EcclesThe default options for the Memory Allocation transformation ensure that no
725922b886STom Ecclesarray allocations, no matter how large, are moved from the stack to the heap.
735922b886STom Eccles
745922b886STom Eccles#### `ConvertExpr.cpp`
755922b886STom Eccles`ConvertExpr.cpp` allocates many array temporaries on the heap:
765922b886STom Eccles  - Passing array arguments by value or when they need re-shaping
775922b886STom Eccles  - Lowering elemental array expressions
785922b886STom Eccles  - Lowering mask expressions
795922b886STom Eccles  - Array constructors
805922b886STom Eccles
815922b886STom EcclesThe last two of these cases are **not** covered by the current stack arrays pass
825922b886STom Ecclesdesign.
835922b886STom Eccles
845922b886STom EcclesThe FIR code generated for mask expressions (the WHERE construct) sets a
855922b886STom Ecclesboolean variable to indicate whether a heap allocation was necessary. The
865922b886STom Ecclesallocation is only freed if the variable indicates that the allocation was
875922b886STom Ecclesperformed to begin with. The proposed dataflow analysis is not intelligent
885922b886STom Ecclesenough to statically determine that the boolean variable will always be true
895922b886STom Eccleswhen the allocation is performed. Beyond this, the control flow in the generated
905922b886STom EcclesFIR code passes the allocated memory through `fir.result`, resulting in a
915922b886STom Ecclesdifferent SSA value to be allocated and freed, causing the analysis not to
925922b886STom Ecclesrealise that the allocated memory is freed. The most convenient solution here
935922b886STom Eccleswould be to generate less complicated FIR code, as the existing codegen has
945922b886STom Ecclesknown bugs: https://github.com/llvm/llvm-project/issues/56921,
955922b886STom Eccleshttps://github.com/llvm/llvm-project/issues/59803.
965922b886STom Eccles
975922b886STom EcclesCode generated for array constructors uses `realloc()` to grow the allocated
985922b886STom Ecclesbuffer because the size of the resulting array cannot always be determined
995922b886STom Ecclesahead of running the constructor. This makes this temporary unsuitable
1005922b886STom Ecclesfor allocation on the stack.
1015922b886STom Eccles
1025922b886STom Eccles#### `IntrinsicCall.cpp`
1035922b886STom EcclesThe existing design is for the runtime to do the allocation and the lowering
1045922b886STom Ecclescode to insert `fir.freemem` to remove the allocation. It is not clear whether
1055922b886STom Ecclesthis can be replaced by adapting lowering code to do stack allocations and to
1065922b886STom Ecclespass these to the runtime. This would be a significant change and so is out of
1075922b886STom Ecclesscope of `-fstack-arrays`.
1085922b886STom Eccles
1095922b886STom Eccles#### `ConvertVariable.cpp`
1105922b886STom EcclesSee `Fortran::lower::MapSymbolAttributes`:
1115922b886STom Eccles
1125922b886STom EcclesDummy arguments that are not declared in the current entry point require a
1135922b886STom Ecclesskeleton definition. Most such "unused" dummies will not survive into final
1145922b886STom Ecclesgenerated code, but some will. It is illegal to reference one at run time if it
1155922b886STom Ecclesdoes. There are three ways these dummies can be mapped to a value:
1165922b886STom Eccles- a `fir::UndefOp` value: preferable but can't be used for dummies with dynamic
1175922b886STom Eccles  bounds or used to define another dummy.
1185922b886STom Eccles- A stack slot. This has intermediate-weight cost but may not be usable for
1195922b886STom Eccles  objects with dynamic bounds.
1205922b886STom Eccles- A heap box/descriptor is the heaviest weight option, only used for dynamic
1215922b886STom Eccles  objects.
1225922b886STom Eccles
1235922b886STom EcclesThese heap allocations will be out of scope for `-fstack-arrays` because the
1245922b886STom Ecclesexisting implementation already uses stack allocations (or better) where
1255922b886STom Ecclespossible, and most such dummy arguments will be removed from generated code.
1265922b886STom Eccles
1275922b886STom Eccles#### `BufferizeHLFIR.cpp`
1285922b886STom EcclesTODO
1295922b886STom Eccles
1305922b886STom Eccles### Detecting Allocations to Move
1315922b886STom EcclesAllocations which could be moved to the stack will be detected by performing a
132b2b7efb9SAlex Zinenkoforward dense data flow analysis using `mlir::dataflow::DenseForwardDataFlowAnalysis`.
1335922b886STom EcclesThis analysis will search for SSA values created by a `fir.allocmem` which are
1345922b886STom Ecclesalways freed using `fir.freemem` within the same function.
1355922b886STom Eccles
1365922b886STom EcclesTracking allocations by SSA values will miss some cases where address
1375922b886STom Ecclescalculations are duplicated in different blocks: resulting in different SSA
1385922b886STom Ecclesvalues as arguments for the allocation and the free. It is believed that simple
1395922b886STom Ecclestracking of SSA values is sufficient for detecting the allocations for array
1405922b886STom Ecclestemporaries added by Flang, because these temporaries should be simple arrays:
1415922b886STom Ecclesnot nested inside of derived types or other arrays.
1425922b886STom Eccles
1435922b886STom EcclesArrays allocated by an `allocate` statement in source code should not be moved
1445922b886STom Ecclesto the stack. These will be identified by adding an attribute to these
1455922b886STom Eccles`fir.allocmem` operations when they are lowered. Intrinsics such as `allocated`
1465922b886STom Ecclesand `move_alloc` do not need to be supported because the array temporaries moved
1475922b886STom Ecclesto the stack will never be visible in user code.
1485922b886STom Eccles
1495922b886STom EcclesOnly allocations for arrays will be considered for moving to the stack.
1505922b886STom Eccles
1515922b886STom EcclesWhen conducting the dense data flow analysis, the pass will assume that existing
1525922b886STom Ecclesallocations are not freed inside called functions. This is true for the existing
1535922b886STom Ecclesheap array temporary allocations generated by Flang. If an allocation were to be
1545922b886STom Ecclesfreed inside of a called function, the allocation would presumably not also be
1555922b886STom Ecclesfreed later in the caller function (as this would be a double-free). Therefore
1565922b886STom Ecclesthe stack arrays pass would not find where the allocation was freed and so not
1575922b886STom Ecclestransform the allocation. It is necessary to make this assumption so that the
1585922b886STom Ecclesstack arrays pass can transform array allocations created for pass-by-value
1595922b886STom Ecclesfunction arguments.
1605922b886STom Eccles
1615922b886STom Eccles### Moving Allocations
1625922b886STom EcclesThe `fir.allocmem` will be replaced by a `fir.alloca` with the same arguments.
1635922b886STom EcclesThe `fir.freemem` will be removed entirely.
1645922b886STom Eccles
1655922b886STom EcclesWhere possible, the `fir.alloca` should be placed in the function entry block
1665922b886STom Eccles(so we can be sure it only happens once). This may not be possible if the array
1675922b886STom Eccleshas non-constant extents (causing the `fir.alloca` to have SSA values as
1685922b886STom Ecclesoperands). In this case, the `fir.alloca` will be placed immediately after the
1695922b886STom Eccleslast operand becomes available.
1705922b886STom Eccles
1715922b886STom EcclesIf this location is inside a loop (either an `mlir::LoopLikeOpInterface` or a
1725922b886STom Ecclescyclic CFG), the transformation should attempt to use the `llvm.stacksave`/
1735922b886STom Eccles`llvm.stackrestore` intrinsics to ensure that the stack does not grow on every
1745922b886STom Ecclesloop iteration. Use of these intrinsics is considered valid when the allocation
1755922b886STom Ecclesand deallocation occur within the same block and there are no other stack
1765922b886STom Ecclesallocations between them. If this is not the case, the existing heap allocation
1775922b886STom Eccleswill be preserved.
1785922b886STom Eccles
1795922b886STom Eccles### FIR attribute
1805922b886STom EcclesA FIR attribute will be added to distinguish `fir.allocmem` arising from
1815922b886STom Eccles`allocate` statements in source from `fir.allocmem` operations  added by Flang.
1825922b886STom EcclesThe attribute will be called `"fir.must_be_heap"` and will have a boolean value:
1835922b886STom Eccles`true` meaning that the stack arrays pass must not move the allocation, `false`
1845922b886STom Ecclesmeaning that stack arrays may move the allocation. Not specifying the attribute
1855922b886STom Eccleswill be equivalent to setting it to `false`.
1865922b886STom Eccles
1875922b886STom Eccles## Testing Plan
1885922b886STom EcclesFileCheck tests will be written to check each of the above identified sources of
1895922b886STom Ecclesheap allocated array temporaries are detected and converted by the new pass.
1905922b886STom Eccles
1915922b886STom EcclesAnother test will check that `allocate` statements in source code will not be
1925922b886STom Ecclesmoved to the stack.
193