1<!--===- docs/ArrayComposition.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9# Array Composition 10 11```{contents} 12--- 13local: 14--- 15``` 16 17This note attempts to describe the motivation for and design of an 18implementation of Fortran 90 (and later) array expression evaluation that 19minimizes the use of dynamically allocated temporary storage for 20the results of calls to transformational intrinsic functions, and 21making them more amenable to acceleration. 22 23The transformational intrinsic functions of Fortran of interest to 24us here include: 25 26* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, 27 `DOT_PRODUCT`, 28 `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) 29* Axial reductions (`SUM(X,DIM=)`, &c.) 30* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) 31* Axial location reductions (`MAXLOC(DIM=`, &c.) 32* `TRANSPOSE(M)` matrix transposition 33* `RESHAPE` without `ORDER=` 34* `RESHAPE` with `ORDER=` 35* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` 36* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` 37* `PACK` and `UNPACK` 38* `MATMUL` 39* `SPREAD` 40 41Other Fortran intrinsic functions are technically transformational (e.g., 42`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. 43The generic `REDUCE` is also not considered here. 44 45## Arrays as functions 46 47A whole array can be viewed as a function that maps its indices to the values 48of its elements. 49Specifically, it is a map from a tuple of integers to its element type. 50The rank of the array is the number of elements in that tuple, 51and the shape of the array delimits the domain of the map. 52 53`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers 54`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. 55 56## Array expressions as functions 57 58The same perspective can be taken of an array expression comprising 59intrinsic operators and elemental functions. 60Fortran doesn't allow one to apply subscripts directly to an expression, 61but expressions have rank and shape, and one can view array expressions 62as functions over index tuples by applying those indices to the arrays 63and subexpressions in the expression. 64 65Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). 66The right-hand side of that assignment could be evaluated into a 67temporary array `T` and then subscripted as it is copied into `B`. 68``` 69REAL, ALLOCATABLE :: T(:,:) 70ALLOCATE(T(N,M)) 71DO CONCURRENT(J=1:N,K=1:M) 72 T(J,K)=A(J,K) + 1.0 73END DO 74DO CONCURRENT(J=1:N,K=1:M) 75 B(J,K)=T(J,K) 76END DO 77DEALLOCATE(T) 78``` 79But we can avoid the allocation, population, and deallocation of 80the temporary by treating the right-hand side expression as if it 81were a statement function `F(J,K)=A(J,K)+1.0` and evaluating 82``` 83DO CONCURRENT(J=1:N,K=1:M) 84 A(J,K)=F(J,K) 85END DO 86``` 87 88In general, when a Fortran array assignment to a non-allocatable array 89does not include the left-hand 90side variable as an operand of the right-hand side expression, and any 91function calls on the right-hand side are elemental or scalar-valued, 92we can avoid the use of a temporary. 93 94## Transformational intrinsic functions as function composition 95 96Many of the transformational intrinsic functions listed above 97can, when their array arguments are viewed as functions over their 98index tuples, be seen as compositions of those functions with 99functions of the "incoming" indices -- yielding a function for 100an entire right-hand side of an array assignment statement. 101 102For example, the application of `TRANSPOSE(A + 1.0)` to the index 103tuple `(J,K)` becomes `A(K,J) + 1.0`. 104 105Partial (axial) reductions can be similarly composed. 106The application of `SUM(A,DIM=2)` to the index `J` is the 107complete reduction `SUM(A(J,:))`. 108 109More completely: 110* Reductions to scalars (`SUM(X)` without `DIM=`) become 111 runtime calls; the result needs no dynamic allocation, 112 being a scalar. 113* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` 114 become scalar values like `SUM(X(J,K,:))` if `d=3`. 115* Location reductions to indices (`MAXLOC(X)` without `DIM=`) 116 do not require dynamic allocation, since their results are 117 either scalar or small vectors of length `RANK(X)`. 118* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) 119 are handled like other axial reductions like `SUM(DIM=)`. 120* `TRANSPOSE(M)` exchanges the two components of the index tuple. 121* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape 122 vector `S`, and then use it to linearize indices into offsets 123 in the storage order of `A` (whose shape must also be captured). 124 These conversions can involve division and/or modulus, which 125 can be optimized into a fixed-point multiplication using the 126 usual technique. 127* `RESHAPE` with `ORDER=` is similar, but must permute the 128 components of the index tuple; it generalizes `TRANSPOSE`. 129* `CSHIFT` applies addition and modulus. 130* `EOSHIFT` applies addition and a conditional move (`MERGE`). 131* `PACK` and `UNPACK` are likely to require a runtime call. 132* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but 133 might benefit from calling a highly optimized runtime 134 routine. 135* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply 136 applies `A` to a reduced index tuple. 137 138## Determination of rank and shape 139 140An important part of evaluating array expressions without the use of 141temporary storage is determining the shape of the result prior to, 142or without, evaluating the elements of the result. 143 144The shapes of array objects, results of elemental intrinsic functions, 145and results of intrinsic operations are obvious. 146But it is possible to determine the shapes of the results of many 147transformational intrinsic function calls as well. 148 149* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: 150 `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. 151 (The `DIM=` argument is commonly a compile-time constant.) 152* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. 153* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. 154* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. 155* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. 156* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. 157* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` 158* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. 159* `RANK(PACK(...))` is always 1. 160* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. 161* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. 162* `SHAPE(SHAPE(X))` is `[RANK(X)]`. 163* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at 164 dimension `d`. 165 166This is useful because expression evaluations that *do* require temporaries 167to hold their results (due to the context in which the evaluation occurs) 168can be implemented with a separation of the allocation 169of the temporary array and the population of the array. 170The code that evaluates the expression, or that implements a transformational 171intrinsic in the runtime library, can be designed with an API that includes 172a pointer to the destination array as an argument. 173 174Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable 175of evaluating their array expressions directly into the newly-allocated 176storage for the allocatable array. 177The implementation would generate code to calculate the shape, use it 178to allocate the memory and populate the descriptor, and then drive a 179loop nest around the expression to populate the array. 180In cases where the analyzed shape is known at compile time, we should 181be able to have the opportunity to avoid heap allocation in favor of 182stack storage, if the scope of the variable is local. 183 184## Automatic reallocation of allocatables 185 186Fortran 2003 introduced the ability to assign non-conforming array expressions 187to ALLOCATABLE arrays with the implied semantics of reallocation to the 188new shape. 189The implementation of this feature also becomes more straightforward if 190our implementation of array expressions has decoupled calculation of shapes 191from the evaluation of the elements of the result. 192 193## Rewriting rules 194 195Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into 196the result of an array expression or subexpression. 197 198* Array constructors always yield vectors; higher-rank arrays that appear as 199 constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. 200* Array constructors with multiple constituents are concatenations of 201 their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. 202* Array constructors with implied DO loops are difficult when nested 203 triangularly. 204* Whole array references can have lower bounds other than 1, so 205 `A => A(LBOUND(A,1):UBOUND(A,1),...)`. 206* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. 207* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. 208* Scalar operands ignore indices: `X{j,k} => X`. 209 Further, they are evaluated at most once. 210* Elemental operators and functions apply indices to their arguments: 211 `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. 212* `TRANSPOSE(X){j,k} => X{k,j}`. 213* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. 214 If X is sufficiently expensive to compute elementally, it might be evaluated 215 into a temporary. 216 217(more...) 218