1932aae77SSourabh Singh Tomar<!--===- docs/ArrayComposition.md 2932aae77SSourabh Singh Tomar 3932aae77SSourabh Singh Tomar Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4932aae77SSourabh Singh Tomar See https://llvm.org/LICENSE.txt for license information. 5932aae77SSourabh Singh Tomar SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6932aae77SSourabh Singh Tomar 7932aae77SSourabh Singh Tomar--> 8932aae77SSourabh Singh Tomar 9271a7bb1SRichard Barton# Array Composition 10271a7bb1SRichard Barton 11*b7ff0320Scor3ntin```{contents} 12*b7ff0320Scor3ntin--- 13*b7ff0320Scor3ntinlocal: 14*b7ff0320Scor3ntin--- 15271a7bb1SRichard Barton``` 16271a7bb1SRichard Barton 17eaff2004Ssameeran joshiThis note attempts to describe the motivation for and design of an 18eaff2004Ssameeran joshiimplementation of Fortran 90 (and later) array expression evaluation that 19eaff2004Ssameeran joshiminimizes the use of dynamically allocated temporary storage for 20eaff2004Ssameeran joshithe results of calls to transformational intrinsic functions, and 21eaff2004Ssameeran joshimaking them more amenable to acceleration. 22eaff2004Ssameeran joshi 23eaff2004Ssameeran joshiThe transformational intrinsic functions of Fortran of interest to 24eaff2004Ssameeran joshius here include: 25eaff2004Ssameeran joshi 26eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, 27eaff2004Ssameeran joshi `DOT_PRODUCT`, 28eaff2004Ssameeran joshi `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) 29eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=)`, &c.) 30eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) 31eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(DIM=`, &c.) 32eaff2004Ssameeran joshi* `TRANSPOSE(M)` matrix transposition 33eaff2004Ssameeran joshi* `RESHAPE` without `ORDER=` 34eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=` 35eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` 36eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` 37eaff2004Ssameeran joshi* `PACK` and `UNPACK` 38eaff2004Ssameeran joshi* `MATMUL` 39eaff2004Ssameeran joshi* `SPREAD` 40eaff2004Ssameeran joshi 41eaff2004Ssameeran joshiOther Fortran intrinsic functions are technically transformational (e.g., 42eaff2004Ssameeran joshi`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. 43eaff2004Ssameeran joshiThe generic `REDUCE` is also not considered here. 44eaff2004Ssameeran joshi 45271a7bb1SRichard Barton## Arrays as functions 46271a7bb1SRichard Barton 47eaff2004Ssameeran joshiA whole array can be viewed as a function that maps its indices to the values 48eaff2004Ssameeran joshiof its elements. 49eaff2004Ssameeran joshiSpecifically, it is a map from a tuple of integers to its element type. 50eaff2004Ssameeran joshiThe rank of the array is the number of elements in that tuple, 51eaff2004Ssameeran joshiand the shape of the array delimits the domain of the map. 52eaff2004Ssameeran joshi 53eaff2004Ssameeran joshi`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers 54eaff2004Ssameeran joshi`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. 55eaff2004Ssameeran joshi 56271a7bb1SRichard Barton## Array expressions as functions 57271a7bb1SRichard Barton 58eaff2004Ssameeran joshiThe same perspective can be taken of an array expression comprising 59eaff2004Ssameeran joshiintrinsic operators and elemental functions. 60eaff2004Ssameeran joshiFortran doesn't allow one to apply subscripts directly to an expression, 61eaff2004Ssameeran joshibut expressions have rank and shape, and one can view array expressions 62eaff2004Ssameeran joshias functions over index tuples by applying those indices to the arrays 63eaff2004Ssameeran joshiand subexpressions in the expression. 64eaff2004Ssameeran joshi 65eaff2004Ssameeran joshiConsider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). 66eaff2004Ssameeran joshiThe right-hand side of that assignment could be evaluated into a 67eaff2004Ssameeran joshitemporary array `T` and then subscripted as it is copied into `B`. 68eaff2004Ssameeran joshi``` 69eaff2004Ssameeran joshiREAL, ALLOCATABLE :: T(:,:) 70eaff2004Ssameeran joshiALLOCATE(T(N,M)) 71eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M) 72eaff2004Ssameeran joshi T(J,K)=A(J,K) + 1.0 73eaff2004Ssameeran joshiEND DO 74eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M) 75eaff2004Ssameeran joshi B(J,K)=T(J,K) 76eaff2004Ssameeran joshiEND DO 77eaff2004Ssameeran joshiDEALLOCATE(T) 78eaff2004Ssameeran joshi``` 79eaff2004Ssameeran joshiBut we can avoid the allocation, population, and deallocation of 80eaff2004Ssameeran joshithe temporary by treating the right-hand side expression as if it 81eaff2004Ssameeran joshiwere a statement function `F(J,K)=A(J,K)+1.0` and evaluating 82eaff2004Ssameeran joshi``` 83eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M) 84eaff2004Ssameeran joshi A(J,K)=F(J,K) 85eaff2004Ssameeran joshiEND DO 86eaff2004Ssameeran joshi``` 87eaff2004Ssameeran joshi 88eaff2004Ssameeran joshiIn general, when a Fortran array assignment to a non-allocatable array 89eaff2004Ssameeran joshidoes not include the left-hand 90eaff2004Ssameeran joshiside variable as an operand of the right-hand side expression, and any 91eaff2004Ssameeran joshifunction calls on the right-hand side are elemental or scalar-valued, 92eaff2004Ssameeran joshiwe can avoid the use of a temporary. 93eaff2004Ssameeran joshi 94271a7bb1SRichard Barton## Transformational intrinsic functions as function composition 95271a7bb1SRichard Barton 96eaff2004Ssameeran joshiMany of the transformational intrinsic functions listed above 97eaff2004Ssameeran joshican, when their array arguments are viewed as functions over their 98eaff2004Ssameeran joshiindex tuples, be seen as compositions of those functions with 99eaff2004Ssameeran joshifunctions of the "incoming" indices -- yielding a function for 100eaff2004Ssameeran joshian entire right-hand side of an array assignment statement. 101eaff2004Ssameeran joshi 102eaff2004Ssameeran joshiFor example, the application of `TRANSPOSE(A + 1.0)` to the index 103eaff2004Ssameeran joshituple `(J,K)` becomes `A(K,J) + 1.0`. 104eaff2004Ssameeran joshi 105eaff2004Ssameeran joshiPartial (axial) reductions can be similarly composed. 106eaff2004Ssameeran joshiThe application of `SUM(A,DIM=2)` to the index `J` is the 107eaff2004Ssameeran joshicomplete reduction `SUM(A(J,:))`. 108eaff2004Ssameeran joshi 109eaff2004Ssameeran joshiMore completely: 110eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)` without `DIM=`) become 111eaff2004Ssameeran joshi runtime calls; the result needs no dynamic allocation, 112eaff2004Ssameeran joshi being a scalar. 113eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` 114eaff2004Ssameeran joshi become scalar values like `SUM(X(J,K,:))` if `d=3`. 115eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC(X)` without `DIM=`) 116eaff2004Ssameeran joshi do not require dynamic allocation, since their results are 117eaff2004Ssameeran joshi either scalar or small vectors of length `RANK(X)`. 118eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) 119eaff2004Ssameeran joshi are handled like other axial reductions like `SUM(DIM=)`. 120eaff2004Ssameeran joshi* `TRANSPOSE(M)` exchanges the two components of the index tuple. 121eaff2004Ssameeran joshi* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape 122eaff2004Ssameeran joshi vector `S`, and then use it to linearize indices into offsets 123eaff2004Ssameeran joshi in the storage order of `A` (whose shape must also be captured). 124eaff2004Ssameeran joshi These conversions can involve division and/or modulus, which 125eaff2004Ssameeran joshi can be optimized into a fixed-point multiplication using the 126eaff2004Ssameeran joshi usual technique. 127eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=` is similar, but must permute the 128eaff2004Ssameeran joshi components of the index tuple; it generalizes `TRANSPOSE`. 129eaff2004Ssameeran joshi* `CSHIFT` applies addition and modulus. 130eaff2004Ssameeran joshi* `EOSHIFT` applies addition and a conditional move (`MERGE`). 131eaff2004Ssameeran joshi* `PACK` and `UNPACK` are likely to require a runtime call. 132eaff2004Ssameeran joshi* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but 133eaff2004Ssameeran joshi might benefit from calling a highly optimized runtime 134eaff2004Ssameeran joshi routine. 135eaff2004Ssameeran joshi* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply 136eaff2004Ssameeran joshi applies `A` to a reduced index tuple. 137eaff2004Ssameeran joshi 138271a7bb1SRichard Barton## Determination of rank and shape 139271a7bb1SRichard Barton 140eaff2004Ssameeran joshiAn important part of evaluating array expressions without the use of 141eaff2004Ssameeran joshitemporary storage is determining the shape of the result prior to, 142eaff2004Ssameeran joshior without, evaluating the elements of the result. 143eaff2004Ssameeran joshi 144eaff2004Ssameeran joshiThe shapes of array objects, results of elemental intrinsic functions, 145eaff2004Ssameeran joshiand results of intrinsic operations are obvious. 146eaff2004Ssameeran joshiBut it is possible to determine the shapes of the results of many 147eaff2004Ssameeran joshitransformational intrinsic function calls as well. 148eaff2004Ssameeran joshi 149eaff2004Ssameeran joshi* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: 150eaff2004Ssameeran joshi `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. 151eaff2004Ssameeran joshi (The `DIM=` argument is commonly a compile-time constant.) 152eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. 153eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. 154eaff2004Ssameeran joshi* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. 155eaff2004Ssameeran joshi* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. 156eaff2004Ssameeran joshi* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. 157eaff2004Ssameeran joshi* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` 158eaff2004Ssameeran joshi* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. 159eaff2004Ssameeran joshi* `RANK(PACK(...))` is always 1. 160eaff2004Ssameeran joshi* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. 161eaff2004Ssameeran joshi* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. 162eaff2004Ssameeran joshi* `SHAPE(SHAPE(X))` is `[RANK(X)]`. 163eaff2004Ssameeran joshi* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at 164eaff2004Ssameeran joshi dimension `d`. 165eaff2004Ssameeran joshi 166eaff2004Ssameeran joshiThis is useful because expression evaluations that *do* require temporaries 167eaff2004Ssameeran joshito hold their results (due to the context in which the evaluation occurs) 168eaff2004Ssameeran joshican be implemented with a separation of the allocation 169eaff2004Ssameeran joshiof the temporary array and the population of the array. 170eaff2004Ssameeran joshiThe code that evaluates the expression, or that implements a transformational 171eaff2004Ssameeran joshiintrinsic in the runtime library, can be designed with an API that includes 172eaff2004Ssameeran joshia pointer to the destination array as an argument. 173eaff2004Ssameeran joshi 174eaff2004Ssameeran joshiStatements like `ALLOCATE(A,SOURCE=expression)` should thus be capable 175eaff2004Ssameeran joshiof evaluating their array expressions directly into the newly-allocated 176eaff2004Ssameeran joshistorage for the allocatable array. 177eaff2004Ssameeran joshiThe implementation would generate code to calculate the shape, use it 178eaff2004Ssameeran joshito allocate the memory and populate the descriptor, and then drive a 179eaff2004Ssameeran joshiloop nest around the expression to populate the array. 180eaff2004Ssameeran joshiIn cases where the analyzed shape is known at compile time, we should 181eaff2004Ssameeran joshibe able to have the opportunity to avoid heap allocation in favor of 182eaff2004Ssameeran joshistack storage, if the scope of the variable is local. 183eaff2004Ssameeran joshi 184271a7bb1SRichard Barton## Automatic reallocation of allocatables 185271a7bb1SRichard Barton 186eaff2004Ssameeran joshiFortran 2003 introduced the ability to assign non-conforming array expressions 187eaff2004Ssameeran joshito ALLOCATABLE arrays with the implied semantics of reallocation to the 188eaff2004Ssameeran joshinew shape. 189eaff2004Ssameeran joshiThe implementation of this feature also becomes more straightforward if 190eaff2004Ssameeran joshiour implementation of array expressions has decoupled calculation of shapes 191eaff2004Ssameeran joshifrom the evaluation of the elements of the result. 192eaff2004Ssameeran joshi 193271a7bb1SRichard Barton## Rewriting rules 194271a7bb1SRichard Barton 195eaff2004Ssameeran joshiLet `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into 196eaff2004Ssameeran joshithe result of an array expression or subexpression. 197eaff2004Ssameeran joshi 198eaff2004Ssameeran joshi* Array constructors always yield vectors; higher-rank arrays that appear as 199eaff2004Ssameeran joshi constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. 200eaff2004Ssameeran joshi* Array constructors with multiple constituents are concatenations of 201eaff2004Ssameeran joshi their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. 202eaff2004Ssameeran joshi* Array constructors with implied DO loops are difficult when nested 203eaff2004Ssameeran joshi triangularly. 204eaff2004Ssameeran joshi* Whole array references can have lower bounds other than 1, so 205eaff2004Ssameeran joshi `A => A(LBOUND(A,1):UBOUND(A,1),...)`. 206eaff2004Ssameeran joshi* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. 207eaff2004Ssameeran joshi* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. 208eaff2004Ssameeran joshi* Scalar operands ignore indices: `X{j,k} => X`. 209eaff2004Ssameeran joshi Further, they are evaluated at most once. 210eaff2004Ssameeran joshi* Elemental operators and functions apply indices to their arguments: 211eaff2004Ssameeran joshi `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. 212eaff2004Ssameeran joshi* `TRANSPOSE(X){j,k} => X{k,j}`. 213eaff2004Ssameeran joshi* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. 214eaff2004Ssameeran joshi If X is sufficiently expensive to compute elementally, it might be evaluated 215eaff2004Ssameeran joshi into a temporary. 216eaff2004Ssameeran joshi 217eaff2004Ssameeran joshi(more...) 218