flang/docs/ArrayComposition.md

932aae77SSourabh Singh Tomar<!--===- docs/ArrayComposition.md
932aae77SSourabh Singh Tomar
932aae77SSourabh Singh Tomar   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
932aae77SSourabh Singh Tomar   See https://llvm.org/LICENSE.txt for license information.
932aae77SSourabh Singh Tomar   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
932aae77SSourabh Singh Tomar
932aae77SSourabh Singh Tomar-->
932aae77SSourabh Singh Tomar
271a7bb1SRichard Barton# Array Composition
271a7bb1SRichard Barton
*b7ff0320Scor3ntin```{contents}
*b7ff0320Scor3ntin---
*b7ff0320Scor3ntinlocal:
*b7ff0320Scor3ntin---
271a7bb1SRichard Barton```
271a7bb1SRichard Barton
eaff2004Ssameeran joshiThis note attempts to describe the motivation for and design of an
eaff2004Ssameeran joshiimplementation of Fortran 90 (and later) array expression evaluation that
eaff2004Ssameeran joshiminimizes the use of dynamically allocated temporary storage for
eaff2004Ssameeran joshithe results of calls to transformational intrinsic functions, and
eaff2004Ssameeran joshimaking them more amenable to acceleration.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiThe transformational intrinsic functions of Fortran of interest to
eaff2004Ssameeran joshius here include:
eaff2004Ssameeran joshi
eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`,
eaff2004Ssameeran joshi  `DOT_PRODUCT`,
eaff2004Ssameeran joshi  `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`)
eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=)`, &c.)
eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`)
eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(DIM=`, &c.)
eaff2004Ssameeran joshi* `TRANSPOSE(M)` matrix transposition
eaff2004Ssameeran joshi* `RESHAPE` without `ORDER=`
eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=`
eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=`
eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=`
eaff2004Ssameeran joshi* `PACK` and `UNPACK`
eaff2004Ssameeran joshi* `MATMUL`
eaff2004Ssameeran joshi* `SPREAD`
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiOther Fortran intrinsic functions are technically transformational (e.g.,
eaff2004Ssameeran joshi`COMMAND_ARGUMENT_COUNT`) but not of interest for this note.
eaff2004Ssameeran joshiThe generic `REDUCE` is also not considered here.
eaff2004Ssameeran joshi
271a7bb1SRichard Barton## Arrays as functions
271a7bb1SRichard Barton
eaff2004Ssameeran joshiA whole array can be viewed as a function that maps its indices to the values
eaff2004Ssameeran joshiof its elements.
eaff2004Ssameeran joshiSpecifically, it is a map from a tuple of integers to its element type.
eaff2004Ssameeran joshiThe rank of the array is the number of elements in that tuple,
eaff2004Ssameeran joshiand the shape of the array delimits the domain of the map.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshi`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers
eaff2004Ssameeran joshi`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values.
eaff2004Ssameeran joshi
271a7bb1SRichard Barton## Array expressions as functions
271a7bb1SRichard Barton
eaff2004Ssameeran joshiThe same perspective can be taken of an array expression comprising
eaff2004Ssameeran joshiintrinsic operators and elemental functions.
eaff2004Ssameeran joshiFortran doesn't allow one to apply subscripts directly to an expression,
eaff2004Ssameeran joshibut expressions have rank and shape, and one can view array expressions
eaff2004Ssameeran joshias functions over index tuples by applying those indices to the arrays
eaff2004Ssameeran joshiand subexpressions in the expression.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiConsider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`).
eaff2004Ssameeran joshiThe right-hand side of that assignment could be evaluated into a
eaff2004Ssameeran joshitemporary array `T` and then subscripted as it is copied into `B`.
eaff2004Ssameeran joshi```
eaff2004Ssameeran joshiREAL, ALLOCATABLE :: T(:,:)
eaff2004Ssameeran joshiALLOCATE(T(N,M))
eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M)
eaff2004Ssameeran joshi  T(J,K)=A(J,K) + 1.0
eaff2004Ssameeran joshiEND DO
eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M)
eaff2004Ssameeran joshi  B(J,K)=T(J,K)
eaff2004Ssameeran joshiEND DO
eaff2004Ssameeran joshiDEALLOCATE(T)
eaff2004Ssameeran joshi```
eaff2004Ssameeran joshiBut we can avoid the allocation, population, and deallocation of
eaff2004Ssameeran joshithe temporary by treating the right-hand side expression as if it
eaff2004Ssameeran joshiwere a statement function `F(J,K)=A(J,K)+1.0` and evaluating
eaff2004Ssameeran joshi```
eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M)
eaff2004Ssameeran joshi  A(J,K)=F(J,K)
eaff2004Ssameeran joshiEND DO
eaff2004Ssameeran joshi```
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiIn general, when a Fortran array assignment to a non-allocatable array
eaff2004Ssameeran joshidoes not include the left-hand
eaff2004Ssameeran joshiside variable as an operand of the right-hand side expression, and any
eaff2004Ssameeran joshifunction calls on the right-hand side are elemental or scalar-valued,
eaff2004Ssameeran joshiwe can avoid the use of a temporary.
eaff2004Ssameeran joshi
271a7bb1SRichard Barton## Transformational intrinsic functions as function composition
271a7bb1SRichard Barton
eaff2004Ssameeran joshiMany of the transformational intrinsic functions listed above
eaff2004Ssameeran joshican, when their array arguments are viewed as functions over their
eaff2004Ssameeran joshiindex tuples, be seen as compositions of those functions with
eaff2004Ssameeran joshifunctions of the "incoming" indices -- yielding a function for
eaff2004Ssameeran joshian entire right-hand side of an array assignment statement.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiFor example, the application of `TRANSPOSE(A + 1.0)` to the index
eaff2004Ssameeran joshituple `(J,K)` becomes `A(K,J) + 1.0`.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiPartial (axial) reductions can be similarly composed.
eaff2004Ssameeran joshiThe application of `SUM(A,DIM=2)` to the index `J` is the
eaff2004Ssameeran joshicomplete reduction `SUM(A(J,:))`.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiMore completely:
eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)` without `DIM=`) become
eaff2004Ssameeran joshi  runtime calls; the result needs no dynamic allocation,
eaff2004Ssameeran joshi  being a scalar.
eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)`
eaff2004Ssameeran joshi  become scalar values like `SUM(X(J,K,:))` if `d=3`.
eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC(X)` without `DIM=`)
eaff2004Ssameeran joshi  do not require dynamic allocation, since their results are
eaff2004Ssameeran joshi  either scalar or small vectors of length `RANK(X)`.
eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(X,DIM=)`, &c.)
eaff2004Ssameeran joshi  are handled like other axial reductions like `SUM(DIM=)`.
eaff2004Ssameeran joshi* `TRANSPOSE(M)` exchanges the two components of the index tuple.
eaff2004Ssameeran joshi* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape
eaff2004Ssameeran joshi  vector `S`, and then use it to linearize indices into offsets
eaff2004Ssameeran joshi  in the storage order of `A` (whose shape must also be captured).
eaff2004Ssameeran joshi  These conversions can involve division and/or modulus, which
eaff2004Ssameeran joshi  can be optimized into a fixed-point multiplication using the
eaff2004Ssameeran joshi  usual technique.
eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=` is similar, but must permute the
eaff2004Ssameeran joshi  components of the index tuple; it generalizes `TRANSPOSE`.
eaff2004Ssameeran joshi* `CSHIFT` applies addition and modulus.
eaff2004Ssameeran joshi* `EOSHIFT` applies addition and a conditional move (`MERGE`).
eaff2004Ssameeran joshi* `PACK` and `UNPACK` are likely to require a runtime call.
eaff2004Ssameeran joshi* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but
eaff2004Ssameeran joshi  might benefit from calling a highly optimized runtime
eaff2004Ssameeran joshi  routine.
eaff2004Ssameeran joshi* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply
eaff2004Ssameeran joshi  applies `A` to a reduced index tuple.
eaff2004Ssameeran joshi
271a7bb1SRichard Barton## Determination of rank and shape
271a7bb1SRichard Barton
eaff2004Ssameeran joshiAn important part of evaluating array expressions without the use of
eaff2004Ssameeran joshitemporary storage is determining the shape of the result prior to,
eaff2004Ssameeran joshior without, evaluating the elements of the result.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiThe shapes of array objects, results of elemental intrinsic functions,
eaff2004Ssameeran joshiand results of intrinsic operations are obvious.
eaff2004Ssameeran joshiBut it is possible to determine the shapes of the results of many
eaff2004Ssameeran joshitransformational intrinsic function calls as well.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshi* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed:
eaff2004Ssameeran joshi  `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general.
eaff2004Ssameeran joshi  (The `DIM=` argument is commonly a compile-time constant.)
eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X))` is `[RANK(X)]`.
eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed.
eaff2004Ssameeran joshi* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`.
eaff2004Ssameeran joshi* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`.
eaff2004Ssameeran joshi* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`.
eaff2004Ssameeran joshi* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)`
eaff2004Ssameeran joshi* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`.
eaff2004Ssameeran joshi* `RANK(PACK(...))` is always 1.
eaff2004Ssameeran joshi* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`.
eaff2004Ssameeran joshi* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`.
eaff2004Ssameeran joshi* `SHAPE(SHAPE(X))` is `[RANK(X)]`.
eaff2004Ssameeran joshi* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at
eaff2004Ssameeran joshi  dimension `d`.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiThis is useful because expression evaluations that *do* require temporaries
eaff2004Ssameeran joshito hold their results (due to the context in which the evaluation occurs)
eaff2004Ssameeran joshican be implemented with a separation of the allocation
eaff2004Ssameeran joshiof the temporary array and the population of the array.
eaff2004Ssameeran joshiThe code that evaluates the expression, or that implements a transformational
eaff2004Ssameeran joshiintrinsic in the runtime library, can be designed with an API that includes
eaff2004Ssameeran joshia pointer to the destination array as an argument.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshiStatements like `ALLOCATE(A,SOURCE=expression)` should thus be capable
eaff2004Ssameeran joshiof evaluating their array expressions directly into the newly-allocated
eaff2004Ssameeran joshistorage for the allocatable array.
eaff2004Ssameeran joshiThe implementation would generate code to calculate the shape, use it
eaff2004Ssameeran joshito allocate the memory and populate the descriptor, and then drive a
eaff2004Ssameeran joshiloop nest around the expression to populate the array.
eaff2004Ssameeran joshiIn cases where the analyzed shape is known at compile time, we should
eaff2004Ssameeran joshibe able to have the opportunity to avoid heap allocation in favor of
eaff2004Ssameeran joshistack storage, if the scope of the variable is local.
eaff2004Ssameeran joshi
271a7bb1SRichard Barton## Automatic reallocation of allocatables
271a7bb1SRichard Barton
eaff2004Ssameeran joshiFortran 2003 introduced the ability to assign non-conforming array expressions
eaff2004Ssameeran joshito ALLOCATABLE arrays with the implied semantics of reallocation to the
eaff2004Ssameeran joshinew shape.
eaff2004Ssameeran joshiThe implementation of this feature also becomes more straightforward if
eaff2004Ssameeran joshiour implementation of array expressions has decoupled calculation of shapes
eaff2004Ssameeran joshifrom the evaluation of the elements of the result.
eaff2004Ssameeran joshi
271a7bb1SRichard Barton## Rewriting rules
271a7bb1SRichard Barton
eaff2004Ssameeran joshiLet `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into
eaff2004Ssameeran joshithe result of an array expression or subexpression.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshi* Array constructors always yield vectors; higher-rank arrays that appear as
eaff2004Ssameeran joshi  constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`.
eaff2004Ssameeran joshi* Array constructors with multiple constituents are concatenations of
eaff2004Ssameeran joshi  their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`.
eaff2004Ssameeran joshi* Array constructors with implied DO loops are difficult when nested
eaff2004Ssameeran joshi  triangularly.
eaff2004Ssameeran joshi* Whole array references can have lower bounds other than 1, so
eaff2004Ssameeran joshi  `A => A(LBOUND(A,1):UBOUND(A,1),...)`.
eaff2004Ssameeran joshi* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`.
eaff2004Ssameeran joshi* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`.
eaff2004Ssameeran joshi* Scalar operands ignore indices: `X{j,k} => X`.
eaff2004Ssameeran joshi  Further, they are evaluated at most once.
eaff2004Ssameeran joshi* Elemental operators and functions apply indices to their arguments:
eaff2004Ssameeran joshi  `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`.
eaff2004Ssameeran joshi* `TRANSPOSE(X){j,k} => X{k,j}`.
eaff2004Ssameeran joshi* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated.
eaff2004Ssameeran joshi  If X is sufficiently expensive to compute elementally, it might be evaluated
eaff2004Ssameeran joshi  into a temporary.
eaff2004Ssameeran joshi
eaff2004Ssameeran joshi(more...)