xref: /llvm-project/flang/docs/ArrayComposition.md (revision b7ff03206d668cd5a620a9d4e1b22ea112ed56e3)
1<!--===- docs/ArrayComposition.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# Array Composition
10
11```{contents}
12---
13local:
14---
15```
16
17This note attempts to describe the motivation for and design of an
18implementation of Fortran 90 (and later) array expression evaluation that
19minimizes the use of dynamically allocated temporary storage for
20the results of calls to transformational intrinsic functions, and
21making them more amenable to acceleration.
22
23The transformational intrinsic functions of Fortran of interest to
24us here include:
25
26* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`,
27  `DOT_PRODUCT`,
28  `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`)
29* Axial reductions (`SUM(X,DIM=)`, &c.)
30* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`)
31* Axial location reductions (`MAXLOC(DIM=`, &c.)
32* `TRANSPOSE(M)` matrix transposition
33* `RESHAPE` without `ORDER=`
34* `RESHAPE` with `ORDER=`
35* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=`
36* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=`
37* `PACK` and `UNPACK`
38* `MATMUL`
39* `SPREAD`
40
41Other Fortran intrinsic functions are technically transformational (e.g.,
42`COMMAND_ARGUMENT_COUNT`) but not of interest for this note.
43The generic `REDUCE` is also not considered here.
44
45## Arrays as functions
46
47A whole array can be viewed as a function that maps its indices to the values
48of its elements.
49Specifically, it is a map from a tuple of integers to its element type.
50The rank of the array is the number of elements in that tuple,
51and the shape of the array delimits the domain of the map.
52
53`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers
54`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values.
55
56## Array expressions as functions
57
58The same perspective can be taken of an array expression comprising
59intrinsic operators and elemental functions.
60Fortran doesn't allow one to apply subscripts directly to an expression,
61but expressions have rank and shape, and one can view array expressions
62as functions over index tuples by applying those indices to the arrays
63and subexpressions in the expression.
64
65Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`).
66The right-hand side of that assignment could be evaluated into a
67temporary array `T` and then subscripted as it is copied into `B`.
68```
69REAL, ALLOCATABLE :: T(:,:)
70ALLOCATE(T(N,M))
71DO CONCURRENT(J=1:N,K=1:M)
72  T(J,K)=A(J,K) + 1.0
73END DO
74DO CONCURRENT(J=1:N,K=1:M)
75  B(J,K)=T(J,K)
76END DO
77DEALLOCATE(T)
78```
79But we can avoid the allocation, population, and deallocation of
80the temporary by treating the right-hand side expression as if it
81were a statement function `F(J,K)=A(J,K)+1.0` and evaluating
82```
83DO CONCURRENT(J=1:N,K=1:M)
84  A(J,K)=F(J,K)
85END DO
86```
87
88In general, when a Fortran array assignment to a non-allocatable array
89does not include the left-hand
90side variable as an operand of the right-hand side expression, and any
91function calls on the right-hand side are elemental or scalar-valued,
92we can avoid the use of a temporary.
93
94## Transformational intrinsic functions as function composition
95
96Many of the transformational intrinsic functions listed above
97can, when their array arguments are viewed as functions over their
98index tuples, be seen as compositions of those functions with
99functions of the "incoming" indices -- yielding a function for
100an entire right-hand side of an array assignment statement.
101
102For example, the application of `TRANSPOSE(A + 1.0)` to the index
103tuple `(J,K)` becomes `A(K,J) + 1.0`.
104
105Partial (axial) reductions can be similarly composed.
106The application of `SUM(A,DIM=2)` to the index `J` is the
107complete reduction `SUM(A(J,:))`.
108
109More completely:
110* Reductions to scalars (`SUM(X)` without `DIM=`) become
111  runtime calls; the result needs no dynamic allocation,
112  being a scalar.
113* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)`
114  become scalar values like `SUM(X(J,K,:))` if `d=3`.
115* Location reductions to indices (`MAXLOC(X)` without `DIM=`)
116  do not require dynamic allocation, since their results are
117  either scalar or small vectors of length `RANK(X)`.
118* Axial location reductions (`MAXLOC(X,DIM=)`, &c.)
119  are handled like other axial reductions like `SUM(DIM=)`.
120* `TRANSPOSE(M)` exchanges the two components of the index tuple.
121* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape
122  vector `S`, and then use it to linearize indices into offsets
123  in the storage order of `A` (whose shape must also be captured).
124  These conversions can involve division and/or modulus, which
125  can be optimized into a fixed-point multiplication using the
126  usual technique.
127* `RESHAPE` with `ORDER=` is similar, but must permute the
128  components of the index tuple; it generalizes `TRANSPOSE`.
129* `CSHIFT` applies addition and modulus.
130* `EOSHIFT` applies addition and a conditional move (`MERGE`).
131* `PACK` and `UNPACK` are likely to require a runtime call.
132* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but
133  might benefit from calling a highly optimized runtime
134  routine.
135* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply
136  applies `A` to a reduced index tuple.
137
138## Determination of rank and shape
139
140An important part of evaluating array expressions without the use of
141temporary storage is determining the shape of the result prior to,
142or without, evaluating the elements of the result.
143
144The shapes of array objects, results of elemental intrinsic functions,
145and results of intrinsic operations are obvious.
146But it is possible to determine the shapes of the results of many
147transformational intrinsic function calls as well.
148
149* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed:
150  `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general.
151  (The `DIM=` argument is commonly a compile-time constant.)
152* `SHAPE(MAXLOC(X))` is `[RANK(X)]`.
153* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed.
154* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`.
155* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`.
156* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`.
157* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)`
158* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`.
159* `RANK(PACK(...))` is always 1.
160* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`.
161* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`.
162* `SHAPE(SHAPE(X))` is `[RANK(X)]`.
163* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at
164  dimension `d`.
165
166This is useful because expression evaluations that *do* require temporaries
167to hold their results (due to the context in which the evaluation occurs)
168can be implemented with a separation of the allocation
169of the temporary array and the population of the array.
170The code that evaluates the expression, or that implements a transformational
171intrinsic in the runtime library, can be designed with an API that includes
172a pointer to the destination array as an argument.
173
174Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable
175of evaluating their array expressions directly into the newly-allocated
176storage for the allocatable array.
177The implementation would generate code to calculate the shape, use it
178to allocate the memory and populate the descriptor, and then drive a
179loop nest around the expression to populate the array.
180In cases where the analyzed shape is known at compile time, we should
181be able to have the opportunity to avoid heap allocation in favor of
182stack storage, if the scope of the variable is local.
183
184## Automatic reallocation of allocatables
185
186Fortran 2003 introduced the ability to assign non-conforming array expressions
187to ALLOCATABLE arrays with the implied semantics of reallocation to the
188new shape.
189The implementation of this feature also becomes more straightforward if
190our implementation of array expressions has decoupled calculation of shapes
191from the evaluation of the elements of the result.
192
193## Rewriting rules
194
195Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into
196the result of an array expression or subexpression.
197
198* Array constructors always yield vectors; higher-rank arrays that appear as
199  constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`.
200* Array constructors with multiple constituents are concatenations of
201  their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`.
202* Array constructors with implied DO loops are difficult when nested
203  triangularly.
204* Whole array references can have lower bounds other than 1, so
205  `A => A(LBOUND(A,1):UBOUND(A,1),...)`.
206* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`.
207* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`.
208* Scalar operands ignore indices: `X{j,k} => X`.
209  Further, they are evaluated at most once.
210* Elemental operators and functions apply indices to their arguments:
211  `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`.
212* `TRANSPOSE(X){j,k} => X{k,j}`.
213* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated.
214  If X is sufficiently expensive to compute elementally, it might be evaluated
215  into a temporary.
216
217(more...)
218