xref: /llvm-project/flang/docs/RuntimeDescriptor.md (revision b7ff03206d668cd5a620a9d4e1b22ea112ed56e3)
1<!--===- docs/RuntimeDescriptor.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# Runtime Descriptors
10
11```{contents}
12---
13local:
14---
15```
16
17## Concept
18The properties that characterize data values and objects in Fortran
19programs must sometimes be materialized when the program runs.
20
21Some properties are known during compilation and constant during
22execution, yet must be reified anyway for execution in order to
23drive the interfaces of a language support library or the mandated
24interfaces of interoperable (i.e., C) procedure calls.
25
26Note that many Fortran intrinsic subprograms have interfaces
27that are more flexible and generic than actual Fortran subprograms
28can be, so properties that must be known during compilation and
29are constant during execution may still need to be materialized
30for calls to the library, even if only by modifying names to
31distinguish types or their kind specializations.
32
33Other properties are deferred to execution, and need to be represented
34to serve the needs of compiled code and the run time support library.
35
36Previous implementations of Fortran have typically defined a small
37sheaf of _descriptor_ data structures for this purpose, and attached
38these descriptors as additional hidden arguments, type components,
39and local variables so as to convey dynamic characteristics between
40subprograms and between user code and the run-time support library.
41
42### References
43References are to the 12-2017 draft of the Fortran 2018 standard
44(N2146).
45
46Section 15.4.2.2 can be interpreted as a decent list of things that
47might need descriptors or other hidden state passed across a
48subprogram call, since such features (apart from assumed-length
49`CHARACTER` function results) trigger a requirement for the
50subprogram to have an explicit interface visible to their callers.
51
52Section 15.5.2 has good laundry lists of situations that can arise
53across subprogram call boundaries.
54
55## A survey of dynamic characteristics
56
57### Length of assumed-length `CHARACTER` function results (B.3.6)
58```
59CHARACTER*8 :: FOO
60PRINT *, FOO('abcdefghijklmnopqrstuvwxyz')
61...
62CHARACTER*(*) FUNCTION FOO(STR)
63  CHARACTER*26 STR
64  FOO=STR
65END
66```
67
68prints `abcdefgh` because the length parameter of the character type
69of the result of `FOO` is passed across the call -- even in the absence
70of an explicit interface!
71
72### Assumed length type parameters (7.2)
73Dummy arguments and associate names for `SELECT TYPE` can have assumed length
74type parameters, which are denoted by asterisks (not colons).
75Their values come from actual arguments or the associated expression (resp.).
76
77### Explicit-shape arrays (8.5.8.2)
78The expressions used for lower and upper bounds must be captured and remain
79invariant over the scope of an array, even if they contain references to
80variables that are later modified.
81
82Explicit-shape arrays can be dummy arguments, "adjustable" local variables,
83and components of derived type (using specification expressions in terms
84of constants and KIND type parameters).
85
86### Leading dimensions of assumed-size arrays (8.5.8.5)
87```
88SUBROUTINE BAR(A)
89  REAL A(2,3,*)
90END
91```
92The total size and final dimension's extent do not constitute dynamic
93properties.
94The called subprogram has no means to extract the extent of the
95last (major) dimension, and may not depend upon it implicitly by using
96the array in any context that demands a known shape.
97
98The values of the expressions used as the bounds of the dimensions
99that appear prior to
100the last dimension are, however, effectively captured on entry to the
101subprogram, and remain invariant even if the variables that appear in
102those expressions have their values modified later.
103This is similar to the requirements for an explicit-shape array.
104
105### Some function results
1061. Deferred-shape
1072. Deferred length type parameter values
1083. Stride information for `POINTER` results
109
110Note that while function result variables can have the `ALLOCATABLE`
111attribute, the function itself and the value returned to the caller
112do not possess the attribute.
113
114### Assumed-shape arrays
115The extents of the dimensions of assumed-shape dummy argument arrays
116are conveyed from those of the actual effective arguments.
117The bounds, however, are not.  The called subprogram can define the
118lower bound to be a value other than 1, but that is a local effect
119only.
120
121### Deferred-shape arrays
122The extents and bounds of `POINTER` and `ALLOCATABLE` arrays are
123established by pointer assignments and `ALLOCATE` statements.
124Note that dummy arguments and function results that are `POINTER`
125or `ALLOCATABLE` can be deferred-shape, not assumed-shape -- one cannot
126supply a lower bound expression as a local effect.
127
128### Strides
129Some arrays can have discontiguous (or negative) strides.
130These include assumed-shape dummy arguments and deferred-shape
131`POINTER` variables, components, and function results.
132
133Fortran disallows some conceivable cases that might otherwise
134require implied strides, such as passing an array of an extended
135derived type as an actual argument that corresponds to a
136nonpolymorphic dummy array of a base type, or the similar
137case of pointer assignment to a base of an extended derived type.
138
139Other arrays, including `ALLOCATABLE`, can be assured to
140be contiguous, and do not necessarily need to manage or
141convey dynamic stride information.
142`CONTIGUOUS` dummy arguments and `POINTER` arrays need not
143record stride information either.
144(The standard notes that a `CONTIGUOUS POINTER` occupies a
145number of storage units that is distinct from that required
146to hold a non-`CONTIGUOUS` pointer.)
147
148Note that Fortran distinguishes the `CONTIGUOUS` attribute from
149the concept of being known or required to be _simply contiguous_ (9.5.4),
150which includes `CONTIGUOUS` entities as well as many others, and
151the concept of actually _being_ contiguous (8.5.7) during execution.
152I believe that the property of being simply contiguous implies
153that an entity is known at compilation time to not require the
154use or maintenance of hidden stride values.
155
156### Derived type component initializers
157Fortran allows components of derived types to be declared with
158initial values that are to be assigned to the components when an
159instance of the derived type is created.
160These include `ALLOCATABLE` components, which are always initialized
161to a deallocated state.
162
163These can be implemented with constructor subroutines, inline
164stores or block copies from static initializer blocks, or a sequence
165of sparse offset/size/value component initializers to be emplaced
166by the run-time library.
167
168N.B. Fortran allows kind type parameters to appear in component
169initialization constant expressions, but not length type parameters,
170so the initialization values are constants.
171
172N.B. Initialization is not assignment, and cannot be implemented
173with assignments to uninitialized derived type instances from
174static constant initializers.
175
176### Polymorphic `CLASS()`, `CLASS(*)`, and `TYPE(*)`
177Type identification for `SELECT TYPE`.
178Default initializers (see above).
179Offset locations of `ALLOCATABLE` and polymorphic components.
180Presence of `FINAL` procedures.
181Mappings to overridable type-bound specific procedures.
182
183### Deferred length type parameters
184Derived types with length type parameters, and `CHARACTER`, may be used
185with the values of those parameters deferred to execution.
186Their actual values must be maintained as characteristics of the dynamic
187type that is associated with a value or object
188.
189A single copy of the deferred length type parameters suffices for
190all of the elements of an array of that parameterized derived type.
191
192### Components whose types and/or shape depends on length type parameters
193Non-pointer, non-allocatable components whose types or shapes are expressed
194in terms of length type parameters will probably have to be implemented as
195if they had deferred type and/or shape and were `ALLOCATABLE`.
196The derived type instance constructor must allocate them and possibly
197initialize them; the instance destructor must deallocate them.
198
199### Assumed rank arrays
200Rank is almost always known at compilation time and would be redundant
201in most circumstances if also managed dynamically.
202`DIMENSION(..)` dummy arguments (8.5.8.7), however, are a recent feature
203with which the rank of a whole array is dynamic outside the cases of
204a `SELECT RANK` construct.
205
206The lower bounds of the dimensions of assumed rank arrays
207are always 1.
208
209### Cached invariant subexpressions for addressing
210Implementations of Fortran have often maintained precalculated integer
211values to accelerate subscript computations.
212For example, given `REAL*8 :: A(2:4,3:5)`, the data reference `A(I,J)`
213resolves to something like `&A + 8*((I-2)+3*(J-3))`, and this can be
214effectively reassociated to `&A - 88 + 8*I + 24*J`
215or `&A - 88 + 8*(I + 3*J)`.
216When the offset term and coefficients are not compile-time constants,
217they are at least invariant and can be precomputed.
218
219In the cases of dummy argument arrays, `POINTER`, and `ALLOCATABLE`,
220these addressing invariants could be managed alongside other dynamic
221information like deferred extents and lower bounds to avoid their
222recalculation.
223It's not clear that it's worth the trouble to do so, since the
224expressions are invariant and cheap.
225
226### Coarray state (8.5.6)
227A _coarray_ is an `ALLOCATABLE` variable or component, or statically
228allocated variable (`SAVE` attribute explicit or implied), or dummy
229argument whose ultimate effective argument is one of such things.
230
231Each image in a team maintains its portion of each coarray and can
232access those portions of the coarray that are maintained by other images
233in the team.
234Allocations and deallocations are synchronization events at which
235the several images can exchange whatever information is needed by
236the underlying intercommunication interface to access the data
237of their peers.
238(Strictly speaking, an implementation could synchronize
239images at allocations and deallocations with simple barriers, and defer
240the communication of remote access information until it is needed for a
241given coarray on a given image, so long as it could be acquired in a
242"one-sided" fashion.)
243
244### Presence of `OPTIONAL` dummy arguments
245Typically indicated with null argument addresses.
246Note that `POINTER` and `ALLOCATABLE` objects can be passed to
247non-`POINTER` non-`ALLOCATABLE` dummy arguments, and their
248association or allocation status (resp.) determines the presence
249of the dummy argument.
250
251### Stronger contiguity enforcement or indication
252Some implementations of Fortran guarantee that dummy argument arrays
253are, or have been made to be, contiguous on one or more dimensions
254when the language does not require them to be so (8.5.7 p2).
255Others pass a flag to identify contiguous arrays (or could pass the
256number of contiguous leading dimensions, although I know of no such
257implementation) so that optimizing transformations that depend on
258contiguity can be made conditional with multiple-version code generation
259and selected during execution.
260
261In the absence of a contiguity guarantee or flag, the called side
262would have to determine contiguity dynamically, if it cares,
263by calculating addresses of elements in the array whose subscripts
264differ by exactly 1 on exactly 1 dimension of interest, and checking
265whether that difference exactly matches the byte size of the type times
266the product of the extents of any prior dimensions.
267
268### Host instances for dummy procedures and procedure pointers
269A static link or other means of accessing the imported state of the
270host procedure must be available when an internal procedure is
271used as an actual argument or as a pointer assignment target.
272
273### Alternate returns
274Subroutines (only) with alternate return arguments need a
275means, such as the otherwise unused function return value, by which
276to distinguish and identify the use of an alternate `RETURN` statement.
277The protocol can be a simple nonzero integer that drives a switch
278in the caller, or the caller can pass multiple return addresses as
279arguments for the callee to substitute on the stack for the original
280return address in the event of an alternate `RETURN`.
281
282## Implementation options
283
284### A note on array descriptions
285Some arrays require dynamic management of distinct combinations of
286values per dimension.
287
288One can extract the extent on a dimension from its bounds, or extract
289the upper bound from the extent and the lower bound.  Having distinct
290extent and upper bound would be redundant.
291
292Contiguous arrays can assume a stride of 1 on each dimension.
293
294Assumed-shape and assumed-size dummy argument arrays need not convey
295lower bounds.
296
297So there are examples of dimensions with
298 * extent only (== upper bound): `CONTIGUOUS` assumed-shape, explict shape and multidimensional assumed-size with constant lower bound
299 * lower bound and either extent or upper bound: `ALLOCATABLE`, `CONTIGUOUS` `POINTER`, general explicit-shape and multidimensional assumed-size
300 * extent (== upper bound) and stride: general (non-`CONTIGUOUS`) assumed-shape
301 * lower bound, stride, and either extent or upper bound: general (non-`CONTIGUOUS`) `POINTER`, assumed-rank
302
303and these cases could be accompanied by precomputed invariant
304addressing subexpressions to accelerate indexing calculations.
305
306### Interoperability requirements
307
308Fortran 2018 requires that a Fortran implementation supply a header file
309`ISO_Fortran_binding.h` for use in C and C++ programs that defines and
310implements an interface to Fortran objects from the _interoperable_
311subset of Fortran objects and their types suitable for use when those
312objects are passed to C functions.
313This interface mandates a fat descriptor that is passed by address,
314containing (at least)
315 * a data base address
316 * explicit rank and type
317 * flags to distinguish `POINTER` and `ALLOCATABLE`
318 * elemental byte size, and
319 * (per-dimension) lower bound, extent, and byte stride
320
321The requirements on the interoperability API do not mandate any
322support for features like derived type component initialization,
323automatic deallocation of `ALLOCATABLE` components, finalization,
324derived type parameters, data contiguity flags, &c.
325But neither does the Standard preclude inclusion of additional
326interfaces to describe and support such things.
327
328Given a desire to fully support the Fortran 2018 language, we need
329to either support the interoperability requirements as a distinct
330specialization of the procedure call protocol, or use the
331`ISO_Fortran_binding.h` header file requirements as a subset basis for a
332complete implementation that adds representations for all the
333missing capabilities, which would be isolated and named so as
334to prevent user C code from relying upon them.
335
336### Design space
337There is a range of possible options for representing the
338properties of values and objects during the execution of Fortran
339programs.
340
341At one extreme, the amount of dynamic information is minimized,
342and is packaged in custom data structures or additional arguments
343for each situation to convey only the values that are unknown at
344compilation time and actually needed at execution time.
345
346At the other extreme, data values and objects are described completely,
347including even the values of properties are known at compilation time.
348This is not as silly as it sounds -- e.g., Fortran array descriptors
349have historically materialized the number of dimensions they cover, even
350though rank will be (nearly) always be a known constant during compilation.
351
352When data are packaged, their containers can be self-describing to
353some degree.
354Description records can have tag values or strings.
355Their fields can have presence flags or identifying tags, and fields
356need not have fixed offsets or ordering.
357This flexibility can increase binary compatibility across revisions
358of the run-time support library, and is convenient for debugging
359that library.
360However, it is not free.
361
362Further, the requirements of the representation of dynamic
363properties of values and objects depend on the execution model:
364specifically, are the complicated semantics of intrinsic assignment,
365deallocation, and finalization of allocatables implemented entirely
366in the support library, in generated code for non-recursive cases,
367or by means of a combination of the two approaches?
368
369Consider how to implement the following:
370```
371TYPE :: LIST
372  REAL :: HEAD
373  TYPE(LIST), ALLOCATABLE :: REST
374END TYPE LIST
375TYPE(LIST), ALLOCATABLE :: A, B
376...
377A = B
378```
379
380Fortran requires that `A`'s arbitrary-length linked list be deleted and
381replaced with a "deep copy" of `B`'s.
382So either a complicated pair of loops must be generated by the compiler,
383or a sophisticated run time support library needs to be driven with
384an expressive representation of type information.
385
386## Proposal
387We need to write `ISO_Fortran_binding.h` in any event.
388It is a header that is published for use in user C code for interoperation
389with compiled Fortran and the Fortran run time support library.
390
391There is a sole descriptor structure defined in `ISO_Fortran_binding.h`.
392It is suitable for characterizing scalars and array sections of intrinsic
393types.
394It is essentially a "fat" data pointer that encapsulates a raw data pointer,
395a type code, rank, elemental byte size, and per-dimension bounds and stride.
396
397Please note that the mandated interoperable descriptor includes the data
398pointer.
399This design in the Standard precludes the use of static descriptors that
400could be associated with dynamic base addresses.
401
402The F18 runtime cannot use just the mandated interoperable
403`struct CFI_cdesc_t` argument descriptor structure as its
404all-purpose data descriptor.
405It has no information about derived type components, overridable
406type-bound procedure bindings, type parameters, &c.
407
408However, we could extend the standard interoperable argument descriptor.
409The `struct CFI_cdesc_t` structure is not of fixed size, but we
410can efficiently locate the first address after an instance of the
411standard descriptor and attach our own data record there to
412hold what we need.
413There's at least one unused padding byte in the standard argument
414descriptor that can be used to hold a flag indicating the presence
415of the addenda.
416
417The definitions of our additional run time data structures must
418appear in a header file that is distinct from `ISO_Fortran_binding.h`,
419and they should never be used by user applications.
420
421This expanded descriptor structure can serve, at least initially for
422simplicity, as the sole representation of `POINTER` variables and
423components, `ALLOCATABLE` variables and components, and derived type
424instances, including length parameter values.
425
426An immediate concern with this concept is the amount of space and
427initialization time that would be wasted when derived type components
428needing a descriptor would have to be accompanied by an instance
429of the general descriptor.
430(In the linked list example close above, what could be done with a
431single pointer for the `REST` component would become at least
432a four-word dynamic structure.)
433This concern is amplified when derived type instances
434are allocated as arrays, since the overhead is per-element.
435
436We can reduce this wastage in two ways.
437First, when the content of the component's descriptor is constant
438at compilation apart from its base address, a static descriptor
439can be placed in read-only storage and attached to the description
440of the derived type's components.
441Second, we could eventually optimize the storage requirements by
442omitting all static fields from the dynamic descriptor, and
443expand the compressed dynamic descriptor during execution when
444needed.
445