xref: /llvm-project/flang/docs/F202X.md (revision eb5ffa58f53a56308e97df8789e73e8b8230c4ec)
1<!--===- docs/F202X.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# A first take on Fortran 202X features for LLVM Flang
10
11I (Peter Klausler) have been studying the draft PDF of the
12[Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf),
13which will soon be published as ISO Fortran 2023.
14I have compiled this summary of its changes relative to
15the current Fortran 2018 standard from the perspective
16of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang)
17implementor.
18
19## TL;DR
20
21Fortran 202X doesn't make very many changes to the language
22relative to Fortran 2018, which was itself a small increment
23over Fortran 2008.
24Apart from `REDUCE` clauses that were added to the
25[still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
26`DO CONCURRENT` construct, there's little here for Fortran users
27to get excited about.
28
29## Priority of implementation in LLVM Flang
30
31We are working hard to ensure that existing working applications will
32port successfully to LLVM Flang with minimal effort.
33I am not particularly concerned with conforming to a new
34standard as an end in itself.
35
36The only features below that appear to have already been implemented
37in other compilers are the `REDUCE` clauses and the degree trigonometric
38intrinsic functions, so those should have priority as an aid to
39portability.
40We would want to support them earlier even if they were not in a standard.
41
42The `REDUCE` clause also merits early implementation due to
43its potential for performance improvements in real codes.
44I don't see any other feature here that would be relevant to
45performance (maybe a weak argument could be made for `SIMPLE`).
46The bulk of this revision unfortunately comprises changes to Fortran that
47are neither performance-related, already available in
48some compilers, nor (obviously) in use in existing codes.
49I will not prioritize implementing them myself over
50other work until they become portability concerns or are
51requested by actual users.
52
53Given Fortran's history of the latency between new
54standards and the support for their features in real compilers,
55and then the extra lag before the features are then actually used
56in codes meant to be portable, I doubt that many of the items
57below will have to be worked on any time soon due to user demand.
58
59If J3 had chosen to add more features that were material improvements
60to Fortran -- and there's quite a long list of worthy candidates that
61were passed over, like read-only pointers -- it would have made sense
62for me to prioritize their implementation in LLVM Flang more
63urgently.
64
65## Specific change descriptions
66
67The individual features added to the language are summarized
68in what I see as their order of significance to Fortran users.
69
70### Alert: There's a breaking change!
71
72The Fortran committee used to abhor making breaking changes,
73apart from fixes, so that conforming codes could be portable across
74time as well as across compilers.
75Fortran 202X, however, uncharacteristically perpetrates one such
76change to existing semantics that will silently cause existing
77codes to work differently, if that change were to be implemented
78and enabled by default.
79
80Specifically, automatic reallocation of whole deferred-length character
81allocatable scalars is now mandated when they appear for internal output
82(e.g., `WRITE(A,*) ...`)
83or as output arguments for some statements and intrinsic procedures
84(e.g., `IOMSG=`, `ERRMSG=`).
85So existing codes that allocate output buffers
86for such things will, or would, now observe that their buffers are
87silently changing their lengths during execution, rather than being
88padded with blanks or being truncated.  For example:
89
90```
91  character(:), allocatable :: buffer
92  allocate(character(20)::buffer)
93  write(buffer,'F5.3') 3.14159
94  print *, len(buffer)
95```
96
97prints 20 with Fortran 2018 but would print 5 with Fortran 202X.
98
99There would have no problem with the new standard changing the
100behavior in the current error case of an unallocated variable;
101defining new semantics for old errors is a generally safe means
102for extending a programming language.
103However, in this case, we'll need to protect existing conforming
104codes from the surprising new reallocation semantics, which
105affect cases that are not errors.
106
107When/if there are requests from real users to implement this breaking
108change, and if it is implemented, I'll have to ensure that users
109have the ability to control this change in behavior via an option &/or the
110runtime environment, and when it's enabled, emit a warning at code
111sites that are at risk.
112This warning should mention a source change they can make to protect
113themselves from this change by passing the complete substring (`A(:)`)
114instead of a whole character allocatable.
115
116This feature reminds me of Fortran 2003's change to whole
117allocatable array assignment, although in that case users were
118put at risk only of extra runtime overhead that was needless in
119existing codes, not a change in behavior, and users learned to
120assign to whole array sections (`A(:)=...`) rather than to whole
121allocatable arrays where the performance hit mattered.
122
123### Major Items
124
125The features in this section are expensive to implement in
126terms of engineering time to design, code, refactor, and test
127(i.e., weeks or months, not days).
128
129#### `DO CONCURRENT REDUCE`
130
131J3 continues to ignore the
132[serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
133with `DO CONCURRENT`, despite the simplicity of the necessary fix and their
134admirable willingness to repair the standard to fix problems with
135other features (e.g., plugging holes in `PURE` procedure requirements)
136and their less admirable willingness to make breaking changes (see above).
137They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be
138immediately useful to HPC codes and worth implementing soon.
139
140#### `SIMPLE` procedures
141
142The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE`
143procedures.
144There are things that one can do in a `PURE` procedure
145but cannot in a `SIMPLE` one.  But the virtue of being `SIMPLE` seems
146to be its own reward, not a requirement to access any other
147feature.
148
149`SIMPLE` procedures might have been more useful had `DO CONCURRENT` been
150changed to require callees to be `SIMPLE`, not just `PURE`.
151
152The implementation of `SIMPLE` will be nontrivial: it involves
153some parsing and symbol table work, and some generalization of the
154predicate function `IsPureProcedure()`, extending the semantic checking on
155calls in `PURE` procedures to ensure that `SIMPLE` procedures
156only call other `SIMPLE` procedures, and modifying the intrinsic
157procedure table to note that most intrinsics are now `SIMPLE`
158rather than just `PURE`.
159
160I don't expect any codes to rush to change their `PURE` procedures
161to be `SIMPLE`, since it buys little and reduces portability.
162This makes `SIMPLE` a lower-priority feature.
163
164#### Conditional expressions and actual arguments
165
166Next on the list of "big ticket" items are C-style conditional
167expressions.  These come in two forms, each of which is a distinct
168feature that would be nontrivial to implement, and I would not be
169surprised to see some compilers implement one before the other.
170
171The first form is a new parenthesized expression primary that any C programmer
172would recognize.  It has straightforward parsing and semantics,
173but will require support in folding and all other code that
174processes expressions.  Lowering will be nontrivial due to
175control flow.
176
177The second form is a conditional actual argument syntax
178that allows runtime selection of argument associations, as well
179as a `.NIL.` syntax for optional arguments to signify an absent actual
180argument.  This would have been more useful if it had also been
181allowed as a pointer assignment statement right-hand side, and
182that might be a worthwhile extension.  As this form is essentially
183a conditional variable reference it may be cleaner to have a
184distinct representation from the conditional expression primary
185in the parse tree and strongly-typed `Expr<T>` representations.
186
187#### `ENUMERATION TYPE`
188
189Fortran 202X has a new category of type.  The new non-interoperable
190`ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately,
191a powerful sum data type as in Haskell or Rust.  Unlike the
192current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new
193type name and its distinct values.
194
195This feature may well be the item requiring the largest patch to
196the compiler for its implementation, as it affects parsing,
197type checking on assignment and argument association, generic
198resolution, formatted I/O, NAMELIST, debugging symbols, &c.
199It will indirectly affect every switch statement in the compiler
200that switches over the six (now seven) type categories.
201This will be a big project for little useful return to users.
202
203#### `TYPEOF` and `CLASSOF`
204
205Last on the list of "big ticket" items are the new TYPEOF and CLASSOF
206type specifiers, which allow declarations to indirectly use the
207types of previously-defined entities.  These would have obvious utility
208in a language with type polymorphism but aren't going to be very
209useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth
210supporting as a utility feature for a parametric module extension.
211
212`CLASSOF` has implications for semantics and lowering that need to
213be thought through as it seems to provide a means of
214declaring polymorphic local variables and function results that are
215neither allocatables nor pointers.
216
217#### Coarray extensions:
218
219 * `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector
220 * Arrays with coarray components
221
222#### "Rank Independent" Features
223
224The `RANK(n)` attribute declaration syntax is equivalent to
225`DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons.
226As `n` must be a constant expression, that's straightforward to implement,
227though not terribly useful until the language acquires additional features.
228(I can see some utility in being able to declare PDT components with a
229`RANK` that depends on a `KIND` type parameter.)
230
231It is now possible to declare the lower and upper bounds of an explicit
232shape entity using a constant-length vector specification expression
233in a declaration, `ALLOCATE` statement, or pointer assignment with
234bounds remapping.
235For example, `real A([2,3])` is equivalent to `real A(2,3)`.
236
237The new `A(@V)` "multiple subscript" indexing syntax uses an integer
238vector to supply a list of subscripts or of triplet bounds/strides.  This one
239has tough edge cases for lowering that need to be thought through;
240for example, when the lengths of two or more of the vectors in
241`A(@U,@V,@W)` are not known at compilation time, implementing the indexing
242would be tricky in generated code and might just end up filling a
243temporary with `[U,V,W]` first.
244
245The obvious use case for "multiple subscripts" would be as a means to
246index into an assumed-rank dummy argument without the bother of a `SELECT RANK`
247construct, but that usage is not supported in Fortran 202X.
248
249This feature may well turn out to be Fortran 202X's analog to Fortran 2003's
250`LEN` derived type parameters.
251
252### Minor Items
253
254So much for the major features of Fortran 202X.  The longer list
255of minor features can be more briefly summarized.
256
257#### New Edit Descriptors
258
259Fortran 202X has some noncontroversial small tweaks to formatted output.
260The `AT` edit descriptor automatically trims character output.  The `LZP`,
261`LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a
262means for controlling the output of leading zero digits.
263
264#### Intrinsic Module Extensions
265
266Addressing some issues and omissions in intrinsic modules:
267
268 * LOGICAL8/16/32/64 and REAL16
269 * IEEE module facilities upgraded to match latest IEEE FP standard
270 * C_F_STRPOINTER, F_C_STRING for NUL-terminated strings
271 * C_F_POINTER(LOWER=)
272
273#### Intrinsic Procedure Extensions
274
275The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks.
276
277There are new intrinsic functions for trigonometric functions in
278units of degrees and half-circles.
279GNU Fortran already supports the forms that use degree units.
280These should call into math library implementations that are
281specialized for those units rather than simply multiplying
282arguments or results with conversion factors.
283 * `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND`
284 * `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI`
285
286`SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL`
287
288There are two new character utility intrinsic
289functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`.
290`TOKENIZE` requires memory allocation to return its results,
291and could and should have been implemented once in some Fortran utility
292library for those who need a slow tokenization facility rather than
293requiring implementations in each vendor's runtime support library with
294all the extra cost and compatibilty risk that entails.
295
296`SPLIT` is worse -- not only could it, like `TOKENIZE`,
297have been supplied by a Fortran utility library rather than being
298added to the standard, it's redundant;
299it provides nothing that cannot be already accomplished by
300composing today's `SCAN` intrinsic function with substring indexing:
301
302```
303module m
304  interface split
305    module procedure :: split
306  end interface
307  !instantiate for all possible ck/ik/lk combinations
308  integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.)
309 contains
310  simple elemental subroutine split(string, set, pos, back)
311    character(*, kind=ck), intent(in) :: string, set
312    integer(kind=ik), intent(in out) :: pos
313    logical(kind=lk), intent(in), optional :: back
314    if (present(back)) then
315      if (back) then
316        pos = scan(string(:pos-1), set, .true.)
317        return
318      end if
319    end if
320    npos = scan(string(pos+1:), set)
321    pos = merge(pos + npos, len(string) + 1, npos /= 0)
322  end
323end
324```
325
326(The code above isn't a proposed implementation for `SPLIT`, just a
327demonstration of how programs could use `SCAN` to accomplish the same
328results today.)
329
330## Source limitations
331
332Fortran 202X raises the maximum number of characters per free form
333source line and the maximum total number of characters per statement.
334Both of these have always been unlimited in this compiler (or
335limited only by available memory, to be more accurate.)
336
337## More BOZ usage opportunities
338
339BOZ literal constants (binary, octal, and hexadecimal constants,
340also known as "typeless" values) have more conforming usage in the
341new standard in contexts where the type is unambiguously known.
342They may now appear as initializers, as right-hand sides of intrinsic
343assignments to integer and real variables, in explicitly typed
344array constructors, and in the definitions of enumerations.
345
346## Citation updates
347
348The source base contains hundreds of references to the subclauses,
349requirements, and constraints of the Fortran 2018 standard, mostly in code comments.
350These will need to be mapped to their Fortran 202X counterparts once the
351new standard is published, as the Fortran committee does not provide a
352means for citing these items by names that are fixed over time like the
353C++ committee does.
354If we had access to the LaTeX sources of the standard, we could generate
355a mapping table and automate this update.
356