1<!--===- docs/F202X.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9# A first take on Fortran 202X features for LLVM Flang 10 11I (Peter Klausler) have been studying the draft PDF of the 12[Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf), 13which will soon be published as ISO Fortran 2023. 14I have compiled this summary of its changes relative to 15the current Fortran 2018 standard from the perspective 16of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang) 17implementor. 18 19## TL;DR 20 21Fortran 202X doesn't make very many changes to the language 22relative to Fortran 2018, which was itself a small increment 23over Fortran 2008. 24Apart from `REDUCE` clauses that were added to the 25[still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md) 26`DO CONCURRENT` construct, there's little here for Fortran users 27to get excited about. 28 29## Priority of implementation in LLVM Flang 30 31We are working hard to ensure that existing working applications will 32port successfully to LLVM Flang with minimal effort. 33I am not particularly concerned with conforming to a new 34standard as an end in itself. 35 36The only features below that appear to have already been implemented 37in other compilers are the `REDUCE` clauses and the degree trigonometric 38intrinsic functions, so those should have priority as an aid to 39portability. 40We would want to support them earlier even if they were not in a standard. 41 42The `REDUCE` clause also merits early implementation due to 43its potential for performance improvements in real codes. 44I don't see any other feature here that would be relevant to 45performance (maybe a weak argument could be made for `SIMPLE`). 46The bulk of this revision unfortunately comprises changes to Fortran that 47are neither performance-related, already available in 48some compilers, nor (obviously) in use in existing codes. 49I will not prioritize implementing them myself over 50other work until they become portability concerns or are 51requested by actual users. 52 53Given Fortran's history of the latency between new 54standards and the support for their features in real compilers, 55and then the extra lag before the features are then actually used 56in codes meant to be portable, I doubt that many of the items 57below will have to be worked on any time soon due to user demand. 58 59If J3 had chosen to add more features that were material improvements 60to Fortran -- and there's quite a long list of worthy candidates that 61were passed over, like read-only pointers -- it would have made sense 62for me to prioritize their implementation in LLVM Flang more 63urgently. 64 65## Specific change descriptions 66 67The individual features added to the language are summarized 68in what I see as their order of significance to Fortran users. 69 70### Alert: There's a breaking change! 71 72The Fortran committee used to abhor making breaking changes, 73apart from fixes, so that conforming codes could be portable across 74time as well as across compilers. 75Fortran 202X, however, uncharacteristically perpetrates one such 76change to existing semantics that will silently cause existing 77codes to work differently, if that change were to be implemented 78and enabled by default. 79 80Specifically, automatic reallocation of whole deferred-length character 81allocatable scalars is now mandated when they appear for internal output 82(e.g., `WRITE(A,*) ...`) 83or as output arguments for some statements and intrinsic procedures 84(e.g., `IOMSG=`, `ERRMSG=`). 85So existing codes that allocate output buffers 86for such things will, or would, now observe that their buffers are 87silently changing their lengths during execution, rather than being 88padded with blanks or being truncated. For example: 89 90``` 91 character(:), allocatable :: buffer 92 allocate(character(20)::buffer) 93 write(buffer,'F5.3') 3.14159 94 print *, len(buffer) 95``` 96 97prints 20 with Fortran 2018 but would print 5 with Fortran 202X. 98 99There would have no problem with the new standard changing the 100behavior in the current error case of an unallocated variable; 101defining new semantics for old errors is a generally safe means 102for extending a programming language. 103However, in this case, we'll need to protect existing conforming 104codes from the surprising new reallocation semantics, which 105affect cases that are not errors. 106 107When/if there are requests from real users to implement this breaking 108change, and if it is implemented, I'll have to ensure that users 109have the ability to control this change in behavior via an option &/or the 110runtime environment, and when it's enabled, emit a warning at code 111sites that are at risk. 112This warning should mention a source change they can make to protect 113themselves from this change by passing the complete substring (`A(:)`) 114instead of a whole character allocatable. 115 116This feature reminds me of Fortran 2003's change to whole 117allocatable array assignment, although in that case users were 118put at risk only of extra runtime overhead that was needless in 119existing codes, not a change in behavior, and users learned to 120assign to whole array sections (`A(:)=...`) rather than to whole 121allocatable arrays where the performance hit mattered. 122 123### Major Items 124 125The features in this section are expensive to implement in 126terms of engineering time to design, code, refactor, and test 127(i.e., weeks or months, not days). 128 129#### `DO CONCURRENT REDUCE` 130 131J3 continues to ignore the 132[serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md) 133with `DO CONCURRENT`, despite the simplicity of the necessary fix and their 134admirable willingness to repair the standard to fix problems with 135other features (e.g., plugging holes in `PURE` procedure requirements) 136and their less admirable willingness to make breaking changes (see above). 137They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be 138immediately useful to HPC codes and worth implementing soon. 139 140#### `SIMPLE` procedures 141 142The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE` 143procedures. 144There are things that one can do in a `PURE` procedure 145but cannot in a `SIMPLE` one. But the virtue of being `SIMPLE` seems 146to be its own reward, not a requirement to access any other 147feature. 148 149`SIMPLE` procedures might have been more useful had `DO CONCURRENT` been 150changed to require callees to be `SIMPLE`, not just `PURE`. 151 152The implementation of `SIMPLE` will be nontrivial: it involves 153some parsing and symbol table work, and some generalization of the 154predicate function `IsPureProcedure()`, extending the semantic checking on 155calls in `PURE` procedures to ensure that `SIMPLE` procedures 156only call other `SIMPLE` procedures, and modifying the intrinsic 157procedure table to note that most intrinsics are now `SIMPLE` 158rather than just `PURE`. 159 160I don't expect any codes to rush to change their `PURE` procedures 161to be `SIMPLE`, since it buys little and reduces portability. 162This makes `SIMPLE` a lower-priority feature. 163 164#### Conditional expressions and actual arguments 165 166Next on the list of "big ticket" items are C-style conditional 167expressions. These come in two forms, each of which is a distinct 168feature that would be nontrivial to implement, and I would not be 169surprised to see some compilers implement one before the other. 170 171The first form is a new parenthesized expression primary that any C programmer 172would recognize. It has straightforward parsing and semantics, 173but will require support in folding and all other code that 174processes expressions. Lowering will be nontrivial due to 175control flow. 176 177The second form is a conditional actual argument syntax 178that allows runtime selection of argument associations, as well 179as a `.NIL.` syntax for optional arguments to signify an absent actual 180argument. This would have been more useful if it had also been 181allowed as a pointer assignment statement right-hand side, and 182that might be a worthwhile extension. As this form is essentially 183a conditional variable reference it may be cleaner to have a 184distinct representation from the conditional expression primary 185in the parse tree and strongly-typed `Expr<T>` representations. 186 187#### `ENUMERATION TYPE` 188 189Fortran 202X has a new category of type. The new non-interoperable 190`ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately, 191a powerful sum data type as in Haskell or Rust. Unlike the 192current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new 193type name and its distinct values. 194 195This feature may well be the item requiring the largest patch to 196the compiler for its implementation, as it affects parsing, 197type checking on assignment and argument association, generic 198resolution, formatted I/O, NAMELIST, debugging symbols, &c. 199It will indirectly affect every switch statement in the compiler 200that switches over the six (now seven) type categories. 201This will be a big project for little useful return to users. 202 203#### `TYPEOF` and `CLASSOF` 204 205Last on the list of "big ticket" items are the new TYPEOF and CLASSOF 206type specifiers, which allow declarations to indirectly use the 207types of previously-defined entities. These would have obvious utility 208in a language with type polymorphism but aren't going to be very 209useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth 210supporting as a utility feature for a parametric module extension. 211 212`CLASSOF` has implications for semantics and lowering that need to 213be thought through as it seems to provide a means of 214declaring polymorphic local variables and function results that are 215neither allocatables nor pointers. 216 217#### Coarray extensions: 218 219 * `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector 220 * Arrays with coarray components 221 222#### "Rank Independent" Features 223 224The `RANK(n)` attribute declaration syntax is equivalent to 225`DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons. 226As `n` must be a constant expression, that's straightforward to implement, 227though not terribly useful until the language acquires additional features. 228(I can see some utility in being able to declare PDT components with a 229`RANK` that depends on a `KIND` type parameter.) 230 231It is now possible to declare the lower and upper bounds of an explicit 232shape entity using a constant-length vector specification expression 233in a declaration, `ALLOCATE` statement, or pointer assignment with 234bounds remapping. 235For example, `real A([2,3])` is equivalent to `real A(2,3)`. 236 237The new `A(@V)` "multiple subscript" indexing syntax uses an integer 238vector to supply a list of subscripts or of triplet bounds/strides. This one 239has tough edge cases for lowering that need to be thought through; 240for example, when the lengths of two or more of the vectors in 241`A(@U,@V,@W)` are not known at compilation time, implementing the indexing 242would be tricky in generated code and might just end up filling a 243temporary with `[U,V,W]` first. 244 245The obvious use case for "multiple subscripts" would be as a means to 246index into an assumed-rank dummy argument without the bother of a `SELECT RANK` 247construct, but that usage is not supported in Fortran 202X. 248 249This feature may well turn out to be Fortran 202X's analog to Fortran 2003's 250`LEN` derived type parameters. 251 252### Minor Items 253 254So much for the major features of Fortran 202X. The longer list 255of minor features can be more briefly summarized. 256 257#### New Edit Descriptors 258 259Fortran 202X has some noncontroversial small tweaks to formatted output. 260The `AT` edit descriptor automatically trims character output. The `LZP`, 261`LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a 262means for controlling the output of leading zero digits. 263 264#### Intrinsic Module Extensions 265 266Addressing some issues and omissions in intrinsic modules: 267 268 * LOGICAL8/16/32/64 and REAL16 269 * IEEE module facilities upgraded to match latest IEEE FP standard 270 * C_F_STRPOINTER, F_C_STRING for NUL-terminated strings 271 * C_F_POINTER(LOWER=) 272 273#### Intrinsic Procedure Extensions 274 275The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks. 276 277There are new intrinsic functions for trigonometric functions in 278units of degrees and half-circles. 279GNU Fortran already supports the forms that use degree units. 280These should call into math library implementations that are 281specialized for those units rather than simply multiplying 282arguments or results with conversion factors. 283 * `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND` 284 * `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI` 285 286`SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL` 287 288There are two new character utility intrinsic 289functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`. 290`TOKENIZE` requires memory allocation to return its results, 291and could and should have been implemented once in some Fortran utility 292library for those who need a slow tokenization facility rather than 293requiring implementations in each vendor's runtime support library with 294all the extra cost and compatibilty risk that entails. 295 296`SPLIT` is worse -- not only could it, like `TOKENIZE`, 297have been supplied by a Fortran utility library rather than being 298added to the standard, it's redundant; 299it provides nothing that cannot be already accomplished by 300composing today's `SCAN` intrinsic function with substring indexing: 301 302``` 303module m 304 interface split 305 module procedure :: split 306 end interface 307 !instantiate for all possible ck/ik/lk combinations 308 integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.) 309 contains 310 simple elemental subroutine split(string, set, pos, back) 311 character(*, kind=ck), intent(in) :: string, set 312 integer(kind=ik), intent(in out) :: pos 313 logical(kind=lk), intent(in), optional :: back 314 if (present(back)) then 315 if (back) then 316 pos = scan(string(:pos-1), set, .true.) 317 return 318 end if 319 end if 320 npos = scan(string(pos+1:), set) 321 pos = merge(pos + npos, len(string) + 1, npos /= 0) 322 end 323end 324``` 325 326(The code above isn't a proposed implementation for `SPLIT`, just a 327demonstration of how programs could use `SCAN` to accomplish the same 328results today.) 329 330## Source limitations 331 332Fortran 202X raises the maximum number of characters per free form 333source line and the maximum total number of characters per statement. 334Both of these have always been unlimited in this compiler (or 335limited only by available memory, to be more accurate.) 336 337## More BOZ usage opportunities 338 339BOZ literal constants (binary, octal, and hexadecimal constants, 340also known as "typeless" values) have more conforming usage in the 341new standard in contexts where the type is unambiguously known. 342They may now appear as initializers, as right-hand sides of intrinsic 343assignments to integer and real variables, in explicitly typed 344array constructors, and in the definitions of enumerations. 345 346## Citation updates 347 348The source base contains hundreds of references to the subclauses, 349requirements, and constraints of the Fortran 2018 standard, mostly in code comments. 350These will need to be mapped to their Fortran 202X counterparts once the 351new standard is published, as the Fortran committee does not provide a 352means for citing these items by names that are fixed over time like the 353C++ committee does. 354If we had access to the LaTeX sources of the standard, we could generate 355a mapping table and automate this update. 356