xref: /llvm-project/flang/docs/FortranForCProgrammers.md (revision b7ff03206d668cd5a620a9d4e1b22ea112ed56e3)
1<!--===- docs/FortranForCProgrammers.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9# Fortran For C Programmers
10
11```{contents}
12---
13local:
14---
15```
16
17This note is limited to essential information about Fortran so that
18a C or C++ programmer can get started more quickly with the language,
19at least as a reader, and avoid some common pitfalls when starting
20to write or modify Fortran code.
21Please see other sources to learn about Fortran's rich history,
22current applications, and modern best practices in new code.
23
24## Know This At Least
25
26* There have been many implementations of Fortran, often from competing
27  vendors, and the standard language has been defined by U.S. and
28  international standards organizations.  The various editions of
29  the standard are known as the '66, '77, '90, '95, 2003, 2008, and
30  (now) 2018 standards.
31* Forward compatibility is important.  Fortran has outlasted many
32  generations of computer systems hardware and software.  Standard
33  compliance notwithstanding, Fortran programmers generally expect that
34  code that has compiled successfully in the past will continue to
35  compile and work indefinitely.  The standards sometimes designate
36  features as being deprecated, obsolescent, or even deleted, but that
37  can be read only as discouraging their use in new code -- they'll
38  probably always work in any serious implementation.
39* Fortran has two source forms, which are typically distinguished by
40  filename suffixes.  `foo.f` is old-style "fixed-form" source, and
41  `foo.f90` is new-style "free-form" source.  All language features
42  are available in both source forms.  Neither form has reserved words
43  in the sense that C does.  Spaces are not required between tokens
44  in fixed form, and case is not significant in either form.
45* Variable declarations are optional by default.  Variables whose
46  names begin with the letters `I` through `N` are implicitly
47  `INTEGER`, and others are implicitly `REAL`.  These implicit typing
48  rules can be changed in the source.
49* Fortran uses parentheses in both array references and function calls.
50  All arrays must be declared as such; other names followed by parenthesized
51  expressions are assumed to be function calls.
52* Fortran has a _lot_ of built-in "intrinsic" functions.  They are always
53  available without a need to declare or import them.  Their names reflect
54  the implicit typing rules, so you will encounter names that have been
55  modified so that they have the right type (e.g., `AIMAG` has a leading `A`
56  so that it's `REAL` rather than `INTEGER`).
57* The modern language has means for declaring types, data, and subprogram
58  interfaces in compiled "modules", as well as legacy mechanisms for
59  sharing data and interconnecting subprograms.
60
61## A Rosetta Stone
62
63Fortran's language standard and other documentation uses some terminology
64in particular ways that might be unfamiliar.
65
66| Fortran | English |
67| ------- | ------- |
68| Association | Making a name refer to something else |
69| Assumed | Some attribute of an argument or interface that is not known until a call is made |
70| Companion processor | A C compiler |
71| Component | Class member |
72| Deferred | Some attribute of a variable that is not known until an allocation or assignment |
73| Derived type | C++ class |
74| Dummy argument | C++ reference argument |
75| Final procedure | C++ destructor |
76| Generic | Overloaded function, resolved by actual arguments |
77| Host procedure | The subprogram that contains a nested one |
78| Implied DO | There's a loop inside a statement |
79| Interface | Prototype |
80| Internal I/O | `sscanf` and `snprintf` |
81| Intrinsic | Built-in type or function |
82| Polymorphic | Dynamically typed |
83| Processor | Fortran compiler |
84| Rank | Number of dimensions that an array has |
85| `SAVE` attribute | Statically allocated |
86| Type-bound procedure | Kind of a C++ member function but not really |
87| Unformatted | Raw binary |
88
89## Data Types
90
91There are five built-in ("intrinsic") types: `INTEGER`, `REAL`, `COMPLEX`,
92`LOGICAL`, and `CHARACTER`.
93They are parameterized with "kind" values, which should be treated as
94non-portable integer codes, although in practice today these are the
95byte sizes of the data.
96(For `COMPLEX`, the kind type parameter value is the byte size of one of the
97two `REAL` components, or half of the total size.)
98The legacy `DOUBLE PRECISION` intrinsic type is an alias for a kind of `REAL`
99that should be more precise, and bigger, than the default `REAL`.
100
101`COMPLEX` is a simple structure that comprises two `REAL` components.
102
103`CHARACTER` data also have length, which may or may not be known at compilation
104time.
105`CHARACTER` variables are fixed-length strings and they get padded out
106with space characters when not completely assigned.
107
108User-defined ("derived") data types can be synthesized from the intrinsic
109types and from previously-defined user types, much like a C `struct`.
110Derived types can be parameterized with integer values that either have
111to be constant at compilation time ("kind" parameters) or deferred to
112execution ("len" parameters).
113
114Derived types can inherit ("extend") from at most one other derived type.
115They can have user-defined destructors (`FINAL` procedures).
116They can specify default initial values for their components.
117With some work, one can also specify a general constructor function,
118since Fortran allows a generic interface to have the same name as that
119of a derived type.
120
121Last, there are "typeless" binary constants that can be used in a few
122situations, like static data initialization or immediate conversion,
123where type is not necessary.
124
125## Arrays
126
127Arrays are not types in Fortran.
128Being an array is a property of an object or function, not of a type.
129Unlike C, one cannot have an array of arrays or an array of pointers,
130although can can have an array of a derived type that has arrays or
131pointers as components.
132Arrays are multidimensional, and the number of dimensions is called
133the _rank_ of the array.
134In storage, arrays are stored such that the last subscript has the
135largest stride in memory, e.g. A(1,1) is followed by A(2,1), not A(1,2).
136And yes, the default lower bound on each dimension is 1, not 0.
137
138Expressions can manipulate arrays as multidimensional values, and
139the compiler will create the necessary loops.
140
141## Allocatables
142
143Modern Fortran programs use `ALLOCATABLE` data extensively.
144Such variables and derived type components are allocated dynamically.
145They are automatically deallocated when they go out of scope, much
146like C++'s `std::vector<>` class template instances are.
147The array bounds, derived type `LEN` parameters, and even the
148type of an allocatable can all be deferred to run time.
149(If you really want to learn all about modern Fortran, I suggest
150that you study everything that can be done with `ALLOCATABLE` data,
151and follow up all the references that are made in the documentation
152from the description of `ALLOCATABLE` to other topics; it's a feature
153that interacts with much of the rest of the language.)
154
155## I/O
156
157Fortran's input/output features are built into the syntax of the language,
158rather than being defined by library interfaces as in C and C++.
159There are means for raw binary I/O and for "formatted" transfers to
160character representations.
161There are means for random-access I/O using fixed-size records as well as for
162sequential I/O.
163One can scan data from or format data into `CHARACTER` variables via
164"internal" formatted I/O.
165I/O from and to files uses a scheme of integer "unit" numbers that is
166similar to the open file descriptors of UNIX; i.e., one opens a file
167and assigns it a unit number, then uses that unit number in subsequent
168`READ` and `WRITE` statements.
169
170Formatted I/O relies on format specifications to map values to fields of
171characters, similar to the format strings used with C's `printf` family
172of standard library functions.
173These format specifications can appear in `FORMAT` statements and
174be referenced by their labels, in character literals directly in I/O
175statements, or in character variables.
176
177One can also use compiler-generated formatting in "list-directed" I/O,
178in which the compiler derives reasonable default formats based on
179data types.
180
181## Subprograms
182
183Fortran has both `FUNCTION` and `SUBROUTINE` subprograms.
184They share the same name space, but functions cannot be called as
185subroutines or vice versa.
186Subroutines are called with the `CALL` statement, while functions are
187invoked with function references in expressions.
188
189There is one level of subprogram nesting.
190A function, subroutine, or main program can have functions and subroutines
191nested within it, but these "internal" procedures cannot themselves have
192their own internal procedures.
193As is the case with C++ lambda expressions, internal procedures can
194reference names from their host subprograms.
195
196## Modules
197
198Modern Fortran has good support for separate compilation and namespace
199management.
200The *module* is the basic unit of compilation, although independent
201subprograms still exist, of course, as well as the main program.
202Modules define types, constants, interfaces, and nested
203subprograms.
204
205Objects from a module are made available for use in other compilation
206units via the `USE` statement, which has options for limiting the objects
207that are made available as well as for renaming them.
208All references to objects in modules are done with direct names or
209aliases that have been added to the local scope, as Fortran has no means
210of qualifying references with module names.
211
212## Arguments
213
214Functions and subroutines have "dummy" arguments that are dynamically
215associated with actual arguments during calls.
216Essentially, all argument passing in Fortran is by reference, not value.
217One may restrict access to argument data by declaring that dummy
218arguments have `INTENT(IN)`, but that corresponds to the use of
219a `const` reference in C++ and does not imply that the data are
220copied; use `VALUE` for that.
221
222When it is not possible to pass a reference to an object, or a sparse
223regular array section of an object, as an actual argument, Fortran
224compilers must allocate temporary space to hold the actual argument
225across the call.
226This is always guaranteed to happen when an actual argument is enclosed
227in parentheses.
228
229The compiler is free to assume that any aliasing between dummy arguments
230and other data is safe.
231In other words, if some object can be written to under one name, it's
232never going to be read or written using some other name in that same
233scope.
234```
235  SUBROUTINE FOO(X,Y,Z)
236  X = 3.14159
237  Y = 2.1828
238  Z = 2 * X ! CAN BE FOLDED AT COMPILE TIME
239  END
240```
241This is the opposite of the assumptions under which a C or C++ compiler must
242labor when trying to optimize code with pointers.
243
244## Overloading
245
246Fortran supports a form of overloading via its interface feature.
247By default, an interface is a means for specifying prototypes for a
248set of subroutines and functions.
249But when an interface is named, that name becomes a *generic* name
250for its specific subprograms, and calls via the generic name are
251mapped at compile time to one of the specific subprograms based
252on the types, kinds, and ranks of the actual arguments.
253A similar feature can be used for generic type-bound procedures.
254
255This feature can be used to overload the built-in operators and some
256I/O statements, too.
257
258## Polymorphism
259
260Fortran code can be written to accept data of some derived type or
261any extension thereof using `CLASS`, deferring the actual type to
262execution, rather than the usual `TYPE` syntax.
263This is somewhat similar to the use of `virtual` functions in c++.
264
265Fortran's `SELECT TYPE` construct is used to distinguish between
266possible specific types dynamically, when necessary.  It's a
267little like C++17's `std::visit()` on a discriminated union.
268
269## Pointers
270
271Pointers are objects in Fortran, not data types.
272Pointers can point to data, arrays, and subprograms.
273A pointer can only point to data that has the `TARGET` attribute.
274Outside of the pointer assignment statement (`P=>X`) and some intrinsic
275functions and cases with pointer dummy arguments, pointers are implicitly
276dereferenced, and the use of their name is a reference to the data to which
277they point instead.
278
279Unlike C, a pointer cannot point to a pointer *per se*, nor can they be
280used to implement a level of indirection to the management structure of
281an allocatable.
282If you assign to a Fortran pointer to make it point at another pointer,
283you are making the pointer point to the data (if any) to which the other
284pointer points.
285Similarly, if you assign to a Fortran pointer to make it point to an allocatable,
286you are making the pointer point to the current content of the allocatable,
287not to the metadata that manages the allocatable.
288
289Unlike allocatables, pointers do not deallocate their data when they go
290out of scope.
291
292A legacy feature, "Cray pointers", implements dynamic base addressing of
293one variable using an address stored in another.
294
295## Preprocessing
296
297There is no standard preprocessing feature, but every real Fortran implementation
298has some support for passing Fortran source code through a variant of
299the standard C source preprocessor.
300Since Fortran is very different from C at the lexical level (e.g., line
301continuations, Hollerith literals, no reserved words, fixed form), using
302a stock modern C preprocessor on Fortran source can be difficult.
303Preprocessing behavior varies across implementations and one should not depend on
304much portability.
305Preprocessing is typically requested by the use of a capitalized filename
306suffix (e.g., "foo.F90") or a compiler command line option.
307(Since the F18 compiler always runs its built-in preprocessing stage,
308no special option or filename suffix is required.)
309
310## "Object Oriented" Programming
311
312Fortran doesn't have member functions (or subroutines) in the sense
313that C++ does, in which a function has immediate access to the members
314of a specific instance of a derived type.
315But Fortran does have an analog to C++'s `this` via *type-bound
316procedures*.
317This is a means of binding a particular subprogram name to a derived
318type, possibly with aliasing, in such a way that the subprogram can
319be called as if it were a component of the type (e.g., `X%F(Y)`)
320and receive the object to the left of the `%` as an additional actual argument,
321exactly as if the call had been written `F(X,Y)`.
322The object is passed as the first argument by default, but that can be
323changed; indeed, the same specific subprogram can be used for multiple
324type-bound procedures by choosing different dummy arguments to serve as
325the passed object.
326The equivalent of a `static` member function is also available by saying
327that no argument is to be associated with the object via `NOPASS`.
328
329There's a lot more that can be said about type-bound procedures (e.g., how they
330support overloading) but this should be enough to get you started with
331the most common usage.
332
333## Pitfalls
334
335Variable initializers, e.g. `INTEGER :: J=123`, are _static_ initializers!
336They imply that the variable is stored in static storage, not on the stack,
337and the initialized value lasts only until the variable is assigned.
338One must use an assignment statement to implement a dynamic initializer
339that will apply to every fresh instance of the variable.
340Be especially careful when using initializers in the newish `BLOCK` construct,
341which perpetuates the interpretation as static data.
342(Derived type component initializers, however, do work as expected.)
343
344If you see an assignment to an array that's never been declared as such,
345it's probably a definition of a *statement function*, which is like
346a parameterized macro definition, e.g. `A(X)=SQRT(X)**3`.
347In the original Fortran language, this was the only means for user
348function definitions.
349Today, of course, one should use an external or internal function instead.
350
351Fortran expressions don't bind exactly like C's do.
352Watch out for exponentiation with `**`, which of course C lacks; it
353binds more tightly than negation does (e.g., `-2**2` is -4),
354and it binds to the right, unlike what any other Fortran and most
355C operators do; e.g., `2**2**3` is 256, not 64.
356Logical values must be compared with special logical equivalence
357relations (`.EQV.` and `.NEQV.`) rather than the usual equality
358operators.
359
360A Fortran compiler is allowed to short-circuit expression evaluation,
361but not required to do so.
362If one needs to protect a use of an `OPTIONAL` argument or possibly
363disassociated pointer, use an `IF` statement, not a logical `.AND.`
364operation.
365In fact, Fortran can remove function calls from expressions if their
366values are not required to determine the value of the expression's
367result; e.g., if there is a `PRINT` statement in function `F`, it
368may or may not be executed by the assignment statement `X=0*F()`.
369(Well, it probably will be, in practice, but compilers always reserve
370the right to optimize better.)
371
372Unless they have an explicit suffix (`1.0_8`, `2.0_8`) or a `D`
373exponent (`3.0D0`), real literal constants in Fortran have the
374default `REAL` type -- *not* `double` as in the case in C and C++.
375If you're not careful, you can lose precision at compilation time
376from your constant values and never know it.
377