xref: /llvm-project/flang/docs/DebugGeneration.md (revision 357f6c7826437f6527db6f99f756a34fb5e0f716)
1# Debug Generation
2
3Application developers spend a significant time debugging the applications that
4they create. Hence it is important that a compiler provide support for a good
5debug experience. DWARF[1] is the standard debugging file format used by
6compilers and debuggers. The LLVM infrastructure supports debug info generation
7using metadata[2]. Support for generating debug metadata is present
8in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to
9generate good debug information.
10
11We can break the work for debug generation into two separate tasks:
121) Line Table generation
132) Full debug generation
14The support for Fortran Debug in LLVM infrastructure[3] has made great progress
15due to many Fortran frontends adopting LLVM as the backend as well as the
16availability of the Classic Flang compiler.
17
18## Driver Flags
19By default, Flang will not generate any debug or linetable information.
20Debug information will be generated if the following flags are present.
21
22-gline-tables-only, -g1 : Emit debug line number tables only
23-g : Emit full debug info
24
25## Line Table Generation
26
27There is existing AddDebugFoundationPass which add `FusedLoc` with a
28`SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata
29for that function. However, following values are hardcoded at the moment. These
30will instead be passed from the driver.
31
32- Details of the compiler (name and version and git hash).
33- Language Standard. We can set it to Fortran95 for now and periodically
34revise it when full support for later standards is available.
35- Optimisation Level.
36- Type of debug generated (linetable/full debug).
37- Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is
38the main program.
39
40`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
41match the signature of the actual function/subroutine.
42
43
44## Full Debug Generation
45
46Full debug info will include metadata to describe functions, variables and
47types. Flang will generate debug metadata in the form of MLIR attributes. These
48attributes will be converted to the format expected by LLVM IR in DebugTranslation[4].
49
50Debug metadata generation can be broken down in 2 steps.
51
521. MLIR attributes are generated by reading information from AST or FIR. This
53step can happen anytime before or during conversion to LLVM dialect. An example
54of the metadata generated in this step is `DILocalVariableAttr` or
55`DIDerivedTypeAttr`.
56
572. Changes that can only happen during or after conversion to LLVM dialect. The
58example of this is passing `DIGlobalVariableExpressionAttr` while
59creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp`
60that is required for local variables. It can only be created after conversion to
61LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are
62quite minimal. The bulk of the work happens in step 1.
63
64One design decision that we need to make is to decide where to perform step 1.
65Here are some possible options:
66
67**During conversion to LLVM dialect**
68
69Pros:
701. Do step 1 and 2 in one place.
712. No chance of missing any change introduced by an earlier transformation.
72
73Cons:
741. Passing a lot of information from the driver as discussed in the line table
75section above may muddle interface of FIRToLLVMConversion.
762. `DeclareOp` is removed before this pass.
773. Even if `DeclareOp` is retained, creating debug metadata while some ops have
78been converted to LLVMdialect and others are not may cause its own issues. We
79have to walk the ops chain to extract the information which may be problematic
80in this case.
814. Some source information is lost by this point. Examples include
82information about namelists, source line information about field of derived
83types etc.
84
85**During a pass before conversion to LLVM dialect**
86
87This is similar to what AddDebugFoundationPass is currently doing.
88
89Pros:
901. One central location dedicated to debug information processing. This can
91result in a cleaner implementation.
922. Similar to above, less chance of missing any change introduced by an earlier
93transformation.
94
95Cons:
961. Step 2 still need to happen during conversion to LLVM dialect. But
97changes required for step 2 are quite minimal.
982. Similar to above, some source information may be lost by this point.
99
100**During Lowering from AST**
101
102Pros
1031. We have better source information.
104
105Cons:
1061. There may be change in the code after lowering which may not be
107reflected in debug information.
1082. Comments on an earlier PR [5] advised against this approach.
109
110## Design
111
112The design below assumes that we are extracting the information from FIR.
113If we generate debug metadata during lowering then the description below
114may need to change. Although the generated metadata remains the same in
115both cases.
116
117The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The
118information mentioned in the line info section above will be passed to it from
119the driver. This pass will run quite late in the pipeline but before
120`DeclareOp` is removed.
121
122In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp`
123and `DeclareOp` to extract the source information and build the MLIR
124attributes. A class will be added to handle conversion of MLIR and FIR types to
125`DITypeAttr`.
126
127Following sections provide details of how various language constructs will be
128handled. In these sections, the LLVM IR metadata and MLIR attributes have been
129used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute
130which gets translated to LLVM IR's `DILocalVariable`.
131
132### Variables
133
134#### Local Variables
135  In MLIR, local variables are represented by `DILocalVariableAttr` which
136  stores information like source location and type. They also require a
137  `DbgDeclareOp` which binds `DILocalVariableAttr` with a location.
138
139  In FIR, `DeclareOp` has source information about the variable. The
140  `DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is
141  attached to the memref op of the `DeclareOp` using a `FusedLoc` approach.
142
143  During conversion to LLVM dialect, when an op is encountered that has a
144  `DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which
145  binds the attr with its location.
146
147  The change in the IR look like as follows:
148
149```
150  original fir
151  %2 = fir.alloca i32  loc(#loc4)
152  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
153
154  Fir with FusedLoc.
155
156  %2 = fir.alloca i32  loc(#loc38)
157  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
158  #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... >
159  #loc38 = loc(fused<#di_local_variable5>[#loc4])
160
161  After conversion to llvm dialect
162
163  #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...>
164  %1 = llvm.alloca %0 x i64
165  llvm.intr.dbg.declare #di_local_variable = %1
166```
167
168#### Function Arguments
169
170Arguments work in similar way, but they present a difficulty that `DeclareOp`'s
171memref points to `BlockArgument`. Unlike the op in local variable case,
172the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily
173be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering
174or in a separate pass.
175
176### Module
177
178In debug metadata, the Fortran module will be represented by `DIModuleAttr`.
179The variables or functions inside module will have scope pointing to the parent module.
180
181```
182module helper
183   real glr
184   ...
185end module helper
186
187!1 = !DICompileUnit(language: DW_LANG_Fortran90 ...)
188!2 = !DIModule(scope: !1, name: "helper" ...)
189!3 = !DIGlobalVariable(scope: !2, name: "glr" ...)
190
191Use of a module results in the following metadata.
192!4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2)
193```
194
195Modules are not first class entities in the FIR. So there is no way to get
196the location where they are declared in source file.
197
198But the information that a variable or function is part of a module
199can be extracted from its mangled name along with name of the module. There is
200a `GlobalOp` generated for each module variable in FIR and there is also a
201`DeclareOp` in each function where the module variable is used.
202
203We will use the `GlobalOp` to generate the `DIModuleAttr` and associated
204`DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used
205to generate `DIImportedEntityAttr`. Care will be taken to avoid generating
206duplicate `DIImportedEntityAttr` entries in same function.
207
208### Derived Types
209
210A derived type will be represented in metadata by `DICompositeType` with a tag of
211`DW_TAG_structure_type`. It will have elements which point to the components.
212
213```
214  type :: t_pair
215    integer :: i
216    real :: x
217  end type
218!1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...)
219!2 = !{!3, !4}
220!3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...)
221!4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...)
222!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
223!6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...)
224```
225
226In FIR, `RecordType` and `TypeInfoOp` can be used to get information about the
227location of the derived type and the types of its components. We may also use
228`FusedLoc` on `TypeInfoOp` to encode location information for all the components
229of the derived type.
230
231### CommonBlocks
232
233A common block will be represented in metadata by `DICommonBlockAttr` which
234will be used as scope by the variable inside common block. `DIExpression`
235can be used to give the offset of any given variable inside the global storage
236for common block.
237
238```
239integer a, b
240common /test/ a, b
241
242;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6
243!1 = !DISubprogram()
244!2 = !DICommonBlock(scope: !1, name: "test" ...)
245!3 = !DIGlobalVariable(scope: !2, name: "a" ...)
246!4 = !DIExpression()
247!5 = !DIGlobalVariableExpression(var: !3, expr: !4)
248!6 = !DIGlobalVariable(scope: !2, name: "b" ...)
249!7 = !DIExpression(DW_OP_plus_uconst, 4)
250!8 = !DIGlobalVariableExpression(var: !6, expr: !7)
251```
252
253In FIR, a common block results in a `GlobalOp` with common linkage. Every
254function where the common block is used has `DeclareOp` for that variable.
255This `DeclareOp` will point to global storage through
256`CoordinateOp` and `AddrOfOp`. The `CoordinateOp` has the offset of the
257location of this variable in global storage. There is enough information to
258generate the required metadata. Although it requires walking up the chain from
259`DeclaredOp` to locate `CoordinateOp` and `AddrOfOp`.
260
261### Arrays
262
263The type of fixed size array is represented using `DICompositeType`. The
264`DISubrangeAttr` is used to provide bounds in any given dimensions.
265
266```
267integer abc(4,5)
268
269!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...)
270!2 = !{ !3, !4 }
271!3 = !DISubrange(lowerBound: 1, upperBound: 4 ...)
272!4 = !DISubrange(lowerBound: 1, upperBound: 5 ...)
273!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
274
275```
276
277#### Adjustable
278
279The debug metadata for the adjustable array looks similar to fixed sized array
280with one change. The bounds are not constant values but point to a
281`DILocalVariableAttr`.
282
283In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain
284to get the value that represents the array bound in any dimension. We will
285create a `DILocalVariableAttr` that will point to that location. This
286variable will be used in the `DISubrangeAttr`. Note that this
287`DILocalVariableAttr` does not correspond to any source variable.
288
289#### Assumed Size
290
291This is treated as raw array. Debug information will not provide any upper bound
292information for the last dimension.
293
294#### Assumed Shape
295The assumed shape array will use the similar representation as fixed size
296array but there will be 2 differences.
297
2981. There will be a `datalocation` field which will be an expression. This will
299enable debugger to get the data pointer from array descriptor.
300
3012. The field in `DISubrangeAttr` for array bounds will be expression which will
302allow the debugger to get the bounds from descriptor.
303
304```
305integer(4), intent(out) :: a(:,:)
306
307!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, elements: !2, dataLocation: !3)
308!2 = !{!5, !7}
309!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
310!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref)
311!5 = !DISubrange(lowerBound: !1, upperBound: !4 ...)
312!6 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 56, DW_OP_deref)
313!7 = !DISubrange(lowerBound: !1, upperBound: !6, ...)
314!8 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
315```
316
317In assumed shape case, the rank can be determined from the FIR's `SequenceType`.
318This allows us to generate a `DISubrangeAttr` in each dimension.
319
320#### Assumed Rank
321
322This is currently unsupported in flang. Its representation will be similar to
323array representation for assumed shape array with the following difference.
324
3251. `DICompositeTypeAttr` will have a rank field which will be an expression.
326It will be used to get the rank value from descriptor.
3272. Instead of `DISubrangeType` for each dimension, there will be a single
328`DIGenericSubrange` which will allow debuggers to calculate bounds in any
329dimension.
330
331### Pointers and Allocatables
332The pointer and allocatable will be represented using `DICompositeTypeAttr`. It
333is quirk of DWARF that scalar allocatable or pointer variables will show up in
334the debug info as pointer to scalar while array pointer or allocatable
335variables show up as arrays. The behavior is same in gfortran and classic flang.
336
337```
338  integer, allocatable :: ar(:)
339  integer, pointer :: sc
340
341!1 = !DILocalVariable(name: "sc", type: !2)
342!2 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !3, associated: !9 ...)
343!3 = !DIBasicType(tag: DW_TAG_base_type, name: "integer", ...)
344!4 = !DILocalVariable(name: "ar", type: !5 ...)
345!5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, elements: !6, dataLocation: !8, allocated: !9)
346!6 = !{!7}
347!7 = !DISubrange(lowerBound: !10, upperBound: !11 ...)
348!8 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
349!9 = !DIExpression(DW_OP_push_object_address, DW_OP_deref, DW_OP_lit0, DW_OP_ne)
350!10 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 24, DW_OP_deref)
351!11 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref)
352
353```
354
355IN FIR, these variable are represent as <!fir.box<!fir.heap<>> or
356fir.box<!fir.ptr<>>. There is also `allocatable` or `pointer` attribute on
357the `DeclareOp`. This allows us to generate allocated/associated status of
358these variables. The metadata to get the information from the descriptor is
359similar to arrays.
360
361### Strings
362
363The `DIStringTypeAttr` can represent both fixed size and allocatable strings. For
364the allocatable case, the `stringLengthExpression` and `stringLocationExpression`
365are used to provide the length and the location of the string respectively.
366
367```
368  character(len=:), allocatable :: var
369  character(len=20) :: fixed
370
371!1 = !DILocalVariable(name: "var", type: !2)
372!2 = !DIStringType(name: "character(*)", stringLengthExpression: !4, stringLocationExpression: !3 ...)
373!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
374!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 8)
375
376!5 = !DILocalVariable(name: "fixed", type: !6)
377!6 = !DIStringType(name: "character (20)", size: 160)
378
379```
380
381### Association
382
383They will be treated like normal variables. Although we may require to handle
384the case where the `DeclareOp` of one variable points to the `DeclareOp` of
385another variable (e.g. a => b).
386
387### Namelists
388
389FIR does not seem to have a way to extract information about namelists.
390
391```
392namelist /abc/ x3, y3
393
394(gdb) p abc
395$1 = ( x3 = 100, y3 = 500 )
396(gdb) p x3
397$2 = 100
398(gdb) p y3
399$3 = 500
400```
401
402Even without namelist support, we should be able to see the value of the
403individual variables like `x3` and `y3` in the above example. But we would not
404be able to evaluate the namelist and have the debugger prints the value of all
405the variables in it as shown above for `abc`.
406
407## Missing metadata in MLIR
408
409Some metadata types that are needed for fortran are present in LLVM IR but are
410absent from MLIR. A non comprehensive list is given below.
411
4121. `DICommonBlockAttr`
4132. `DIGenericSubrangeAttr`
4143. `DISubrangeAttr` in MLIR takes IntegerAttr at the moment so only works
415with fixed sizes arrays. It needs to also accept `DIExpressionAttr` or
416`DILocalVariableAttr` to support assumed shape and adjustable arrays.
4174. The `DICompositeTypeAttr` will need to have field for `datalocation`,
418`rank`, `allocated` and `associated`.
4195. `DIStringTypeAttr`
420
421# Testing
422
423- LLVM LIT tests will be added to test:
424  - the driver and ensure that it passes the line table and full debug
425    info generation appropriately.
426  - that the pass works as expected and generates debug info. Test will be
427    with `fir-opt`.
428  - with `flang -fc1` that end-to-end debug info generation works.
429- Manual external tests will be written to ensure that the following works
430  in debug tools
431  - Break at lines.
432  - Break at functions.
433  - print type (ptype) of function names.
434  - print values and types (ptype) of various type of variables
435- Manually run `GDB`'s gdb.fortran testsuite with llvm-flang.
436
437# Resources
438- [1] https://dwarfstd.org/doc/DWARF5.pdf
439- [2] https://llvm.org/docs/LangRef.html#metadata
440- [3] https://archive.fosdem.org/2022/schedule/event/llvm_fortran_debug/
441- [4] https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/DebugTranslation.cpp
442- [5] https://github.com/llvm/llvm-project/pull/84202
443