1# Debug Generation 2 3Application developers spend a significant time debugging the applications that 4they create. Hence it is important that a compiler provide support for a good 5debug experience. DWARF[1] is the standard debugging file format used by 6compilers and debuggers. The LLVM infrastructure supports debug info generation 7using metadata[2]. Support for generating debug metadata is present 8in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to 9generate good debug information. 10 11We can break the work for debug generation into two separate tasks: 121) Line Table generation 132) Full debug generation 14The support for Fortran Debug in LLVM infrastructure[3] has made great progress 15due to many Fortran frontends adopting LLVM as the backend as well as the 16availability of the Classic Flang compiler. 17 18## Driver Flags 19By default, Flang will not generate any debug or linetable information. 20Debug information will be generated if the following flags are present. 21 22-gline-tables-only, -g1 : Emit debug line number tables only 23-g : Emit full debug info 24 25## Line Table Generation 26 27There is existing AddDebugFoundationPass which add `FusedLoc` with a 28`SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata 29for that function. However, following values are hardcoded at the moment. These 30will instead be passed from the driver. 31 32- Details of the compiler (name and version and git hash). 33- Language Standard. We can set it to Fortran95 for now and periodically 34revise it when full support for later standards is available. 35- Optimisation Level. 36- Type of debug generated (linetable/full debug). 37- Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is 38the main program. 39 40`DISubroutineTypeAttr` currently has a fixed type. This will be changed to 41match the signature of the actual function/subroutine. 42 43 44## Full Debug Generation 45 46Full debug info will include metadata to describe functions, variables and 47types. Flang will generate debug metadata in the form of MLIR attributes. These 48attributes will be converted to the format expected by LLVM IR in DebugTranslation[4]. 49 50Debug metadata generation can be broken down in 2 steps. 51 521. MLIR attributes are generated by reading information from AST or FIR. This 53step can happen anytime before or during conversion to LLVM dialect. An example 54of the metadata generated in this step is `DILocalVariableAttr` or 55`DIDerivedTypeAttr`. 56 572. Changes that can only happen during or after conversion to LLVM dialect. The 58example of this is passing `DIGlobalVariableExpressionAttr` while 59creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp` 60that is required for local variables. It can only be created after conversion to 61LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are 62quite minimal. The bulk of the work happens in step 1. 63 64One design decision that we need to make is to decide where to perform step 1. 65Here are some possible options: 66 67**During conversion to LLVM dialect** 68 69Pros: 701. Do step 1 and 2 in one place. 712. No chance of missing any change introduced by an earlier transformation. 72 73Cons: 741. Passing a lot of information from the driver as discussed in the line table 75section above may muddle interface of FIRToLLVMConversion. 762. `DeclareOp` is removed before this pass. 773. Even if `DeclareOp` is retained, creating debug metadata while some ops have 78been converted to LLVMdialect and others are not may cause its own issues. We 79have to walk the ops chain to extract the information which may be problematic 80in this case. 814. Some source information is lost by this point. Examples include 82information about namelists, source line information about field of derived 83types etc. 84 85**During a pass before conversion to LLVM dialect** 86 87This is similar to what AddDebugFoundationPass is currently doing. 88 89Pros: 901. One central location dedicated to debug information processing. This can 91result in a cleaner implementation. 922. Similar to above, less chance of missing any change introduced by an earlier 93transformation. 94 95Cons: 961. Step 2 still need to happen during conversion to LLVM dialect. But 97changes required for step 2 are quite minimal. 982. Similar to above, some source information may be lost by this point. 99 100**During Lowering from AST** 101 102Pros 1031. We have better source information. 104 105Cons: 1061. There may be change in the code after lowering which may not be 107reflected in debug information. 1082. Comments on an earlier PR [5] advised against this approach. 109 110## Design 111 112The design below assumes that we are extracting the information from FIR. 113If we generate debug metadata during lowering then the description below 114may need to change. Although the generated metadata remains the same in 115both cases. 116 117The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The 118information mentioned in the line info section above will be passed to it from 119the driver. This pass will run quite late in the pipeline but before 120`DeclareOp` is removed. 121 122In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp` 123and `DeclareOp` to extract the source information and build the MLIR 124attributes. A class will be added to handle conversion of MLIR and FIR types to 125`DITypeAttr`. 126 127Following sections provide details of how various language constructs will be 128handled. In these sections, the LLVM IR metadata and MLIR attributes have been 129used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute 130which gets translated to LLVM IR's `DILocalVariable`. 131 132### Variables 133 134#### Local Variables 135 In MLIR, local variables are represented by `DILocalVariableAttr` which 136 stores information like source location and type. They also require a 137 `DbgDeclareOp` which binds `DILocalVariableAttr` with a location. 138 139 In FIR, `DeclareOp` has source information about the variable. The 140 `DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is 141 attached to the memref op of the `DeclareOp` using a `FusedLoc` approach. 142 143 During conversion to LLVM dialect, when an op is encountered that has a 144 `DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which 145 binds the attr with its location. 146 147 The change in the IR look like as follows: 148 149``` 150 original fir 151 %2 = fir.alloca i32 loc(#loc4) 152 %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"} 153 154 Fir with FusedLoc. 155 156 %2 = fir.alloca i32 loc(#loc38) 157 %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"} 158 #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... > 159 #loc38 = loc(fused<#di_local_variable5>[#loc4]) 160 161 After conversion to llvm dialect 162 163 #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...> 164 %1 = llvm.alloca %0 x i64 165 llvm.intr.dbg.declare #di_local_variable = %1 166``` 167 168#### Function Arguments 169 170Arguments work in similar way, but they present a difficulty that `DeclareOp`'s 171memref points to `BlockArgument`. Unlike the op in local variable case, 172the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily 173be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering 174or in a separate pass. 175 176### Module 177 178In debug metadata, the Fortran module will be represented by `DIModuleAttr`. 179The variables or functions inside module will have scope pointing to the parent module. 180 181``` 182module helper 183 real glr 184 ... 185end module helper 186 187!1 = !DICompileUnit(language: DW_LANG_Fortran90 ...) 188!2 = !DIModule(scope: !1, name: "helper" ...) 189!3 = !DIGlobalVariable(scope: !2, name: "glr" ...) 190 191Use of a module results in the following metadata. 192!4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2) 193``` 194 195Modules are not first class entities in the FIR. So there is no way to get 196the location where they are declared in source file. 197 198But the information that a variable or function is part of a module 199can be extracted from its mangled name along with name of the module. There is 200a `GlobalOp` generated for each module variable in FIR and there is also a 201`DeclareOp` in each function where the module variable is used. 202 203We will use the `GlobalOp` to generate the `DIModuleAttr` and associated 204`DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used 205to generate `DIImportedEntityAttr`. Care will be taken to avoid generating 206duplicate `DIImportedEntityAttr` entries in same function. 207 208### Derived Types 209 210A derived type will be represented in metadata by `DICompositeType` with a tag of 211`DW_TAG_structure_type`. It will have elements which point to the components. 212 213``` 214 type :: t_pair 215 integer :: i 216 real :: x 217 end type 218!1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...) 219!2 = !{!3, !4} 220!3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...) 221!4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...) 222!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...) 223!6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...) 224``` 225 226In FIR, `RecordType` and `TypeInfoOp` can be used to get information about the 227location of the derived type and the types of its components. We may also use 228`FusedLoc` on `TypeInfoOp` to encode location information for all the components 229of the derived type. 230 231### CommonBlocks 232 233A common block will be represented in metadata by `DICommonBlockAttr` which 234will be used as scope by the variable inside common block. `DIExpression` 235can be used to give the offset of any given variable inside the global storage 236for common block. 237 238``` 239integer a, b 240common /test/ a, b 241 242;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6 243!1 = !DISubprogram() 244!2 = !DICommonBlock(scope: !1, name: "test" ...) 245!3 = !DIGlobalVariable(scope: !2, name: "a" ...) 246!4 = !DIExpression() 247!5 = !DIGlobalVariableExpression(var: !3, expr: !4) 248!6 = !DIGlobalVariable(scope: !2, name: "b" ...) 249!7 = !DIExpression(DW_OP_plus_uconst, 4) 250!8 = !DIGlobalVariableExpression(var: !6, expr: !7) 251``` 252 253In FIR, a common block results in a `GlobalOp` with common linkage. Every 254function where the common block is used has `DeclareOp` for that variable. 255This `DeclareOp` will point to global storage through 256`CoordinateOp` and `AddrOfOp`. The `CoordinateOp` has the offset of the 257location of this variable in global storage. There is enough information to 258generate the required metadata. Although it requires walking up the chain from 259`DeclaredOp` to locate `CoordinateOp` and `AddrOfOp`. 260 261### Arrays 262 263The type of fixed size array is represented using `DICompositeType`. The 264`DISubrangeAttr` is used to provide bounds in any given dimensions. 265 266``` 267integer abc(4,5) 268 269!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...) 270!2 = !{ !3, !4 } 271!3 = !DISubrange(lowerBound: 1, upperBound: 4 ...) 272!4 = !DISubrange(lowerBound: 1, upperBound: 5 ...) 273!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...) 274 275``` 276 277#### Adjustable 278 279The debug metadata for the adjustable array looks similar to fixed sized array 280with one change. The bounds are not constant values but point to a 281`DILocalVariableAttr`. 282 283In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain 284to get the value that represents the array bound in any dimension. We will 285create a `DILocalVariableAttr` that will point to that location. This 286variable will be used in the `DISubrangeAttr`. Note that this 287`DILocalVariableAttr` does not correspond to any source variable. 288 289#### Assumed Size 290 291This is treated as raw array. Debug information will not provide any upper bound 292information for the last dimension. 293 294#### Assumed Shape 295The assumed shape array will use the similar representation as fixed size 296array but there will be 2 differences. 297 2981. There will be a `datalocation` field which will be an expression. This will 299enable debugger to get the data pointer from array descriptor. 300 3012. The field in `DISubrangeAttr` for array bounds will be expression which will 302allow the debugger to get the bounds from descriptor. 303 304``` 305integer(4), intent(out) :: a(:,:) 306 307!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, elements: !2, dataLocation: !3) 308!2 = !{!5, !7} 309!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) 310!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref) 311!5 = !DISubrange(lowerBound: !1, upperBound: !4 ...) 312!6 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 56, DW_OP_deref) 313!7 = !DISubrange(lowerBound: !1, upperBound: !6, ...) 314!8 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...) 315``` 316 317In assumed shape case, the rank can be determined from the FIR's `SequenceType`. 318This allows us to generate a `DISubrangeAttr` in each dimension. 319 320#### Assumed Rank 321 322This is currently unsupported in flang. Its representation will be similar to 323array representation for assumed shape array with the following difference. 324 3251. `DICompositeTypeAttr` will have a rank field which will be an expression. 326It will be used to get the rank value from descriptor. 3272. Instead of `DISubrangeType` for each dimension, there will be a single 328`DIGenericSubrange` which will allow debuggers to calculate bounds in any 329dimension. 330 331### Pointers and Allocatables 332The pointer and allocatable will be represented using `DICompositeTypeAttr`. It 333is quirk of DWARF that scalar allocatable or pointer variables will show up in 334the debug info as pointer to scalar while array pointer or allocatable 335variables show up as arrays. The behavior is same in gfortran and classic flang. 336 337``` 338 integer, allocatable :: ar(:) 339 integer, pointer :: sc 340 341!1 = !DILocalVariable(name: "sc", type: !2) 342!2 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !3, associated: !9 ...) 343!3 = !DIBasicType(tag: DW_TAG_base_type, name: "integer", ...) 344!4 = !DILocalVariable(name: "ar", type: !5 ...) 345!5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, elements: !6, dataLocation: !8, allocated: !9) 346!6 = !{!7} 347!7 = !DISubrange(lowerBound: !10, upperBound: !11 ...) 348!8 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) 349!9 = !DIExpression(DW_OP_push_object_address, DW_OP_deref, DW_OP_lit0, DW_OP_ne) 350!10 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 24, DW_OP_deref) 351!11 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref) 352 353``` 354 355IN FIR, these variable are represent as <!fir.box<!fir.heap<>> or 356fir.box<!fir.ptr<>>. There is also `allocatable` or `pointer` attribute on 357the `DeclareOp`. This allows us to generate allocated/associated status of 358these variables. The metadata to get the information from the descriptor is 359similar to arrays. 360 361### Strings 362 363The `DIStringTypeAttr` can represent both fixed size and allocatable strings. For 364the allocatable case, the `stringLengthExpression` and `stringLocationExpression` 365are used to provide the length and the location of the string respectively. 366 367``` 368 character(len=:), allocatable :: var 369 character(len=20) :: fixed 370 371!1 = !DILocalVariable(name: "var", type: !2) 372!2 = !DIStringType(name: "character(*)", stringLengthExpression: !4, stringLocationExpression: !3 ...) 373!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref) 374!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 8) 375 376!5 = !DILocalVariable(name: "fixed", type: !6) 377!6 = !DIStringType(name: "character (20)", size: 160) 378 379``` 380 381### Association 382 383They will be treated like normal variables. Although we may require to handle 384the case where the `DeclareOp` of one variable points to the `DeclareOp` of 385another variable (e.g. a => b). 386 387### Namelists 388 389FIR does not seem to have a way to extract information about namelists. 390 391``` 392namelist /abc/ x3, y3 393 394(gdb) p abc 395$1 = ( x3 = 100, y3 = 500 ) 396(gdb) p x3 397$2 = 100 398(gdb) p y3 399$3 = 500 400``` 401 402Even without namelist support, we should be able to see the value of the 403individual variables like `x3` and `y3` in the above example. But we would not 404be able to evaluate the namelist and have the debugger prints the value of all 405the variables in it as shown above for `abc`. 406 407## Missing metadata in MLIR 408 409Some metadata types that are needed for fortran are present in LLVM IR but are 410absent from MLIR. A non comprehensive list is given below. 411 4121. `DICommonBlockAttr` 4132. `DIGenericSubrangeAttr` 4143. `DISubrangeAttr` in MLIR takes IntegerAttr at the moment so only works 415with fixed sizes arrays. It needs to also accept `DIExpressionAttr` or 416`DILocalVariableAttr` to support assumed shape and adjustable arrays. 4174. The `DICompositeTypeAttr` will need to have field for `datalocation`, 418`rank`, `allocated` and `associated`. 4195. `DIStringTypeAttr` 420 421# Testing 422 423- LLVM LIT tests will be added to test: 424 - the driver and ensure that it passes the line table and full debug 425 info generation appropriately. 426 - that the pass works as expected and generates debug info. Test will be 427 with `fir-opt`. 428 - with `flang -fc1` that end-to-end debug info generation works. 429- Manual external tests will be written to ensure that the following works 430 in debug tools 431 - Break at lines. 432 - Break at functions. 433 - print type (ptype) of function names. 434 - print values and types (ptype) of various type of variables 435- Manually run `GDB`'s gdb.fortran testsuite with llvm-flang. 436 437# Resources 438- [1] https://dwarfstd.org/doc/DWARF5.pdf 439- [2] https://llvm.org/docs/LangRef.html#metadata 440- [3] https://archive.fosdem.org/2022/schedule/event/llvm_fortran_debug/ 441- [4] https://github.com/llvm/llvm-project/blob/main/mlir/lib/Target/LLVMIR/DebugTranslation.cpp 442- [5] https://github.com/llvm/llvm-project/pull/84202 443