1# Debug Info Assignment Tracking 2 3Assignment Tracking is an alternative technique for tracking variable location 4debug info through optimisations in LLVM. It provides accurate variable 5locations for assignments where a local variable (or a field of one) is the 6LHS. In rare and complicated circumstances indirect assignments might be 7optimized away without being tracked, but otherwise we make our best effort to 8track all variable locations. 9 10The core idea is to track more information about source assignments in order 11and preserve enough information to be able to defer decisions about whether to 12use non-memory locations (register, constant) or memory locations until after 13middle end optimisations have run. This is in opposition to using 14`#dbg_declare` and `#dbg_value`, which is to make the decision for most 15variables early on, which can result in suboptimal variable locations that may 16be either incorrect or incomplete. 17 18A secondary goal of assignment tracking is to cause minimal additional work for 19LLVM pass writers, and minimal disruption to LLVM in general. 20 21## Status and usage 22 23**Status**: Enabled by default in Clang but disabled under some circumstances 24(which can be overridden with the `forced` option, see below). `opt` will not 25run the pass unless asked (`-passes=declare-to-assign`). 26 27**Flag**: 28`-Xclang -fexperimental-assignment-tracking=<disabled|enabled|forced>` 29 30When enabled Clang gets LLVM to run the pass `declare-to-assign`. The pass 31converts conventional debug records to assignment tracking metadata and sets 32the module flag `debug-info-assignment-tracking` to the value `i1 true`. To 33check whether assignment tracking is enabled for a module call 34`isAssignmentTrackingEnabled(const Module &M)` (from `llvm/IR/DebugInfo.h`). 35 36## Design and implementation 37 38### Assignment markers: `#dbg_assign` 39 40`#dbg_value`, a conventional debug record, marks out a position in the 41IR where a variable takes a particular value. Similarly, Assignment Tracking 42marks out the position of assignments with a record called `#dbg_assign`. 43 44In order to know where in IR it is appropriate to use a memory location for a 45variable, each assignment marker must in some way refer to the store, if any 46(or multiple!), that performs the assignment. That way, the position of the 47store and marker can be considered together when making that choice. Another 48important benefit of referring to the store is that we can then build a two-way 49mapping of stores<->markers that can be used to find markers that need to be 50updated when stores are modified. 51 52An `#dbg_assign` marker that is not linked to any instruction signals that 53the store that performed the assignment has been optimised out, and therefore 54the memory location will not be valid for at least some part of the program. 55 56Here's the `#dbg_assign` signature. `Value *` type parameters are first wrapped 57in `ValueAsMetadata`: 58 59``` 60 #dbg_assign(Value *Value, 61 DIExpression *ValueExpression, 62 DILocalVariable *Variable, 63 DIAssignID *ID, 64 Value *Address, 65 DIExpression *AddressExpression) 66``` 67 68The first three parameters look and behave like an `#dbg_value`. `ID` is a 69reference to a store (see next section). `Address` is the destination address 70of the store and it is modified by `AddressExpression`. An empty/undef/poison 71address means the address component has been killed (the memory address is no 72longer a valid location). LLVM currently encodes variable fragment information 73in `DIExpression`s, so as an implementation quirk the `FragmentInfo` for 74`Variable` is contained within `ValueExpression` only. 75 76### Instruction link: `DIAssignID` 77 78`DIAssignID` metadata is the mechanism that is currently used to encode the 79store<->marker link. The metadata node has no operands and all instances are 80`distinct`; equality is checked for by comparing addresses. 81 82`#dbg_assign` records use a `DIAssignID` metadata node instance as an 83operand. This way it refers to any store-like instruction that has the same 84`DIAssignID` attachment. E.g. For this test.cpp, 85 86``` 87int fun(int a) { 88 return a; 89} 90``` 91compiled without optimisations: 92``` 93$ clang++ test.cpp -o test.ll -emit-llvm -S -g -O0 -Xclang -fexperimental-assignment-tracking=enabled 94``` 95we get: 96``` 97define dso_local noundef i32 @_Z3funi(i32 noundef %a) #0 !dbg !8 { 98entry: 99 %a.addr = alloca i32, align 4, !DIAssignID !13 100 #dbg_assign(i1 undef, !14, !DIExpression(), !13, i32* %a.addr, !DIExpression(), !15) 101 store i32 %a, i32* %a.addr, align 4, !DIAssignID !16 102 #dbg_assign(i32 %a, !14, !DIExpression(), !16, i32* %a.addr, !DIExpression(), !15) 103 %0 = load i32, i32* %a.addr, align 4, !dbg !17 104 ret i32 %0, !dbg !18 105} 106 107... 108!13 = distinct !DIAssignID() 109!14 = !DILocalVariable(name: "a", ...) 110... 111!16 = distinct !DIAssignID() 112``` 113 114The first `#dbg_assign` refers to the `alloca` through `!DIAssignID !13`, 115and the second refers to the `store` through `!DIAssignID !16`. 116 117### Store-like instructions 118 119In the absence of a linked `#dbg_assign`, a store to an address that is 120known to be the backing storage for a variable is considered to represent an 121assignment to that variable. 122 123This gives us a safe fall-back in cases where `#dbg_assign` records have 124been deleted, the `DIAssignID` attachment on the store has been dropped, or the 125optimiser has made a once-indirect store (not tracked with Assignment Tracking) 126direct. 127 128### Middle-end: Considerations for pass-writers 129 130#### Non-debug instruction updates 131 132**Cloning** an instruction: nothing new to do. Cloning automatically clones a 133`DIAssignID` attachment. Multiple instructions may have the same `DIAssignID` 134instruction. In this case, the assignment is considered to take place in 135multiple positions in the program. 136 137**Moving** a non-debug instruction: nothing new to do. Instructions linked to a 138`#dbg_assign` have their initial IR position marked by the position of the 139`#dbg_assign`. 140 141**Deleting** a non-debug instruction: nothing new to do. Simple DSE does not 142require any change; it’s safe to delete an instruction with a `DIAssignID` 143attachment. A `#dbg_assign` that uses a `DIAssignID` that is not attached 144to any instruction indicates that the memory location isn’t valid. 145 146**Merging** stores: In many cases no change is required as `DIAssignID` 147attachments are automatically merged if `combineMetadata` is called. One way or 148another, the `DIAssignID` attachments must be merged such that new store 149becomes linked to all the `#dbg_assign` records that the merged stores 150were linked to. This can be achieved simply by calling a helper function 151`Instruction::mergeDIAssignID`. 152 153**Inlining** stores: As stores are inlined we generate `#dbg_assign` 154records and `DIAssignID` attachments as if the stores represent source 155assignments, just like the in frontend. This isn’t perfect, as stores may have 156been moved, modified or deleted before inlining, but it does at least keep the 157information about the variable correct within the non-inlined scope. 158 159**Splitting** stores: SROA and passes that split stores treat `#dbg_assign` 160records similarly to `#dbg_declare` records. Clone the 161`#dbg_assign` records linked to the store, update the FragmentInfo in 162the `ValueExpression`, and give the split stores (and cloned records) new 163`DIAssignID` attachments each. In other words, treat the split stores as 164separate assignments. For partial DSE (e.g. shortening a memset), we do the 165same except that `#dbg_assign` for the dead fragment gets an `Undef` 166`Address`. 167 168**Promoting** allocas and store/loads: `#dbg_assign` records implicitly 169describe joined values in memory locations at CFG joins, but this is not 170necessarily the case after promoting (or partially promoting) the 171variable. Passes that promote variables are responsible for inserting 172`#dbg_assign` records after the resultant PHIs generated during 173promotion. `mem2reg` already has to do this (with `#dbg_value`) for 174`#dbg_declare`s. Where a store has no linked record, the store is 175assumed to represent an assignment for variables stored at the destination 176address. 177 178#### Debug record updates 179 180**Moving** a debug record: avoid moving `#dbg_assign` records where 181possible, as they represent a source-level assignment, whose position in the 182program should not be affected by optimization passes. 183 184**Deleting** a debug record: Nothing new to do. Just like for conventional 185debug records, unless it is unreachable, it’s almost always incorrect to 186delete a `#dbg_assign` record. 187 188### Lowering `#dbg_assign` to MIR 189 190To begin with only SelectionDAG ISel will be supported. `#dbg_assign` 191records are lowered to MIR `DBG_INSTR_REF` instructions. Before this happens 192we need to decide where it is appropriate to use memory locations and where we 193must use a non-memory location (or no location) for each variable. In order to 194make those decisions we run a standard fixed-point dataflow analysis that makes 195the choice at each instruction, iteratively joining the results for each block. 196 197### TODO list 198 199Outstanding improvements: 200 201* As mentioned in test llvm/test/DebugInfo/assignment-tracking/X86/diamond-3.ll, 202 the analysis should treat escaping calls like untagged stores. 203 204* The system expects locals to be backed by a local alloca. This isn't always 205 the case - sometimes a pointer to storage is passed into a function 206 (e.g. sret, byval). We need to be able to handle those cases. See 207 llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and 208 clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for examples. 209 210* `trackAssignments` doesn't yet work for variables that have their 211 `#dbg_declare` location modified by a `DIExpression`, e.g. when the 212 address of the variable is itself stored in an `alloca` with the 213 `#dbg_declare` using `DIExpression(DW_OP_deref)`. See `indirectReturn` in 214 llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and in 215 clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for an 216 example. 217 218* In order to solve the first bullet-point we need to be able to specify that a 219 memory location is available without using a `DIAssignID`. This is because 220 the storage address is not computed by an instruction (it's an argument 221 value) and therefore we have nowhere to put the metadata attachment. To solve 222 this we probably need another marker record to denote "the variable's 223 stack home is X address" - similar to `#dbg_declare` except that it needs 224 to compose with `#dbg_assign` records such that the stack home address 225 is only selected as a location for the variable when the `#dbg_assign` 226 records agree it should be. 227 228* Given the above (a special "the stack home is X" record), and the fact 229 that we can only track assignments with fixed offsets and sizes, I think we 230 can probably get rid of the address and address-expression part, since it 231 will always be computable with the info we have. 232 233* Assignment tracking is disabled by default for LTO and thinLTO builds, and 234 if LLDB debugger tuning has been specified. We should remove these 235 restrictions. See EmitAssemblyHelper::RunOptimizationPipeline in 236 clang/lib/CodeGen/BackendUtil.cpp. 237