1# Instruction referencing for debug info 2 3This document explains how LLVM uses value tracking, or instruction 4referencing, to determine variable locations for debug info in the code 5generation stage of compilation. This content is aimed at those working on code 6generation targets and optimisation passes. It may also be of interest to anyone 7curious about low-level debug info handling. 8 9# Problem statement 10 11At the end of compilation, LLVM must produce a DWARF location list (or similar) 12describing what register or stack location a variable can be found in, for each 13instruction in that variable's lexical scope. We could track the virtual 14register that the variable resides in through compilation, however this is 15vulnerable to register optimisations during regalloc, and instruction 16movements. 17 18# Solution: instruction referencing 19 20Rather than identify the virtual register that a variable value resides in, 21instead in instruction referencing mode, LLVM refers to the machine instruction 22and operand position that the value is defined in. Consider the LLVM IR way of 23referring to instruction values: 24 25```llvm 26%2 = add i32 %0, %1 27 #dbg_value(metadata i32 %2, 28``` 29 30In LLVM IR, the IR Value is synonymous with the instruction that computes the 31value, to the extent that in memory a Value is a pointer to the computing 32instruction. Instruction referencing implements this relationship in the 33codegen backend of LLVM, after instruction selection. Consider the X86 assembly 34below and instruction referencing debug info, corresponding to the earlier 35LLVM IR: 36 37```text 38%2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1 39DBG_INSTR_REF 1, 0, !123, !456, debug-location !789 40``` 41 42While the function remains in SSA form, virtual register `%2` is sufficient to 43identify the value computed by the instruction -- however the function 44eventually leaves SSA form, and register optimisations will obscure which 45register the desired value is in. Instead, a more consistent way of identifying 46the instruction's value is to refer to the `MachineOperand` where the value is 47defined: independently of which register is defined by that `MachineOperand`. In 48the code above, the `DBG_INSTR_REF` instruction refers to instruction number 49one, operand zero, while the `ADD32rr` has a `debug-instr-number` attribute 50attached indicating that it is instruction number one. 51 52De-coupling variable locations from registers avoids difficulties involving 53register allocation and optimisation, but requires additional instrumentation 54when the instructions are optimised instead. Optimisations that replace 55instructions with optimised versions that compute the same value must either 56preserve the instruction number, or record a substitution from the old 57instruction / operand number pair to the new instruction / operand pair -- see 58`MachineFunction::substituteDebugValuesForInst`. If debug info maintenance is 59not performed, or an instruction is eliminated as dead code, the variable 60location is safely dropped and marked "optimised out". The exception is 61instructions that are mutated rather than replaced, which always need debug info 62maintenance. 63 64# Register allocator considerations 65 66When the register allocator runs, debugging instructions do not directly refer 67to any virtual registers, and thus there is no need for expensive location 68maintenance during regalloc (i.e. `LiveDebugVariables`). Debug instructions are 69unlinked from the function, then linked back in after register allocation 70completes. 71 72The exception is `PHI` instructions: these become implicit definitions at 73control flow merges once regalloc finishes, and any debug numbers attached to 74`PHI` instructions are lost. To circumvent this, debug numbers of `PHI`s are 75recorded at the start of register allocation (`phi-node-elimination`), then 76`DBG_PHI` instructions are inserted after regalloc finishes. This requires some 77maintenance of which register a variable is located in during regalloc, but at 78single positions (block entry points) rather than ranges of instructions. 79 80An example, before regalloc: 81 82```text 83bb.2: 84 %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1 85``` 86 87After: 88 89```text 90bb.2: 91 DBG_PHI $rax, 1 92``` 93 94# `LiveDebugValues` 95 96After optimisations and code layout complete, information about variable 97values must be translated into variable locations, i.e. registers and stack 98slots. This is performed in the [`LiveDebugValues` pass][LiveDebugValues], where 99the debug instructions and machine code are separated out into two independent 100functions: 101 * One that assigns values to variable names, 102 * One that assigns values to machine registers and stack slots. 103 104LLVM's existing SSA tools are used to place `PHI`s for each function, between 105variable values and the values contained in machine locations, with value 106propagation eliminating any unnecessary `PHI`s. The two can then be joined up 107to map variables to values, then values to locations, for each instruction in 108the function. 109 110Key to this process is being able to identify the movement of values between 111registers and stack locations, so that the location of values can be preserved 112for the full time that they are resident in the machine. 113 114# Required target support and transition guide 115 116Instruction referencing will work on any target, but likely with poor coverage. 117Supporting instruction referencing well requires: 118 * Target hooks to be implemented to allow `LiveDebugValues` to follow values 119 through the machine, 120 * Target-specific optimisations to be instrumented, to preserve instruction 121 numbers. 122 123## Target hooks 124 125`TargetInstrInfo::isCopyInstrImpl` must be implemented to recognise any 126instructions that are copy-like -- `LiveDebugValues` uses this to identify when 127values move between registers. 128 129`TargetInstrInfo::isLoadFromStackSlotPostFE` and 130`TargetInstrInfo::isStoreToStackSlotPostFE` are needed to identify spill and 131restore instructions. Each should return the destination or source register 132respectively. `LiveDebugValues` will track the movement of a value from / to 133the stack slot. In addition, any instruction that writes to a stack spill 134should have a `MachineMemoryOperand` attached, so that `LiveDebugValues` can 135recognise that a slot has been clobbered. 136 137## Target-specific optimisation instrumentation 138 139Optimisations come in two flavours: those that mutate a `MachineInstr` to make 140it do something different, and those that create a new instruction to replace 141the operation of the old. 142 143The former _must_ be instrumented -- the relevant question is whether any 144register def in any operand will produce a different value, as a result of the 145mutation. If the answer is yes, then there is a risk that a `DBG_INSTR_REF` 146instruction referring to that operand will end up assigning the different 147value to a variable, presenting the debugging developer with an unexpected 148variable value. In such scenarios, call `MachineInstr::dropDebugNumber()` on the 149mutated instruction to erase its instruction number. Any `DBG_INSTR_REF` 150referring to it will produce an empty variable location instead, that appears 151as "optimised out" in the debugger. 152 153For the latter flavour of optimisation, to increase coverage you should record 154an instruction number substitution: a mapping from the old instruction number / 155operand pair to new instruction number / operand pair. Consider if we replace 156a three-address add instruction with a two-address add: 157 158```text 159%2:gr32 = ADD32rr %0, %1, debug-instr-number 1 160``` 161 162becomes 163 164```text 165%2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2 166``` 167 168With a substitution from "instruction number 1 operand 0" to "instruction number 1692 operand 0" recorded in the `MachineFunction`. In `LiveDebugValues`, 170`DBG_INSTR_REF`s will be mapped through the substitution table to find the most 171recent instruction number / operand number of the value it refers to. 172 173Use `MachineFunction::substituteDebugValuesForInst` to automatically produce 174substitutions between an old and new instruction. It assumes that any operand 175that is a def in the old instruction is a def in the new instruction at the 176same operand position. This works most of the time, for example in the example 177above. 178 179If operand numbers do not line up between the old and new instruction, use 180`MachineInstr::getDebugInstrNum` to acquire the instruction number for the new 181instruction, and `MachineFunction::makeDebugValueSubstitution` to record the 182mapping between register definitions in the old and new instructions. If some 183values computed by the old instruction are no longer computed by the new 184instruction, record no substitution -- `LiveDebugValues` will safely drop the 185now unavailable variable value. 186 187Should your target clone instructions, much the same as the `TailDuplicator` 188optimisation pass, do not attempt to preserve the instruction numbers or 189record any substitutions. `MachineFunction::CloneMachineInstr` should drop the 190instruction number of any cloned instruction, to avoid duplicate numbers 191appearing to `LiveDebugValues`. Dealing with duplicated instructions is a 192natural extension to instruction referencing that's currently unimplemented. 193 194[LiveDebugValues]: project:SourceLevelDebugging.rst#LiveDebugValues expansion of variable locations 195