xref: /llvm-project/llvm/docs/InstrRefDebugInfo.md (revision 400d4fd7b6dea9c7cdd255bb804fcd0ee77f6d42)
1# Instruction referencing for debug info
2
3This document explains how LLVM uses value tracking, or instruction
4referencing, to determine variable locations for debug info in the code
5generation stage of compilation. This content is aimed at those working on code
6generation targets and optimisation passes. It may also be of interest to anyone
7curious about low-level debug info handling.
8
9# Problem statement
10
11At the end of compilation, LLVM must produce a DWARF location list (or similar)
12describing what register or stack location a variable can be found in, for each
13instruction in that variable's lexical scope. We could track the virtual
14register that the variable resides in through compilation, however this is
15vulnerable to register optimisations during regalloc, and instruction
16movements.
17
18# Solution: instruction referencing
19
20Rather than identify the virtual register that a variable value resides in,
21instead in instruction referencing mode, LLVM refers to the machine instruction
22and operand position that the value is defined in. Consider the LLVM IR way of
23referring to instruction values:
24
25```llvm
26%2 = add i32 %0, %1
27  #dbg_value(metadata i32 %2,
28```
29
30In LLVM IR, the IR Value is synonymous with the instruction that computes the
31value, to the extent that in memory a Value is a pointer to the computing
32instruction. Instruction referencing implements this relationship in the
33codegen backend of LLVM, after instruction selection. Consider the X86 assembly
34below and instruction referencing debug info, corresponding to the earlier
35LLVM IR:
36
37```text
38%2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
39DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
40```
41
42While the function remains in SSA form, virtual register `%2` is sufficient to
43identify the value computed by the instruction -- however the function
44eventually leaves SSA form, and register optimisations will obscure which
45register the desired value is in. Instead, a more consistent way of identifying
46the instruction's value is to refer to the `MachineOperand` where the value is
47defined: independently of which register is defined by that `MachineOperand`. In
48the code above, the `DBG_INSTR_REF` instruction refers to instruction number
49one, operand zero, while the `ADD32rr` has a `debug-instr-number` attribute
50attached indicating that it is instruction number one.
51
52De-coupling variable locations from registers avoids difficulties involving
53register allocation and optimisation, but requires additional instrumentation
54when the instructions are optimised instead. Optimisations that replace
55instructions with optimised versions that compute the same value must either
56preserve the instruction number, or record a substitution from the old
57instruction / operand number pair to the new instruction / operand pair -- see
58`MachineFunction::substituteDebugValuesForInst`. If debug info maintenance is
59not performed, or an instruction is eliminated as dead code, the variable
60location is safely dropped and marked "optimised out". The exception is
61instructions that are mutated rather than replaced, which always need debug info
62maintenance.
63
64# Register allocator considerations
65
66When the register allocator runs, debugging instructions do not directly refer
67to any virtual registers, and thus there is no need for expensive location
68maintenance during regalloc (i.e. `LiveDebugVariables`). Debug instructions are
69unlinked from the function, then linked back in after register allocation
70completes.
71
72The exception is `PHI` instructions: these become implicit definitions at
73control flow merges once regalloc finishes, and any debug numbers attached to
74`PHI` instructions are lost. To circumvent this, debug numbers of `PHI`s are
75recorded at the start of register allocation (`phi-node-elimination`), then
76`DBG_PHI` instructions are inserted after regalloc finishes. This requires some
77maintenance of which register a variable is located in during regalloc, but at
78single positions (block entry points) rather than ranges of instructions.
79
80An example, before regalloc:
81
82```text
83bb.2:
84  %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
85```
86
87After:
88
89```text
90bb.2:
91  DBG_PHI $rax, 1
92```
93
94# `LiveDebugValues`
95
96After optimisations and code layout complete, information about variable
97values must be translated into variable locations, i.e. registers and stack
98slots. This is performed in the [`LiveDebugValues` pass][LiveDebugValues], where
99the debug instructions and machine code are separated out into two independent
100functions:
101 * One that assigns values to variable names,
102 * One that assigns values to machine registers and stack slots.
103
104LLVM's existing SSA tools are used to place `PHI`s for each function, between
105variable values and the values contained in machine locations, with value
106propagation eliminating any unnecessary `PHI`s. The two can then be joined up
107to map variables to values, then values to locations, for each instruction in
108the function.
109
110Key to this process is being able to identify the movement of values between
111registers and stack locations, so that the location of values can be preserved
112for the full time that they are resident in the machine.
113
114# Required target support and transition guide
115
116Instruction referencing will work on any target, but likely with poor coverage.
117Supporting instruction referencing well requires:
118 * Target hooks to be implemented to allow `LiveDebugValues` to follow values
119   through the machine,
120 * Target-specific optimisations to be instrumented, to preserve instruction
121   numbers.
122
123## Target hooks
124
125`TargetInstrInfo::isCopyInstrImpl` must be implemented to recognise any
126instructions that are copy-like -- `LiveDebugValues` uses this to identify when
127values move between registers.
128
129`TargetInstrInfo::isLoadFromStackSlotPostFE` and
130`TargetInstrInfo::isStoreToStackSlotPostFE` are needed to identify spill and
131restore instructions. Each should return the destination or source register
132respectively. `LiveDebugValues` will track the movement of a value from / to
133the stack slot. In addition, any instruction that writes to a stack spill
134should have a `MachineMemoryOperand` attached, so that `LiveDebugValues` can
135recognise that a slot has been clobbered.
136
137## Target-specific optimisation instrumentation
138
139Optimisations come in two flavours: those that mutate a `MachineInstr` to make
140it do something different, and those that create a new instruction to replace
141the operation of the old.
142
143The former _must_ be instrumented -- the relevant question is whether any
144register def in any operand will produce a different value, as a result of the
145mutation. If the answer is yes, then there is a risk that a `DBG_INSTR_REF`
146instruction referring to that operand will end up assigning the different
147value to a variable, presenting the debugging developer with an unexpected
148variable value. In such scenarios, call `MachineInstr::dropDebugNumber()` on the
149mutated instruction to erase its instruction number. Any `DBG_INSTR_REF`
150referring to it will produce an empty variable location instead, that appears
151as "optimised out" in the debugger.
152
153For the latter flavour of optimisation, to increase coverage you should record
154an instruction number substitution: a mapping from the old instruction number /
155operand pair to new instruction number / operand pair. Consider if we replace
156a three-address add instruction with a two-address add:
157
158```text
159%2:gr32 = ADD32rr %0, %1, debug-instr-number 1
160```
161
162becomes
163
164```text
165%2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
166```
167
168With a substitution from "instruction number 1 operand 0" to "instruction number
1692 operand 0" recorded in the `MachineFunction`. In `LiveDebugValues`,
170`DBG_INSTR_REF`s will be mapped through the substitution table to find the most
171recent instruction number / operand number of the value it refers to.
172
173Use `MachineFunction::substituteDebugValuesForInst` to automatically produce
174substitutions between an old and new instruction. It assumes that any operand
175that is a def in the old instruction is a def in the new instruction at the
176same operand position. This works most of the time, for example in the example
177above.
178
179If operand numbers do not line up between the old and new instruction, use
180`MachineInstr::getDebugInstrNum` to acquire the instruction number for the new
181instruction, and `MachineFunction::makeDebugValueSubstitution` to record the
182mapping between register definitions in the old and new instructions. If some
183values computed by the old instruction are no longer computed by the new
184instruction, record no substitution -- `LiveDebugValues` will safely drop the
185now unavailable variable value.
186
187Should your target clone instructions, much the same as the `TailDuplicator`
188optimisation pass, do not attempt to preserve the instruction numbers or
189record any substitutions. `MachineFunction::CloneMachineInstr` should drop the
190instruction number of any cloned instruction, to avoid duplicate numbers
191appearing to `LiveDebugValues`. Dealing with duplicated instructions is a
192natural extension to instruction referencing that's currently unimplemented.
193
194[LiveDebugValues]: project:SourceLevelDebugging.rst#LiveDebugValues expansion of variable locations
195