xref: /llvm-project/llvm/docs/InstrRefDebugInfo.md (revision 400d4fd7b6dea9c7cdd255bb804fcd0ee77f6d42)
17a1d5ef7SJ. Ryan Stinnett# Instruction referencing for debug info
27a1d5ef7SJ. Ryan Stinnett
37a1d5ef7SJ. Ryan StinnettThis document explains how LLVM uses value tracking, or instruction
47a1d5ef7SJ. Ryan Stinnettreferencing, to determine variable locations for debug info in the code
57a1d5ef7SJ. Ryan Stinnettgeneration stage of compilation. This content is aimed at those working on code
67a1d5ef7SJ. Ryan Stinnettgeneration targets and optimisation passes. It may also be of interest to anyone
77a1d5ef7SJ. Ryan Stinnettcurious about low-level debug info handling.
87a1d5ef7SJ. Ryan Stinnett
97a1d5ef7SJ. Ryan Stinnett# Problem statement
107a1d5ef7SJ. Ryan Stinnett
117a1d5ef7SJ. Ryan StinnettAt the end of compilation, LLVM must produce a DWARF location list (or similar)
127a1d5ef7SJ. Ryan Stinnettdescribing what register or stack location a variable can be found in, for each
137a1d5ef7SJ. Ryan Stinnettinstruction in that variable's lexical scope. We could track the virtual
147a1d5ef7SJ. Ryan Stinnettregister that the variable resides in through compilation, however this is
157a1d5ef7SJ. Ryan Stinnettvulnerable to register optimisations during regalloc, and instruction
167a1d5ef7SJ. Ryan Stinnettmovements.
177a1d5ef7SJ. Ryan Stinnett
187a1d5ef7SJ. Ryan Stinnett# Solution: instruction referencing
197a1d5ef7SJ. Ryan Stinnett
207a1d5ef7SJ. Ryan StinnettRather than identify the virtual register that a variable value resides in,
217a1d5ef7SJ. Ryan Stinnettinstead in instruction referencing mode, LLVM refers to the machine instruction
227a1d5ef7SJ. Ryan Stinnettand operand position that the value is defined in. Consider the LLVM IR way of
237a1d5ef7SJ. Ryan Stinnettreferring to instruction values:
247a1d5ef7SJ. Ryan Stinnett
25b878245aSJ. Ryan Stinnett```llvm
267a1d5ef7SJ. Ryan Stinnett%2 = add i32 %0, %1
27*400d4fd7SStephen Tozer  #dbg_value(metadata i32 %2,
28b878245aSJ. Ryan Stinnett```
297a1d5ef7SJ. Ryan Stinnett
307a1d5ef7SJ. Ryan StinnettIn LLVM IR, the IR Value is synonymous with the instruction that computes the
317a1d5ef7SJ. Ryan Stinnettvalue, to the extent that in memory a Value is a pointer to the computing
327a1d5ef7SJ. Ryan Stinnettinstruction. Instruction referencing implements this relationship in the
337a1d5ef7SJ. Ryan Stinnettcodegen backend of LLVM, after instruction selection. Consider the X86 assembly
347a1d5ef7SJ. Ryan Stinnettbelow and instruction referencing debug info, corresponding to the earlier
357a1d5ef7SJ. Ryan StinnettLLVM IR:
367a1d5ef7SJ. Ryan Stinnett
37b878245aSJ. Ryan Stinnett```text
387a1d5ef7SJ. Ryan Stinnett%2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
397a1d5ef7SJ. Ryan StinnettDBG_INSTR_REF 1, 0, !123, !456, debug-location !789
40b878245aSJ. Ryan Stinnett```
417a1d5ef7SJ. Ryan Stinnett
42b878245aSJ. Ryan StinnettWhile the function remains in SSA form, virtual register `%2` is sufficient to
437a1d5ef7SJ. Ryan Stinnettidentify the value computed by the instruction -- however the function
447a1d5ef7SJ. Ryan Stinnetteventually leaves SSA form, and register optimisations will obscure which
457a1d5ef7SJ. Ryan Stinnettregister the desired value is in. Instead, a more consistent way of identifying
46b878245aSJ. Ryan Stinnettthe instruction's value is to refer to the `MachineOperand` where the value is
47b878245aSJ. Ryan Stinnettdefined: independently of which register is defined by that `MachineOperand`. In
48b878245aSJ. Ryan Stinnettthe code above, the `DBG_INSTR_REF` instruction refers to instruction number
49b878245aSJ. Ryan Stinnettone, operand zero, while the `ADD32rr` has a `debug-instr-number` attribute
50b878245aSJ. Ryan Stinnettattached indicating that it is instruction number one.
517a1d5ef7SJ. Ryan Stinnett
527a1d5ef7SJ. Ryan StinnettDe-coupling variable locations from registers avoids difficulties involving
537a1d5ef7SJ. Ryan Stinnettregister allocation and optimisation, but requires additional instrumentation
547a1d5ef7SJ. Ryan Stinnettwhen the instructions are optimised instead. Optimisations that replace
557a1d5ef7SJ. Ryan Stinnettinstructions with optimised versions that compute the same value must either
567a1d5ef7SJ. Ryan Stinnettpreserve the instruction number, or record a substitution from the old
577a1d5ef7SJ. Ryan Stinnettinstruction / operand number pair to the new instruction / operand pair -- see
58b878245aSJ. Ryan Stinnett`MachineFunction::substituteDebugValuesForInst`. If debug info maintenance is
59b878245aSJ. Ryan Stinnettnot performed, or an instruction is eliminated as dead code, the variable
60b878245aSJ. Ryan Stinnettlocation is safely dropped and marked "optimised out". The exception is
61b878245aSJ. Ryan Stinnettinstructions that are mutated rather than replaced, which always need debug info
627a1d5ef7SJ. Ryan Stinnettmaintenance.
637a1d5ef7SJ. Ryan Stinnett
647a1d5ef7SJ. Ryan Stinnett# Register allocator considerations
657a1d5ef7SJ. Ryan Stinnett
667a1d5ef7SJ. Ryan StinnettWhen the register allocator runs, debugging instructions do not directly refer
677a1d5ef7SJ. Ryan Stinnettto any virtual registers, and thus there is no need for expensive location
68b878245aSJ. Ryan Stinnettmaintenance during regalloc (i.e. `LiveDebugVariables`). Debug instructions are
697a1d5ef7SJ. Ryan Stinnettunlinked from the function, then linked back in after register allocation
707a1d5ef7SJ. Ryan Stinnettcompletes.
717a1d5ef7SJ. Ryan Stinnett
72b878245aSJ. Ryan StinnettThe exception is `PHI` instructions: these become implicit definitions at
73b878245aSJ. Ryan Stinnettcontrol flow merges once regalloc finishes, and any debug numbers attached to
74b878245aSJ. Ryan Stinnett`PHI` instructions are lost. To circumvent this, debug numbers of `PHI`s are
75b878245aSJ. Ryan Stinnettrecorded at the start of register allocation (`phi-node-elimination`), then
76b878245aSJ. Ryan Stinnett`DBG_PHI` instructions are inserted after regalloc finishes. This requires some
777a1d5ef7SJ. Ryan Stinnettmaintenance of which register a variable is located in during regalloc, but at
787a1d5ef7SJ. Ryan Stinnettsingle positions (block entry points) rather than ranges of instructions.
797a1d5ef7SJ. Ryan Stinnett
807a1d5ef7SJ. Ryan StinnettAn example, before regalloc:
817a1d5ef7SJ. Ryan Stinnett
82b878245aSJ. Ryan Stinnett```text
837a1d5ef7SJ. Ryan Stinnettbb.2:
847a1d5ef7SJ. Ryan Stinnett  %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
85b878245aSJ. Ryan Stinnett```
867a1d5ef7SJ. Ryan Stinnett
877a1d5ef7SJ. Ryan StinnettAfter:
887a1d5ef7SJ. Ryan Stinnett
89b878245aSJ. Ryan Stinnett```text
907a1d5ef7SJ. Ryan Stinnettbb.2:
917a1d5ef7SJ. Ryan Stinnett  DBG_PHI $rax, 1
92b878245aSJ. Ryan Stinnett```
937a1d5ef7SJ. Ryan Stinnett
94b878245aSJ. Ryan Stinnett# `LiveDebugValues`
957a1d5ef7SJ. Ryan Stinnett
967a1d5ef7SJ. Ryan StinnettAfter optimisations and code layout complete, information about variable
977a1d5ef7SJ. Ryan Stinnettvalues must be translated into variable locations, i.e. registers and stack
98a225d897SJ. Ryan Stinnettslots. This is performed in the [`LiveDebugValues` pass][LiveDebugValues], where
997a1d5ef7SJ. Ryan Stinnettthe debug instructions and machine code are separated out into two independent
1007a1d5ef7SJ. Ryan Stinnettfunctions:
1017a1d5ef7SJ. Ryan Stinnett * One that assigns values to variable names,
1027a1d5ef7SJ. Ryan Stinnett * One that assigns values to machine registers and stack slots.
1037a1d5ef7SJ. Ryan Stinnett
104b878245aSJ. Ryan StinnettLLVM's existing SSA tools are used to place `PHI`s for each function, between
1057a1d5ef7SJ. Ryan Stinnettvariable values and the values contained in machine locations, with value
106b878245aSJ. Ryan Stinnettpropagation eliminating any unnecessary `PHI`s. The two can then be joined up
1077a1d5ef7SJ. Ryan Stinnettto map variables to values, then values to locations, for each instruction in
1087a1d5ef7SJ. Ryan Stinnettthe function.
1097a1d5ef7SJ. Ryan Stinnett
1107a1d5ef7SJ. Ryan StinnettKey to this process is being able to identify the movement of values between
1117a1d5ef7SJ. Ryan Stinnettregisters and stack locations, so that the location of values can be preserved
1127a1d5ef7SJ. Ryan Stinnettfor the full time that they are resident in the machine.
1137a1d5ef7SJ. Ryan Stinnett
1147a1d5ef7SJ. Ryan Stinnett# Required target support and transition guide
1157a1d5ef7SJ. Ryan Stinnett
1167a1d5ef7SJ. Ryan StinnettInstruction referencing will work on any target, but likely with poor coverage.
1177a1d5ef7SJ. Ryan StinnettSupporting instruction referencing well requires:
118b878245aSJ. Ryan Stinnett * Target hooks to be implemented to allow `LiveDebugValues` to follow values
119b878245aSJ. Ryan Stinnett   through the machine,
120b878245aSJ. Ryan Stinnett * Target-specific optimisations to be instrumented, to preserve instruction
121b878245aSJ. Ryan Stinnett   numbers.
1227a1d5ef7SJ. Ryan Stinnett
1237a1d5ef7SJ. Ryan Stinnett## Target hooks
1247a1d5ef7SJ. Ryan Stinnett
125b878245aSJ. Ryan Stinnett`TargetInstrInfo::isCopyInstrImpl` must be implemented to recognise any
126b878245aSJ. Ryan Stinnettinstructions that are copy-like -- `LiveDebugValues` uses this to identify when
1277a1d5ef7SJ. Ryan Stinnettvalues move between registers.
1287a1d5ef7SJ. Ryan Stinnett
129b878245aSJ. Ryan Stinnett`TargetInstrInfo::isLoadFromStackSlotPostFE` and
130b878245aSJ. Ryan Stinnett`TargetInstrInfo::isStoreToStackSlotPostFE` are needed to identify spill and
1317a1d5ef7SJ. Ryan Stinnettrestore instructions. Each should return the destination or source register
132b878245aSJ. Ryan Stinnettrespectively. `LiveDebugValues` will track the movement of a value from / to
1337a1d5ef7SJ. Ryan Stinnettthe stack slot. In addition, any instruction that writes to a stack spill
134b878245aSJ. Ryan Stinnettshould have a `MachineMemoryOperand` attached, so that `LiveDebugValues` can
1357a1d5ef7SJ. Ryan Stinnettrecognise that a slot has been clobbered.
1367a1d5ef7SJ. Ryan Stinnett
1377a1d5ef7SJ. Ryan Stinnett## Target-specific optimisation instrumentation
1387a1d5ef7SJ. Ryan Stinnett
139b878245aSJ. Ryan StinnettOptimisations come in two flavours: those that mutate a `MachineInstr` to make
1407a1d5ef7SJ. Ryan Stinnettit do something different, and those that create a new instruction to replace
1417a1d5ef7SJ. Ryan Stinnettthe operation of the old.
1427a1d5ef7SJ. Ryan Stinnett
1437a1d5ef7SJ. Ryan StinnettThe former _must_ be instrumented -- the relevant question is whether any
1447a1d5ef7SJ. Ryan Stinnettregister def in any operand will produce a different value, as a result of the
145b878245aSJ. Ryan Stinnettmutation. If the answer is yes, then there is a risk that a `DBG_INSTR_REF`
1467a1d5ef7SJ. Ryan Stinnettinstruction referring to that operand will end up assigning the different
1477a1d5ef7SJ. Ryan Stinnettvalue to a variable, presenting the debugging developer with an unexpected
148b878245aSJ. Ryan Stinnettvariable value. In such scenarios, call `MachineInstr::dropDebugNumber()` on the
149b878245aSJ. Ryan Stinnettmutated instruction to erase its instruction number. Any `DBG_INSTR_REF`
1507a1d5ef7SJ. Ryan Stinnettreferring to it will produce an empty variable location instead, that appears
1517a1d5ef7SJ. Ryan Stinnettas "optimised out" in the debugger.
1527a1d5ef7SJ. Ryan Stinnett
1537a1d5ef7SJ. Ryan StinnettFor the latter flavour of optimisation, to increase coverage you should record
1547a1d5ef7SJ. Ryan Stinnettan instruction number substitution: a mapping from the old instruction number /
1557a1d5ef7SJ. Ryan Stinnettoperand pair to new instruction number / operand pair. Consider if we replace
1567a1d5ef7SJ. Ryan Stinnetta three-address add instruction with a two-address add:
1577a1d5ef7SJ. Ryan Stinnett
158b878245aSJ. Ryan Stinnett```text
1597a1d5ef7SJ. Ryan Stinnett%2:gr32 = ADD32rr %0, %1, debug-instr-number 1
160b878245aSJ. Ryan Stinnett```
1617a1d5ef7SJ. Ryan Stinnett
1627a1d5ef7SJ. Ryan Stinnettbecomes
1637a1d5ef7SJ. Ryan Stinnett
164b878245aSJ. Ryan Stinnett```text
1657a1d5ef7SJ. Ryan Stinnett%2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
166b878245aSJ. Ryan Stinnett```
1677a1d5ef7SJ. Ryan Stinnett
1687a1d5ef7SJ. Ryan StinnettWith a substitution from "instruction number 1 operand 0" to "instruction number
169b878245aSJ. Ryan Stinnett2 operand 0" recorded in the `MachineFunction`. In `LiveDebugValues`,
170b878245aSJ. Ryan Stinnett`DBG_INSTR_REF`s will be mapped through the substitution table to find the most
171b878245aSJ. Ryan Stinnettrecent instruction number / operand number of the value it refers to.
1727a1d5ef7SJ. Ryan Stinnett
173b878245aSJ. Ryan StinnettUse `MachineFunction::substituteDebugValuesForInst` to automatically produce
1747a1d5ef7SJ. Ryan Stinnettsubstitutions between an old and new instruction. It assumes that any operand
1757a1d5ef7SJ. Ryan Stinnettthat is a def in the old instruction is a def in the new instruction at the
1767a1d5ef7SJ. Ryan Stinnettsame operand position. This works most of the time, for example in the example
1777a1d5ef7SJ. Ryan Stinnettabove.
1787a1d5ef7SJ. Ryan Stinnett
1797a1d5ef7SJ. Ryan StinnettIf operand numbers do not line up between the old and new instruction, use
180b878245aSJ. Ryan Stinnett`MachineInstr::getDebugInstrNum` to acquire the instruction number for the new
181b878245aSJ. Ryan Stinnettinstruction, and `MachineFunction::makeDebugValueSubstitution` to record the
1827a1d5ef7SJ. Ryan Stinnettmapping between register definitions in the old and new instructions. If some
1837a1d5ef7SJ. Ryan Stinnettvalues computed by the old instruction are no longer computed by the new
184b878245aSJ. Ryan Stinnettinstruction, record no substitution -- `LiveDebugValues` will safely drop the
1857a1d5ef7SJ. Ryan Stinnettnow unavailable variable value.
1867a1d5ef7SJ. Ryan Stinnett
187b878245aSJ. Ryan StinnettShould your target clone instructions, much the same as the `TailDuplicator`
1887a1d5ef7SJ. Ryan Stinnettoptimisation pass, do not attempt to preserve the instruction numbers or
189b878245aSJ. Ryan Stinnettrecord any substitutions. `MachineFunction::CloneMachineInstr` should drop the
1907a1d5ef7SJ. Ryan Stinnettinstruction number of any cloned instruction, to avoid duplicate numbers
191b878245aSJ. Ryan Stinnettappearing to `LiveDebugValues`. Dealing with duplicated instructions is a
1927a1d5ef7SJ. Ryan Stinnettnatural extension to instruction referencing that's currently unimplemented.
1937a1d5ef7SJ. Ryan Stinnett
194d97947e1SAiden Grossman[LiveDebugValues]: project:SourceLevelDebugging.rst#LiveDebugValues expansion of variable locations
195