xref: /llvm-project/llvm/docs/AssignmentTracking.md (revision b6a4ab5a12c9ced0642769e4b2d8f77859541ba8)
1# Debug Info Assignment Tracking
2
3Assignment Tracking is an alternative technique for tracking variable location
4debug info through optimisations in LLVM. It provides accurate variable
5locations for assignments where a local variable (or a field of one) is the
6LHS. In rare and complicated circumstances indirect assignments might be
7optimized away without being tracked, but otherwise we make our best effort to
8track all variable locations.
9
10The core idea is to track more information about source assignments in order
11and preserve enough information to be able to defer decisions about whether to
12use non-memory locations (register, constant) or memory locations until after
13middle end optimisations have run. This is in opposition to using
14`#dbg_declare` and `#dbg_value`, which is to make the decision for most
15variables early on, which can result in suboptimal variable locations that may
16be either incorrect or incomplete.
17
18A secondary goal of assignment tracking is to cause minimal additional work for
19LLVM pass writers, and minimal disruption to LLVM in general.
20
21## Status and usage
22
23**Status**: Enabled by default in Clang but disabled under some circumstances
24(which can be overridden with the `forced` option, see below). `opt` will not
25run the pass unless asked (`-passes=declare-to-assign`).
26
27**Flag**:
28`-Xclang -fexperimental-assignment-tracking=<disabled|enabled|forced>`
29
30When enabled Clang gets LLVM to run the pass `declare-to-assign`. The pass
31converts conventional debug records to assignment tracking metadata and sets
32the module flag `debug-info-assignment-tracking` to the value `i1 true`. To
33check whether assignment tracking is enabled for a module call
34`isAssignmentTrackingEnabled(const Module &M)` (from `llvm/IR/DebugInfo.h`).
35
36## Design and implementation
37
38### Assignment markers: `#dbg_assign`
39
40`#dbg_value`, a conventional debug record, marks out a position in the
41IR where a variable takes a particular value. Similarly, Assignment Tracking
42marks out the position of assignments with a record called `#dbg_assign`.
43
44In order to know where in IR it is appropriate to use a memory location for a
45variable, each assignment marker must in some way refer to the store, if any
46(or multiple!), that performs the assignment. That way, the position of the
47store and marker can be considered together when making that choice. Another
48important benefit of referring to the store is that we can then build a two-way
49mapping of stores<->markers that can be used to find markers that need to be
50updated when stores are modified.
51
52An `#dbg_assign` marker that is not linked to any instruction signals that
53the store that performed the assignment has been optimised out, and therefore
54the memory location will not be valid for at least some part of the program.
55
56Here's the `#dbg_assign` signature. `Value *` type parameters are first wrapped
57in `ValueAsMetadata`:
58
59```
60  #dbg_assign(Value *Value,
61              DIExpression *ValueExpression,
62              DILocalVariable *Variable,
63              DIAssignID *ID,
64              Value *Address,
65              DIExpression *AddressExpression)
66```
67
68The first three parameters look and behave like an `#dbg_value`. `ID` is a
69reference to a store (see next section). `Address` is the destination address
70of the store and it is modified by `AddressExpression`. An empty/undef/poison
71address means the address component has been killed (the memory address is no
72longer a valid location). LLVM currently encodes variable fragment information
73in `DIExpression`s, so as an implementation quirk the `FragmentInfo` for
74`Variable` is contained within `ValueExpression` only.
75
76### Instruction link: `DIAssignID`
77
78`DIAssignID` metadata is the mechanism that is currently used to encode the
79store<->marker link. The metadata node has no operands and all instances are
80`distinct`; equality is checked for by comparing addresses.
81
82`#dbg_assign` records use a `DIAssignID` metadata node instance as an
83operand. This way it refers to any store-like instruction that has the same
84`DIAssignID` attachment. E.g. For this test.cpp,
85
86```
87int fun(int a) {
88  return a;
89}
90```
91compiled without optimisations:
92```
93$ clang++ test.cpp -o test.ll -emit-llvm -S -g -O0 -Xclang -fexperimental-assignment-tracking=enabled
94```
95we get:
96```
97define dso_local noundef i32 @_Z3funi(i32 noundef %a) #0 !dbg !8 {
98entry:
99  %a.addr = alloca i32, align 4, !DIAssignID !13
100    #dbg_assign(i1 undef, !14, !DIExpression(), !13, i32* %a.addr, !DIExpression(), !15)
101  store i32 %a, i32* %a.addr, align 4, !DIAssignID !16
102    #dbg_assign(i32 %a, !14, !DIExpression(), !16, i32* %a.addr, !DIExpression(), !15)
103  %0 = load i32, i32* %a.addr, align 4, !dbg !17
104  ret i32 %0, !dbg !18
105}
106
107...
108!13 = distinct !DIAssignID()
109!14 = !DILocalVariable(name: "a", ...)
110...
111!16 = distinct !DIAssignID()
112```
113
114The first `#dbg_assign` refers to the `alloca` through `!DIAssignID !13`,
115and the second refers to the `store` through `!DIAssignID !16`.
116
117### Store-like instructions
118
119In the absence of a linked `#dbg_assign`, a store to an address that is
120known to be the backing storage for a variable is considered to represent an
121assignment to that variable.
122
123This gives us a safe fall-back in cases where `#dbg_assign` records have
124been deleted, the `DIAssignID` attachment on the store has been dropped, or the
125optimiser has made a once-indirect store (not tracked with Assignment Tracking)
126direct.
127
128### Middle-end: Considerations for pass-writers
129
130#### Non-debug instruction updates
131
132**Cloning** an instruction: nothing new to do. Cloning automatically clones a
133`DIAssignID` attachment. Multiple instructions may have the same `DIAssignID`
134instruction. In this case, the assignment is considered to take place in
135multiple positions in the program.
136
137**Moving** a non-debug instruction: nothing new to do. Instructions linked to a
138`#dbg_assign` have their initial IR position marked by the position of the
139`#dbg_assign`.
140
141**Deleting** a non-debug instruction: nothing new to do. Simple DSE does not
142require any change; it’s safe to delete an instruction with a `DIAssignID`
143attachment. A `#dbg_assign` that uses a `DIAssignID` that is not attached
144to any instruction indicates that the memory location isn’t valid.
145
146**Merging** stores: In many cases no change is required as `DIAssignID`
147attachments are automatically merged if `combineMetadata` is called. One way or
148another, the `DIAssignID` attachments must be merged such that new store
149becomes linked to all the `#dbg_assign` records that the merged stores
150were linked to. This can be achieved simply by calling a helper function
151`Instruction::mergeDIAssignID`.
152
153**Inlining** stores: As stores are inlined we generate `#dbg_assign`
154records and `DIAssignID` attachments as if the stores represent source
155assignments, just like the in frontend. This isn’t perfect, as stores may have
156been moved, modified or deleted before inlining, but it does at least keep the
157information about the variable correct within the non-inlined scope.
158
159**Splitting** stores: SROA and passes that split stores treat `#dbg_assign`
160records similarly to `#dbg_declare` records. Clone the
161`#dbg_assign` records linked to the store, update the FragmentInfo in
162the `ValueExpression`, and give the split stores (and cloned records) new
163`DIAssignID` attachments each. In other words, treat the split stores as
164separate assignments. For partial DSE (e.g. shortening a memset), we do the
165same except that `#dbg_assign` for the dead fragment gets an `Undef`
166`Address`.
167
168**Promoting** allocas and store/loads: `#dbg_assign` records implicitly
169describe joined values in memory locations at CFG joins, but this is not
170necessarily the case after promoting (or partially promoting) the
171variable. Passes that promote variables are responsible for inserting
172`#dbg_assign` records after the resultant PHIs generated during
173promotion. `mem2reg` already has to do this (with `#dbg_value`) for
174`#dbg_declare`s. Where a store has no linked record, the store is
175assumed to represent an assignment for variables stored at the destination
176address.
177
178#### Debug record updates
179
180**Moving** a debug record: avoid moving `#dbg_assign` records where
181possible, as they represent a source-level assignment, whose position in the
182program should not be affected by optimization passes.
183
184**Deleting** a debug record: Nothing new to do. Just like for conventional
185debug records, unless it is unreachable, it’s almost always incorrect to
186delete a `#dbg_assign` record.
187
188### Lowering `#dbg_assign` to MIR
189
190To begin with only SelectionDAG ISel will be supported. `#dbg_assign`
191records are lowered to MIR `DBG_INSTR_REF` instructions. Before this happens
192we need to decide where it is appropriate to use memory locations and where we
193must use a non-memory location (or no location) for each variable. In order to
194make those decisions we run a standard fixed-point dataflow analysis that makes
195the choice at each instruction, iteratively joining the results for each block.
196
197### TODO list
198
199Outstanding improvements:
200
201* As mentioned in test llvm/test/DebugInfo/assignment-tracking/X86/diamond-3.ll,
202  the analysis should treat escaping calls like untagged stores.
203
204* The system expects locals to be backed by a local alloca. This isn't always
205  the case - sometimes a pointer to storage is passed into a function
206  (e.g. sret, byval). We need to be able to handle those cases. See
207  llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and
208  clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for examples.
209
210* `trackAssignments` doesn't yet work for variables that have their
211  `#dbg_declare` location modified by a `DIExpression`, e.g. when the
212  address of the variable is itself stored in an `alloca` with the
213  `#dbg_declare` using `DIExpression(DW_OP_deref)`. See `indirectReturn` in
214  llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and in
215  clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for an
216  example.
217
218* In order to solve the first bullet-point we need to be able to specify that a
219  memory location is available without using a `DIAssignID`. This is because
220  the storage address is not computed by an instruction (it's an argument
221  value) and therefore we have nowhere to put the metadata attachment. To solve
222  this we probably need another marker record to denote "the variable's
223  stack home is X address" - similar to `#dbg_declare` except that it needs
224  to compose with `#dbg_assign` records such that the stack home address
225  is only selected as a location for the variable when the `#dbg_assign`
226  records agree it should be.
227
228* Given the above (a special "the stack home is X" record), and the fact
229  that we can only track assignments with fixed offsets and sizes, I think we
230  can probably get rid of the address and address-expression part, since it
231  will always be computable with the info we have.
232
233* Assignment tracking is disabled by default for LTO and thinLTO builds, and
234  if LLDB debugger tuning has been specified. We should remove these
235  restrictions. See EmitAssemblyHelper::RunOptimizationPipeline in
236  clang/lib/CodeGen/BackendUtil.cpp.
237