xref: /llvm-project/llvm/docs/AssignmentTracking.md (revision b6a4ab5a12c9ced0642769e4b2d8f77859541ba8)
133c7ae55SOCHyams# Debug Info Assignment Tracking
233c7ae55SOCHyams
333c7ae55SOCHyamsAssignment Tracking is an alternative technique for tracking variable location
433c7ae55SOCHyamsdebug info through optimisations in LLVM. It provides accurate variable
533c7ae55SOCHyamslocations for assignments where a local variable (or a field of one) is the
633c7ae55SOCHyamsLHS. In rare and complicated circumstances indirect assignments might be
733c7ae55SOCHyamsoptimized away without being tracked, but otherwise we make our best effort to
833c7ae55SOCHyamstrack all variable locations.
933c7ae55SOCHyams
1033c7ae55SOCHyamsThe core idea is to track more information about source assignments in order
1133c7ae55SOCHyamsand preserve enough information to be able to defer decisions about whether to
1233c7ae55SOCHyamsuse non-memory locations (register, constant) or memory locations until after
1333c7ae55SOCHyamsmiddle end optimisations have run. This is in opposition to using
14400d4fd7SStephen Tozer`#dbg_declare` and `#dbg_value`, which is to make the decision for most
1533c7ae55SOCHyamsvariables early on, which can result in suboptimal variable locations that may
1633c7ae55SOCHyamsbe either incorrect or incomplete.
1733c7ae55SOCHyams
1833c7ae55SOCHyamsA secondary goal of assignment tracking is to cause minimal additional work for
1933c7ae55SOCHyamsLLVM pass writers, and minimal disruption to LLVM in general.
2033c7ae55SOCHyams
2133c7ae55SOCHyams## Status and usage
2233c7ae55SOCHyams
23*b6a4ab5aSOrlando Cazalet-Hyams**Status**: Enabled by default in Clang but disabled under some circumstances
24*b6a4ab5aSOrlando Cazalet-Hyams(which can be overridden with the `forced` option, see below). `opt` will not
25*b6a4ab5aSOrlando Cazalet-Hyamsrun the pass unless asked (`-passes=declare-to-assign`).
2633c7ae55SOCHyams
27*b6a4ab5aSOrlando Cazalet-Hyams**Flag**:
28*b6a4ab5aSOrlando Cazalet-Hyams`-Xclang -fexperimental-assignment-tracking=<disabled|enabled|forced>`
2933c7ae55SOCHyams
30*b6a4ab5aSOrlando Cazalet-HyamsWhen enabled Clang gets LLVM to run the pass `declare-to-assign`. The pass
31400d4fd7SStephen Tozerconverts conventional debug records to assignment tracking metadata and sets
324ece5073SOCHyamsthe module flag `debug-info-assignment-tracking` to the value `i1 true`. To
334ece5073SOCHyamscheck whether assignment tracking is enabled for a module call
344ece5073SOCHyams`isAssignmentTrackingEnabled(const Module &M)` (from `llvm/IR/DebugInfo.h`).
3533c7ae55SOCHyams
3633c7ae55SOCHyams## Design and implementation
3733c7ae55SOCHyams
38400d4fd7SStephen Tozer### Assignment markers: `#dbg_assign`
3933c7ae55SOCHyams
40400d4fd7SStephen Tozer`#dbg_value`, a conventional debug record, marks out a position in the
4133c7ae55SOCHyamsIR where a variable takes a particular value. Similarly, Assignment Tracking
42400d4fd7SStephen Tozermarks out the position of assignments with a record called `#dbg_assign`.
4333c7ae55SOCHyams
4433c7ae55SOCHyamsIn order to know where in IR it is appropriate to use a memory location for a
4533c7ae55SOCHyamsvariable, each assignment marker must in some way refer to the store, if any
4633c7ae55SOCHyams(or multiple!), that performs the assignment. That way, the position of the
4733c7ae55SOCHyamsstore and marker can be considered together when making that choice. Another
4833c7ae55SOCHyamsimportant benefit of referring to the store is that we can then build a two-way
4933c7ae55SOCHyamsmapping of stores<->markers that can be used to find markers that need to be
5033c7ae55SOCHyamsupdated when stores are modified.
5133c7ae55SOCHyams
52400d4fd7SStephen TozerAn `#dbg_assign` marker that is not linked to any instruction signals that
5333c7ae55SOCHyamsthe store that performed the assignment has been optimised out, and therefore
5433c7ae55SOCHyamsthe memory location will not be valid for at least some part of the program.
5533c7ae55SOCHyams
56400d4fd7SStephen TozerHere's the `#dbg_assign` signature. `Value *` type parameters are first wrapped
57400d4fd7SStephen Tozerin `ValueAsMetadata`:
5833c7ae55SOCHyams
5933c7ae55SOCHyams```
60400d4fd7SStephen Tozer  #dbg_assign(Value *Value,
6133c7ae55SOCHyams              DIExpression *ValueExpression,
6233c7ae55SOCHyams              DILocalVariable *Variable,
6333c7ae55SOCHyams              DIAssignID *ID,
6433c7ae55SOCHyams              Value *Address,
6533c7ae55SOCHyams              DIExpression *AddressExpression)
6633c7ae55SOCHyams```
6733c7ae55SOCHyams
68400d4fd7SStephen TozerThe first three parameters look and behave like an `#dbg_value`. `ID` is a
6933c7ae55SOCHyamsreference to a store (see next section). `Address` is the destination address
7012ece768SOCHyamsof the store and it is modified by `AddressExpression`. An empty/undef/poison
7112ece768SOCHyamsaddress means the address component has been killed (the memory address is no
7212ece768SOCHyamslonger a valid location). LLVM currently encodes variable fragment information
7312ece768SOCHyamsin `DIExpression`s, so as an implementation quirk the `FragmentInfo` for
7412ece768SOCHyams`Variable` is contained within `ValueExpression` only.
7533c7ae55SOCHyams
7633c7ae55SOCHyams### Instruction link: `DIAssignID`
7733c7ae55SOCHyams
7833c7ae55SOCHyams`DIAssignID` metadata is the mechanism that is currently used to encode the
7933c7ae55SOCHyamsstore<->marker link. The metadata node has no operands and all instances are
8033c7ae55SOCHyams`distinct`; equality is checked for by comparing addresses.
8133c7ae55SOCHyams
82400d4fd7SStephen Tozer`#dbg_assign` records use a `DIAssignID` metadata node instance as an
8333c7ae55SOCHyamsoperand. This way it refers to any store-like instruction that has the same
8433c7ae55SOCHyams`DIAssignID` attachment. E.g. For this test.cpp,
8533c7ae55SOCHyams
8633c7ae55SOCHyams```
8733c7ae55SOCHyamsint fun(int a) {
8833c7ae55SOCHyams  return a;
8933c7ae55SOCHyams}
9033c7ae55SOCHyams```
9133c7ae55SOCHyamscompiled without optimisations:
9233c7ae55SOCHyams```
93*b6a4ab5aSOrlando Cazalet-Hyams$ clang++ test.cpp -o test.ll -emit-llvm -S -g -O0 -Xclang -fexperimental-assignment-tracking=enabled
9433c7ae55SOCHyams```
9533c7ae55SOCHyamswe get:
9633c7ae55SOCHyams```
9733c7ae55SOCHyamsdefine dso_local noundef i32 @_Z3funi(i32 noundef %a) #0 !dbg !8 {
9833c7ae55SOCHyamsentry:
9933c7ae55SOCHyams  %a.addr = alloca i32, align 4, !DIAssignID !13
100400d4fd7SStephen Tozer    #dbg_assign(i1 undef, !14, !DIExpression(), !13, i32* %a.addr, !DIExpression(), !15)
10133c7ae55SOCHyams  store i32 %a, i32* %a.addr, align 4, !DIAssignID !16
102400d4fd7SStephen Tozer    #dbg_assign(i32 %a, !14, !DIExpression(), !16, i32* %a.addr, !DIExpression(), !15)
10333c7ae55SOCHyams  %0 = load i32, i32* %a.addr, align 4, !dbg !17
10433c7ae55SOCHyams  ret i32 %0, !dbg !18
10533c7ae55SOCHyams}
10633c7ae55SOCHyams
10733c7ae55SOCHyams...
10833c7ae55SOCHyams!13 = distinct !DIAssignID()
10933c7ae55SOCHyams!14 = !DILocalVariable(name: "a", ...)
11033c7ae55SOCHyams...
11133c7ae55SOCHyams!16 = distinct !DIAssignID()
11233c7ae55SOCHyams```
11333c7ae55SOCHyams
114400d4fd7SStephen TozerThe first `#dbg_assign` refers to the `alloca` through `!DIAssignID !13`,
11533c7ae55SOCHyamsand the second refers to the `store` through `!DIAssignID !16`.
11633c7ae55SOCHyams
11733c7ae55SOCHyams### Store-like instructions
11833c7ae55SOCHyams
119400d4fd7SStephen TozerIn the absence of a linked `#dbg_assign`, a store to an address that is
12033c7ae55SOCHyamsknown to be the backing storage for a variable is considered to represent an
12133c7ae55SOCHyamsassignment to that variable.
12233c7ae55SOCHyams
123400d4fd7SStephen TozerThis gives us a safe fall-back in cases where `#dbg_assign` records have
12433c7ae55SOCHyamsbeen deleted, the `DIAssignID` attachment on the store has been dropped, or the
12533c7ae55SOCHyamsoptimiser has made a once-indirect store (not tracked with Assignment Tracking)
12633c7ae55SOCHyamsdirect.
12733c7ae55SOCHyams
12833c7ae55SOCHyams### Middle-end: Considerations for pass-writers
12933c7ae55SOCHyams
13033c7ae55SOCHyams#### Non-debug instruction updates
13133c7ae55SOCHyams
13233c7ae55SOCHyams**Cloning** an instruction: nothing new to do. Cloning automatically clones a
13333c7ae55SOCHyams`DIAssignID` attachment. Multiple instructions may have the same `DIAssignID`
13433c7ae55SOCHyamsinstruction. In this case, the assignment is considered to take place in
13533c7ae55SOCHyamsmultiple positions in the program.
13633c7ae55SOCHyams
137400d4fd7SStephen Tozer**Moving** a non-debug instruction: nothing new to do. Instructions linked to a
138400d4fd7SStephen Tozer`#dbg_assign` have their initial IR position marked by the position of the
139400d4fd7SStephen Tozer`#dbg_assign`.
14033c7ae55SOCHyams
14133c7ae55SOCHyams**Deleting** a non-debug instruction: nothing new to do. Simple DSE does not
14233c7ae55SOCHyamsrequire any change; it’s safe to delete an instruction with a `DIAssignID`
143400d4fd7SStephen Tozerattachment. A `#dbg_assign` that uses a `DIAssignID` that is not attached
14433c7ae55SOCHyamsto any instruction indicates that the memory location isn’t valid.
14533c7ae55SOCHyams
14633c7ae55SOCHyams**Merging** stores: In many cases no change is required as `DIAssignID`
14733c7ae55SOCHyamsattachments are automatically merged if `combineMetadata` is called. One way or
14833c7ae55SOCHyamsanother, the `DIAssignID` attachments must be merged such that new store
149400d4fd7SStephen Tozerbecomes linked to all the `#dbg_assign` records that the merged stores
15033c7ae55SOCHyamswere linked to. This can be achieved simply by calling a helper function
15133c7ae55SOCHyams`Instruction::mergeDIAssignID`.
15233c7ae55SOCHyams
153400d4fd7SStephen Tozer**Inlining** stores: As stores are inlined we generate `#dbg_assign`
154400d4fd7SStephen Tozerrecords and `DIAssignID` attachments as if the stores represent source
15533c7ae55SOCHyamsassignments, just like the in frontend. This isn’t perfect, as stores may have
15633c7ae55SOCHyamsbeen moved, modified or deleted before inlining, but it does at least keep the
15733c7ae55SOCHyamsinformation about the variable correct within the non-inlined scope.
15833c7ae55SOCHyams
159400d4fd7SStephen Tozer**Splitting** stores: SROA and passes that split stores treat `#dbg_assign`
160400d4fd7SStephen Tozerrecords similarly to `#dbg_declare` records. Clone the
161400d4fd7SStephen Tozer`#dbg_assign` records linked to the store, update the FragmentInfo in
162400d4fd7SStephen Tozerthe `ValueExpression`, and give the split stores (and cloned records) new
16333c7ae55SOCHyams`DIAssignID` attachments each. In other words, treat the split stores as
16433c7ae55SOCHyamsseparate assignments. For partial DSE (e.g. shortening a memset), we do the
165400d4fd7SStephen Tozersame except that `#dbg_assign` for the dead fragment gets an `Undef`
16633c7ae55SOCHyams`Address`.
16733c7ae55SOCHyams
168400d4fd7SStephen Tozer**Promoting** allocas and store/loads: `#dbg_assign` records implicitly
16933c7ae55SOCHyamsdescribe joined values in memory locations at CFG joins, but this is not
17033c7ae55SOCHyamsnecessarily the case after promoting (or partially promoting) the
17133c7ae55SOCHyamsvariable. Passes that promote variables are responsible for inserting
172400d4fd7SStephen Tozer`#dbg_assign` records after the resultant PHIs generated during
173400d4fd7SStephen Tozerpromotion. `mem2reg` already has to do this (with `#dbg_value`) for
174400d4fd7SStephen Tozer`#dbg_declare`s. Where a store has no linked record, the store is
17533c7ae55SOCHyamsassumed to represent an assignment for variables stored at the destination
17633c7ae55SOCHyamsaddress.
17733c7ae55SOCHyams
178400d4fd7SStephen Tozer#### Debug record updates
17933c7ae55SOCHyams
180400d4fd7SStephen Tozer**Moving** a debug record: avoid moving `#dbg_assign` records where
18133c7ae55SOCHyamspossible, as they represent a source-level assignment, whose position in the
18233c7ae55SOCHyamsprogram should not be affected by optimization passes.
18333c7ae55SOCHyams
184400d4fd7SStephen Tozer**Deleting** a debug record: Nothing new to do. Just like for conventional
185400d4fd7SStephen Tozerdebug records, unless it is unreachable, it’s almost always incorrect to
186400d4fd7SStephen Tozerdelete a `#dbg_assign` record.
18733c7ae55SOCHyams
188400d4fd7SStephen Tozer### Lowering `#dbg_assign` to MIR
18933c7ae55SOCHyams
190400d4fd7SStephen TozerTo begin with only SelectionDAG ISel will be supported. `#dbg_assign`
191400d4fd7SStephen Tozerrecords are lowered to MIR `DBG_INSTR_REF` instructions. Before this happens
19233c7ae55SOCHyamswe need to decide where it is appropriate to use memory locations and where we
19333c7ae55SOCHyamsmust use a non-memory location (or no location) for each variable. In order to
19433c7ae55SOCHyamsmake those decisions we run a standard fixed-point dataflow analysis that makes
19533c7ae55SOCHyamsthe choice at each instruction, iteratively joining the results for each block.
19633c7ae55SOCHyams
19733c7ae55SOCHyams### TODO list
19833c7ae55SOCHyams
199*b6a4ab5aSOrlando Cazalet-HyamsOutstanding improvements:
20033c7ae55SOCHyams
2017ea80c27SOCHyams* As mentioned in test llvm/test/DebugInfo/assignment-tracking/X86/diamond-3.ll,
2027ea80c27SOCHyams  the analysis should treat escaping calls like untagged stores.
2037ea80c27SOCHyams
20433c7ae55SOCHyams* The system expects locals to be backed by a local alloca. This isn't always
20533c7ae55SOCHyams  the case - sometimes a pointer to storage is passed into a function
20633c7ae55SOCHyams  (e.g. sret, byval). We need to be able to handle those cases. See
20733c7ae55SOCHyams  llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and
20833c7ae55SOCHyams  clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for examples.
20933c7ae55SOCHyams
21033c7ae55SOCHyams* `trackAssignments` doesn't yet work for variables that have their
211400d4fd7SStephen Tozer  `#dbg_declare` location modified by a `DIExpression`, e.g. when the
21233c7ae55SOCHyams  address of the variable is itself stored in an `alloca` with the
213400d4fd7SStephen Tozer  `#dbg_declare` using `DIExpression(DW_OP_deref)`. See `indirectReturn` in
21433c7ae55SOCHyams  llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and in
21533c7ae55SOCHyams  clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for an
21633c7ae55SOCHyams  example.
21733c7ae55SOCHyams
21833c7ae55SOCHyams* In order to solve the first bullet-point we need to be able to specify that a
21933c7ae55SOCHyams  memory location is available without using a `DIAssignID`. This is because
22033c7ae55SOCHyams  the storage address is not computed by an instruction (it's an argument
22133c7ae55SOCHyams  value) and therefore we have nowhere to put the metadata attachment. To solve
222400d4fd7SStephen Tozer  this we probably need another marker record to denote "the variable's
223400d4fd7SStephen Tozer  stack home is X address" - similar to `#dbg_declare` except that it needs
224400d4fd7SStephen Tozer  to compose with `#dbg_assign` records such that the stack home address
225400d4fd7SStephen Tozer  is only selected as a location for the variable when the `#dbg_assign`
226400d4fd7SStephen Tozer  records agree it should be.
22733c7ae55SOCHyams
228400d4fd7SStephen Tozer* Given the above (a special "the stack home is X" record), and the fact
22933c7ae55SOCHyams  that we can only track assignments with fixed offsets and sizes, I think we
23033c7ae55SOCHyams  can probably get rid of the address and address-expression part, since it
23133c7ae55SOCHyams  will always be computable with the info we have.
232*b6a4ab5aSOrlando Cazalet-Hyams
233*b6a4ab5aSOrlando Cazalet-Hyams* Assignment tracking is disabled by default for LTO and thinLTO builds, and
234*b6a4ab5aSOrlando Cazalet-Hyams  if LLDB debugger tuning has been specified. We should remove these
235*b6a4ab5aSOrlando Cazalet-Hyams  restrictions. See EmitAssemblyHelper::RunOptimizationPipeline in
236*b6a4ab5aSOrlando Cazalet-Hyams  clang/lib/CodeGen/BackendUtil.cpp.
237