133c7ae55SOCHyams# Debug Info Assignment Tracking 233c7ae55SOCHyams 333c7ae55SOCHyamsAssignment Tracking is an alternative technique for tracking variable location 433c7ae55SOCHyamsdebug info through optimisations in LLVM. It provides accurate variable 533c7ae55SOCHyamslocations for assignments where a local variable (or a field of one) is the 633c7ae55SOCHyamsLHS. In rare and complicated circumstances indirect assignments might be 733c7ae55SOCHyamsoptimized away without being tracked, but otherwise we make our best effort to 833c7ae55SOCHyamstrack all variable locations. 933c7ae55SOCHyams 1033c7ae55SOCHyamsThe core idea is to track more information about source assignments in order 1133c7ae55SOCHyamsand preserve enough information to be able to defer decisions about whether to 1233c7ae55SOCHyamsuse non-memory locations (register, constant) or memory locations until after 1333c7ae55SOCHyamsmiddle end optimisations have run. This is in opposition to using 14400d4fd7SStephen Tozer`#dbg_declare` and `#dbg_value`, which is to make the decision for most 1533c7ae55SOCHyamsvariables early on, which can result in suboptimal variable locations that may 1633c7ae55SOCHyamsbe either incorrect or incomplete. 1733c7ae55SOCHyams 1833c7ae55SOCHyamsA secondary goal of assignment tracking is to cause minimal additional work for 1933c7ae55SOCHyamsLLVM pass writers, and minimal disruption to LLVM in general. 2033c7ae55SOCHyams 2133c7ae55SOCHyams## Status and usage 2233c7ae55SOCHyams 23*b6a4ab5aSOrlando Cazalet-Hyams**Status**: Enabled by default in Clang but disabled under some circumstances 24*b6a4ab5aSOrlando Cazalet-Hyams(which can be overridden with the `forced` option, see below). `opt` will not 25*b6a4ab5aSOrlando Cazalet-Hyamsrun the pass unless asked (`-passes=declare-to-assign`). 2633c7ae55SOCHyams 27*b6a4ab5aSOrlando Cazalet-Hyams**Flag**: 28*b6a4ab5aSOrlando Cazalet-Hyams`-Xclang -fexperimental-assignment-tracking=<disabled|enabled|forced>` 2933c7ae55SOCHyams 30*b6a4ab5aSOrlando Cazalet-HyamsWhen enabled Clang gets LLVM to run the pass `declare-to-assign`. The pass 31400d4fd7SStephen Tozerconverts conventional debug records to assignment tracking metadata and sets 324ece5073SOCHyamsthe module flag `debug-info-assignment-tracking` to the value `i1 true`. To 334ece5073SOCHyamscheck whether assignment tracking is enabled for a module call 344ece5073SOCHyams`isAssignmentTrackingEnabled(const Module &M)` (from `llvm/IR/DebugInfo.h`). 3533c7ae55SOCHyams 3633c7ae55SOCHyams## Design and implementation 3733c7ae55SOCHyams 38400d4fd7SStephen Tozer### Assignment markers: `#dbg_assign` 3933c7ae55SOCHyams 40400d4fd7SStephen Tozer`#dbg_value`, a conventional debug record, marks out a position in the 4133c7ae55SOCHyamsIR where a variable takes a particular value. Similarly, Assignment Tracking 42400d4fd7SStephen Tozermarks out the position of assignments with a record called `#dbg_assign`. 4333c7ae55SOCHyams 4433c7ae55SOCHyamsIn order to know where in IR it is appropriate to use a memory location for a 4533c7ae55SOCHyamsvariable, each assignment marker must in some way refer to the store, if any 4633c7ae55SOCHyams(or multiple!), that performs the assignment. That way, the position of the 4733c7ae55SOCHyamsstore and marker can be considered together when making that choice. Another 4833c7ae55SOCHyamsimportant benefit of referring to the store is that we can then build a two-way 4933c7ae55SOCHyamsmapping of stores<->markers that can be used to find markers that need to be 5033c7ae55SOCHyamsupdated when stores are modified. 5133c7ae55SOCHyams 52400d4fd7SStephen TozerAn `#dbg_assign` marker that is not linked to any instruction signals that 5333c7ae55SOCHyamsthe store that performed the assignment has been optimised out, and therefore 5433c7ae55SOCHyamsthe memory location will not be valid for at least some part of the program. 5533c7ae55SOCHyams 56400d4fd7SStephen TozerHere's the `#dbg_assign` signature. `Value *` type parameters are first wrapped 57400d4fd7SStephen Tozerin `ValueAsMetadata`: 5833c7ae55SOCHyams 5933c7ae55SOCHyams``` 60400d4fd7SStephen Tozer #dbg_assign(Value *Value, 6133c7ae55SOCHyams DIExpression *ValueExpression, 6233c7ae55SOCHyams DILocalVariable *Variable, 6333c7ae55SOCHyams DIAssignID *ID, 6433c7ae55SOCHyams Value *Address, 6533c7ae55SOCHyams DIExpression *AddressExpression) 6633c7ae55SOCHyams``` 6733c7ae55SOCHyams 68400d4fd7SStephen TozerThe first three parameters look and behave like an `#dbg_value`. `ID` is a 6933c7ae55SOCHyamsreference to a store (see next section). `Address` is the destination address 7012ece768SOCHyamsof the store and it is modified by `AddressExpression`. An empty/undef/poison 7112ece768SOCHyamsaddress means the address component has been killed (the memory address is no 7212ece768SOCHyamslonger a valid location). LLVM currently encodes variable fragment information 7312ece768SOCHyamsin `DIExpression`s, so as an implementation quirk the `FragmentInfo` for 7412ece768SOCHyams`Variable` is contained within `ValueExpression` only. 7533c7ae55SOCHyams 7633c7ae55SOCHyams### Instruction link: `DIAssignID` 7733c7ae55SOCHyams 7833c7ae55SOCHyams`DIAssignID` metadata is the mechanism that is currently used to encode the 7933c7ae55SOCHyamsstore<->marker link. The metadata node has no operands and all instances are 8033c7ae55SOCHyams`distinct`; equality is checked for by comparing addresses. 8133c7ae55SOCHyams 82400d4fd7SStephen Tozer`#dbg_assign` records use a `DIAssignID` metadata node instance as an 8333c7ae55SOCHyamsoperand. This way it refers to any store-like instruction that has the same 8433c7ae55SOCHyams`DIAssignID` attachment. E.g. For this test.cpp, 8533c7ae55SOCHyams 8633c7ae55SOCHyams``` 8733c7ae55SOCHyamsint fun(int a) { 8833c7ae55SOCHyams return a; 8933c7ae55SOCHyams} 9033c7ae55SOCHyams``` 9133c7ae55SOCHyamscompiled without optimisations: 9233c7ae55SOCHyams``` 93*b6a4ab5aSOrlando Cazalet-Hyams$ clang++ test.cpp -o test.ll -emit-llvm -S -g -O0 -Xclang -fexperimental-assignment-tracking=enabled 9433c7ae55SOCHyams``` 9533c7ae55SOCHyamswe get: 9633c7ae55SOCHyams``` 9733c7ae55SOCHyamsdefine dso_local noundef i32 @_Z3funi(i32 noundef %a) #0 !dbg !8 { 9833c7ae55SOCHyamsentry: 9933c7ae55SOCHyams %a.addr = alloca i32, align 4, !DIAssignID !13 100400d4fd7SStephen Tozer #dbg_assign(i1 undef, !14, !DIExpression(), !13, i32* %a.addr, !DIExpression(), !15) 10133c7ae55SOCHyams store i32 %a, i32* %a.addr, align 4, !DIAssignID !16 102400d4fd7SStephen Tozer #dbg_assign(i32 %a, !14, !DIExpression(), !16, i32* %a.addr, !DIExpression(), !15) 10333c7ae55SOCHyams %0 = load i32, i32* %a.addr, align 4, !dbg !17 10433c7ae55SOCHyams ret i32 %0, !dbg !18 10533c7ae55SOCHyams} 10633c7ae55SOCHyams 10733c7ae55SOCHyams... 10833c7ae55SOCHyams!13 = distinct !DIAssignID() 10933c7ae55SOCHyams!14 = !DILocalVariable(name: "a", ...) 11033c7ae55SOCHyams... 11133c7ae55SOCHyams!16 = distinct !DIAssignID() 11233c7ae55SOCHyams``` 11333c7ae55SOCHyams 114400d4fd7SStephen TozerThe first `#dbg_assign` refers to the `alloca` through `!DIAssignID !13`, 11533c7ae55SOCHyamsand the second refers to the `store` through `!DIAssignID !16`. 11633c7ae55SOCHyams 11733c7ae55SOCHyams### Store-like instructions 11833c7ae55SOCHyams 119400d4fd7SStephen TozerIn the absence of a linked `#dbg_assign`, a store to an address that is 12033c7ae55SOCHyamsknown to be the backing storage for a variable is considered to represent an 12133c7ae55SOCHyamsassignment to that variable. 12233c7ae55SOCHyams 123400d4fd7SStephen TozerThis gives us a safe fall-back in cases where `#dbg_assign` records have 12433c7ae55SOCHyamsbeen deleted, the `DIAssignID` attachment on the store has been dropped, or the 12533c7ae55SOCHyamsoptimiser has made a once-indirect store (not tracked with Assignment Tracking) 12633c7ae55SOCHyamsdirect. 12733c7ae55SOCHyams 12833c7ae55SOCHyams### Middle-end: Considerations for pass-writers 12933c7ae55SOCHyams 13033c7ae55SOCHyams#### Non-debug instruction updates 13133c7ae55SOCHyams 13233c7ae55SOCHyams**Cloning** an instruction: nothing new to do. Cloning automatically clones a 13333c7ae55SOCHyams`DIAssignID` attachment. Multiple instructions may have the same `DIAssignID` 13433c7ae55SOCHyamsinstruction. In this case, the assignment is considered to take place in 13533c7ae55SOCHyamsmultiple positions in the program. 13633c7ae55SOCHyams 137400d4fd7SStephen Tozer**Moving** a non-debug instruction: nothing new to do. Instructions linked to a 138400d4fd7SStephen Tozer`#dbg_assign` have their initial IR position marked by the position of the 139400d4fd7SStephen Tozer`#dbg_assign`. 14033c7ae55SOCHyams 14133c7ae55SOCHyams**Deleting** a non-debug instruction: nothing new to do. Simple DSE does not 14233c7ae55SOCHyamsrequire any change; it’s safe to delete an instruction with a `DIAssignID` 143400d4fd7SStephen Tozerattachment. A `#dbg_assign` that uses a `DIAssignID` that is not attached 14433c7ae55SOCHyamsto any instruction indicates that the memory location isn’t valid. 14533c7ae55SOCHyams 14633c7ae55SOCHyams**Merging** stores: In many cases no change is required as `DIAssignID` 14733c7ae55SOCHyamsattachments are automatically merged if `combineMetadata` is called. One way or 14833c7ae55SOCHyamsanother, the `DIAssignID` attachments must be merged such that new store 149400d4fd7SStephen Tozerbecomes linked to all the `#dbg_assign` records that the merged stores 15033c7ae55SOCHyamswere linked to. This can be achieved simply by calling a helper function 15133c7ae55SOCHyams`Instruction::mergeDIAssignID`. 15233c7ae55SOCHyams 153400d4fd7SStephen Tozer**Inlining** stores: As stores are inlined we generate `#dbg_assign` 154400d4fd7SStephen Tozerrecords and `DIAssignID` attachments as if the stores represent source 15533c7ae55SOCHyamsassignments, just like the in frontend. This isn’t perfect, as stores may have 15633c7ae55SOCHyamsbeen moved, modified or deleted before inlining, but it does at least keep the 15733c7ae55SOCHyamsinformation about the variable correct within the non-inlined scope. 15833c7ae55SOCHyams 159400d4fd7SStephen Tozer**Splitting** stores: SROA and passes that split stores treat `#dbg_assign` 160400d4fd7SStephen Tozerrecords similarly to `#dbg_declare` records. Clone the 161400d4fd7SStephen Tozer`#dbg_assign` records linked to the store, update the FragmentInfo in 162400d4fd7SStephen Tozerthe `ValueExpression`, and give the split stores (and cloned records) new 16333c7ae55SOCHyams`DIAssignID` attachments each. In other words, treat the split stores as 16433c7ae55SOCHyamsseparate assignments. For partial DSE (e.g. shortening a memset), we do the 165400d4fd7SStephen Tozersame except that `#dbg_assign` for the dead fragment gets an `Undef` 16633c7ae55SOCHyams`Address`. 16733c7ae55SOCHyams 168400d4fd7SStephen Tozer**Promoting** allocas and store/loads: `#dbg_assign` records implicitly 16933c7ae55SOCHyamsdescribe joined values in memory locations at CFG joins, but this is not 17033c7ae55SOCHyamsnecessarily the case after promoting (or partially promoting) the 17133c7ae55SOCHyamsvariable. Passes that promote variables are responsible for inserting 172400d4fd7SStephen Tozer`#dbg_assign` records after the resultant PHIs generated during 173400d4fd7SStephen Tozerpromotion. `mem2reg` already has to do this (with `#dbg_value`) for 174400d4fd7SStephen Tozer`#dbg_declare`s. Where a store has no linked record, the store is 17533c7ae55SOCHyamsassumed to represent an assignment for variables stored at the destination 17633c7ae55SOCHyamsaddress. 17733c7ae55SOCHyams 178400d4fd7SStephen Tozer#### Debug record updates 17933c7ae55SOCHyams 180400d4fd7SStephen Tozer**Moving** a debug record: avoid moving `#dbg_assign` records where 18133c7ae55SOCHyamspossible, as they represent a source-level assignment, whose position in the 18233c7ae55SOCHyamsprogram should not be affected by optimization passes. 18333c7ae55SOCHyams 184400d4fd7SStephen Tozer**Deleting** a debug record: Nothing new to do. Just like for conventional 185400d4fd7SStephen Tozerdebug records, unless it is unreachable, it’s almost always incorrect to 186400d4fd7SStephen Tozerdelete a `#dbg_assign` record. 18733c7ae55SOCHyams 188400d4fd7SStephen Tozer### Lowering `#dbg_assign` to MIR 18933c7ae55SOCHyams 190400d4fd7SStephen TozerTo begin with only SelectionDAG ISel will be supported. `#dbg_assign` 191400d4fd7SStephen Tozerrecords are lowered to MIR `DBG_INSTR_REF` instructions. Before this happens 19233c7ae55SOCHyamswe need to decide where it is appropriate to use memory locations and where we 19333c7ae55SOCHyamsmust use a non-memory location (or no location) for each variable. In order to 19433c7ae55SOCHyamsmake those decisions we run a standard fixed-point dataflow analysis that makes 19533c7ae55SOCHyamsthe choice at each instruction, iteratively joining the results for each block. 19633c7ae55SOCHyams 19733c7ae55SOCHyams### TODO list 19833c7ae55SOCHyams 199*b6a4ab5aSOrlando Cazalet-HyamsOutstanding improvements: 20033c7ae55SOCHyams 2017ea80c27SOCHyams* As mentioned in test llvm/test/DebugInfo/assignment-tracking/X86/diamond-3.ll, 2027ea80c27SOCHyams the analysis should treat escaping calls like untagged stores. 2037ea80c27SOCHyams 20433c7ae55SOCHyams* The system expects locals to be backed by a local alloca. This isn't always 20533c7ae55SOCHyams the case - sometimes a pointer to storage is passed into a function 20633c7ae55SOCHyams (e.g. sret, byval). We need to be able to handle those cases. See 20733c7ae55SOCHyams llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and 20833c7ae55SOCHyams clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for examples. 20933c7ae55SOCHyams 21033c7ae55SOCHyams* `trackAssignments` doesn't yet work for variables that have their 211400d4fd7SStephen Tozer `#dbg_declare` location modified by a `DIExpression`, e.g. when the 21233c7ae55SOCHyams address of the variable is itself stored in an `alloca` with the 213400d4fd7SStephen Tozer `#dbg_declare` using `DIExpression(DW_OP_deref)`. See `indirectReturn` in 21433c7ae55SOCHyams llvm/test/DebugInfo/Generic/assignment-tracking/track-assignments.ll and in 21533c7ae55SOCHyams clang/test/CodeGen/assignment-tracking/assignment-tracking.cpp for an 21633c7ae55SOCHyams example. 21733c7ae55SOCHyams 21833c7ae55SOCHyams* In order to solve the first bullet-point we need to be able to specify that a 21933c7ae55SOCHyams memory location is available without using a `DIAssignID`. This is because 22033c7ae55SOCHyams the storage address is not computed by an instruction (it's an argument 22133c7ae55SOCHyams value) and therefore we have nowhere to put the metadata attachment. To solve 222400d4fd7SStephen Tozer this we probably need another marker record to denote "the variable's 223400d4fd7SStephen Tozer stack home is X address" - similar to `#dbg_declare` except that it needs 224400d4fd7SStephen Tozer to compose with `#dbg_assign` records such that the stack home address 225400d4fd7SStephen Tozer is only selected as a location for the variable when the `#dbg_assign` 226400d4fd7SStephen Tozer records agree it should be. 22733c7ae55SOCHyams 228400d4fd7SStephen Tozer* Given the above (a special "the stack home is X" record), and the fact 22933c7ae55SOCHyams that we can only track assignments with fixed offsets and sizes, I think we 23033c7ae55SOCHyams can probably get rid of the address and address-expression part, since it 23133c7ae55SOCHyams will always be computable with the info we have. 232*b6a4ab5aSOrlando Cazalet-Hyams 233*b6a4ab5aSOrlando Cazalet-Hyams* Assignment tracking is disabled by default for LTO and thinLTO builds, and 234*b6a4ab5aSOrlando Cazalet-Hyams if LLDB debugger tuning has been specified. We should remove these 235*b6a4ab5aSOrlando Cazalet-Hyams restrictions. See EmitAssemblyHelper::RunOptimizationPipeline in 236*b6a4ab5aSOrlando Cazalet-Hyams clang/lib/CodeGen/BackendUtil.cpp. 237