1Symbolication 2============= 3 4LLDB is separated into a shared library that contains the core of the debugger, 5and a driver that implements debugging and a command interpreter. LLDB can be 6used to symbolicate your crash logs and can often provide more information than 7other symbolication programs: 8 9- Inlined functions 10- Variables that are in scope for an address, along with their locations 11 12The simplest form of symbolication is to load an executable: 13 14.. code-block:: text 15 16 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 17 18We use the ``--no-dependents`` flag with the ``target create`` command so that 19we don't load all of the dependent shared libraries from the current system. 20When we symbolicate, we are often symbolicating a binary that was running on 21another system, and even though the main executable might reference shared 22libraries in ``/usr/lib``, we often don't want to load the versions on the 23current computer. 24 25Using the ``image list`` command will show us a list of all shared libraries 26associated with the current target. As expected, we currently only have a 27single binary: 28 29.. code-block:: text 30 31 (lldb) image list 32 [ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out 33 /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out 34 35Now we can look up an address: 36 37.. code-block:: text 38 39 (lldb) image lookup --address 0x100000aa3 40 Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131) 41 Summary: a.out`main + 67 at main.c:13 42 43Since we haven't specified a slide or any load addresses for individual 44sections in the binary, the address that we use here is a file address. A file 45address refers to a virtual address as defined by each object file. 46 47If we didn't use the ``--no-dependents`` option with ``target create``, we 48would have loaded all dependent shared libraries: 49 50.. code-block:: text 51 52 (lldb) image list 53 [ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out 54 /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out 55 [ 1] 8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B 0x0000000000000000 /usr/lib/system/libsystem_c.dylib 56 [ 2] 62AA0B84-188A-348B-8F9E-3E2DB08DB93C 0x0000000000000000 /usr/lib/system/libsystem_dnssd.dylib 57 [ 3] C0535565-35D1-31A7-A744-63D9F10F12A4 0x0000000000000000 /usr/lib/system/libsystem_kernel.dylib 58 ... 59 60Now if we do a lookup using a file address, this can result in multiple matches 61since most shared libraries have a virtual address space that starts at zero: 62 63.. code-block:: text 64 65 (lldb) image lookup -a 0x1000 66 Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096) 67 68 Address: libsystem_c.dylib[0x0000000000001000] (libsystem_c.dylib.__TEXT.__text + 928) 69 Summary: libsystem_c.dylib`mcount + 9 70 71 Address: libsystem_dnssd.dylib[0x0000000000001000] (libsystem_dnssd.dylib.__TEXT.__text + 456) 72 Summary: libsystem_dnssd.dylib`ConvertHeaderBytes + 38 73 74 Address: libsystem_kernel.dylib[0x0000000000001000] (libsystem_kernel.dylib.__TEXT.__text + 1116) 75 Summary: libsystem_kernel.dylib`clock_get_time + 102 76 ... 77 78To avoid getting multiple file address matches, you can specify the name of the 79shared library to limit the search: 80 81.. code-block:: text 82 83 (lldb) image lookup -a 0x1000 a.out 84 Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096) 85 86Defining Load Addresses for Sections 87------------------------------------ 88 89When symbolicating your crash logs, it can be tedious if you always have to 90adjust your crashlog-addresses into file addresses. To avoid having to do any 91conversion, you can set the load address for the sections of the modules in 92your target. Once you set any section load address, lookups will switch to 93using load addresses. You can slide all sections in the executable by the same 94amount, or set the load address for individual sections. The ``target modules 95load --slide`` command allows us to set the load address for all sections. 96 97Below is an example of sliding all sections in a.out by adding 0x123000 to each 98section's file address: 99 100.. code-block:: text 101 102 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 103 (lldb) target modules load --file a.out --slide 0x123000 104 105 106It is often much easier to specify the actual load location of each section by 107name. Crash logs on macOS have a Binary Images section that specifies that 108address of the __TEXT segment for each binary. Specifying a slide requires 109requires that you first find the original (file) address for the __TEXT 110segment, and subtract the two values. If you specify the address of the __TEXT 111segment with ``target modules load section address``, you don't need to do any 112calculations. To specify the load addresses of sections we can specify one or 113more section name + address pairs in the ``target modules load`` command: 114 115.. code-block:: text 116 117 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 118 (lldb) target modules load --file a.out __TEXT 0x100123000 119 120We specified that the __TEXT section is loaded at 0x100123000. Now that we have 121defined where sections have been loaded in our target, any lookups we do will 122now use load addresses so we don't have to do any math on the addresses in the 123crashlog backtraces, we can just use the raw addresses: 124 125.. code-block:: text 126 127 (lldb) image lookup --address 0x100123aa3 128 Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131) 129 Summary: a.out`main + 67 at main.c:13 130 131Loading Multiple Executables 132---------------------------- 133 134You often have more than one executable involved when you need to symbolicate a 135crash log. When this happens, you create a target for the main executable or 136one of the shared libraries, then add more modules to the target using the 137``target modules add`` command. 138 139Lets say we have a Darwin crash log that contains the following images: 140 141.. code-block:: text 142 143 Binary Images: 144 0x100000000 - 0x100000ff7 <A866975B-CA1E-3649-98D0-6C5FAA444ECF> /tmp/a.out 145 0x7fff83f32000 - 0x7fff83ffefe7 <8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B> /usr/lib/system/libsystem_c.dylib 146 0x7fff883db000 - 0x7fff883e3ff7 <62AA0B84-188A-348B-8F9E-3E2DB08DB93C> /usr/lib/system/libsystem_dnssd.dylib 147 0x7fff8c0dc000 - 0x7fff8c0f7ff7 <C0535565-35D1-31A7-A744-63D9F10F12A4> /usr/lib/system/libsystem_kernel.dylib 148 149First we create the target using the main executable and then add any extra 150shared libraries we want: 151 152.. code-block:: text 153 154 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out 155 (lldb) target modules add /usr/lib/system/libsystem_c.dylib 156 (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib 157 (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib 158 159 160If you have debug symbols in standalone files, such as dSYM files on macOS, 161you can specify their paths using the --symfile option for the ``target create`` 162(recent LLDB releases only) and ``target modules add`` commands: 163 164.. code-block:: text 165 166 (lldb) target create --no-dependents --arch x86_64 /tmp/a.out --symfile /tmp/a.out.dSYM 167 (lldb) target modules add /usr/lib/system/libsystem_c.dylib --symfile /build/server/a/libsystem_c.dylib.dSYM 168 (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib --symfile /build/server/b/libsystem_dnssd.dylib.dSYM 169 (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib --symfile /build/server/c/libsystem_kernel.dylib.dSYM 170 171Then we set the load addresses for each __TEXT section (note the colors of the 172load addresses above and below) using the first address from the Binary Images 173section for each image: 174 175.. code-block:: text 176 177 (lldb) target modules load --file a.out 0x100000000 178 (lldb) target modules load --file libsystem_c.dylib 0x7fff83f32000 179 (lldb) target modules load --file libsystem_dnssd.dylib 0x7fff883db000 180 (lldb) target modules load --file libsystem_kernel.dylib 0x7fff8c0dc000 181 182 183Now any stack backtraces that haven't been symbolicated can be symbolicated 184using ``image lookup`` with the raw backtrace addresses. 185 186Given the following raw backtrace: 187 188.. code-block:: text 189 190 Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 191 0 libsystem_kernel.dylib 0x00007fff8a1e6d46 __kill + 10 192 1 libsystem_c.dylib 0x00007fff84597df0 abort + 177 193 2 libsystem_c.dylib 0x00007fff84598e2a __assert_rtn + 146 194 3 a.out 0x0000000100000f46 main + 70 195 4 libdyld.dylib 0x00007fff8c4197e1 start + 1 196 197We can now symbolicate the load addresses: 198 199.. code-block:: text 200 201 (lldb) image lookup -a 0x00007fff8a1e6d46 202 (lldb) image lookup -a 0x00007fff84597df0 203 (lldb) image lookup -a 0x00007fff84598e2a 204 (lldb) image lookup -a 0x0000000100000f46 205 206 207Getting Variable Information 208---------------------------- 209 210If you add the --verbose flag to the ``image lookup --address`` command, you 211can get verbose information which can often include the locations of some of 212your local variables: 213 214.. code-block:: text 215 216 (lldb) image lookup --address 0x100123aa3 --verbose 217 Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 110) 218 Summary: a.out`main + 50 at main.c:13 219 Module: file = "/tmp/a.out", arch = "x86_64" 220 CompileUnit: id = {0x00000000}, file = "/tmp/main.c", language = "ISO C:1999" 221 Function: id = {0x0000004f}, name = "main", range = [0x0000000100000bc0-0x0000000100000dc9) 222 FuncType: id = {0x0000004f}, decl = main.c:9, compiler_type = "int (int, const char **, const char **, const char **)" 223 Blocks: id = {0x0000004f}, range = [0x100000bc0-0x100000dc9) 224 id = {0x000000ae}, range = [0x100000bf2-0x100000dc4) 225 LineEntry: [0x0000000100000bf2-0x0000000100000bfa): /tmp/main.c:13:23 226 Symbol: id = {0x00000004}, range = [0x0000000100000bc0-0x0000000100000dc9), name="main" 227 Variable: id = {0x000000bf}, name = "path", type= "char [1024]", location = DW_OP_fbreg(-1072), decl = main.c:28 228 Variable: id = {0x00000072}, name = "argc", type= "int", location = r13, decl = main.c:8 229 Variable: id = {0x00000081}, name = "argv", type= "const char **", location = r12, decl = main.c:8 230 Variable: id = {0x00000090}, name = "envp", type= "const char **", location = r15, decl = main.c:8 231 Variable: id = {0x0000009f}, name = "aapl", type= "const char **", location = rbx, decl = main.c:8 232 233 234The interesting part is the variables that are listed. The variables are the 235parameters and local variables that are in scope for the address that was 236specified. These variable entries have locations which are shown in bold above. 237Crash logs often have register information for the first frame in each stack, 238and being able to reconstruct one or more local variables can often help you 239decipher more information from a crash log than you normally would be able to. 240Note that this is really only useful for the first frame, and only if your 241crash logs have register information for your threads. 242 243Using Python API to Symbolicate 244------------------------------- 245 246All of the commands above can be done through the python script bridge. The 247code below will recreate the target and add the three shared libraries that we 248added in the darwin crash log example above: 249 250.. code-block:: python 251 252 triple = "x86_64-apple-macosx" 253 platform_name = None 254 add_dependents = False 255 target = lldb.debugger.CreateTarget("/tmp/a.out", triple, platform_name, add_dependents, lldb.SBError()) 256 if target: 257 # Get the executable module 258 module = target.GetModuleAtIndex(0) 259 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x100000000) 260 module = target.AddModule ("/usr/lib/system/libsystem_c.dylib", triple, None, "/build/server/a/libsystem_c.dylib.dSYM") 261 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff83f32000) 262 module = target.AddModule ("/usr/lib/system/libsystem_dnssd.dylib", triple, None, "/build/server/b/libsystem_dnssd.dylib.dSYM") 263 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff883db000) 264 module = target.AddModule ("/usr/lib/system/libsystem_kernel.dylib", triple, None, "/build/server/c/libsystem_kernel.dylib.dSYM") 265 target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff8c0dc000) 266 267 load_addr = 0x00007fff8a1e6d46 268 # so_addr is a section offset address, or a lldb.SBAddress object 269 so_addr = target.ResolveLoadAddress (load_addr) 270 # Get a symbol context for the section offset address which includes 271 # a module, compile unit, function, block, line entry, and symbol 272 sym_ctx = so_addr.GetSymbolContext (lldb.eSymbolContextEverything) 273 print sym_ctx 274 275 276Use Builtin Python Module to Symbolicate 277---------------------------------------- 278 279LLDB includes a module in the lldb package named lldb.utils.symbolication. This module contains a lot of symbolication functions that simplify the symbolication process by allowing you to create objects that represent symbolication class objects such as: 280 281- lldb.utils.symbolication.Address 282- lldb.utils.symbolication.Section 283- lldb.utils.symbolication.Image 284- lldb.utils.symbolication.Symbolicator 285 286 287**lldb.utils.symbolication.Address** 288 289This class represents an address that will be symbolicated. It will cache any 290information that has been looked up: module, compile unit, function, block, 291line entry, symbol. It does this by having a lldb.SBSymbolContext as a member 292variable. 293 294**lldb.utils.symbolication.Section** 295 296This class represents a section that might get loaded in a 297lldb.utils.symbolication.Image. It has helper functions that allow you to set 298it from text that might have been extracted from a crash log file. 299 300**lldb.utils.symbolication.Image** 301 302This class represents a module that might get loaded into the target we use for 303symbolication. This class contains the executable path, optional symbol file 304path, the triple, and the list of sections that will need to be loaded if we 305choose the ask the target to load this image. Many of these objects will never 306be loaded into the target unless they are needed by symbolication. You often 307have a crash log that has 100 to 200 different shared libraries loaded, but 308your crash log stack backtraces only use a few of these shared libraries. Only 309the images that contain stack backtrace addresses need to be loaded in the 310target in order to symbolicate. 311 312Subclasses of this class will want to override the 313locate_module_and_debug_symbols method: 314 315.. code-block:: text 316 317 class CustomImage(lldb.utils.symbolication.Image): 318 def locate_module_and_debug_symbols (self): 319 # Locate the module and symbol given the info found in the crash log 320 321Overriding this function allows clients to find the correct executable module 322and symbol files as they might reside on a build server. 323 324**lldb.utils.symbolication.Symbolicator** 325 326This class coordinates the symbolication process by loading only the 327lldb.utils.symbolication.Image instances that need to be loaded in order to 328symbolicate an supplied address. 329 330**lldb.macosx.crashlog** 331 332lldb.macosx.crashlog is a package that is distributed on macOS builds that 333subclasses the above classes. This module parses the information in the Darwin 334crash logs and creates symbolication objects that represent the images, the 335sections and the thread frames for the backtraces. It then uses the functions 336in the lldb.utils.symbolication to symbolicate the crash logs. 337 338This module installs a new ``crashlog`` command into the lldb command 339interpreter so that you can use it to parse and symbolicate macOS crash 340logs: 341 342.. code-block:: text 343 344 (lldb) command script import lldb.macosx.crashlog 345 "crashlog" and "save_crashlog" command installed, use the "--help" option for detailed help 346 (lldb) crashlog /tmp/crash.log 347 ... 348 349The command that is installed has built in help that shows the options that can 350be used when symbolicating: 351 352.. code-block:: text 353 354 (lldb) crashlog --help 355 Usage: crashlog [options] [FILE ...] 356 357Symbolicate one or more darwin crash log files to provide source file and line 358information, inlined stack frames back to the concrete functions, and 359disassemble the location of the crash for the first frame of the crashed 360thread. If this script is imported into the LLDB command interpreter, a 361``crashlog`` command will be added to the interpreter for use at the LLDB 362command line. After a crash log has been parsed and symbolicated, a target will 363have been created that has all of the shared libraries loaded at the load 364addresses found in the crash log file. This allows you to explore the program 365as if it were stopped at the locations described in the crash log and functions 366can be disassembled and lookups can be performed using the addresses found in 367the crash log. 368 369.. code-block:: text 370 371 Options: 372 -h, --help show this help message and exit 373 -v, --verbose display verbose debug info 374 -g, --debug display verbose debug logging 375 -a, --load-all load all executable images, not just the images found 376 in the crashed stack frames 377 --images show image list 378 --debug-delay=NSEC pause for NSEC seconds for debugger 379 -c, --crashed-only only symbolicate the crashed thread 380 -d DISASSEMBLE_DEPTH, --disasm-depth=DISASSEMBLE_DEPTH 381 set the depth in stack frames that should be 382 disassembled (default is 1) 383 -D, --disasm-all enabled disassembly of frames on all threads (not just 384 the crashed thread) 385 -B DISASSEMBLE_BEFORE, --disasm-before=DISASSEMBLE_BEFORE 386 the number of instructions to disassemble before the 387 frame PC 388 -A DISASSEMBLE_AFTER, --disasm-after=DISASSEMBLE_AFTER 389 the number of instructions to disassemble after the 390 frame PC 391 -C NLINES, --source-context=NLINES 392 show NLINES source lines of source context (default = 393 4) 394 --source-frames=NFRAMES 395 show source for NFRAMES (default = 4) 396 --source-all show source for all threads, not just the crashed 397 thread 398 -i, --interactive parse all crash logs and enter interactive mode 399 400 401The source for the "symbolication" and "crashlog" modules are available in git. 402 403