xref: /llvm-project/lldb/docs/use/symbolication.rst (revision 6a2552a1419d674033c8d2f8bfeeb981a70a2e67)
1Symbolication
2=============
3
4LLDB is separated into a shared library that contains the core of the debugger,
5and a driver that implements debugging and a command interpreter. LLDB can be
6used to symbolicate your crash logs and can often provide more information than
7other symbolication programs:
8
9- Inlined functions
10- Variables that are in scope for an address, along with their locations
11
12The simplest form of symbolication is to load an executable:
13
14.. code-block:: text
15
16   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
17
18We use the ``--no-dependents`` flag with the ``target create`` command so that
19we don't load all of the dependent shared libraries from the current system.
20When we symbolicate, we are often symbolicating a binary that was running on
21another system, and even though the main executable might reference shared
22libraries in ``/usr/lib``, we often don't want to load the versions on the
23current computer.
24
25Using the ``image list`` command will show us a list of all shared libraries
26associated with the current target. As expected, we currently only have a
27single binary:
28
29.. code-block:: text
30
31   (lldb) image list
32   [  0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out
33         /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out
34
35Now we can look up an address:
36
37.. code-block:: text
38
39   (lldb) image lookup --address 0x100000aa3
40         Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131)
41         Summary: a.out`main + 67 at main.c:13
42
43Since we haven't specified a slide or any load addresses for individual
44sections in the binary, the address that we use here is a file address. A file
45address refers to a virtual address as defined by each object file.
46
47If we didn't use the ``--no-dependents`` option with ``target create``, we
48would have loaded all dependent shared libraries:
49
50.. code-block:: text
51
52   (lldb) image list
53   [  0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out
54         /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out
55   [  1] 8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B 0x0000000000000000 /usr/lib/system/libsystem_c.dylib
56   [  2] 62AA0B84-188A-348B-8F9E-3E2DB08DB93C 0x0000000000000000 /usr/lib/system/libsystem_dnssd.dylib
57   [  3] C0535565-35D1-31A7-A744-63D9F10F12A4 0x0000000000000000 /usr/lib/system/libsystem_kernel.dylib
58   ...
59
60Now if we do a lookup using a file address, this can result in multiple matches
61since most shared libraries have a virtual address space that starts at zero:
62
63.. code-block:: text
64
65   (lldb) image lookup -a 0x1000
66         Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096)
67
68         Address: libsystem_c.dylib[0x0000000000001000] (libsystem_c.dylib.__TEXT.__text + 928)
69         Summary: libsystem_c.dylib`mcount + 9
70
71         Address: libsystem_dnssd.dylib[0x0000000000001000] (libsystem_dnssd.dylib.__TEXT.__text + 456)
72         Summary: libsystem_dnssd.dylib`ConvertHeaderBytes + 38
73
74         Address: libsystem_kernel.dylib[0x0000000000001000] (libsystem_kernel.dylib.__TEXT.__text + 1116)
75         Summary: libsystem_kernel.dylib`clock_get_time + 102
76   ...
77
78To avoid getting multiple file address matches, you can specify the name of the
79shared library to limit the search:
80
81.. code-block:: text
82
83   (lldb) image lookup -a 0x1000 a.out
84         Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096)
85
86Defining Load Addresses for Sections
87------------------------------------
88
89When symbolicating your crash logs, it can be tedious if you always have to
90adjust your crashlog-addresses into file addresses. To avoid having to do any
91conversion, you can set the load address for the sections of the modules in
92your target. Once you set any section load address, lookups will switch to
93using load addresses. You can slide all sections in the executable by the same
94amount, or set the load address for individual sections. The ``target modules
95load --slide`` command allows us to set the load address for all sections.
96
97Below is an example of sliding all sections in a.out by adding 0x123000 to each
98section's file address:
99
100.. code-block:: text
101
102   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
103   (lldb) target modules load --file a.out --slide 0x123000
104
105
106It is often much easier to specify the actual load location of each section by
107name. Crash logs on macOS have a Binary Images section that specifies that
108address of the __TEXT segment for each binary. Specifying a slide requires
109requires that you first find the original (file) address for the __TEXT
110segment, and subtract the two values. If you specify the address of the __TEXT
111segment with ``target modules load section address``, you don't need to do any
112calculations. To specify the load addresses of sections we can specify one or
113more section name + address pairs in the ``target modules load`` command:
114
115.. code-block:: text
116
117   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
118   (lldb) target modules load --file a.out __TEXT 0x100123000
119
120We specified that the __TEXT section is loaded at 0x100123000. Now that we have
121defined where sections have been loaded in our target, any lookups we do will
122now use load addresses so we don't have to do any math on the addresses in the
123crashlog backtraces, we can just use the raw addresses:
124
125.. code-block:: text
126
127   (lldb) image lookup --address 0x100123aa3
128         Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131)
129         Summary: a.out`main + 67 at main.c:13
130
131Loading Multiple Executables
132----------------------------
133
134You often have more than one executable involved when you need to symbolicate a
135crash log. When this happens, you create a target for the main executable or
136one of the shared libraries, then add more modules to the target using the
137``target modules add`` command.
138
139Lets say we have a Darwin crash log that contains the following images:
140
141.. code-block:: text
142
143   Binary Images:
144      0x100000000 -    0x100000ff7 <A866975B-CA1E-3649-98D0-6C5FAA444ECF> /tmp/a.out
145   0x7fff83f32000 - 0x7fff83ffefe7 <8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B> /usr/lib/system/libsystem_c.dylib
146   0x7fff883db000 - 0x7fff883e3ff7 <62AA0B84-188A-348B-8F9E-3E2DB08DB93C> /usr/lib/system/libsystem_dnssd.dylib
147   0x7fff8c0dc000 - 0x7fff8c0f7ff7 <C0535565-35D1-31A7-A744-63D9F10F12A4> /usr/lib/system/libsystem_kernel.dylib
148
149First we create the target using the main executable and then add any extra
150shared libraries we want:
151
152.. code-block:: text
153
154   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out
155   (lldb) target modules add /usr/lib/system/libsystem_c.dylib
156   (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib
157   (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib
158
159
160If you have debug symbols in standalone files, such as dSYM files on macOS,
161you can specify their paths using the --symfile option for the ``target create``
162(recent LLDB releases only) and ``target modules add`` commands:
163
164.. code-block:: text
165
166   (lldb) target create --no-dependents --arch x86_64 /tmp/a.out --symfile /tmp/a.out.dSYM
167   (lldb) target modules add /usr/lib/system/libsystem_c.dylib --symfile /build/server/a/libsystem_c.dylib.dSYM
168   (lldb) target modules add /usr/lib/system/libsystem_dnssd.dylib --symfile /build/server/b/libsystem_dnssd.dylib.dSYM
169   (lldb) target modules add /usr/lib/system/libsystem_kernel.dylib --symfile /build/server/c/libsystem_kernel.dylib.dSYM
170
171Then we set the load addresses for each __TEXT section (note the colors of the
172load addresses above and below) using the first address from the Binary Images
173section for each image:
174
175.. code-block:: text
176
177   (lldb) target modules load --file a.out 0x100000000
178   (lldb) target modules load --file libsystem_c.dylib 0x7fff83f32000
179   (lldb) target modules load --file libsystem_dnssd.dylib 0x7fff883db000
180   (lldb) target modules load --file libsystem_kernel.dylib 0x7fff8c0dc000
181
182
183Now any stack backtraces that haven't been symbolicated can be symbolicated
184using ``image lookup`` with the raw backtrace addresses.
185
186Given the following raw backtrace:
187
188.. code-block:: text
189
190   Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
191   0   libsystem_kernel.dylib        	0x00007fff8a1e6d46 __kill + 10
192   1   libsystem_c.dylib             	0x00007fff84597df0 abort + 177
193   2   libsystem_c.dylib             	0x00007fff84598e2a __assert_rtn + 146
194   3   a.out                         	0x0000000100000f46 main + 70
195   4   libdyld.dylib                 	0x00007fff8c4197e1 start + 1
196
197We can now symbolicate the load addresses:
198
199.. code-block:: text
200
201   (lldb) image lookup -a 0x00007fff8a1e6d46
202   (lldb) image lookup -a 0x00007fff84597df0
203   (lldb) image lookup -a 0x00007fff84598e2a
204   (lldb) image lookup -a 0x0000000100000f46
205
206
207Getting Variable Information
208----------------------------
209
210If you add the --verbose flag to the ``image lookup --address`` command, you
211can get verbose information which can often include the locations of some of
212your local variables:
213
214.. code-block:: text
215
216   (lldb) image lookup --address 0x100123aa3 --verbose
217         Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 110)
218         Summary: a.out`main + 50 at main.c:13
219         Module: file = "/tmp/a.out", arch = "x86_64"
220   CompileUnit: id = {0x00000000}, file = "/tmp/main.c", language = "ISO C:1999"
221      Function: id = {0x0000004f}, name = "main", range = [0x0000000100000bc0-0x0000000100000dc9)
222      FuncType: id = {0x0000004f}, decl = main.c:9, compiler_type = "int (int, const char **, const char **, const char **)"
223        Blocks: id = {0x0000004f}, range = [0x100000bc0-0x100000dc9)
224                id = {0x000000ae}, range = [0x100000bf2-0x100000dc4)
225      LineEntry: [0x0000000100000bf2-0x0000000100000bfa): /tmp/main.c:13:23
226        Symbol: id = {0x00000004}, range = [0x0000000100000bc0-0x0000000100000dc9), name="main"
227      Variable: id = {0x000000bf}, name = "path", type= "char [1024]", location = DW_OP_fbreg(-1072), decl = main.c:28
228      Variable: id = {0x00000072}, name = "argc", type= "int", location = r13, decl = main.c:8
229      Variable: id = {0x00000081}, name = "argv", type= "const char **", location = r12, decl = main.c:8
230      Variable: id = {0x00000090}, name = "envp", type= "const char **", location = r15, decl = main.c:8
231      Variable: id = {0x0000009f}, name = "aapl", type= "const char **", location = rbx, decl = main.c:8
232
233
234The interesting part is the variables that are listed. The variables are the
235parameters and local variables that are in scope for the address that was
236specified. These variable entries have locations which are shown in bold above.
237Crash logs often have register information for the first frame in each stack,
238and being able to reconstruct one or more local variables can often help you
239decipher more information from a crash log than you normally would be able to.
240Note that this is really only useful for the first frame, and only if your
241crash logs have register information for your threads.
242
243Using Python API to Symbolicate
244-------------------------------
245
246All of the commands above can be done through the python script bridge. The
247code below will recreate the target and add the three shared libraries that we
248added in the darwin crash log example above:
249
250.. code-block:: python
251
252   triple = "x86_64-apple-macosx"
253   platform_name = None
254   add_dependents = False
255   target = lldb.debugger.CreateTarget("/tmp/a.out", triple, platform_name, add_dependents, lldb.SBError())
256   if target:
257         # Get the executable module
258         module = target.GetModuleAtIndex(0)
259         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x100000000)
260         module = target.AddModule ("/usr/lib/system/libsystem_c.dylib", triple, None, "/build/server/a/libsystem_c.dylib.dSYM")
261         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff83f32000)
262         module = target.AddModule ("/usr/lib/system/libsystem_dnssd.dylib", triple, None, "/build/server/b/libsystem_dnssd.dylib.dSYM")
263         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff883db000)
264         module = target.AddModule ("/usr/lib/system/libsystem_kernel.dylib", triple, None, "/build/server/c/libsystem_kernel.dylib.dSYM")
265         target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff8c0dc000)
266
267         load_addr = 0x00007fff8a1e6d46
268         # so_addr is a section offset address, or a lldb.SBAddress object
269         so_addr = target.ResolveLoadAddress (load_addr)
270         # Get a symbol context for the section offset address which includes
271         # a module, compile unit, function, block, line entry, and symbol
272         sym_ctx = so_addr.GetSymbolContext (lldb.eSymbolContextEverything)
273         print sym_ctx
274
275
276Use Builtin Python Module to Symbolicate
277----------------------------------------
278
279LLDB includes a module in the lldb package named lldb.utils.symbolication. This module contains a lot of symbolication functions that simplify the symbolication process by allowing you to create objects that represent symbolication class objects such as:
280
281- lldb.utils.symbolication.Address
282- lldb.utils.symbolication.Section
283- lldb.utils.symbolication.Image
284- lldb.utils.symbolication.Symbolicator
285
286
287**lldb.utils.symbolication.Address**
288
289This class represents an address that will be symbolicated. It will cache any
290information that has been looked up: module, compile unit, function, block,
291line entry, symbol. It does this by having a lldb.SBSymbolContext as a member
292variable.
293
294**lldb.utils.symbolication.Section**
295
296This class represents a section that might get loaded in a
297lldb.utils.symbolication.Image. It has helper functions that allow you to set
298it from text that might have been extracted from a crash log file.
299
300**lldb.utils.symbolication.Image**
301
302This class represents a module that might get loaded into the target we use for
303symbolication. This class contains the executable path, optional symbol file
304path, the triple, and the list of sections that will need to be loaded if we
305choose the ask the target to load this image. Many of these objects will never
306be loaded into the target unless they are needed by symbolication. You often
307have a crash log that has 100 to 200 different shared libraries loaded, but
308your crash log stack backtraces only use a few of these shared libraries. Only
309the images that contain stack backtrace addresses need to be loaded in the
310target in order to symbolicate.
311
312Subclasses of this class will want to override the
313locate_module_and_debug_symbols method:
314
315.. code-block:: text
316
317   class CustomImage(lldb.utils.symbolication.Image):
318      def locate_module_and_debug_symbols (self):
319         # Locate the module and symbol given the info found in the crash log
320
321Overriding this function allows clients to find the correct executable module
322and symbol files as they might reside on a build server.
323
324**lldb.utils.symbolication.Symbolicator**
325
326This class coordinates the symbolication process by loading only the
327lldb.utils.symbolication.Image instances that need to be loaded in order to
328symbolicate an supplied address.
329
330**lldb.macosx.crashlog**
331
332lldb.macosx.crashlog is a package that is distributed on macOS builds that
333subclasses the above classes. This module parses the information in the Darwin
334crash logs and creates symbolication objects that represent the images, the
335sections and the thread frames for the backtraces. It then uses the functions
336in the lldb.utils.symbolication to symbolicate the crash logs.
337
338This module installs a new ``crashlog`` command into the lldb command
339interpreter so that you can use it to parse and symbolicate macOS crash
340logs:
341
342.. code-block:: text
343
344   (lldb) command script import lldb.macosx.crashlog
345   "crashlog" and "save_crashlog" command installed, use the "--help" option for detailed help
346   (lldb) crashlog /tmp/crash.log
347   ...
348
349The command that is installed has built in help that shows the options that can
350be used when symbolicating:
351
352.. code-block:: text
353
354   (lldb) crashlog --help
355   Usage: crashlog [options]  [FILE ...]
356
357Symbolicate one or more darwin crash log files to provide source file and line
358information, inlined stack frames back to the concrete functions, and
359disassemble the location of the crash for the first frame of the crashed
360thread. If this script is imported into the LLDB command interpreter, a
361``crashlog`` command will be added to the interpreter for use at the LLDB
362command line. After a crash log has been parsed and symbolicated, a target will
363have been created that has all of the shared libraries loaded at the load
364addresses found in the crash log file. This allows you to explore the program
365as if it were stopped at the locations described in the crash log and functions
366can  be disassembled and lookups can be performed using the addresses found in
367the crash log.
368
369.. code-block:: text
370
371   Options:
372   -h, --help            show this help message and exit
373   -v, --verbose         display verbose debug info
374   -g, --debug           display verbose debug logging
375   -a, --load-all        load all executable images, not just the images found
376                           in the crashed stack frames
377   --images              show image list
378   --debug-delay=NSEC    pause for NSEC seconds for debugger
379   -c, --crashed-only    only symbolicate the crashed thread
380   -d DISASSEMBLE_DEPTH, --disasm-depth=DISASSEMBLE_DEPTH
381                           set the depth in stack frames that should be
382                           disassembled (default is 1)
383   -D, --disasm-all      enabled disassembly of frames on all threads (not just
384                           the crashed thread)
385   -B DISASSEMBLE_BEFORE, --disasm-before=DISASSEMBLE_BEFORE
386                           the number of instructions to disassemble before the
387                           frame PC
388   -A DISASSEMBLE_AFTER, --disasm-after=DISASSEMBLE_AFTER
389                           the number of instructions to disassemble after the
390                           frame PC
391   -C NLINES, --source-context=NLINES
392                           show NLINES source lines of source context (default =
393                           4)
394   --source-frames=NFRAMES
395                           show source for NFRAMES (default = 4)
396   --source-all          show source for all threads, not just the crashed
397                           thread
398   -i, --interactive     parse all crash logs and enter interactive mode
399
400
401The source for the "symbolication" and "crashlog" modules are available in git.
402
403