xref: /llvm-project/lldb/docs/resources/debugging.rst (revision 70306238cf3730fd7ef02170a1fdfa302676ac2b)
1Debugging
2=========
3
4This page details various ways to debug LLDB itself and other LLDB tools. If
5you want to know how to use LLDB in general, please refer to
6:doc:`/use/tutorial`.
7
8As LLDB is generally split into 2 tools, ``lldb`` and ``lldb-server``
9(``debugserver`` on Mac OS), the techniques shown here will not always apply to
10both. With some knowledge of them all, you can mix and match as needed.
11
12In this document we refer to the initial ``lldb`` as the "debugger" and the
13program being debugged as the "inferior".
14
15Building For Debugging
16----------------------
17
18To build LLDB with debugging information add the following to your CMake
19configuration:
20
21::
22
23  -DCMAKE_BUILD_TYPE=Debug \
24  -DLLDB_EXPORT_ALL_SYMBOLS=ON
25
26Note that the ``lldb`` you will use to do the debugging does not itself need to
27have debug information.
28
29Then build as you normally would according to :doc:`/resources/build`.
30
31If you are going to debug in a way that doesn't need debug info (printf, strace,
32etc.) we recommend adding ``LLVM_ENABLE_ASSERTIONS=ON`` to Release build
33configurations. This will make LLDB fail earlier instead of continuing with
34invalid state (assertions are enabled by default for Debug builds).
35
36Debugging ``lldb``
37------------------
38
39The simplest scenario is where we want to debug a local execution of ``lldb``
40like this one:
41
42::
43
44  ./bin/lldb test_program
45
46LLDB is like any other program, so you can use the same approach.
47
48::
49
50  ./bin/lldb -- ./bin/lldb /tmp/test.o
51
52That's it. At least, that's the minimum. There's nothing special about LLDB
53being a debugger that means you can't attach another debugger to it like any
54other program.
55
56What can be an issue is that both debuggers have command line interfaces which
57makes it very confusing which one is which:
58
59::
60
61  (the debugger)
62  (lldb) run
63  Process 1741640 launched: '<...>/bin/lldb' (aarch64)
64  Process 1741640 stopped and restarted: thread 1 received signal: SIGCHLD
65
66  (the inferior)
67  (lldb) target create "/tmp/test.o"
68  Current executable set to '/tmp/test.o' (aarch64).
69
70Another issue is that when you resume the inferior, it will not print the
71``(lldb)`` prompt because as far as it knows it hasn't changed state. A quick
72way around that is to type something that is clearly not a command and hit
73enter.
74
75::
76
77  (lldb) Process 1742266 stopped and restarted: thread 1 received signal: SIGCHLD
78  Process 1742266 stopped
79  * thread #1, name = 'lldb', stop reason = signal SIGSTOP
80      frame #0: 0x0000ffffed5bfbf0 libc.so.6`__GI___libc_read at read.c:26:10
81  (lldb) c
82  Process 1742266 resuming
83  notacommand
84  error: 'notacommand' is not a valid command.
85  (lldb)
86
87You could just remember whether you are in the debugger or the inferior but
88it's more for you to remember, and for interrupt based events you simply may not
89be able to know.
90
91Here are some better approaches. First, you could use another debugger like GDB
92to debug LLDB. Perhaps an IDE like Xcode or Visual Studio Code. Something which
93runs LLDB under the hood so you don't have to type in commands to the debugger
94yourself.
95
96Or you could change the prompt text for the debugger and/or inferior.
97
98::
99
100  $ ./bin/lldb -o "settings set prompt \"(lldb debugger) \"" -- \
101    ./bin/lldb -o "settings set prompt \"(lldb inferior) \"" /tmp/test.o
102  <...>
103  (lldb) settings set prompt "(lldb debugger) "
104  (lldb debugger) run
105  <...>
106  (lldb) settings set prompt "(lldb inferior) "
107  (lldb inferior)
108
109If you want spacial separation you can run the inferior in one terminal then
110attach to it in another. Remember that while paused in the debugger, the inferior
111will not respond to input so you will have to ``continue`` in the debugger
112first.
113
114::
115
116  (in terminal A)
117  $ ./bin/lldb /tmp/test.o
118
119  (in terminal B)
120  $ ./bin/lldb ./bin/lldb --attach-pid $(pidof lldb)
121
122Placing Breakpoints
123*******************
124
125Generally you will want to hit some breakpoint in the inferior ``lldb``. To place
126that breakpoint you must first stop the inferior.
127
128If you're debugging from another window this is done with ``process interrupt``.
129The inferior will stop, you place the breakpoint and then ``continue``. Go back
130to the inferior and input the command that should trigger the breakpoint.
131
132If you are running debugger and inferior in the same window, input ``ctrl+c``
133instead of ``process interrupt`` and then folllow the rest of the steps.
134
135If you are doing this with ``lldb-server`` and find your breakpoint is never
136hit, check that you are breaking in code that is actually run by
137``lldb-server``. There are cases where code only used by ``lldb`` ends up
138linked into ``lldb-server``, so the debugger can break there but the breakpoint
139will never be hit.
140
141Debugging ``lldb-server``
142-------------------------
143
144Note: If you are on MacOS you are likely using ``debugserver`` instead of
145``lldb-server``. The spirit of these instructions applies but the specifics will
146be different.
147
148We suggest you read :doc:`/use/remote` before attempting to debug ``lldb-server``
149as working out exactly what you want to debug requires that you understand its
150various modes and behaviour. While you may not be literally debugging on a
151remote target, think of your host machine as the "remote" in this scenario.
152
153The ``lldb-server`` options for your situation will depend on what part of it
154or mode you are interested in. To work out what those are, recreate the scenario
155first without any extra debugging layers. Let's say we want to debug
156``lldb-server`` during the following command:
157
158::
159
160  $ ./bin/lldb /tmp/test.o
161
162We can treat ``lldb-server`` as we treated ``lldb`` before, running it under
163``lldb``. The equivalent to having ``lldb`` launch the ``lldb-server`` for us is
164to start ``lldb-server`` in the ``gdbserver`` mode.
165
166The following commands recreate that, while debugging ``lldb-server``:
167
168::
169
170  $ ./bin/lldb -- ./bin/lldb-server gdbserver :1234 /tmp/test.o
171  (lldb) target create "./bin/lldb-server"
172  Current executable set to '<...>/bin/lldb-server' (aarch64).
173  <...>
174  Process 1742485 launched: '<...>/bin/lldb-server' (aarch64)
175  Launched '/tmp/test.o' as process 1742586...
176
177  (in another terminal)
178  $ ./bin/lldb /tmp/test.o -o "gdb-remote 1234"
179
180Note that the first ``lldb`` is the one debugging ``lldb-server``. The second
181``lldb`` is debugging ``/tmp/test.o`` and is only used to trigger the
182interesting code path in ``lldb-server``.
183
184This is another case where you may want to layout your terminals in a
185predictable way, or change the prompt of one or both copies of ``lldb``.
186
187If you are debugging a scenario where the ``lldb-server`` starts in ``platform``
188mode, but you want to debug the ``gdbserver`` mode you'll have to work out what
189subprocess it's starting for the ``gdbserver`` part. One way is to look at the
190list of runninng processes and take the command line from there.
191
192In theory it should be possible to use LLDB's
193``target.process.follow-fork-mode`` or GDB's ``follow-fork-mode`` to
194automatically debug the ``gdbserver`` process as it's created. However this
195author has not been able to get either to work in this scenario so we suggest
196making a more specific command wherever possible instead.
197
198Another option is to let ``lldb-server`` start up, then attach to the process
199that's interesting to you. It's less automated and won't work if the bug occurs
200during startup. However it is a good way to know you've found the right one,
201then you can take its command line and run that directly.
202
203Output From ``lldb-server``
204***************************
205
206As ``lldb-server`` often launches subprocesses, output messages may be hidden
207if they are emitted from the child processes.
208
209You can tell it to enable logging using the ``--log-channels`` option. For
210example ``--log-channels "posix ptrace"``. However that is not passed on to the
211child processes.
212
213The same goes for ``printf``. If it's called in a child process you won't see
214the output.
215
216In these cases consider interactive debugging ``lldb-server`` or
217working out a more specific command such that it does not have to spawn a
218subprocess. For example if you start with ``platform`` mode, work out what
219``gdbserver`` mode process it spawns and run that command instead.
220
221Another option if you have ``strace`` available is to trace the whole process
222tree and inspect the logs after the session has ended. ::
223
224  $ strace -ff -o log -p $(pidof lldb-server)
225
226This will log all syscalls made by ``lldb-server`` and processes that it forks.
227``-ff`` tells ``strace`` to trace child processes and write the results to a
228separate file for each process, named using the prefix given by ``-o``.
229
230Search the log files for specific terms to find the process you're interested
231in. For example, to find a process that acted as a ``gdbserver`` instance::
232
233  $ grep "gdbserver" log.*
234  log.<N>:execve("<...>/lldb-server", [<...> "gdbserver", <...>) = 0
235
236Remote Debugging
237----------------
238
239If you want to debug part of LLDB running on a remote machine, the principals
240are the same but we will have to start debug servers, then attach debuggers to
241those servers.
242
243In the example below we're debugging an ``lldb-server`` ``gdbserver`` mode
244command running on a remote machine.
245
246For simplicity we'll use the same ``lldb-server`` as the debug server
247and the inferior, but it doesn't need to be that way. You can use ``gdbserver``
248(as in, GDB's debug server program) or a system installed ``lldb-server`` if you
249suspect your local copy is not stable. As is the case in many of these
250scenarios.
251
252::
253
254  $ <...>/bin/lldb-server gdbserver 0.0.0.0:54322 -- \
255    <...>/bin/lldb-server gdbserver 0.0.0.0:54321 -- /tmp/test.o
256
257Now we have a debug server listening on port 54322 of our remote (``0.0.0.0``
258means it's listening for external connections). This is where we will connect
259``lldb`` to, to debug the second ``lldb-server``.
260
261To trigger behaviour in the second ``lldb-server``, we will connect a second
262``lldb`` to port 54321 of the remote.
263
264This is the final configuration:
265
266::
267
268  Host                                        | Remote
269  --------------------------------------------|--------------------
270  lldb A debugs lldb-server on port 54322 ->  | lldb-server A
271                                              |  (which runs)
272  lldb B debugs /tmp/test.o on port 54321 ->  |    lldb-server B
273                                              |      (which runs)
274                                              |        /tmp/test.o
275
276You would use ``lldb A`` to place a breakpoint in the code you're interested in,
277then ``lldb B`` to trigger ``lldb-server B`` to go into that code and hit the
278breakpoint. ``lldb-server A`` is only here to let us debug ``lldb-server B``
279remotely.
280
281Debugging The Remote Protocol
282-----------------------------
283
284LLDB mostly follows the `GDB Remote Protocol <https://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html>`_
285. Where there are differences it tries to handle both LLDB and GDB behaviour.
286
287LLDB does have extensions to the protocol which are documented in
288`lldb-gdb-remote.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-gdb-remote.txt>`_
289and `lldb/docs/lldb-platform-packets.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-platform-packets.txt>`_.
290
291Logging Packets
292***************
293
294If you just want to observe packets, you can enable the ``gdb-remote packets``
295log channel.
296
297::
298
299  (lldb) log enable gdb-remote packets
300  (lldb) run
301  lldb             <   1> send packet: +
302  lldb             history[1] tid=0x264bfd <   1> send packet: +
303  lldb             <  19> send packet: $QStartNoAckMode#b0
304  lldb             <   1> read packet: +
305
306You can do this on the ``lldb-server`` end as well by passing the option
307``--log-channels "gdb-remote packets"``. Then you'll see both sides of the
308connection.
309
310Some packets may be printed in a nicer way than others. For example XML packets
311will print the literal XML, some binary packets may be decoded. Others will just
312be printed unmodified. So do check what format you expect, a common one is hex
313encoded bytes.
314
315You can enable this logging even when you are connecting to an ``lldb-server``
316in platform mode, this protocol is used for that too.
317
318Debugging Packet Exchanges
319**************************
320
321Say you want to make ``lldb`` send a packet to ``lldb-server``, then debug
322how the latter builds its response. Maybe even see how ``lldb`` handles it once
323it's sent back.
324
325That all takes time, so LLDB will likely time out and think the remote has gone
326away. You can change the ``plugin.process.gdb-remote.packet-timeout`` setting
327to prevent this.
328
329Here's an example, first we'll start an ``lldb-server`` being debugged by
330``lldb``. Placing a breakpoint on a packet handler we know will be hit once
331another ``lldb`` connects.
332
333::
334
335  $ lldb -- lldb-server gdbserver :1234 -- /tmp/test.o
336  <...>
337  (lldb) b GDBRemoteCommunicationServerCommon::Handle_qSupported
338  Breakpoint 1: where = <...>
339  (lldb) run
340  <...>
341
342Next we connect another ``lldb`` to this, with a timeout of 5 minutes:
343
344::
345
346  $ lldb /tmp/test.o
347  <...>
348  (lldb) settings set plugin.process.gdb-remote.packet-timeout 300
349  (lldb) gdb-remote 1234
350
351Doing so triggers the breakpoint in ``lldb-server``, bringing us back into
352``lldb``. Now we've got 5 minutes to do whatever we need before LLDB decides
353the connection has failed.
354
355::
356
357  * thread #1, name = 'lldb-server', stop reason = breakpoint 1.1
358      frame #0: 0x0000aaaaaacc6848 lldb-server<...>
359  lldb-server`lldb_private::process_gdb_remote::GDBRemoteCommunicationServerCommon::Handle_qSupported:
360  ->  0xaaaaaacc6848 <+0>:  sub    sp, sp, #0xc0
361  <...>
362  (lldb)
363
364Once you're done simply ``continue`` the ``lldb-server``. Back in the other
365``lldb``, the connection process will continue as normal.
366
367::
368
369  Process 2510266 stopped
370  * thread #1, name = 'test.o', stop reason = signal SIGSTOP
371      frame #0: 0x0000fffff7fcd100 ld-2.31.so`_start
372  ld-2.31.so`_start:
373  ->  0xfffff7fcd100 <+0>: mov    x0, sp
374  <...>
375  (lldb)
376
377Reducing Bugs
378-------------
379
380This section covers reducing a bug that happens in LLDB itself, or where you
381suspect that LLDB causes something else to behave abnormally.
382
383Since bugs vary wildly, the advice here is general and incomplete. Let your
384instincts guide you and don't feel the need to try everything before reporting
385an issue or asking for help. This is simply inspiration.
386
387Reduction
388*********
389
390The first step is to reduce uneeded compexity where it is cheap to do so. If
391something is easily removed or frozen to a cerain value, do so. The goal is to
392keep the failure mode the same, with fewer dependencies.
393
394This includes, but is not limited to:
395
396* Removing test cases that don't crash.
397* Replacing dynamic lookups with constant values.
398* Replace supporting functions with stubs that do nothing.
399* Moving the test case to less unqiue system. If your machine has an exotic
400  extension, try it on a readily available commodity machine.
401* Removing irrelevant parts of the test program.
402* Reproducing the issue without using the LLDB test runner.
403* Converting a remote debuging scenario into a local one.
404
405Now we hopefully have a smaller reproducer than we started with. Next we need to
406find out what components of the software stack might be failing.
407
408Some examples are listed below with suggestions for how to investigate them.
409
410* Debugger
411
412  * Use a `released version of LLDB <https://github.com/llvm/llvm-project/releases>`_.
413
414  * If on MacOS, try the system ``lldb``.
415
416  * Try GDB or any other system debugger you might have e.g. Microsoft Visual
417    Studio.
418
419* Kernel
420
421  * Start a virtual machine running a different version. ``qemu-system`` is
422    useful here.
423
424  * Try a different physical system running a different version.
425
426  * Remember that for most kernels, userspace crashing the kernel is always a
427    kernel bug. Even if the userspace program is doing something unconventional.
428    So it could be a bug in the application and the kernel.
429
430* Compiler and compiler options
431
432  * Try other versions of the same compiler or your system compiler.
433
434  * Emit older versions of DWARF info, particularly DWARFv4 to v5, some tools
435    did/do not understand the new constructs.
436
437  * Reduce optimisation options as much as possible.
438
439  * Try all the language modes e.g. C++17/20 for C++.
440
441  * Link against LLVM's libcxx if you suspect a bug involving the system C++
442    library.
443
444  * For languages other than C/C++ e.g. Rust, try making an equivalent program
445    in C/C++. LLDB tends to try to fit other languages into a C/C++ mould, so
446    porting the program can make triage and reporting much easier.
447
448* Operating system
449
450  * Use docker to try various versions of Linux.
451
452  * Use ``qemu-system`` to emulate other operating systems e.g. FreeBSD.
453
454* Architecture
455
456  * Use `QEMU user space emulation <https://www.qemu.org/docs/master/user/main.html>`_
457    to quickly test other architectures. Note that ``lldb-server`` cannot be used
458    with this as the ptrace APIs are not emulated.
459
460  * If you need to test a big endian system use QEMU to emulate s390x (user
461    space emulation for just ``lldb``, ``qemu-system`` for testing
462    ``lldb-server``).
463
464.. note:: When using QEMU you may need to use the built in GDB stub, instead of
465          ``lldb-server``. For example if you wanted to debug ``lldb`` running
466          inside ``qemu-user-s390x`` you would connect to the GDB stub provided
467          by QEMU.
468
469          The same applies if you want to see how ``lldb`` would debug a test
470          program that is running on s390x. It's not totally accurate because
471          you're not using ``lldb-server``, but this is fine for features that
472          are mostly implemented in ``lldb``.
473
474          If you are running a full system using ``qemu-system``, you likely
475          want to connect to the ``lldb-server`` running within the userspace
476          of that system.
477
478          If your test program is bare metal (meaning it requires no supporting
479          operating system) then connect to the built in GDB stub. This can be
480          useful when testing embedded systems or kernel debugging.
481
482Reducing Ptrace Related Bugs
483****************************
484
485This section is written Linux specific but the same can likely be done on
486other Unix or Unix like operating systems.
487
488Sometimes you will find ``lldb-server`` doing something with ptrace that causes
489a problem. Your reproducer involves running ``lldb`` as well, this is not going
490to go over well with kernel and is generally more difficult to explain if you
491want to get help with it.
492
493If you think you can get your point across without this, no need. If you're
494pretty sure you have for example found a Linux Kernel bug, doing this greatly
495increases the chances it'll get fixed.
496
497We'll remove the LLDB dependency by making a smaller standalone program that
498does the same actions. Starting with a skeleton program that forks and debugs
499the inferior process.
500
501The program presented `here <https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1>`_
502(`source <https://github.com/eliben/code-for-blog/blob/master/2011/simple_tracer.c>`_)
503is a great starting point. There is also an AArch64 specific example in
504`the LLDB examples folder <https://github.com/llvm/llvm-project/tree/main/lldb/examples/ptrace_example.c>`_.
505
506For either, you'll need to modify that to fit your architecture. A tip for this
507is to take any constants used in it, find in which function(s) they are used in
508LLDB and then you'll find the equivalent constants in the same LLDB functions
509for your architecture.
510
511Once that is running as expected we can convert ``lldb-server``'s into calls in
512this program. To get a log of those, run ``lldb-server`` with
513``--log-channels "posix ptrace"``. You'll see output like:
514
515::
516
517  $ lldb-server gdbserver :1234 --log-channels "posix ptrace" -- /tmp/test.o
518  1694099878.829990864 <...> ptrace(16896, 2659963, 0x0000000000000000, 0x000000000000007E, 0)=0x0
519  1694099878.830722332 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF7CC, 0x0000FFFFD14BF7D0, 16)=0x0
520  1694099878.831967115 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF66C, 0x0000FFFFD14BF630, 16)=0xffffffffffffffff
521  1694099878.831982136 <...> ptrace() failed: Invalid argument
522  Launched '/tmp/test.o' as process 2659963...
523
524Each call is logged with its parameters and its result as the ``=`` on the end.
525
526From here you will need to use a combination of the `ptrace documentation <https://man7.org/linux/man-pages/man2/ptrace.2.html>`_
527and Linux Kernel headers (``uapi/linux/ptrace.h`` mainly) to figure out what
528the calls are.
529
530The most important parameter is the first, which is the request number. In the
531example above ``16896``, which is hex ``0x4200``, is ``PTRACE_SETOPTIONS``.
532
533Luckily, you don't usually have to figure out all those early calls. Our
534skeleton program will be doing all that, successfully we hope.
535
536What you should do is record just the interesting bit to you. Let's say
537something odd is happening when you read the ``tpidr`` register (this is an
538AArch64 register, just for example purposes).
539
540First, go to the ``lldb-server`` terminal and press enter a few times to put
541some blank lines after the last logging output.
542
543Then go to your ``lldb`` and:
544
545::
546
547  (lldb) register read tpidr
548  tpidr = 0x0000fffff7fef320
549
550You'll see this from ``lldb-server``:
551
552::
553
554  <...> ptrace(16900, 2659963, 0x0000FFFFD14BF6CC, 0x0000FFFFD14BF710, 8)=0x0
555
556If you don't see that, it may be because ``lldb`` has cached it. The easiest way
557to clear that cache is to step. Remember that some registers are read every
558step, so you'll have to adjust depending on the situation.
559
560Assuming you've got that line, you would look up what ``116900`` is. This is
561``0x4204`` in hex, which is ``PTRACE_GETREGSET``. As we expected.
562
563The following parameters are not as we might expect because what we log is a bit
564different from the literal ptrace call. See your platform's definition of
565``PtraceWrapper`` for the exact form.
566
567The point of all this is that by doing a single action you can get a few
568isolated ptrace calls and you can then fill in the blanks and write
569equivalent calls in the skeleton program.
570
571The final piece of this is likely breakpoints. Assuming your bug does not
572require a hardware breakpoint, you can get software breakpoints by inserting
573a break instruction into the inferior's code at compile time. Usually by using
574an architecture specific assembly statement, as you will need to know exactly
575how many instructions to overwrite later.
576
577Doing it this way instead of exactly copying what LLDB does will save a few
578ptrace calls. The AArch64 example program shows how to do this.
579
580* The inferior contains ``BRK #0`` then ``NOP``.
581* 2 4 byte instructins means 8 bytes of data to replace, which matches the
582  minimum size you can write with ``PTRACE_POKETEXT``.
583* The inferior runs to the ``BRK``, which brings us into the debugger.
584* The debugger reads ``PC`` and writes ``NOP`` then ``NOP`` to the location
585  pointed to by ``PC``.
586* The debugger then single steps the inferior to the next instruction
587  (this is not required in this specific scenario, you could just continue but
588  it is included because this more cloesly matches what ``lldb`` does).
589* The debugger then continues the inferior.
590* The inferior exits, and the whole program exits.
591
592Using this technique you can emulate the usual "run to main, do a thing" type
593reproduction steps.
594
595Finally, that "thing" is the ptrace calls you got from the ``lldb-server`` logs.
596Add those to the debugger function and you now have a reproducer that doesn't
597need any part of LLDB.
598
599Debugging Tests
600---------------
601
602See :doc:`/resources/test`.