1Debugging 2========= 3 4This page details various ways to debug LLDB itself and other LLDB tools. If 5you want to know how to use LLDB in general, please refer to 6:doc:`/use/tutorial`. 7 8As LLDB is generally split into 2 tools, ``lldb`` and ``lldb-server`` 9(``debugserver`` on Mac OS), the techniques shown here will not always apply to 10both. With some knowledge of them all, you can mix and match as needed. 11 12In this document we refer to the initial ``lldb`` as the "debugger" and the 13program being debugged as the "inferior". 14 15Building For Debugging 16---------------------- 17 18To build LLDB with debugging information add the following to your CMake 19configuration: 20 21:: 22 23 -DCMAKE_BUILD_TYPE=Debug \ 24 -DLLDB_EXPORT_ALL_SYMBOLS=ON 25 26Note that the ``lldb`` you will use to do the debugging does not itself need to 27have debug information. 28 29Then build as you normally would according to :doc:`/resources/build`. 30 31If you are going to debug in a way that doesn't need debug info (printf, strace, 32etc.) we recommend adding ``LLVM_ENABLE_ASSERTIONS=ON`` to Release build 33configurations. This will make LLDB fail earlier instead of continuing with 34invalid state (assertions are enabled by default for Debug builds). 35 36Debugging ``lldb`` 37------------------ 38 39The simplest scenario is where we want to debug a local execution of ``lldb`` 40like this one: 41 42:: 43 44 ./bin/lldb test_program 45 46LLDB is like any other program, so you can use the same approach. 47 48:: 49 50 ./bin/lldb -- ./bin/lldb /tmp/test.o 51 52That's it. At least, that's the minimum. There's nothing special about LLDB 53being a debugger that means you can't attach another debugger to it like any 54other program. 55 56What can be an issue is that both debuggers have command line interfaces which 57makes it very confusing which one is which: 58 59:: 60 61 (the debugger) 62 (lldb) run 63 Process 1741640 launched: '<...>/bin/lldb' (aarch64) 64 Process 1741640 stopped and restarted: thread 1 received signal: SIGCHLD 65 66 (the inferior) 67 (lldb) target create "/tmp/test.o" 68 Current executable set to '/tmp/test.o' (aarch64). 69 70Another issue is that when you resume the inferior, it will not print the 71``(lldb)`` prompt because as far as it knows it hasn't changed state. A quick 72way around that is to type something that is clearly not a command and hit 73enter. 74 75:: 76 77 (lldb) Process 1742266 stopped and restarted: thread 1 received signal: SIGCHLD 78 Process 1742266 stopped 79 * thread #1, name = 'lldb', stop reason = signal SIGSTOP 80 frame #0: 0x0000ffffed5bfbf0 libc.so.6`__GI___libc_read at read.c:26:10 81 (lldb) c 82 Process 1742266 resuming 83 notacommand 84 error: 'notacommand' is not a valid command. 85 (lldb) 86 87You could just remember whether you are in the debugger or the inferior but 88it's more for you to remember, and for interrupt based events you simply may not 89be able to know. 90 91Here are some better approaches. First, you could use another debugger like GDB 92to debug LLDB. Perhaps an IDE like Xcode or Visual Studio Code. Something which 93runs LLDB under the hood so you don't have to type in commands to the debugger 94yourself. 95 96Or you could change the prompt text for the debugger and/or inferior. 97 98:: 99 100 $ ./bin/lldb -o "settings set prompt \"(lldb debugger) \"" -- \ 101 ./bin/lldb -o "settings set prompt \"(lldb inferior) \"" /tmp/test.o 102 <...> 103 (lldb) settings set prompt "(lldb debugger) " 104 (lldb debugger) run 105 <...> 106 (lldb) settings set prompt "(lldb inferior) " 107 (lldb inferior) 108 109If you want spacial separation you can run the inferior in one terminal then 110attach to it in another. Remember that while paused in the debugger, the inferior 111will not respond to input so you will have to ``continue`` in the debugger 112first. 113 114:: 115 116 (in terminal A) 117 $ ./bin/lldb /tmp/test.o 118 119 (in terminal B) 120 $ ./bin/lldb ./bin/lldb --attach-pid $(pidof lldb) 121 122Placing Breakpoints 123******************* 124 125Generally you will want to hit some breakpoint in the inferior ``lldb``. To place 126that breakpoint you must first stop the inferior. 127 128If you're debugging from another window this is done with ``process interrupt``. 129The inferior will stop, you place the breakpoint and then ``continue``. Go back 130to the inferior and input the command that should trigger the breakpoint. 131 132If you are running debugger and inferior in the same window, input ``ctrl+c`` 133instead of ``process interrupt`` and then folllow the rest of the steps. 134 135If you are doing this with ``lldb-server`` and find your breakpoint is never 136hit, check that you are breaking in code that is actually run by 137``lldb-server``. There are cases where code only used by ``lldb`` ends up 138linked into ``lldb-server``, so the debugger can break there but the breakpoint 139will never be hit. 140 141Debugging ``lldb-server`` 142------------------------- 143 144Note: If you are on MacOS you are likely using ``debugserver`` instead of 145``lldb-server``. The spirit of these instructions applies but the specifics will 146be different. 147 148We suggest you read :doc:`/use/remote` before attempting to debug ``lldb-server`` 149as working out exactly what you want to debug requires that you understand its 150various modes and behaviour. While you may not be literally debugging on a 151remote target, think of your host machine as the "remote" in this scenario. 152 153The ``lldb-server`` options for your situation will depend on what part of it 154or mode you are interested in. To work out what those are, recreate the scenario 155first without any extra debugging layers. Let's say we want to debug 156``lldb-server`` during the following command: 157 158:: 159 160 $ ./bin/lldb /tmp/test.o 161 162We can treat ``lldb-server`` as we treated ``lldb`` before, running it under 163``lldb``. The equivalent to having ``lldb`` launch the ``lldb-server`` for us is 164to start ``lldb-server`` in the ``gdbserver`` mode. 165 166The following commands recreate that, while debugging ``lldb-server``: 167 168:: 169 170 $ ./bin/lldb -- ./bin/lldb-server gdbserver :1234 /tmp/test.o 171 (lldb) target create "./bin/lldb-server" 172 Current executable set to '<...>/bin/lldb-server' (aarch64). 173 <...> 174 Process 1742485 launched: '<...>/bin/lldb-server' (aarch64) 175 Launched '/tmp/test.o' as process 1742586... 176 177 (in another terminal) 178 $ ./bin/lldb /tmp/test.o -o "gdb-remote 1234" 179 180Note that the first ``lldb`` is the one debugging ``lldb-server``. The second 181``lldb`` is debugging ``/tmp/test.o`` and is only used to trigger the 182interesting code path in ``lldb-server``. 183 184This is another case where you may want to layout your terminals in a 185predictable way, or change the prompt of one or both copies of ``lldb``. 186 187If you are debugging a scenario where the ``lldb-server`` starts in ``platform`` 188mode, but you want to debug the ``gdbserver`` mode you'll have to work out what 189subprocess it's starting for the ``gdbserver`` part. One way is to look at the 190list of runninng processes and take the command line from there. 191 192In theory it should be possible to use LLDB's 193``target.process.follow-fork-mode`` or GDB's ``follow-fork-mode`` to 194automatically debug the ``gdbserver`` process as it's created. However this 195author has not been able to get either to work in this scenario so we suggest 196making a more specific command wherever possible instead. 197 198Another option is to let ``lldb-server`` start up, then attach to the process 199that's interesting to you. It's less automated and won't work if the bug occurs 200during startup. However it is a good way to know you've found the right one, 201then you can take its command line and run that directly. 202 203Output From ``lldb-server`` 204*************************** 205 206As ``lldb-server`` often launches subprocesses, output messages may be hidden 207if they are emitted from the child processes. 208 209You can tell it to enable logging using the ``--log-channels`` option. For 210example ``--log-channels "posix ptrace"``. However that is not passed on to the 211child processes. 212 213The same goes for ``printf``. If it's called in a child process you won't see 214the output. 215 216In these cases consider interactive debugging ``lldb-server`` or 217working out a more specific command such that it does not have to spawn a 218subprocess. For example if you start with ``platform`` mode, work out what 219``gdbserver`` mode process it spawns and run that command instead. 220 221Another option if you have ``strace`` available is to trace the whole process 222tree and inspect the logs after the session has ended. :: 223 224 $ strace -ff -o log -p $(pidof lldb-server) 225 226This will log all syscalls made by ``lldb-server`` and processes that it forks. 227``-ff`` tells ``strace`` to trace child processes and write the results to a 228separate file for each process, named using the prefix given by ``-o``. 229 230Search the log files for specific terms to find the process you're interested 231in. For example, to find a process that acted as a ``gdbserver`` instance:: 232 233 $ grep "gdbserver" log.* 234 log.<N>:execve("<...>/lldb-server", [<...> "gdbserver", <...>) = 0 235 236Remote Debugging 237---------------- 238 239If you want to debug part of LLDB running on a remote machine, the principals 240are the same but we will have to start debug servers, then attach debuggers to 241those servers. 242 243In the example below we're debugging an ``lldb-server`` ``gdbserver`` mode 244command running on a remote machine. 245 246For simplicity we'll use the same ``lldb-server`` as the debug server 247and the inferior, but it doesn't need to be that way. You can use ``gdbserver`` 248(as in, GDB's debug server program) or a system installed ``lldb-server`` if you 249suspect your local copy is not stable. As is the case in many of these 250scenarios. 251 252:: 253 254 $ <...>/bin/lldb-server gdbserver 0.0.0.0:54322 -- \ 255 <...>/bin/lldb-server gdbserver 0.0.0.0:54321 -- /tmp/test.o 256 257Now we have a debug server listening on port 54322 of our remote (``0.0.0.0`` 258means it's listening for external connections). This is where we will connect 259``lldb`` to, to debug the second ``lldb-server``. 260 261To trigger behaviour in the second ``lldb-server``, we will connect a second 262``lldb`` to port 54321 of the remote. 263 264This is the final configuration: 265 266:: 267 268 Host | Remote 269 --------------------------------------------|-------------------- 270 lldb A debugs lldb-server on port 54322 -> | lldb-server A 271 | (which runs) 272 lldb B debugs /tmp/test.o on port 54321 -> | lldb-server B 273 | (which runs) 274 | /tmp/test.o 275 276You would use ``lldb A`` to place a breakpoint in the code you're interested in, 277then ``lldb B`` to trigger ``lldb-server B`` to go into that code and hit the 278breakpoint. ``lldb-server A`` is only here to let us debug ``lldb-server B`` 279remotely. 280 281Debugging The Remote Protocol 282----------------------------- 283 284LLDB mostly follows the `GDB Remote Protocol <https://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html>`_ 285. Where there are differences it tries to handle both LLDB and GDB behaviour. 286 287LLDB does have extensions to the protocol which are documented in 288`lldb-gdb-remote.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-gdb-remote.txt>`_ 289and `lldb/docs/lldb-platform-packets.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-platform-packets.txt>`_. 290 291Logging Packets 292*************** 293 294If you just want to observe packets, you can enable the ``gdb-remote packets`` 295log channel. 296 297:: 298 299 (lldb) log enable gdb-remote packets 300 (lldb) run 301 lldb < 1> send packet: + 302 lldb history[1] tid=0x264bfd < 1> send packet: + 303 lldb < 19> send packet: $QStartNoAckMode#b0 304 lldb < 1> read packet: + 305 306You can do this on the ``lldb-server`` end as well by passing the option 307``--log-channels "gdb-remote packets"``. Then you'll see both sides of the 308connection. 309 310Some packets may be printed in a nicer way than others. For example XML packets 311will print the literal XML, some binary packets may be decoded. Others will just 312be printed unmodified. So do check what format you expect, a common one is hex 313encoded bytes. 314 315You can enable this logging even when you are connecting to an ``lldb-server`` 316in platform mode, this protocol is used for that too. 317 318Debugging Packet Exchanges 319************************** 320 321Say you want to make ``lldb`` send a packet to ``lldb-server``, then debug 322how the latter builds its response. Maybe even see how ``lldb`` handles it once 323it's sent back. 324 325That all takes time, so LLDB will likely time out and think the remote has gone 326away. You can change the ``plugin.process.gdb-remote.packet-timeout`` setting 327to prevent this. 328 329Here's an example, first we'll start an ``lldb-server`` being debugged by 330``lldb``. Placing a breakpoint on a packet handler we know will be hit once 331another ``lldb`` connects. 332 333:: 334 335 $ lldb -- lldb-server gdbserver :1234 -- /tmp/test.o 336 <...> 337 (lldb) b GDBRemoteCommunicationServerCommon::Handle_qSupported 338 Breakpoint 1: where = <...> 339 (lldb) run 340 <...> 341 342Next we connect another ``lldb`` to this, with a timeout of 5 minutes: 343 344:: 345 346 $ lldb /tmp/test.o 347 <...> 348 (lldb) settings set plugin.process.gdb-remote.packet-timeout 300 349 (lldb) gdb-remote 1234 350 351Doing so triggers the breakpoint in ``lldb-server``, bringing us back into 352``lldb``. Now we've got 5 minutes to do whatever we need before LLDB decides 353the connection has failed. 354 355:: 356 357 * thread #1, name = 'lldb-server', stop reason = breakpoint 1.1 358 frame #0: 0x0000aaaaaacc6848 lldb-server<...> 359 lldb-server`lldb_private::process_gdb_remote::GDBRemoteCommunicationServerCommon::Handle_qSupported: 360 -> 0xaaaaaacc6848 <+0>: sub sp, sp, #0xc0 361 <...> 362 (lldb) 363 364Once you're done simply ``continue`` the ``lldb-server``. Back in the other 365``lldb``, the connection process will continue as normal. 366 367:: 368 369 Process 2510266 stopped 370 * thread #1, name = 'test.o', stop reason = signal SIGSTOP 371 frame #0: 0x0000fffff7fcd100 ld-2.31.so`_start 372 ld-2.31.so`_start: 373 -> 0xfffff7fcd100 <+0>: mov x0, sp 374 <...> 375 (lldb) 376 377Reducing Bugs 378------------- 379 380This section covers reducing a bug that happens in LLDB itself, or where you 381suspect that LLDB causes something else to behave abnormally. 382 383Since bugs vary wildly, the advice here is general and incomplete. Let your 384instincts guide you and don't feel the need to try everything before reporting 385an issue or asking for help. This is simply inspiration. 386 387Reduction 388********* 389 390The first step is to reduce uneeded compexity where it is cheap to do so. If 391something is easily removed or frozen to a cerain value, do so. The goal is to 392keep the failure mode the same, with fewer dependencies. 393 394This includes, but is not limited to: 395 396* Removing test cases that don't crash. 397* Replacing dynamic lookups with constant values. 398* Replace supporting functions with stubs that do nothing. 399* Moving the test case to less unqiue system. If your machine has an exotic 400 extension, try it on a readily available commodity machine. 401* Removing irrelevant parts of the test program. 402* Reproducing the issue without using the LLDB test runner. 403* Converting a remote debuging scenario into a local one. 404 405Now we hopefully have a smaller reproducer than we started with. Next we need to 406find out what components of the software stack might be failing. 407 408Some examples are listed below with suggestions for how to investigate them. 409 410* Debugger 411 412 * Use a `released version of LLDB <https://github.com/llvm/llvm-project/releases>`_. 413 414 * If on MacOS, try the system ``lldb``. 415 416 * Try GDB or any other system debugger you might have e.g. Microsoft Visual 417 Studio. 418 419* Kernel 420 421 * Start a virtual machine running a different version. ``qemu-system`` is 422 useful here. 423 424 * Try a different physical system running a different version. 425 426 * Remember that for most kernels, userspace crashing the kernel is always a 427 kernel bug. Even if the userspace program is doing something unconventional. 428 So it could be a bug in the application and the kernel. 429 430* Compiler and compiler options 431 432 * Try other versions of the same compiler or your system compiler. 433 434 * Emit older versions of DWARF info, particularly DWARFv4 to v5, some tools 435 did/do not understand the new constructs. 436 437 * Reduce optimisation options as much as possible. 438 439 * Try all the language modes e.g. C++17/20 for C++. 440 441 * Link against LLVM's libcxx if you suspect a bug involving the system C++ 442 library. 443 444 * For languages other than C/C++ e.g. Rust, try making an equivalent program 445 in C/C++. LLDB tends to try to fit other languages into a C/C++ mould, so 446 porting the program can make triage and reporting much easier. 447 448* Operating system 449 450 * Use docker to try various versions of Linux. 451 452 * Use ``qemu-system`` to emulate other operating systems e.g. FreeBSD. 453 454* Architecture 455 456 * Use `QEMU user space emulation <https://www.qemu.org/docs/master/user/main.html>`_ 457 to quickly test other architectures. Note that ``lldb-server`` cannot be used 458 with this as the ptrace APIs are not emulated. 459 460 * If you need to test a big endian system use QEMU to emulate s390x (user 461 space emulation for just ``lldb``, ``qemu-system`` for testing 462 ``lldb-server``). 463 464.. note:: When using QEMU you may need to use the built in GDB stub, instead of 465 ``lldb-server``. For example if you wanted to debug ``lldb`` running 466 inside ``qemu-user-s390x`` you would connect to the GDB stub provided 467 by QEMU. 468 469 The same applies if you want to see how ``lldb`` would debug a test 470 program that is running on s390x. It's not totally accurate because 471 you're not using ``lldb-server``, but this is fine for features that 472 are mostly implemented in ``lldb``. 473 474 If you are running a full system using ``qemu-system``, you likely 475 want to connect to the ``lldb-server`` running within the userspace 476 of that system. 477 478 If your test program is bare metal (meaning it requires no supporting 479 operating system) then connect to the built in GDB stub. This can be 480 useful when testing embedded systems or kernel debugging. 481 482Reducing Ptrace Related Bugs 483**************************** 484 485This section is written Linux specific but the same can likely be done on 486other Unix or Unix like operating systems. 487 488Sometimes you will find ``lldb-server`` doing something with ptrace that causes 489a problem. Your reproducer involves running ``lldb`` as well, this is not going 490to go over well with kernel and is generally more difficult to explain if you 491want to get help with it. 492 493If you think you can get your point across without this, no need. If you're 494pretty sure you have for example found a Linux Kernel bug, doing this greatly 495increases the chances it'll get fixed. 496 497We'll remove the LLDB dependency by making a smaller standalone program that 498does the same actions. Starting with a skeleton program that forks and debugs 499the inferior process. 500 501The program presented `here <https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1>`_ 502(`source <https://github.com/eliben/code-for-blog/blob/master/2011/simple_tracer.c>`_) 503is a great starting point. There is also an AArch64 specific example in 504`the LLDB examples folder <https://github.com/llvm/llvm-project/tree/main/lldb/examples/ptrace_example.c>`_. 505 506For either, you'll need to modify that to fit your architecture. A tip for this 507is to take any constants used in it, find in which function(s) they are used in 508LLDB and then you'll find the equivalent constants in the same LLDB functions 509for your architecture. 510 511Once that is running as expected we can convert ``lldb-server``'s into calls in 512this program. To get a log of those, run ``lldb-server`` with 513``--log-channels "posix ptrace"``. You'll see output like: 514 515:: 516 517 $ lldb-server gdbserver :1234 --log-channels "posix ptrace" -- /tmp/test.o 518 1694099878.829990864 <...> ptrace(16896, 2659963, 0x0000000000000000, 0x000000000000007E, 0)=0x0 519 1694099878.830722332 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF7CC, 0x0000FFFFD14BF7D0, 16)=0x0 520 1694099878.831967115 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF66C, 0x0000FFFFD14BF630, 16)=0xffffffffffffffff 521 1694099878.831982136 <...> ptrace() failed: Invalid argument 522 Launched '/tmp/test.o' as process 2659963... 523 524Each call is logged with its parameters and its result as the ``=`` on the end. 525 526From here you will need to use a combination of the `ptrace documentation <https://man7.org/linux/man-pages/man2/ptrace.2.html>`_ 527and Linux Kernel headers (``uapi/linux/ptrace.h`` mainly) to figure out what 528the calls are. 529 530The most important parameter is the first, which is the request number. In the 531example above ``16896``, which is hex ``0x4200``, is ``PTRACE_SETOPTIONS``. 532 533Luckily, you don't usually have to figure out all those early calls. Our 534skeleton program will be doing all that, successfully we hope. 535 536What you should do is record just the interesting bit to you. Let's say 537something odd is happening when you read the ``tpidr`` register (this is an 538AArch64 register, just for example purposes). 539 540First, go to the ``lldb-server`` terminal and press enter a few times to put 541some blank lines after the last logging output. 542 543Then go to your ``lldb`` and: 544 545:: 546 547 (lldb) register read tpidr 548 tpidr = 0x0000fffff7fef320 549 550You'll see this from ``lldb-server``: 551 552:: 553 554 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF6CC, 0x0000FFFFD14BF710, 8)=0x0 555 556If you don't see that, it may be because ``lldb`` has cached it. The easiest way 557to clear that cache is to step. Remember that some registers are read every 558step, so you'll have to adjust depending on the situation. 559 560Assuming you've got that line, you would look up what ``116900`` is. This is 561``0x4204`` in hex, which is ``PTRACE_GETREGSET``. As we expected. 562 563The following parameters are not as we might expect because what we log is a bit 564different from the literal ptrace call. See your platform's definition of 565``PtraceWrapper`` for the exact form. 566 567The point of all this is that by doing a single action you can get a few 568isolated ptrace calls and you can then fill in the blanks and write 569equivalent calls in the skeleton program. 570 571The final piece of this is likely breakpoints. Assuming your bug does not 572require a hardware breakpoint, you can get software breakpoints by inserting 573a break instruction into the inferior's code at compile time. Usually by using 574an architecture specific assembly statement, as you will need to know exactly 575how many instructions to overwrite later. 576 577Doing it this way instead of exactly copying what LLDB does will save a few 578ptrace calls. The AArch64 example program shows how to do this. 579 580* The inferior contains ``BRK #0`` then ``NOP``. 581* 2 4 byte instructins means 8 bytes of data to replace, which matches the 582 minimum size you can write with ``PTRACE_POKETEXT``. 583* The inferior runs to the ``BRK``, which brings us into the debugger. 584* The debugger reads ``PC`` and writes ``NOP`` then ``NOP`` to the location 585 pointed to by ``PC``. 586* The debugger then single steps the inferior to the next instruction 587 (this is not required in this specific scenario, you could just continue but 588 it is included because this more cloesly matches what ``lldb`` does). 589* The debugger then continues the inferior. 590* The inferior exits, and the whole program exits. 591 592Using this technique you can emulate the usual "run to main, do a thing" type 593reproduction steps. 594 595Finally, that "thing" is the ptrace calls you got from the ``lldb-server`` logs. 596Add those to the debugger function and you now have a reproducer that doesn't 597need any part of LLDB. 598 599Debugging Tests 600--------------- 601 602See :doc:`/resources/test`.