1.\" $OpenBSD: crash.8,v 1.16 2002/01/02 06:07:41 nordin Exp $ 2.\" 3.\" Copyright (c) 1980, 1991 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 3. All advertising materials mentioning features or use of this software 15.\" must display the following acknowledgement: 16.\" This product includes software developed by the University of 17.\" California, Berkeley and its contributors. 18.\" 4. Neither the name of the University nor the names of its contributors 19.\" may be used to endorse or promote products derived from this software 20.\" without specific prior written permission. 21.\" 22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 32.\" SUCH DAMAGE. 33.\" 34.\" from: @(#)crash.8 6.5 (Berkeley) 4/20/91 35.\" 36.Dd February 23, 2000 37.Dt CRASH 8 38.Os 39.Sh NAME 40.Nm crash 41.Nd system failure and diagnosis 42.Sh DESCRIPTION 43This section explains what happens when the system crashes 44and (very briefly) how to analyze crash dumps. 45.Pp 46When the system crashes voluntarily it prints a message of the form 47.Pp 48.Bd -literal 49 panic: why i gave up the ghost 50.Ed 51.Pp 52on the console and enters the kernel debugger, 53.Xr ddb 4 . 54If the debugger command 55.Ic boot dump 56is entered, or if the debugger was not compiled into the kernel, or 57the debugger was disabled with 58.Xr sysctl 8 , 59then the system dumps the contents of physical memory 60onto a mass storage peripheral device. 61The particular device used is determined by the 62.Sq dumps on 63directive in the 64.Xr config 8 65file used to build the kernel. 66.Pp 67After the dump has been written, the system then 68invokes the automatic reboot procedure as 69described in 70.Xr reboot 8 . 71If auto-reboot is disabled (in a machine dependent way) the system 72will simply halt at this point. 73.Pp 74Upon rebooting, and 75unless some unexpected inconsistency is encountered in the state 76of the file systems due to hardware or software failure, the system 77will copy the previously written dump into 78.Pa /var/crash 79using 80.Xr savecore 8 , 81before resuming multi-user operations. 82.Ss Causes of system failure 83The system has a large number of internal consistency checks; if one 84of these fails, then it will panic with a very short message indicating 85which one failed. 86In many instances, this will be the name of the routine which detected 87the error, or a two-word description of the inconsistency. 88A full understanding of most panic messages requires perusal of the 89source code for the system. 90.Pp 91The most common cause of system failures is hardware failure 92.Pq e.g., bad memory 93which 94can reflect itself in different ways. 95Here are the messages which are most likely, with some hints as to causes. 96Left unstated in all cases is the possibility that a hardware or software 97error produced the message in some unexpected way. 98.Bl -tag -width indent 99.It no init 100This panic message indicates filesystem problems, and reboots are likely 101to be futile. 102Late in the bootstrap procedure, the system was unable to 103locate and execute the initialization process, 104.Xr init 8 . 105The root filesystem is incorrect or has been corrupted, or the mode 106or type of 107.Pa /sbin/init 108forbids execution. 109.It trap type %d, code=%x, pc=%x 110A unexpected trap has occurred within the system; the trap types are 111machine dependent and can be found listed in 112.Pa /sys/arch/ARCH/include/trap.h . 113.Pp 114The code is the referenced address, and the pc is the program counter at the 115time of the fault is printed. 116Hardware flakiness will sometimes generate this panic, but if the cause 117is a kernel bug, 118the kernel debugger 119.Xr ddb 4 120can be used to locate the instruction and subroutine inside the kernel 121corresponding 122to the PC value. 123If that is insufficient to suggest the nature of the problem, 124more detailed examination of the system status at the time of the trap 125usually can produce an explanation. 126.It init died 127The system initialization process has exited. 128This is bad news, as no new users will then be able to log in. 129Rebooting is the only fix, so the system just does it right away. 130.It out of mbufs: map full 131The network has exhausted its private page map for network buffers. 132This usually indicates that buffers are being lost, and rather than 133allow the system to slowly degrade, it reboots immediately. 134The map may be made larger if necessary. 135.El 136.Pp 137That completes the list of panic types you are likely to see. 138.Ss Analyzing a dump 139When the system crashes it writes (or at least attempts to write) 140an image of memory, including the kernel image, onto the dump device. 141On reboot, the kernel image and memory image are separated and preserved in 142the directory 143.Pa /var/crash . 144.Pp 145To analyze the kernel and memory images preserved as 146.Pa bsd.0 147and 148.Pa bsd.0.core , 149you should run 150.Xr gdb 1 , 151loading in the images with the following commands: 152.Pp 153.Bd -literal -offset indent 154# gdb 155GNU gdb 4.16.1 156Copyright 1996 Free Software Foundation, Inc. 157GDB is free software, covered by the GNU General Public License, and you are 158welcome to change it and/or distribute copies of it under certain conditions. 159Type "show copying" to see the conditions. 160There is absolutely no warranty for GDB. 161Type "show warranty" for details. 162This GDB was configured as "i386-unknown-openbsd2.8". 163(gdb) file /var/crash/bsd.0 164Reading symbols from /var/crash/bsd.0...(no debugging symbols found)...done. 165(gdb) target kcore /var/crash/bsd.0.core 166.Ed 167.Pp 168After this, you can use the 169.Ic where 170command to show trace of procedure calls that led to the crash. 171.Pp 172For custom-built kernels, it is helpful if you had previously 173configured your kernel to include debugging symbols with 174.Sq makeoptions DEBUG=-ggdb 175.Pq see Xr options 4 176(though you will not be able to boot an unstripped kernel since it uses too 177much memory). 178In this case, you should use 179.Pa bsd.gdb 180instead of 181.Pa bsd.0 , 182thus allowing 183.Xr gdb 1 184to show symbolic names for addresses and line numbers from the source. 185.Pp 186Analyzing saved system images is sometimes called post-mortem debugging. 187There are a class of analysis tools designed to work on 188both live systems and saved images, most of them are linked with the 189.Xr kvm 3 190library and share option flags to specify the kernel and memory image. 191These tools typically take the following flags: 192.Bl -tag -width indent 193.It Fl N Ar system 194Takes a kernel 195.Ar system 196image as an argument. 197This is where the symbolic information is gotten from, 198which means the image cannot be stripped. 199In some cases, using a 200.Pa bsd.gdb 201version of the kernel can assist even more. 202.It Fl M Ar core 203Normally this 204.Ar core 205is an image produced by 206.Xr savecore 8 207but it can be 208.Pa /dev/mem 209too, if you are looking at the live system. 210.El 211.Pp 212The following commands understand these options: 213.Xr fstat 1 , 214.Xr netstat 1 , 215.Xr nfsstat 1 , 216.Xr ps 1 , 217.Xr systat 1 , 218.Xr w 1 , 219.Xr dmesg 8 , 220.Xr iostat 8 , 221.Xr kgmon 8 , 222.Xr pstat 8 , 223.Xr slstats 8 , 224.Xr trpt 8 , 225.Xr trsp 8 , 226.Xr vmstat 8 227and many others. 228There are exceptions, however. 229For instance, 230.Xr ipcs 1 231has renamed the 232.Fl M 233argument to be 234.Fl C 235instead. 236.Pp 237Examples of use: 238.Pp 239.Bd -literal 240 # ps -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -O paddr 241.Ed 242.Pp 243The 244.Fl O Ar paddr 245option prints each process' 246.Li struct proc 247address, but with the value of KERNBASE masked off. 248This is very useful information if you are analyzing process contexts in 249.Xr gdb 1 . 250You need to add back KERNBASE though, that value can be found in 251.Pa /usr/include/$ARCH/param.h . 252.Pp 253.Bd -literal 254 # vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m 255.Ed 256.Pp 257This analyzes memory allocations at the time of the crash. 258Perhaps some resource was starving the system? 259.Sh CRASH LOCATION DETERMINATION 260The following example should make it easier for a novice kernel 261developer to find out where the kernel crashed. 262.Pp 263First, in 264.Xr ddb 4 265find the function that caused the crash. 266It is either the function at the top of the traceback or the function 267under the call to 268.Fn panic 269or 270.Fn uvm_fault . 271.Pp 272The point of the crash usually looks something like this "function+0x4711". 273.Pp 274Find the function in the sources, let's say that the function is in "foo.c". 275.Pp 276Goto the kernel build directory, i.e., 277.Pa /sys/arch/ARCH/compile/GENERIC . 278.Pp 279Do the following: 280.Bd -literal 281 # rm foo.o 282 # make -n foo.o | sed 's,-c,-g -c,' | sh 283 # objdump -S foo.o | less 284.Ed 285.Pp 286Find the function in the output. 287The function will look something like this: 288.Pp 289.Bd -literal 290 0: 17 47 11 42 foo %x, bar, %y 291 4: foo bar allan %kaka 292 8: XXXX boink %bloyt 293 etc. 294.Ed 295.Pp 296The first number is the offset. 297Find the offset that you got in the ddb trace 298(in this case it's 4711). 299.Pp 300When reporting data collected in this way, include ~20 lines before and ~10 301lines after the offset from the objdump output in the crash report, as well 302as the output of 303.Xr ddb 4 Ns 's 304"show registers" command. 305It's important that the output from objdump includes at least two or 306three lines of C code. 307.Sh REPORTING 308If you are sure you have found a reproducible software bug in the kernel, 309and need help in further diagnosis, or already have a fix, use 310.Xr sendbug 1 311to send the developers a detailed description including the entire session 312from 313.Xr gdb 1 . 314.Sh SEE ALSO 315.Xr gdb 1 , 316.Xr sendbug 1 , 317.Xr ddb 4 , 318.Xr reboot 8 , 319.Xr savecore 8 320