1.\" $OpenBSD: crash.8,v 1.12 2001/08/03 15:21:17 mpech Exp $ 2.\" 3.\" Copyright (c) 1980, 1991 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 3. All advertising materials mentioning features or use of this software 15.\" must display the following acknowledgement: 16.\" This product includes software developed by the University of 17.\" California, Berkeley and its contributors. 18.\" 4. Neither the name of the University nor the names of its contributors 19.\" may be used to endorse or promote products derived from this software 20.\" without specific prior written permission. 21.\" 22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 32.\" SUCH DAMAGE. 33.\" 34.\" from: @(#)crash.8 6.5 (Berkeley) 4/20/91 35.\" 36.Dd February 23, 2000 37.Dt CRASH 8 38.Os 39.Sh NAME 40.Nm crash 41.Nd system failure and diagnosis 42.Sh DESCRIPTION 43This section explains what happens when the system crashes 44and (very briefly) how to analyze crash dumps. 45.Pp 46When the system crashes voluntarily it prints a message of the form 47.Pp 48.Bd -literal 49 panic: why i gave up the ghost 50.Ed 51.Pp 52on the console and enters the kernel debugger, 53.Xr ddb 4 . 54If the debugger command 55.Ic boot dump 56is entered, or if the debugger was not compiled into the kernel, or 57the debugger was disabled with 58.Xr sysctl 8 , 59then the system dumps the contents of physical memory 60onto a mass storage peripheral device. 61The particular device used is determined by the 62.Sq dumps on 63directive in the 64.Xr config 8 65file used to build the kernel. 66.Pp 67After the dump has been written, the system then 68invokes the automatic reboot procedure as 69described in 70.Xr reboot 8 . 71If auto-reboot is disabled (in a machine dependent way) the system 72will simply halt at this point. 73.Pp 74Upon rebooting, and 75unless some unexpected inconsistency is encountered in the state 76of the file systems due to hardware or software failure, the system 77will copy the previously written dump into 78.Pa /var/crash 79using 80.Xr savecore 8 , 81before resuming multi-user operations. 82.Ss Causes of system failure 83The system has a large number of internal consistency checks; if one 84of these fails, then it will panic with a very short message indicating 85which one failed. 86In many instances, this will be the name of the routine which detected 87the error, or a two-word description of the inconsistency. 88A full understanding of most panic messages requires perusal of the 89source code for the system. 90.Pp 91The most common cause of system failures is hardware failure 92.Pq e.g., bad memory 93which 94can reflect itself in different ways. 95Here are the messages which are most likely, with some hints as to causes. 96Left unstated in all cases is the possibility that a hardware or software 97error produced the message in some unexpected way. 98.Bl -tag -width indent 99.It no init 100This panic message indicates filesystem problems, and reboots are likely 101to be futile. 102Late in the bootstrap procedure, the system was unable to 103locate and execute the initialization process, 104.Xr init 8 . 105The root filesystem is incorrect or has been corrupted, or the mode 106or type of 107.Pa /sbin/init 108forbids execution. 109.It timeout table overflow 110.ns 111This really shouldn't be a panic, but until the data structure 112involved is made to be extensible, running out of entries causes a crash. 113If this happens, make the timeout table bigger. 114.It trap type %d, code=%x, pc=%x 115A unexpected trap has occurred within the system; the trap types are 116machine dependent and can be found listed in 117.Pa /sys/arch/ARCH/include/trap.h . 118.Pp 119The code is the referenced address, and the pc is the program counter at the 120time of the fault is printed. 121Hardware flakiness will sometimes generate this panic, but if the cause 122is a kernel bug, 123the kernel debugger 124.Xr ddb 4 125can be used to locate the instruction and subroutine inside the kernel 126corresponding 127to the PC value. 128If that is insufficient to suggest the nature of the problem, 129more detailed examination of the system status at the time of the trap 130usually can produce an explanation. 131.It init died 132The system initialization process has exited. 133This is bad news, as no new users will then be able to log in. 134Rebooting is the only fix, so the system just does it right away. 135.It out of mbufs: map full 136The network has exhausted its private page map for network buffers. 137This usually indicates that buffers are being lost, and rather than 138allow the system to slowly degrade, it reboots immediately. 139The map may be made larger if necessary. 140.El 141.Pp 142That completes the list of panic types you are likely to see. 143.Ss Analyzing a dump 144When the system crashes it writes (or at least attempts to write) 145an image of memory, including the kernel image, onto the dump device. 146On reboot, the kernel image and memory image are separated and preserved in 147the directory 148.Pa /var/crash . 149.Pp 150To analyze the kernel and memory images preserved as 151.Pa bsd.0 152and 153.Pa bsd.0.core , 154you should run 155.Xr gdb 1 , 156loading in the images with the following commands: 157.Pp 158.Bd -literal -offset indent 159# gdb 160GNU gdb 4.16.1 161Copyright 1996 Free Software Foundation, Inc. 162GDB is free software, covered by the GNU General Public License, and you are 163welcome to change it and/or distribute copies of it under certain conditions. 164Type "show copying" to see the conditions. 165There is absolutely no warranty for GDB. Type "show warranty" for details. 166This GDB was configured as "i386-unknown-openbsd2.8". 167(gdb) file /var/crash/bsd.0 168Reading symbols from /var/crash/bsd.0...(no debugging symbols found)...done. 169(gdb) target kcore /var/crash/bsd.0.core 170.Ed 171.Pp 172After this, you can use the 173.Ic where 174command to show trace of procedure calls that led to the crash. 175.Pp 176For custom-built kernels, it is helpful if you had previously 177configured your kernel to include debugging symbols with 178.Sq makeoptions DEBUG=-ggdb 179.Pq see Xr options 4 180(though you will not be able to boot an unstripped kernel since it uses too 181much memory). 182In this case, you should use 183.Pa bsd.gdb 184instead of 185.Pa bsd.0 , 186thus allowing 187.Xr gdb 1 188to show symbolic names for addresses and line numbers from the source. 189.Pp 190Analyzing saved system images is sometimes called post-mortem debugging. 191There are a class of analysis tools designed to work on 192both live systems and saved images, most of them are linked with the 193.Xr kvm 3 194library and share option flags to specify the kernel and memory image. 195These tools typically take the following flags: 196.Bl -tag -width indent 197.It Fl N Ar system 198Takes a kernel 199.Ar system 200image as an argument. 201This is where the symbolic information is gotten from, 202which means the image cannot be stripped. 203In some cases, using a 204.Pa bsd.gdb 205version of the kernel can assist even more. 206.It Fl M Ar core 207Normally this 208.Ar core 209is an image produced by 210.Xr savecore 8 211but it can be 212.Pa /dev/mem 213too, if you are looking at the live system. 214.El 215.Pp 216The following commands understand these options: 217.Xr fstat 1 , 218.Xr netstat 1 , 219.Xr nfsstat 1 , 220.Xr ps 1 , 221.Xr systat 1 , 222.Xr w 1 , 223.Xr dmesg 8 , 224.Xr iostat 8 , 225.Xr kgmon 8 , 226.Xr pstat 8 , 227.Xr slstats 8 , 228.Xr trpt 8 , 229.Xr trsp 8 , 230.Xr vmstat 8 231and many others. 232There are exceptions, however. 233For instance, 234.Xr ipcs 1 235has renamed the 236.Fl M 237argument to be 238.Fl C 239instead. 240.Pp 241Examples of use: 242.Pp 243.Bd -literal 244 # ps -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -O paddr 245.Ed 246.Pp 247The 248.Fl O Ar paddr 249option gives the last 6 hexadecimal digits of the 250.Li struct proc 251pointer for each process. 252This is very useful information if you are analyzing process contexts in 253.Xr gdb 1 . 254The preceding digits have to be guessed, but that is not very difficult; 255they are the start of the kvm space and is defined by machine-dependent 256sizes given in 257.Pa /usr/include/$ARCH/vmparam.h . 258.Pp 259.Bd -literal 260 # vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m 261.Ed 262.Pp 263This analyzes memory allocations at the time of the crash. 264Perhaps some resource was starving the system? 265.Sh CRASH LOCATION DETERMINATION 266The following example should make it easier for a novice kernel 267developer to find out where the kernel crashed. 268.Pp 269First, in 270.Xr ddb 4 271find the function that caused the crash. 272It is either the function at the top of the traceback or the function 273under the call to 274.Fn panic 275or 276.Fn uvm_fault . 277.Pp 278The point of the crash usually looks something like this "function+0x4711". 279.Pp 280Find the function in the sources, let's say that the function is in "foo.c". 281.Pp 282Goto the kernel build directory, i.e., 283.Pa /sys/arch/ARCH/compile/GENERIC . 284.Pp 285Do the following: 286.Bd -literal 287 # rm foo.o 288 # make -n foo.o | sed 's,-c,-g -c,' | sh 289 # objdump -S foo.o | less 290.Ed 291.Pp 292Find the function in the output. 293The function will look something like this: 294.Pp 295.Bd -literal 296 0: 17 47 11 42 foo %x, bar, %y 297 4: foo bar allan %kaka 298 8: XXXX boink %bloyt 299 etc. 300.Ed 301.Pp 302The first number is the offset. 303Find the offset that you got in the ddb trace 304(in this case it's 4711). 305.Pp 306When reporting data collected in this way, include ~20 lines before and ~10 lines 307after the offset from the objdump output in the crash report, as well as the output 308of 309.Xr ddb 4 Ns 's 310"show registers" command. 311It's important that the output from objdump includes at least two or 312three lines of C code. 313.Sh REPORTING 314If you are sure you have found a reproducible software bug in the kernel, 315and need help in further diagnosis, or already have a fix, use 316.Xr sendbug 1 317to send the developers a detailed description including the entire session 318from 319.Xr gdb 1 . 320.Sh "SEE ALSO" 321.Xr gdb 1 , 322.Xr ddb 4 , 323.Xr reboot 8 , 324.Xr savecore 8 , 325.Xr sendbug 1 326