xref: /openbsd-src/share/man/man8/crash.8 (revision db3296cf5c1dd9058ceecc3a29fe4aaa0bd26000)
1.\"	$OpenBSD: crash.8,v 1.19 2003/06/28 14:27:20 jmc Exp $
2.\"
3.\" Copyright (c) 1980, 1991 The Regents of the University of California.
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\" 3. Neither the name of the University nor the names of its contributors
15.\"    may be used to endorse or promote products derived from this software
16.\"    without specific prior written permission.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
22.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28.\" SUCH DAMAGE.
29.\"
30.\"	from: @(#)crash.8	6.5 (Berkeley) 4/20/91
31.\"
32.Dd February 23, 2000
33.Dt CRASH 8
34.Os
35.Sh NAME
36.Nm crash
37.Nd system failure and diagnosis
38.Sh DESCRIPTION
39This section explains what happens when the system crashes
40and (very briefly) how to analyze crash dumps.
41.Pp
42When the system crashes voluntarily it prints a message of the form
43.Pp
44.Bd -literal
45        panic: why i gave up the ghost
46.Ed
47.Pp
48on the console and enters the kernel debugger,
49.Xr ddb 4 .
50If the debugger command
51.Ic boot dump
52is entered, or if the debugger was not compiled into the kernel, or
53the debugger was disabled with
54.Xr sysctl 8 ,
55then the system dumps the contents of physical memory
56onto a mass storage peripheral device.
57The particular device used is determined by the
58.Sq dumps on
59directive in the
60.Xr config 8
61file used to build the kernel.
62.Pp
63After the dump has been written, the system then
64invokes the automatic reboot procedure as
65described in
66.Xr reboot 8 .
67If auto-reboot is disabled (in a machine dependent way) the system
68will simply halt at this point.
69.Pp
70Upon rebooting, and
71unless some unexpected inconsistency is encountered in the state
72of the file systems due to hardware or software failure, the system
73will copy the previously written dump into
74.Pa /var/crash
75using
76.Xr savecore 8 ,
77before resuming multi-user operations.
78.Ss Causes of system failure
79The system has a large number of internal consistency checks; if one
80of these fails, then it will panic with a very short message indicating
81which one failed.
82In many instances, this will be the name of the routine which detected
83the error, or a two-word description of the inconsistency.
84A full understanding of most panic messages requires perusal of the
85source code for the system.
86.Pp
87The most common cause of system failures is hardware failure
88.Pq e.g., bad memory
89which
90can reflect itself in different ways.
91Here are the messages which are most likely, with some hints as to causes.
92Left unstated in all cases is the possibility that a hardware or software
93error produced the message in some unexpected way.
94.Bl -tag -width indent
95.It no init
96This panic message indicates filesystem problems, and reboots are likely
97to be futile.
98Late in the bootstrap procedure, the system was unable to
99locate and execute the initialization process,
100.Xr init 8 .
101The root filesystem is incorrect or has been corrupted, or the mode
102or type of
103.Pa /sbin/init
104forbids execution.
105.It trap type %d, code=%x, pc=%x
106A unexpected trap has occurred within the system; the trap types are
107machine dependent and can be found listed in
108.Pa /sys/arch/ARCH/include/trap.h .
109.Pp
110The code is the referenced address, and the pc is the program counter at the
111time of the fault is printed.
112Hardware flakiness will sometimes generate this panic, but if the cause
113is a kernel bug,
114the kernel debugger
115.Xr ddb 4
116can be used to locate the instruction and subroutine inside the kernel
117corresponding
118to the PC value.
119If that is insufficient to suggest the nature of the problem,
120more detailed examination of the system status at the time of the trap
121usually can produce an explanation.
122.It init died
123The system initialization process has exited.
124This is bad news, as no new users will then be able to log in.
125Rebooting is the only fix, so the system just does it right away.
126.It out of mbufs: map full
127The network has exhausted its private page map for network buffers.
128This usually indicates that buffers are being lost, and rather than
129allow the system to slowly degrade, it reboots immediately.
130The map may be made larger if necessary.
131.El
132.Pp
133That completes the list of panic types you are likely to see.
134.Ss Analyzing a dump
135When the system crashes it writes (or at least attempts to write)
136an image of memory, including the kernel image, onto the dump device.
137On reboot, the kernel image and memory image are separated and preserved in
138the directory
139.Pa /var/crash .
140.Pp
141To analyze the kernel and memory images preserved as
142.Pa bsd.0
143and
144.Pa bsd.0.core ,
145you should run
146.Xr gdb 1 ,
147loading in the images with the following commands:
148.Pp
149.Bd -literal -offset indent
150# gdb
151GNU gdb 4.16.1
152Copyright 1996 Free Software Foundation, Inc.
153GDB is free software, covered by the GNU General Public License, and you are
154welcome to change it and/or distribute copies of it under certain conditions.
155Type "show copying" to see the conditions.
156There is absolutely no warranty for GDB.
157Type "show warranty" for details.
158This GDB was configured as "i386-unknown-openbsd2.8".
159(gdb) file /var/crash/bsd.0
160Reading symbols from /var/crash/bsd.0...(no debugging symbols found)...done.
161(gdb) target kcore /var/crash/bsd.0.core
162.Ed
163.Pp
164After this, you can use the
165.Ic where
166command to show trace of procedure calls that led to the crash.
167.Pp
168For custom-built kernels, it is helpful if you had previously
169configured your kernel to include debugging symbols with
170.Sq makeoptions DEBUG=-ggdb
171.Pq see Xr options 4
172(though you will not be able to boot an unstripped kernel since it uses too
173much memory).
174In this case, you should use
175.Pa bsd.gdb
176instead of
177.Pa bsd.0 ,
178thus allowing
179.Xr gdb 1
180to show symbolic names for addresses and line numbers from the source.
181.Pp
182Analyzing saved system images is sometimes called post-mortem debugging.
183There are a class of analysis tools designed to work on
184both live systems and saved images, most of them are linked with the
185.Xr kvm 3
186library and share option flags to specify the kernel and memory image.
187These tools typically take the following flags:
188.Bl -tag -width indent
189.It Fl N Ar system
190Takes a kernel
191.Ar system
192image as an argument.
193This is where the symbolic information is gotten from,
194which means the image cannot be stripped.
195In some cases, using a
196.Pa bsd.gdb
197version of the kernel can assist even more.
198.It Fl M Ar core
199Normally this
200.Ar core
201is an image produced by
202.Xr savecore 8
203but it can be
204.Pa /dev/mem
205too, if you are looking at the live system.
206.El
207.Pp
208The following commands understand these options:
209.Xr fstat 1 ,
210.Xr netstat 1 ,
211.Xr nfsstat 1 ,
212.Xr ps 1 ,
213.Xr systat 1 ,
214.Xr w 1 ,
215.Xr dmesg 8 ,
216.Xr iostat 8 ,
217.Xr kgmon 8 ,
218.Xr pstat 8 ,
219.Xr slstats 8 ,
220.Xr trpt 8 ,
221.Xr vmstat 8
222and many others.
223There are exceptions, however.
224For instance,
225.Xr ipcs 1
226has renamed the
227.Fl M
228argument to be
229.Fl C
230instead.
231.Pp
232Examples of use:
233.Pp
234.Bd -literal
235    # ps -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -O paddr
236.Ed
237.Pp
238The
239.Fl O Ar paddr
240option prints each process'
241.Li struct proc
242address, but with the value of KERNBASE masked off.
243This is very useful information if you are analyzing process contexts in
244.Xr gdb 1 .
245You need to add back KERNBASE though, that value can be found in
246.Pa /usr/include/$ARCH/param.h .
247.Pp
248.Bd -literal
249    # vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m
250.Ed
251.Pp
252This analyzes memory allocations at the time of the crash.
253Perhaps some resource was starving the system?
254.Sh CRASH LOCATION DETERMINATION
255The following example should make it easier for a novice kernel
256developer to find out where the kernel crashed.
257.Pp
258First, in
259.Xr ddb 4
260find the function that caused the crash.
261It is either the function at the top of the traceback or the function
262under the call to
263.Fn panic
264or
265.Fn uvm_fault .
266.Pp
267The point of the crash usually looks something like this "function+0x4711".
268.Pp
269Find the function in the sources, let's say that the function is in "foo.c".
270.Pp
271Go to the kernel build directory, i.e.,
272.Pa /sys/arch/ARCH/compile/GENERIC .
273.Pp
274Do the following:
275.Bd -literal
276    # rm foo.o
277    # make -n foo.o | sed 's,-c,-g -c,' | sh
278    # objdump -S foo.o | less
279.Ed
280.Pp
281Find the function in the output.
282The function will look something like this:
283.Pp
284.Bd -literal
285     0: 17 47 11 42         foo %x, bar, %y
286     4: foo bar             allan %kaka
287     8: XXXX                boink %bloyt
288    etc.
289.Ed
290.Pp
291The first number is the offset.
292Find the offset that you got in the ddb trace
293(in this case it's 4711).
294.Pp
295When reporting data collected in this way, include ~20 lines before and ~10
296lines after the offset from the objdump output in the crash report, as well
297as the output of
298.Xr ddb 4 Ns 's
299"show registers" command.
300It's important that the output from objdump includes at least two or
301three lines of C code.
302.Sh REPORTING
303If you are sure you have found a reproducible software bug in the kernel,
304and need help in further diagnosis, or already have a fix, use
305.Xr sendbug 1
306to send the developers a detailed description including the entire session
307from
308.Xr gdb 1 .
309.Sh SEE ALSO
310.Xr gdb 1 ,
311.Xr sendbug 1 ,
312.Xr ddb 4 ,
313.Xr reboot 8 ,
314.Xr savecore 8
315