161614Sbostic.\" Copyright (c) 1990, 1991, 1993
261614Sbostic.\"	The Regents of the University of California.  All rights reserved.
344697Sbostic.\"
447467Scael.\" %sccs.include.redist.man%
544697Sbostic.\"
6*65709Smckusick.\"     @(#)crash.8	8.2 (Berkeley) 01/12/94
747467Scael.\"
847467Scael.Dd
9*65709Smckusick.Dt CRASH 8 hp300
1047467Scael.Os
1147467Scael.Sh NAME
1247467Scael.Nm crash
1347467Scael.Nd UNIX system failures
1447467Scael.Sh DESCRIPTION
1547467ScaelThis section explains a bit about system crashes
1644697Sbosticand (very briefly) how to analyze crash dumps.
1747467Scael.Pp
1844697SbosticWhen the system crashes voluntarily it prints a message of the form
1947467Scael.Bd -ragged -offset indent
2044697Sbosticpanic: why i gave up the ghost
2147467Scael.Ed
2247467Scael.Pp
2344697Sbosticon the console, takes a dump on a mass storage peripheral,
2444697Sbosticand then invokes an automatic reboot procedure as
2544697Sbosticdescribed in
2647467Scael.Xr reboot 8 .
2744697SbosticUnless some unexpected inconsistency is encountered in the state
2844697Sbosticof the file systems due to hardware or software failure, the system
2944697Sbosticwill then resume multi-user operations.
3047467Scael.Pp
3144697SbosticThe system has a large number of internal consistency checks; if one
3244697Sbosticof these fails, then it will panic with a very short message indicating
3344697Sbosticwhich one failed.
3444697SbosticIn many instances, this will be the name of the routine which detected
3544697Sbosticthe error, or a two-word description of the inconsistency.
3644697SbosticA full understanding of most panic messages requires perusal of the
3744697Sbosticsource code for the system.
3847467Scael.Pp
3944697SbosticThe most common cause of system failures is hardware failure, which
4044697Sbosticcan reflect itself in different ways.  Here are the messages which
4144697Sbosticare most likely, with some hints as to causes.
4244697SbosticLeft unstated in all cases is the possibility that hardware or software
4344697Sbosticerror produced the message in some unexpected way.
4447467Scael.Pp
4547467Scael.Bl -tag -width Ds -compact
4647467Scael.It Sy iinit
4744697SbosticThis cryptic panic message results from a failure to mount the root filesystem
4844697Sbosticduring the bootstrap process.
4944697SbosticEither the root filesystem has been corrupted,
5044697Sbosticor the system is attempting to use the wrong device as root filesystem.
5144697SbosticUsually, an alternate copy of the system binary or an alternate root
5244697Sbosticfilesystem can be used to bring up the system to investigate.
5347467Scael.Pp
5447467Scael.It Sy "Can't exec /etc/init"
5544697SbosticThis is not a panic message, as reboots are likely to be futile.
5644697SbosticLate in the bootstrap procedure, the system was unable to locate
5744697Sbosticand execute the initialization process,
5847467Scael.Xr init 8 .
5944697SbosticThe root filesystem is incorrect or has been corrupted, or the mode
6047467Scaelor type of
6147467Scael.Pa /etc/init
6247467Scaelforbids execution.
6347467Scael.Pp
6447467Scael.It Sy "IO err in push"
6547467Scael.It Sy "hard IO err in swap"
6644697SbosticThe system encountered an error trying to write to the paging device
6744697Sbosticor an error in reading critical information from a disk drive.
6844697SbosticThe offending disk should be fixed if it is broken or unreliable.
6947467Scael.Pp
7047467Scael.It Sy "realloccg: bad optim"
7147467Scael.It Sy "ialloc: dup alloc"
7247467Scael.It Sy "alloccgblk:cyl groups corrupted"
7347467Scael.It Sy "ialloccg: map corrupted"
7447467Scael.It Sy "free: freeing free block"
7547467Scael.It Sy "free: freeing free frag"
7647467Scael.It Sy "ifree: freeing free inode"
7747467Scael.It Sy "alloccg: map corrupted"
7844697SbosticThese panic messages are among those that may be produced
7944697Sbosticwhen filesystem inconsistencies are detected.
8044697SbosticThe problem generally results from a failure to repair damaged filesystems
8144697Sbosticafter a crash, hardware failures, or other condition that should not
8244697Sbosticnormally occur.
8344697SbosticA filesystem check will normally correct the problem.
8447467Scael.Pp
8547467Scael.It Sy "timeout table overflow"
8644697SbosticThis really shouldn't be a panic, but until the data structure
8744697Sbosticinvolved is made to be extensible, running out of entries causes a crash.
8844697SbosticIf this happens, make the timeout table bigger.
8947467Scael.Pp
9047467Scael.It Sy "trap type %d, code = %x, v = %x"
9144697SbosticAn unexpected trap has occurred within the system; the trap types are:
9247467Scael.Bl -column xxxx -offset indent
9344697Sbostic0	bus error
9444697Sbostic1	address error
9544697Sbostic2	illegal instruction
9644697Sbostic3	divide by zero
9747467Scael.No 4\t Em chk No instruction
9847467Scael.No 5\t Em trapv No instruction
9944697Sbostic6	privileged instruction
10044697Sbostic7	trace trap
10144697Sbostic8	MMU fault
10244697Sbostic9	simulated software interrupt
10344697Sbostic10	format error
10444697Sbostic11	FP coprocessor fault
10544697Sbostic12	coprocessor fault
10644697Sbostic13	simulated AST
10747467Scael.El
10847467Scael.Pp
10944697SbosticThe favorite trap type in system crashes is trap type 8,
11044697Sbosticindicating a wild reference.
11147467Scael``code'' (hex) is the concatenation of the
11247467ScaelMMU
11347467Scaelstatus register
11444697Sbostic(see <hp300/cpu.h>)
11544697Sbosticin the high 16 bits and the 68020 special status word
11644697Sbostic(see the 68020 manual, page 6-17)
11744697Sbosticin the low 16.
11844697Sbostic``v'' (hex) is the virtual address which caused the fault.
11944697SbosticAdditionally, the kernel will dump about a screenful of semi-useful
12044697Sbosticinformation.
12144697Sbostic``pid'' (decimal) is the process id of the process running at the
12244697Sbostictime of the exception.
12344697SbosticNote that if we panic in an interrupt routine,
12444697Sbosticthis process may not be related to the panic.
12544697Sbostic``ps'' (hex) is the 68020 processor status register ``ps''.
12644697Sbostic``pc'' (hex) is the value of the program counter saved
12744697Sbosticon the hardware exception frame.
12844697SbosticIt may
12947467Scael.Em not
13044697Sbosticbe the PC of the instruction causing the fault.
13144697Sbostic``sfc'' and ``dfc'' (hex) are the 68020 source/destination function codes.
13244697SbosticThey should always be one.
13347467Scael``p0'' and ``p1'' are the
13447467ScaelVAX-like
13547467Scaelregion registers.
13644697SbosticThey are of the form:
13747467Scael.Pp
13847467Scael.Bd -ragged -offset indent
13947467Scael<length> '@' <kernel VA>
14047467Scael.Ed
14147467Scael.Pp
14244697Sbosticwhere both are in hex.
14344697SbosticFollowing these values are a dump of the processor registers (hex).
14444697SbosticFinally, is a dump of the stack (user/kernel) at the time of the offense.
14547467Scael.Pp
14647467Scael.It Sy "init died"
14744697SbosticThe system initialization process has exited.  This is bad news, as no new
14844697Sbosticusers will then be able to log in.  Rebooting is the only fix, so the
14944697Sbosticsystem just does it right away.
15047467Scael.Pp
15147467Scael.It Sy "out of mbufs: map full"
15244697SbosticThe network has exhausted its private page map for network buffers.
15344697SbosticThis usually indicates that buffers are being lost, and rather than
15444697Sbosticallow the system to slowly degrade, it reboots immediately.
15544697SbosticThe map may be made larger if necessary.
15647467Scael.El
15747467Scael.Pp
15844697SbosticThat completes the list of panic types you are likely to see.
15947467Scael.Pp
16044697SbosticWhen the system crashes it writes (or at least attempts to write)
16144697Sbostican image of memory into the back end of the dump device,
16244697Sbosticusually the same as the primary swap
16344697Sbosticarea.  After the system is rebooted, the program
16447467Scael.Xr savecore 8
16544697Sbosticruns and preserves a copy of this core image and the current
16644697Sbosticsystem in a specified directory for later perusal.  See
16747467Scael.Xr savecore 8
16844697Sbosticfor details.
16947467Scael.Pp
17044697SbosticTo analyze a dump you should begin by running
17147467Scael.Xr adb 1
17244697Sbosticwith the
17347467Scael.Fl k
17444697Sbosticflag on the system load image and core dump.
17544697SbosticIf the core image is the result of a panic,
17644697Sbosticthe panic message is printed.
17744697SbosticNormally the command
17844697Sbostic``$c''
17944697Sbosticwill provide a stack trace from the point of
18044697Sbosticthe crash and this will provide a clue as to
18144697Sbosticwhat went wrong.
18247467ScaelFor more details consult
18347467Scael.%T "Using ADB to Debug the UNIX Kernel" .
18447467Scael.Sh SEE ALSO
18547467Scael.Xr adb 1 ,
18647467Scael.Xr reboot 8
18747467Scael.Rs
18847467Scael.%T "MC68020 32-bit Microprocessor User's Manual"
18947467Scael.Re
19047467Scael.Rs
19147467Scael.%T "Using ADB to Debug the UNIX Kernel
19247467Scael.Re
19347467Scael.Rs
19447467Scael.%T "4.3BSD for the HP300"
19547467Scael.Re
19647467Scael.Sh HISTORY
19747467ScaelA
19847467Scael.Nm
19947467Scaelman page appeared in Version 6 AT&T UNIX.
200