xref: /netbsd-src/share/man/man4/raid.4 (revision c4a72b64362cdaf56b20ef58a2b9eb3d98492c47)
1.\"     $NetBSD: raid.4,v 1.21 2002/09/04 00:26:08 wiz Exp $
2.\"
3.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Greg Oster
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\"
38.\" Copyright (c) 1995 Carnegie-Mellon University.
39.\" All rights reserved.
40.\"
41.\" Author: Mark Holland
42.\"
43.\" Permission to use, copy, modify and distribute this software and
44.\" its documentation is hereby granted, provided that both the copyright
45.\" notice and this permission notice appear in all copies of the
46.\" software, derivative works or modified versions, and any portions
47.\" thereof, and that both notices appear in supporting documentation.
48.\"
49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
50.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
52.\"
53.\" Carnegie Mellon requests users of this software to return to
54.\"
55.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
56.\"  School of Computer Science
57.\"  Carnegie Mellon University
58.\"  Pittsburgh PA 15213-3890
59.\"
60.\" any improvements or extensions that they make and grant Carnegie the
61.\" rights to redistribute these changes.
62.\"
63.Dd November 9, 1998
64.Dt RAID 4
65.Os
66.Sh NAME
67.Nm raid
68.Nd RAIDframe disk driver
69.Sh SYNOPSIS
70.Cd "pseudo-device raid" Op Ar count
71.Sh DESCRIPTION
72The
73.Nm
74driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to
75.Nx .
76This
77document assumes that the reader has at least some familiarity with RAID
78and RAID concepts.  The reader is also assumed to know how to configure
79disks and pseudo-devices into kernels, how to generate kernels, and how
80to partition disks.
81.Pp
82RAIDframe provides a number of different RAID levels including:
83.Bl -tag -width indent
84.It RAID 0
85provides simple data striping across the components.
86.It RAID 1
87provides mirroring.
88.It RAID 4
89provides data striping across the components, with parity
90stored on a dedicated drive (in this case, the last component).
91.It RAID 5
92provides data striping across the components, with parity
93distributed across all the components.
94.El
95.Pp
96There are a wide variety of other RAID levels supported by RAIDframe,
97including Even-Odd parity, RAID level 5 with rotated sparing, Chained
98declustering,  and Interleaved declustering.  The reader is referred
99to the RAIDframe documentation mentioned in the
100.Sx HISTORY
101section for more detail on these various RAID configurations.
102.Pp
103Depending on the parity level configured, the device driver can
104support the failure of component drives.  The number of failures
105allowed depends on the parity level selected.  If the driver is able
106to handle drive failures, and a drive does fail, then the system is
107operating in "degraded mode".  In this mode, all missing data must be
108reconstructed from the data and parity present on the other
109components.  This results in much slower data accesses, but
110does mean that a failure need not bring the system to a complete halt.
111.Pp
112The RAID driver supports and enforces the use of
113.Sq component labels .
114A
115.Sq component label
116contains important information about the component, including a
117user-specified serial number, the row and column of that component in
118the RAID set, and whether the data (and parity) on the component is
119.Sq clean .
120If the driver determines that the labels are very inconsistent with
121respect to each other (e.g. two or more serial numbers do not match)
122or that the component label is not consistent with it's assigned place
123in the set (e.g. the component label claims the component should be
124the 3rd one a 6-disk set, but the RAID set has it as the 3rd component
125in a 5-disk set) then the device will fail to configure.  If the
126driver determines that exactly one component label seems to be
127incorrect, and the RAID set is being configured as a set that supports
128a single failure, then the RAID set will be allowed to configure, but
129the incorrectly labeled component will be marked as
130.Sq failed ,
131and the RAID set will begin operation in degraded mode.
132If all of the components are consistent among themselves, the RAID set
133will configure normally.
134.Pp
135Component labels are also used to support the auto-detection and
136auto-configuration of RAID sets.  A RAID set can be flagged as
137auto-configurable, in which case it will be configured automatically
138during the kernel boot process.  RAID file systems which are
139automatically configured are also eligible to be the root file system.
140There is currently only limited support (alpha and pmax architectures)
141for booting a kernel directly from a RAID 1 set, and no support for
142booting from any other RAID sets.  To use a RAID set as the root
143file system, a kernel is usually obtained from a small non-RAID
144partition, after which any auto-configuring RAID set can be used for the
145root file system.  See
146.Xr raidctl 8
147for more information on auto-configuration of RAID sets.
148.Pp
149The driver supports
150.Sq hot spares ,
151disks which are on-line, but are not
152actively used in an existing file system.  Should a disk fail, the
153driver is capable of reconstructing the failed disk onto a hot spare
154or back onto a replacement drive.
155If the components are hot swappable, the failed disk can then be
156removed, a new disk put in its place, and a copyback operation
157performed.  The copyback operation, as its name indicates, will copy
158the reconstructed data from the hot spare to the previously failed
159(and now replaced) disk.  Hot spares can also be hot-added using
160.Xr raidctl 8 .
161.Pp
162If a component cannot be detected when the RAID device is configured,
163that component will be simply marked as 'failed'.
164.Pp
165The user-land utility for doing all
166.Nm
167configuration and other operations
168is
169.Xr raidctl 8 .
170Most importantly,
171.Xr raidctl 8
172must be used with the
173.Fl i
174option to initialize all RAID sets.  In particular, this
175initialization includes re-building the parity data.  This rebuilding
176of parity data is also required when either a) a new RAID device is
177brought up for the first time or b) after an un-clean shutdown of a
178RAID device.  By using the
179.Fl P
180option to
181.Xr raidctl 8 ,
182and performing this on-demand recomputation of all parity
183before doing a
184.Xr fsck 8
185or a
186.Xr newfs 8 ,
187file system integrity and parity integrity can be ensured.  It bears
188repeating again that parity recomputation is
189.Ar required
190before any file systems are created or used on the RAID device.  If the
191parity is not correct, then missing data cannot be correctly recovered.
192.Pp
193RAID levels may be combined in a hierarchical fashion.  For example, a RAID 0
194device can be constructed out of a number of RAID 5 devices (which, in turn,
195may be constructed out of the physical disks, or of other RAID devices).
196.Pp
197It is important that drives be hard-coded at their respective
198addresses (i.e. not left free-floating, where a drive with SCSI ID of
1994 can end up as
200.Pa /dev/sd0c )
201for well-behaved functioning of the RAID device.  This is true for all
202types of drives, including IDE, HP-IB, etc.  For normal SCSI drives, for
203example, the following can be used to fix the device addresses:
204.Bd -unfilled -offset indent
205sd0     at scsibus0 target 0 lun ?      # SCSI disk drives
206sd1     at scsibus0 target 1 lun ?      # SCSI disk drives
207sd2     at scsibus0 target 2 lun ?      # SCSI disk drives
208sd3     at scsibus0 target 3 lun ?      # SCSI disk drives
209sd4     at scsibus0 target 4 lun ?      # SCSI disk drives
210sd5     at scsibus0 target 5 lun ?      # SCSI disk drives
211sd6     at scsibus0 target 6 lun ?      # SCSI disk drives
212.Ed
213.Pp
214See
215.Xr sd 4
216for more information.  The rationale for fixing the device addresses
217is as follows: Consider a system with three SCSI drives at SCSI ID's
2184, 5, and 6, and which map to components
219.Pa /dev/sd0e ,
220.Pa /dev/sd1e ,
221and
222.Pa /dev/sd2e
223of a RAID 5 set.  If the drive with SCSI ID 5 fails, and the
224system reboots, the old
225.Pa /dev/sd2e
226will show up as
227.Pa /dev/sd1e .
228The RAID driver is able to detect that component positions have changed, and
229will not allow normal configuration.  If the device addresses are hard
230coded, however, the RAID driver would detect that the middle component
231is unavailable, and bring the RAID 5 set up in degraded mode.  Note
232that the auto-detection and auto-configuration code does not care
233about where the components live.  The auto-configuration code will
234correctly configure a device even after any number of the components
235have been re-arranged.
236.Pp
237The first step to using the
238.Nm
239driver is to ensure that it is suitably configured in the kernel.  This is
240done by adding a line similar to:
241.Bd -unfilled -offset indent
242pseudo-device   raid   4       # RAIDframe disk device
243.Ed
244.Pp
245to the kernel configuration file.  The
246.Sq count
247argument (
248.Sq 4 ,
249in this case), specifies the number of RAIDframe drivers to configure.
250To turn on component auto-detection and auto-configuration of RAID
251sets, simply add:
252.Bd -unfilled -offset indent
253options    RAID_AUTOCONFIG
254.Ed
255.Pp
256to the kernel configuration file.
257.Pp
258All component partitions must be of the type
259.Dv FS_BSDFFS
260(e.g. 4.2BSD) or
261.Dv FS_RAID .
262The use of the latter is strongly encouraged, and is required if
263auto-configuration of the RAID set is desired.  Since RAIDframe leaves
264room for disklabels, RAID components can be simply raw disks, or
265partitions which use an entire disk.
266.Pp
267A more detailed treatment of actually using a
268.Nm
269device is found in
270.Xr raidctl 8 .
271It is highly recommended that the steps to reconstruct, copyback, and
272re-compute parity are well understood by the system administrator(s)
273.Ar before
274a component failure.  Doing the wrong thing when a component fails may
275result in data loss.
276.Pp
277Additional internal consistency checking can be enabled by specifying:
278.Bd -unfilled -offset indent
279options    RAID_DIAGNOSTIC
280.Ed
281.Pp
282These assertions are disabled by default in order to improve
283performance.
284.Sh WARNINGS
285Certain RAID levels (1, 4, 5, 6, and others) can protect against some
286data loss due to component failure.  However the loss of two
287components of a RAID 4 or 5 system, or the loss of a single component
288of a RAID 0 system, will result in the entire file systems on that RAID
289device being lost.
290RAID is
291.Ar NOT
292a substitute for good backup practices.
293.Pp
294Recomputation of parity
295.Ar MUST
296be performed whenever there is a chance that it may have been
297compromised.  This includes after system crashes, or before a RAID
298device has been used for the first time.  Failure to keep parity
299correct will be catastrophic should a component ever fail -- it is
300better to use RAID 0 and get the additional space and speed, than it
301is to use parity, but not keep the parity correct.  At least with RAID
3020 there is no perception of increased data security.
303.Sh FILES
304.Bl -tag -width /dev/XXrXraidX -compact
305.It Pa /dev/{,r}raid*
306.Nm
307device special files.
308.El
309.Sh SEE ALSO
310.Xr sd 4 ,
311.Xr MAKEDEV 8 ,
312.Xr config 8 ,
313.Xr fsck 8 ,
314.Xr mount 8 ,
315.Xr newfs 8 ,
316.Xr raidctl 8
317.Sh HISTORY
318The
319.Nm
320driver in
321.Nx
322is a port of RAIDframe, a framework for rapid prototyping of RAID
323structures developed by the folks at the Parallel Data Laboratory at
324Carnegie Mellon University (CMU).  RAIDframe, as originally distributed
325by CMU, provides a RAID simulator for a number of different
326architectures, and a user-level device driver and a kernel device
327driver for Digital Unix.  The
328.Nm
329driver is a kernelized version of RAIDframe v1.1.
330.Pp
331A more complete description of the internals and functionality of
332RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
333for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
334Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
335Parallel Data Laboratory of Carnegie Mellon University.
336The
337.Nm
338driver first appeared in
339.Nx 1.4 .
340.Sh COPYRIGHT
341.Bd -unfilled
342The RAIDframe Copyright is as follows:
343.Pp
344Copyright (c) 1994-1996 Carnegie-Mellon University.
345All rights reserved.
346.Pp
347Permission to use, copy, modify and distribute this software and
348its documentation is hereby granted, provided that both the copyright
349notice and this permission notice appear in all copies of the
350software, derivative works or modified versions, and any portions
351thereof, and that both notices appear in supporting documentation.
352.Pp
353CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
354CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
355FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
356.Pp
357Carnegie Mellon requests users of this software to return to
358.Pp
359 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
360 School of Computer Science
361 Carnegie Mellon University
362 Pittsburgh PA 15213-3890
363.Pp
364any improvements or extensions that they make and grant Carnegie the
365rights to redistribute these changes.
366.Ed
367