xref: /netbsd-src/share/man/man4/raid.4 (revision 481fca6e59249d8ffcf24fef7cfbe7b131bfb080)
1.\"     $NetBSD: raid.4,v 1.15 2000/05/13 15:22:18 mycroft Exp $
2.\"
3.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Greg Oster
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\"
38.\" Copyright (c) 1995 Carnegie-Mellon University.
39.\" All rights reserved.
40.\"
41.\" Author: Mark Holland
42.\"
43.\" Permission to use, copy, modify and distribute this software and
44.\" its documentation is hereby granted, provided that both the copyright
45.\" notice and this permission notice appear in all copies of the
46.\" software, derivative works or modified versions, and any portions
47.\" thereof, and that both notices appear in supporting documentation.
48.\"
49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
50.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
52.\"
53.\" Carnegie Mellon requests users of this software to return to
54.\"
55.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
56.\"  School of Computer Science
57.\"  Carnegie Mellon University
58.\"  Pittsburgh PA 15213-3890
59.\"
60.\" any improvements or extensions that they make and grant Carnegie the
61.\" rights to redistribute these changes.
62.\"
63.Dd November 9, 1998
64.Dt RAID 4
65.Os
66.Sh NAME
67.Nm raid
68.Nd RAIDframe disk driver
69.Sh SYNOPSIS
70.Cd "pseudo-device raid" Op Ar count
71.Sh DESCRIPTION
72The
73.Nm
74driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to NetBSD.  This
75document assumes that the reader has at least some familiarity with RAID
76and RAID concepts.  The reader is also assumed to know how to configure
77disks and pseudo-devices into kernels, how to generate kernels, and how
78to partition disks.
79.Pp
80RAIDframe provides a number of different RAID levels including:
81.Bl -tag -width indent
82.It RAID 0
83provides simple data striping across the components.
84.It RAID 1
85provides mirroring.
86.It RAID 4
87provides data striping across the components, with parity
88stored on a dedicated drive (in this case, the last component).
89.It RAID 5
90provides data striping across the components, with parity
91distributed across all the components.
92.El
93.Pp
94There are a wide variety of other RAID levels supported by RAIDframe,
95including Even-Odd parity, RAID level 5 with rotated sparing, Chained
96declustering,  and Interleaved declustering.  The reader is referred
97to the RAIDframe documentation mentioned in the
98.Sx HISTORY
99section for more detail on these various RAID configurations.
100.Pp
101Depending on the parity level configured, the device driver can
102support the failure of component drives.  The number of failures
103allowed depends on the parity level selected.  If the driver is able
104to handle drive failures, and a drive does fail, then the system is
105operating in "degraded mode".  In this mode, all missing data must be
106reconstructed from the data and parity present on the other
107components.  This results in much slower data accesses, but
108does mean that a failure need not bring the system to a complete halt.
109.Pp
110The RAID driver supports and enforces the use of
111.Sq component labels .
112A
113.Sq component label
114contains important information about the component, including a
115user-specified serial number, the row and column of that component in
116the RAID set, and whether the data (and parity) on the component is
117.Sq clean .
118If the driver determines that the labels are very inconsistent with
119respect to each other (e.g. two or more serial numbers do not match)
120or that the component label is not consistent with it's assigned place
121in the set (e.g. the component label claims the component should be
122the 3rd one a 6-disk set, but the RAID set has it as the 3rd component
123in a 5-disk set) then the device will fail to configure.  If the
124driver determines that exactly one component label seems to be
125incorrect, and the RAID set is being configured as a set that supports
126a single failure, then the RAID set will be allowed to configure, but
127the incorrectly labeled component will be marked as
128.Sq failed ,
129and the RAID set will begin operation in degraded mode.
130If all of the components are consistent among themselves, the RAID set
131will configure normally.
132.Pp
133Component labels are also used to support the auto-detection and
134auto-configuration of RAID sets.  A RAID set can be flagged as
135auto-configurable, in which case it will be configured automatically
136during the kernel boot process.  RAID filesystems which are
137automatically configured are also eligible to be the root filesystem.
138While there is no support for booting directly from a RAID set, it is
139possible to boot from a small partition which contains a kernel, and
140have the root filesystem on a RAID set.  See
141.Xr raidctl 8
142for more information on auto-configuration of RAID sets.
143.Pp
144The driver supports
145.Sq hot spares ,
146disks which are on-line, but are not
147actively used in an existing filesystem.  Should a disk fail, the
148driver is capable of reconstructing the failed disk onto a hot spare
149or back onto a replacement drive.
150If the components are hot swapable, the failed disk can then be
151removed, a new disk put in its place, and a copyback operation
152performed.  The copyback operation, as its name indicates, will copy
153the reconstructed data from the hot spare to the previously failed
154(and now replaced) disk.  Hot spares can also be hot-added using
155.Xr raidctl 8 .
156.Pp
157If a component cannot be detected when the RAID device is configured,
158that component will be simply marked as 'failed'.
159.Pp
160The user-land utility for doing all
161.Nm
162configuration and other operations
163is
164.Xr raidctl 8 .
165Most importantly,
166.Xr raidctl 8
167must be used with the
168.Fl i
169option to initialize all RAID sets.  In particular, this
170initialization includes re-building the parity data.  This rebuilding
171of parity data is also required when either a) a new RAID device is
172brought up for the first time or b) after an un-clean shutdown of a
173RAID device.  By using the
174.Fl P
175option to
176.Xr raidctl 8 ,
177and performing this on-demand recomputation of all parity
178before doing a
179.Xr fsck 8
180or a
181.Xr newfs 8 ,
182filesystem integrity and parity integrity can be ensured.  It bears
183repeating again that parity recomputation is
184.Ar required
185before any filesystems are created or used on the RAID device.  If the
186parity is not correct, then missing data cannot be correctly recovered.
187.Pp
188RAID levels may be combined in a hierarchical fashion.  For example, a RAID 0
189device can be constructed out of a number of RAID 5 devices (which, in turn,
190may be constructed out of the physical disks, or of other RAID devices).
191.Pp
192It is important that drives be hard-coded at their respective
193addresses (i.e. not left free-floating, where a drive with SCSI ID of
1944 can end up as /dev/sd0c) for well-behaved functioning of the RAID
195device.  This is true for all types of drives, including IDE, HP-IB,
196etc.  For normal SCSI drives, for example, the following can be used
197to fix the device addresses:
198.Bd -unfilled -offset indent
199sd0     at scsibus0 target 0 lun ?      # SCSI disk drives
200sd1     at scsibus0 target 1 lun ?      # SCSI disk drives
201sd2     at scsibus0 target 2 lun ?      # SCSI disk drives
202sd3     at scsibus0 target 3 lun ?      # SCSI disk drives
203sd4     at scsibus0 target 4 lun ?      # SCSI disk drives
204sd5     at scsibus0 target 5 lun ?      # SCSI disk drives
205sd6     at scsibus0 target 6 lun ?      # SCSI disk drives
206.Ed
207.Pp
208See
209.Xr sd 4
210for more information.  The rationale for fixing the device addresses
211is as follows: Consider a system with three SCSI drives at SCSI ID's
2124, 5, and 6, and which map to components /dev/sd0e, /dev/sd1e, and
213/dev/sd2e of a RAID 5 set.  If the drive with SCSI ID 5 fails, and the
214system reboots, the old /dev/sd2e will show up as /dev/sd1e.  The RAID
215driver is able to detect that component positions have changed, and
216will not allow normal configuration.  If the device addresses are hard
217coded, however, the RAID driver would detect that the middle component
218is unavailable, and bring the RAID 5 set up in degraded mode.  Note
219that the auto-detection and auto-configuration code does not care
220about where the components live.  The auto-configuration code will
221correctly configure a device even after any number of the components
222have been re-arranged.
223.Pp
224The first step to using the
225.Nm
226driver is to ensure that it is suitably configured in the kernel.  This is
227done by adding a line similar to:
228.Bd -unfilled -offset indent
229pseudo-device   raid   4       # RAIDframe disk device
230.Ed
231.Pp
232to the kernel configuration file.  The
233.Sq count
234argument (
235.Sq 4 ,
236in this case), specifies the number of RAIDframe drivers to configure.
237To turn on component auto-detection and auto-configuration of RAID
238sets, simply add:
239.Bd -unfilled -offset indent
240options    RAID_AUTOCONFIG
241.Ed
242.Pp
243to the kernel configuration file.
244.Pp
245All component partitions must be of the type
246.Dv FS_BSDFFS
247(e.g. 4.2BSD) or
248.Dv FS_RAID .
249The use of the latter is strongly encouraged, and is required if
250auto-configuration of the RAID set is desired.  Since RAIDframe leaves
251room for disklabels, RAID components can be simply raw disks, or
252partitions which use an entire disk.
253.Pp
254A more detailed treatment of actually using a
255.Nm
256device is found in
257.Xr raidctl 8 .
258It is highly recommended that the steps to reconstruct, copyback, and
259re-compute parity are well understood by the system administrator(s)
260.Ar before
261a component failure.  Doing the wrong thing when a component fails may
262result in data loss.
263.Pp
264.Sh WARNINGS
265Certain RAID levels (1, 4, 5, 6, and others) can protect against some
266data loss due to component failure.  However the loss of two
267components of a RAID 4 or 5 system, or the loss of a single component
268of a RAID 0 system, will result in the entire filesystems on that RAID
269device being lost.
270RAID is
271.Ar NOT
272a substitute for good backup practices.
273.Pp
274Recomputation of parity
275.Ar MUST
276be performed whenever there is a chance that it may have been
277compromised.  This includes after system crashes, or before a RAID
278device has been used for the first time.  Failure to keep parity
279correct will be catastrophic should a component ever fail -- it is
280better to use RAID 0 and get the additional space and speed, than it
281is to use parity, but not keep the parity correct.  At least with RAID
2820 there is no perception of increased data security.
283.Pp
284.Sh FILES
285.Bl -tag -width /dev/XXrXraidX -compact
286.It Pa /dev/{,r}raid*
287.Nm
288device special files.
289.El
290.Pp
291.Sh SEE ALSO
292.Xr MAKEDEV 8 ,
293.Xr raidctl 8 ,
294.Xr config 8 ,
295.Xr fsck 8 ,
296.Xr mount 8 ,
297.Xr newfs 8 ,
298.Xr sd 4
299.Sh HISTORY
300The
301.Nm
302driver in
303.Nx
304is a port of RAIDframe, a framework for rapid prototyping of RAID
305structures developed by the folks at the Parallel Data Laboratory at
306Carnegie Mellon University (CMU).  RAIDframe, as originally distributed
307by CMU, provides a RAID simulator for a number of different
308architectures, and a user-level device driver and a kernel device
309driver for Digital Unix.  The
310.Nm
311driver is a kernelized version of RAIDframe v1.1.
312.Pp
313A more complete description of the internals and functionality of
314RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
315for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
316Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
317Parallel Data Laboratory of Carnegie Mellon University.
318The
319.Nm
320driver first appeared in
321.Nx 1.4 .
322.Sh COPYRIGHT
323.Bd -unfilled
324The RAIDframe Copyright is as follows:
325
326Copyright (c) 1994-1996 Carnegie-Mellon University.
327All rights reserved.
328
329Permission to use, copy, modify and distribute this software and
330its documentation is hereby granted, provided that both the copyright
331notice and this permission notice appear in all copies of the
332software, derivative works or modified versions, and any portions
333thereof, and that both notices appear in supporting documentation.
334
335CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
336CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
337FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
338
339Carnegie Mellon requests users of this software to return to
340
341 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
342 School of Computer Science
343 Carnegie Mellon University
344 Pittsburgh PA 15213-3890
345
346any improvements or extensions that they make and grant Carnegie the
347rights to redistribute these changes.
348.Ed
349