xref: /netbsd-src/share/man/man4/raid.4 (revision 3b435a73967be44dfb4a27315acd72bfacde430c)
1.\"     $NetBSD: raid.4,v 1.7 1999/10/16 20:17:29 kristerw Exp $
2.\"
3.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Greg Oster
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\"
38.\" Copyright (c) 1995 Carnegie-Mellon University.
39.\" All rights reserved.
40.\"
41.\" Author: Mark Holland
42.\"
43.\" Permission to use, copy, modify and distribute this software and
44.\" its documentation is hereby granted, provided that both the copyright
45.\" notice and this permission notice appear in all copies of the
46.\" software, derivative works or modified versions, and any portions
47.\" thereof, and that both notices appear in supporting documentation.
48.\"
49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
50.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
52.\"
53.\" Carnegie Mellon requests users of this software to return to
54.\"
55.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
56.\"  School of Computer Science
57.\"  Carnegie Mellon University
58.\"  Pittsburgh PA 15213-3890
59.\"
60.\" any improvements or extensions that they make and grant Carnegie the
61.\" rights to redistribute these changes.
62.\"
63.Dd November 9, 1998
64.Dt RAID 4
65.Os
66.Sh NAME
67.Nm raid
68.Nd RAIDframe Disk Driver
69.Sh SYNOPSIS
70.Cd "pseudo-device raid" Op Ar count
71.Sh DESCRIPTION
72The
73.Nm
74driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to NetBSD.  This
75document assumes that the reader has at least some familiarity with RAID
76and RAID concepts.  The reader is also assumed to know how to configure
77disks and pseudo-devices into kernels, how to generate kernels, and how
78to partition disks.
79.Pp
80RAIDframe provides a number of different RAID levels including:
81.Bl -tag -width indent
82.It RAID 0
83provides simple data striping across the components.
84.It RAID 1
85provides mirroring.
86.It RAID 4
87provides data striping across the components, with parity
88stored on a dedicated drive (in this case, the last component).
89.It RAID 5
90provides data striping across the components, with parity
91distributed across all the components.
92.El
93.Pp
94There are a wide variety of other RAID levels supported by RAIDframe,
95including Even-Odd parity, RAID level 5 with rotated sparing, Chained
96declustering,  and Interleaved declustering.  The reader is referred
97to the RAIDframe documentation mentioned in the
98.Sx HISTORY
99section for more detail on these various RAID configurations.
100.Pp
101Depending on the parity level configured, the device driver can
102support the failure of component drives.  The number of failures
103allowed depends on the parity level selected.  If the driver is able
104to handle drive failures, and a drive does fail, then the system is
105operating in "degraded mode".  In this mode, all missing data must be
106reconstructed from the data and parity present on the other
107components.  This results in much slower data accesses, but
108does mean that a failure need not bring the system to a complete halt.
109.Pp
110The RAID driver supports and enforces the use of
111.Sq component labels .
112A
113.Sq component label
114contains important information about the component, including a
115user-specified serial number, the row and column of that component in the RAID
116set, and whether the data (and parity) on the component is
117.Sq clean .
118If the driver determines that the labels are very inconsistent with
119respect to each other (e.g. two or more serial numbers do not match)
120or that the component label is not consistent with it's assigned place
121in the set (e.g. the component label claims the component should be
122the 3rd one a 6-disk set, but the RAID set has it as the 3rd component
123in a 5-disk set) then the device will fail to configure.  If the
124driver determines that exactly one component label seems to be
125incorrect, and the RAID set is being configured as a set that supports
126a single failure, then the RAID set will be allowed to configure, but
127the incorrectly labeled component will be marked as
128.Sq failed ,
129and the RAID set will begin operation in degraded mode.
130If all of the components are consistent among themselves, the RAID set
131will configure normally.
132.Pp
133The driver supports
134.Sq hot spares ,
135disks which are on-line, but are not
136actively used in an existing filesystem.  Should a disk fail, the
137driver is capable of reconstructing the failed disk onto a hot spare
138or back onto a replacement drive.
139If the components are hot swapable, the failed disk can then be
140removed, a new disk put in its place, and a copyback operation
141performed.  The copyback operation, as its name indicates, will copy
142the reconstructed data from the hot spare to the previously failed
143(and now replaced) disk.  Hot spares can also be hot-added using
144.Xr raidctl 8 .
145.Pp
146If a component cannot be detected when the RAID device is configured,
147that component will be simply marked as 'failed'.
148.Pp
149The user-land utility for doing all
150.Nm
151configuration and other operations
152is
153.Xr raidctl 8 .
154For any of the RAID flavours which have parity data,
155.Xr raidctl 8
156must be used with the
157.Fl i
158option to re-write the data when either a) a new RAID device is
159brought up for the first time or b) after an un-clean shutdown of a
160RAID device.  By performing this on-demand recomputation of all parity
161before doing a
162.Xr fsck 8
163or a
164.Xr newfs 8
165filesystem integrity and parity integrity can be ensured.  It bears
166repeating again that parity recomputation is
167.Ar required
168before any filesystems are created or used on the RAID device.  If the
169parity is not correct, then missing data cannot be correctly recovered.
170.Pp
171RAID levels may be combined in a hierarchical fashion.  For example, a RAID 0
172device can be constructed out of a number of RAID 5 devices (which, in turn,
173may be constructed out of the physical disks, or of other RAID devices).
174.Pp
175It is important that drives be hard-coded at their respective
176addresses (i.e. not left free-floating, where a drive with SCSI ID of
1774 can end up as /dev/sd0c) for well-behaved functioning of the RAID
178device.  For normal SCSI drives, for example, the following can be
179used to fix the device addresses:
180.Bd -unfilled -offset indent
181sd0     at scsibus0 target 0 lun ?      # SCSI disk drives
182sd1     at scsibus0 target 1 lun ?      # SCSI disk drives
183sd2     at scsibus0 target 2 lun ?      # SCSI disk drives
184sd3     at scsibus0 target 3 lun ?      # SCSI disk drives
185sd4     at scsibus0 target 4 lun ?      # SCSI disk drives
186sd5     at scsibus0 target 5 lun ?      # SCSI disk drives
187sd6     at scsibus0 target 6 lun ?      # SCSI disk drives
188.Ed
189.Pp
190See
191.Xr sd 4
192for more information.  The rationale for fixing the device addresses
193is as follows: Consider a system with three SCSI drives at SCSI ID's
1944, 5, and 6, and which map to components /dev/sd0e, /dev/sd1e, and
195/dev/sd2e of a RAID 5 set.  If the drive with SCSI ID 5 fails, and the
196system reboots, the old /dev/sd2e will show up as /dev/sd1e.  The RAID
197driver is able to detect that component positions have changed, and
198will not allow normal configuration.  If the device addresses are hard
199coded, however, the RAID driver would detect that the middle component
200is unavailable, and bring the RAID 5 set up in degraded mode.
201.Pp
202The first step to using the
203.Nm
204driver is to ensure that it is suitably configured in the kernel.  This is
205done by adding a line similar to:
206.Bd -unfilled -offset indent
207pseudo-device   raid   4       # RAIDframe disk device
208.Ed
209.Pp
210to the kernel configuration file.  The
211.Sq count
212argument (
213.Sq 4 ,
214in this case), specifies the number of RAIDframe drivers to configure.
215At the time of this writing, 4 is the MAXIMUM of
216.Nm
217devices which are supported.  This will change as soon as kernel threads
218are available.
219.Pp
220In all cases the
221.Sq raw
222partitions of the disks
223.Pa must not
224be combined.  Rather, each component partition should be offset by at least one
225cylinder from the beginning of that component disk.  This ensures that
226the disklabels for the component disks do not conflict with the
227disklabel for the
228.Nm
229device.
230As well, all component paritions must be of the type
231.Dv FS_BSDFFS .
232.Pp
233A more detailed treatment of actually using a
234.Nm
235device is found in
236.Xr raidctl 8 .
237It is highly recommended that the steps to reconstruct, copyback, and
238re-compute parity are well understood by the system administrator(s)
239.Ar before
240a component failure.  Doing the wrong thing when a component fails may
241result in data loss.
242.Pp
243.Sh WARNINGS
244Certain RAID levels (1, 4, 5, 6, and others) can protect against some
245data loss due to component failure.  However the loss of two
246components of a RAID 4 or 5 system, or the loss of a single component
247of a RAID 0 system, will result in the entire filesystems on that RAID
248device being lost.
249RAID is
250.Ar NOT
251a substitute for good backup practices.
252.Pp
253Recomputation of parity
254.Ar MUST
255be performed whenever there is a chance that it may have been
256compromised.  This includes after system crashes, or before a RAID
257device has been used for the first time.  Failure to keep parity
258correct will be catastrophic should a component ever fail -- it is
259better to use RAID 0 and get the additional space and speed, than it
260is to use parity, but not keep the parity correct.  At least with RAID
2610 there is no perception of increased data security.
262.Pp
263.Sh FILES
264.Bl -tag -width /dev/XXrXraidX -compact
265.It Pa /dev/{,r}raid*
266.Nm
267device special files.
268.El
269.Pp
270.Sh SEE ALSO
271.Xr MAKEDEV 8 ,
272.Xr raidctl 8 ,
273.Xr config 8 ,
274.Xr fsck 8 ,
275.Xr mount 8 ,
276.Xr newfs 8 ,
277.Xr sd 4
278.Sh HISTORY
279The
280.Nm
281driver in
282.Nx
283is a port of RAIDframe, a framework for rapid prototyping of RAID
284structures developed by the folks at the Parallel Data Laboratory at
285Carnegie Mellon University (CMU).  RAIDframe, as originally distributed
286by CMU, provides a RAID simulator for a number of different
287architectures, and a user-level device driver and a kernel device
288driver for Digital Unix.  The
289.Nm
290driver is a kernelized version of RAIDframe v1.1.
291.Pp
292A more complete description of the internals and functionality of
293RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
294for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
295Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
296Parallel Data Laboratory of Carnegie Mellon University.
297The
298.Nm
299driver first appeared in
300.Nx 1.4 .
301.Sh COPYRIGHT
302.Bd -unfilled
303
304The RAIDframe Copyright is as follows:
305
306Copyright (c) 1994-1996 Carnegie-Mellon University.
307All rights reserved.
308
309Permission to use, copy, modify and distribute this software and
310its documentation is hereby granted, provided that both the copyright
311notice and this permission notice appear in all copies of the
312software, derivative works or modified versions, and any portions
313thereof, and that both notices appear in supporting documentation.
314
315CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
316CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
317FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
318
319Carnegie Mellon requests users of this software to return to
320
321 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
322 School of Computer Science
323 Carnegie Mellon University
324 Pittsburgh PA 15213-3890
325
326any improvements or extensions that they make and grant Carnegie the
327rights to redistribute these changes.
328
329.Ed
330