xref: /netbsd-src/share/man/man4/raid.4 (revision 001c68bd94f75ce9270b69227c4199fbf34ee396)
1.\"     $NetBSD: raid.4,v 1.25 2003/04/13 01:45:06 wiz Exp $
2.\"
3.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Greg Oster
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\"
38.\" Copyright (c) 1995 Carnegie-Mellon University.
39.\" All rights reserved.
40.\"
41.\" Author: Mark Holland
42.\"
43.\" Permission to use, copy, modify and distribute this software and
44.\" its documentation is hereby granted, provided that both the copyright
45.\" notice and this permission notice appear in all copies of the
46.\" software, derivative works or modified versions, and any portions
47.\" thereof, and that both notices appear in supporting documentation.
48.\"
49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
50.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
52.\"
53.\" Carnegie Mellon requests users of this software to return to
54.\"
55.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
56.\"  School of Computer Science
57.\"  Carnegie Mellon University
58.\"  Pittsburgh PA 15213-3890
59.\"
60.\" any improvements or extensions that they make and grant Carnegie the
61.\" rights to redistribute these changes.
62.\"
63.Dd April 9, 2003
64.Dt RAID 4
65.Os
66.Sh NAME
67.Nm raid
68.Nd RAIDframe disk driver
69.Sh SYNOPSIS
70.Cd options RAID_AUTOCONFIG
71.Cd options RAID_DIAGNOSTIC
72.Cd options RF_INCLUDE_EVENODD=n
73.Cd options RF_INCLUDE_RAID5_RS=n
74.Cd options RF_INCLUDE_PARITYLOGGING=n
75.Cd options RF_INCLUDE_CHAINDECLUSTER=n
76.Cd options RF_INCLUDE_INTERDECLUSTER=n
77.Cd options RF_INCLUDE_PARITY_DECLUSTERING=n
78.Cd options RF_INCLUDE_PARITY_DECLUSTERING_DS=n
79.Pp
80.Cd "pseudo-device raid" Op Ar count
81.Sh DESCRIPTION
82The
83.Nm
84driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to
85.Nx .
86This
87document assumes that the reader has at least some familiarity with RAID
88and RAID concepts.  The reader is also assumed to know how to configure
89disks and pseudo-devices into kernels, how to generate kernels, and how
90to partition disks.
91.Pp
92RAIDframe provides a number of different RAID levels including:
93.Bl -tag -width indent
94.It RAID 0
95provides simple data striping across the components.
96.It RAID 1
97provides mirroring.
98.It RAID 4
99provides data striping across the components, with parity
100stored on a dedicated drive (in this case, the last component).
101.It RAID 5
102provides data striping across the components, with parity
103distributed across all the components.
104.El
105.Pp
106There are a wide variety of other RAID levels supported by RAIDframe.
107The configuration file options to enable them are briefly outlined
108at the end of this section.
109.Pp
110Depending on the parity level configured, the device driver can
111support the failure of component drives.  The number of failures
112allowed depends on the parity level selected.  If the driver is able
113to handle drive failures, and a drive does fail, then the system is
114operating in "degraded mode".  In this mode, all missing data must be
115reconstructed from the data and parity present on the other
116components.  This results in much slower data accesses, but
117does mean that a failure need not bring the system to a complete halt.
118.Pp
119The RAID driver supports and enforces the use of
120.Sq component labels .
121A
122.Sq component label
123contains important information about the component, including a
124user-specified serial number, the row and column of that component in
125the RAID set, and whether the data (and parity) on the component is
126.Sq clean .
127If the driver determines that the labels are very inconsistent with
128respect to each other (e.g. two or more serial numbers do not match)
129or that the component label is not consistent with its assigned place
130in the set (e.g. the component label claims the component should be
131the 3rd one a 6-disk set, but the RAID set has it as the 3rd component
132in a 5-disk set) then the device will fail to configure.  If the
133driver determines that exactly one component label seems to be
134incorrect, and the RAID set is being configured as a set that supports
135a single failure, then the RAID set will be allowed to configure, but
136the incorrectly labeled component will be marked as
137.Sq failed ,
138and the RAID set will begin operation in degraded mode.
139If all of the components are consistent among themselves, the RAID set
140will configure normally.
141.Pp
142Component labels are also used to support the auto-detection and
143auto-configuration of RAID sets.  A RAID set can be flagged as
144auto-configurable, in which case it will be configured automatically
145during the kernel boot process.  RAID file systems which are
146automatically configured are also eligible to be the root file system.
147There is currently only limited support (alpha and pmax architectures)
148for booting a kernel directly from a RAID 1 set, and no support for
149booting from any other RAID sets.  To use a RAID set as the root
150file system, a kernel is usually obtained from a small non-RAID
151partition, after which any auto-configuring RAID set can be used for the
152root file system.  See
153.Xr raidctl 8
154for more information on auto-configuration of RAID sets.
155Note that with auto-configuration of RAID sets, it is no longer
156necessary to hard-code SCSI IDs of drives.
157The auto-configuration code will
158correctly configure a device even after any number of the components
159have had their device IDs changed or device names changed.
160.Pp
161The driver supports
162.Sq hot spares ,
163disks which are on-line, but are not
164actively used in an existing file system.  Should a disk fail, the
165driver is capable of reconstructing the failed disk onto a hot spare
166or back onto a replacement drive.
167If the components are hot swappable, the failed disk can then be
168removed, a new disk put in its place, and a copyback operation
169performed.  The copyback operation, as its name indicates, will copy
170the reconstructed data from the hot spare to the previously failed
171(and now replaced) disk.  Hot spares can also be hot-added using
172.Xr raidctl 8 .
173.Pp
174If a component cannot be detected when the RAID device is configured,
175that component will be simply marked as 'failed'.
176.Pp
177The user-land utility for doing all
178.Nm
179configuration and other operations
180is
181.Xr raidctl 8 .
182Most importantly,
183.Xr raidctl 8
184must be used with the
185.Fl i
186option to initialize all RAID sets.  In particular, this
187initialization includes re-building the parity data.  This rebuilding
188of parity data is also required when either a) a new RAID device is
189brought up for the first time or b) after an un-clean shutdown of a
190RAID device.  By using the
191.Fl P
192option to
193.Xr raidctl 8 ,
194and performing this on-demand recomputation of all parity
195before doing a
196.Xr fsck 8
197or a
198.Xr newfs 8 ,
199file system integrity and parity integrity can be ensured.  It bears
200repeating again that parity recomputation is
201.Ar required
202before any file systems are created or used on the RAID device.  If the
203parity is not correct, then missing data cannot be correctly recovered.
204.Pp
205RAID levels may be combined in a hierarchical fashion.  For example, a RAID 0
206device can be constructed out of a number of RAID 5 devices (which, in turn,
207may be constructed out of the physical disks, or of other RAID devices).
208.Pp
209The first step to using the
210.Nm
211driver is to ensure that it is suitably configured in the kernel.  This is
212done by adding a line similar to:
213.Bd -unfilled -offset indent
214pseudo-device   raid   4       # RAIDframe disk device
215.Ed
216.Pp
217to the kernel configuration file.  The
218.Sq count
219argument (
220.Sq 4 ,
221in this case), specifies the number of RAIDframe drivers to configure.
222To turn on component auto-detection and auto-configuration of RAID
223sets, simply add:
224.Bd -unfilled -offset indent
225options    RAID_AUTOCONFIG
226.Ed
227.Pp
228to the kernel configuration file.
229.Pp
230All component partitions must be of the type
231.Dv FS_BSDFFS
232(e.g. 4.2BSD) or
233.Dv FS_RAID .
234The use of the latter is strongly encouraged, and is required if
235auto-configuration of the RAID set is desired.  Since RAIDframe leaves
236room for disklabels, RAID components can be simply raw disks, or
237partitions which use an entire disk.
238.Pp
239A more detailed treatment of actually using a
240.Nm
241device is found in
242.Xr raidctl 8 .
243It is highly recommended that the steps to reconstruct, copyback, and
244re-compute parity are well understood by the system administrator(s)
245.Ar before
246a component failure.  Doing the wrong thing when a component fails may
247result in data loss.
248.Pp
249Additional internal consistency checking can be enabled by specifying:
250.Bd -unfilled -offset indent
251options    RAID_DIAGNOSTIC
252.Ed
253.Pp
254These assertions are disabled by default in order to improve
255performance.
256.Pp
257There are a number of less commonly used RAID levels supported by
258RAIDframe.
259These additional RAID types should be considered experimental, and
260may not be ready for production use.
261The various types and the options to enable them are shown here:
262.Pp
263For Even-Odd parity:
264.Bd -unfilled -offset indent
265options RF_INCLUDE_EVENODD=1
266.Ed
267.Pp
268For RAID level 5 with rotated sparing:
269.Bd -unfilled -offset indent
270options RF_INCLUDE_RAID5_RS=1
271.Ed
272.Pp
273For Parity Logging (highly experimental):
274.Bd -unfilled -offset indent
275options RF_INCLUDE_PARITYLOGGING=1
276.Ed
277.Pp
278For Chain Declustering:
279.Bd -unfilled -offset indent
280options RF_INCLUDE_CHAINDECLUSTER=1
281.Ed
282.Pp
283For Interleaved Declustering:
284.Bd -unfilled -offset indent
285options RF_INCLUDE_INTERDECLUSTER=1
286.Ed
287.Pp
288For Parity Declustering:
289.Bd -unfilled -offset indent
290options RF_INCLUDE_PARITY_DECLUSTERING=1
291.Ed
292.Pp
293For Parity Declustering with Distributed Spares:
294.Bd -unfilled -offset indent
295options RF_INCLUDE_PARITY_DECLUSTERING_DS=1
296.Ed
297.Pp
298The reader is referred to the RAIDframe documentation mentioned in the
299.Sx HISTORY
300section for more detail on these various RAID configurations.
301.Sh WARNINGS
302Certain RAID levels (1, 4, 5, 6, and others) can protect against some
303data loss due to component failure.  However the loss of two
304components of a RAID 4 or 5 system, or the loss of a single component
305of a RAID 0 system, will result in the entire file systems on that RAID
306device being lost.
307RAID is
308.Ar NOT
309a substitute for good backup practices.
310.Pp
311Recomputation of parity
312.Ar MUST
313be performed whenever there is a chance that it may have been
314compromised.  This includes after system crashes, or before a RAID
315device has been used for the first time.  Failure to keep parity
316correct will be catastrophic should a component ever fail -- it is
317better to use RAID 0 and get the additional space and speed, than it
318is to use parity, but not keep the parity correct.  At least with RAID
3190 there is no perception of increased data security.
320.Sh FILES
321.Bl -tag -width /dev/XXrXraidX -compact
322.It Pa /dev/{,r}raid*
323.Nm
324device special files.
325.El
326.Sh SEE ALSO
327.Xr sd 4 ,
328.Xr MAKEDEV 8 ,
329.Xr config 8 ,
330.Xr fsck 8 ,
331.Xr mount 8 ,
332.Xr newfs 8 ,
333.Xr raidctl 8
334.Sh HISTORY
335The
336.Nm
337driver in
338.Nx
339is a port of RAIDframe, a framework for rapid prototyping of RAID
340structures developed by the folks at the Parallel Data Laboratory at
341Carnegie Mellon University (CMU).  RAIDframe, as originally distributed
342by CMU, provides a RAID simulator for a number of different
343architectures, and a user-level device driver and a kernel device
344driver for Digital Unix.  The
345.Nm
346driver is a kernelized version of RAIDframe v1.1.
347.Pp
348A more complete description of the internals and functionality of
349RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
350for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
351Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
352Parallel Data Laboratory of Carnegie Mellon University.
353The
354.Nm
355driver first appeared in
356.Nx 1.4 .
357.Sh COPYRIGHT
358.Bd -unfilled
359The RAIDframe Copyright is as follows:
360.Pp
361Copyright (c) 1994-1996 Carnegie-Mellon University.
362All rights reserved.
363.Pp
364Permission to use, copy, modify and distribute this software and
365its documentation is hereby granted, provided that both the copyright
366notice and this permission notice appear in all copies of the
367software, derivative works or modified versions, and any portions
368thereof, and that both notices appear in supporting documentation.
369.Pp
370CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
371CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
372FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
373.Pp
374Carnegie Mellon requests users of this software to return to
375.Pp
376 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
377 School of Computer Science
378 Carnegie Mellon University
379 Pittsburgh PA 15213-3890
380.Pp
381any improvements or extensions that they make and grant Carnegie the
382rights to redistribute these changes.
383.Ed
384