xref: /netbsd-src/share/man/man4/raid.4 (revision 8b0f9554ff8762542c4defc4f70e1eb76fb508fa)
1.\"     $NetBSD: raid.4,v 1.33 2007/08/06 19:44:16 oster Exp $
2.\"
3.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Greg Oster
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\"
38.\" Copyright (c) 1995 Carnegie-Mellon University.
39.\" All rights reserved.
40.\"
41.\" Author: Mark Holland
42.\"
43.\" Permission to use, copy, modify and distribute this software and
44.\" its documentation is hereby granted, provided that both the copyright
45.\" notice and this permission notice appear in all copies of the
46.\" software, derivative works or modified versions, and any portions
47.\" thereof, and that both notices appear in supporting documentation.
48.\"
49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
50.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
52.\"
53.\" Carnegie Mellon requests users of this software to return to
54.\"
55.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
56.\"  School of Computer Science
57.\"  Carnegie Mellon University
58.\"  Pittsburgh PA 15213-3890
59.\"
60.\" any improvements or extensions that they make and grant Carnegie the
61.\" rights to redistribute these changes.
62.\"
63.Dd August 6, 2007
64.Dt RAID 4
65.Os
66.Sh NAME
67.Nm raid
68.Nd RAIDframe disk driver
69.Sh SYNOPSIS
70.Cd options RAID_AUTOCONFIG
71.Cd options RAID_DIAGNOSTIC
72.Cd options RF_ACC_TRACE=n
73.Cd options RF_DEBUG_MAP=n
74.Cd options RF_DEBUG_PSS=n
75.Cd options RF_DEBUG_QUEUE=n
76.Cd options RF_DEBUG_QUIESCE=n
77.Cd options RF_DEBUG_RECON=n
78.Cd options RF_DEBUG_STRIPELOCK=n
79.Cd options RF_DEBUG_VALIDATE_DAG=n
80.Cd options RF_DEBUG_VERIFYPARITY=n
81.Cd options RF_INCLUDE_CHAINDECLUSTER=n
82.Cd options RF_INCLUDE_EVENODD=n
83.Cd options RF_INCLUDE_INTERDECLUSTER=n
84.Cd options RF_INCLUDE_PARITY_DECLUSTERING=n
85.Cd options RF_INCLUDE_PARITY_DECLUSTERING_DS=n
86.Cd options RF_INCLUDE_PARITYLOGGING=n
87.Cd options RF_INCLUDE_RAID5_RS=n
88.Pp
89.Cd "pseudo-device raid" Op Ar count
90.Sh DESCRIPTION
91The
92.Nm
93driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to
94.Nx .
95This
96document assumes that the reader has at least some familiarity with RAID
97and RAID concepts.  The reader is also assumed to know how to configure
98disks and pseudo-devices into kernels, how to generate kernels, and how
99to partition disks.
100.Pp
101RAIDframe provides a number of different RAID levels including:
102.Bl -tag -width indent
103.It RAID 0
104provides simple data striping across the components.
105.It RAID 1
106provides mirroring.
107.It RAID 4
108provides data striping across the components, with parity
109stored on a dedicated drive (in this case, the last component).
110.It RAID 5
111provides data striping across the components, with parity
112distributed across all the components.
113.El
114.Pp
115There are a wide variety of other RAID levels supported by RAIDframe.
116The configuration file options to enable them are briefly outlined
117at the end of this section.
118.Pp
119Depending on the parity level configured, the device driver can
120support the failure of component drives.  The number of failures
121allowed depends on the parity level selected.  If the driver is able
122to handle drive failures, and a drive does fail, then the system is
123operating in "degraded mode".  In this mode, all missing data must be
124reconstructed from the data and parity present on the other
125components.  This results in much slower data accesses, but
126does mean that a failure need not bring the system to a complete halt.
127.Pp
128The RAID driver supports and enforces the use of
129.Sq component labels .
130A
131.Sq component label
132contains important information about the component, including a
133user-specified serial number, the row and column of that component in
134the RAID set, and whether the data (and parity) on the component is
135.Sq clean .
136The component label currently lives at the half-way point of the a
137.Sq reserved section
138located at the beginning of each component.
139This
140.Sq reserved section
141is RF_PROTECTED_SECTORS in length (64 blocks or 32Kbytes) and the
142component label is currently 1Kbyte in size.
143.Pp
144If the driver determines that the component labels are very inconsistent with
145respect to each other (e.g. two or more serial numbers do not match)
146or that the component label is not consistent with its assigned place
147in the set (e.g. the component label claims the component should be
148the 3rd one in a 6-disk set, but the RAID set has it as the 3rd component
149in a 5-disk set) then the device will fail to configure.  If the
150driver determines that exactly one component label seems to be
151incorrect, and the RAID set is being configured as a set that supports
152a single failure, then the RAID set will be allowed to configure, but
153the incorrectly labeled component will be marked as
154.Sq failed ,
155and the RAID set will begin operation in degraded mode.
156If all of the components are consistent among themselves, the RAID set
157will configure normally.
158.Pp
159Component labels are also used to support the auto-detection and
160autoconfiguration of RAID sets.  A RAID set can be flagged as
161autoconfigurable, in which case it will be configured automatically
162during the kernel boot process.  RAID file systems which are
163automatically configured are also eligible to be the root file system.
164There is currently only limited support (alpha, amd64, i386, pmax,
165sparc, sparc64, and vax architectures)
166for booting a kernel directly from a RAID 1 set, and no support for
167booting from any other RAID sets.  To use a RAID set as the root
168file system, a kernel is usually obtained from a small non-RAID
169partition, after which any autoconfiguring RAID set can be used for the
170root file system.  See
171.Xr raidctl 8
172for more information on autoconfiguration of RAID sets.
173Note that with autoconfiguration of RAID sets, it is no longer
174necessary to hard-code SCSI IDs of drives.
175The autoconfiguration code will
176correctly configure a device even after any number of the components
177have had their device IDs changed or device names changed.
178.Pp
179The driver supports
180.Sq hot spares ,
181disks which are on-line, but are not
182actively used in an existing file system.  Should a disk fail, the
183driver is capable of reconstructing the failed disk onto a hot spare
184or back onto a replacement drive.
185If the components are hot swappable, the failed disk can then be
186removed, a new disk put in its place, and a copyback operation
187performed.  The copyback operation, as its name indicates, will copy
188the reconstructed data from the hot spare to the previously failed
189(and now replaced) disk.  Hot spares can also be hot-added using
190.Xr raidctl 8 .
191.Pp
192If a component cannot be detected when the RAID device is configured,
193that component will be simply marked as 'failed'.
194.Pp
195The user-land utility for doing all
196.Nm
197configuration and other operations
198is
199.Xr raidctl 8 .
200Most importantly,
201.Xr raidctl 8
202must be used with the
203.Fl i
204option to initialize all RAID sets.  In particular, this
205initialization includes re-building the parity data.  This rebuilding
206of parity data is also required when either a) a new RAID device is
207brought up for the first time or b) after an un-clean shutdown of a
208RAID device.  By using the
209.Fl P
210option to
211.Xr raidctl 8 ,
212and performing this on-demand recomputation of all parity
213before doing a
214.Xr fsck 8
215or a
216.Xr newfs 8 ,
217file system integrity and parity integrity can be ensured.  It bears
218repeating again that parity recomputation is
219.Ar required
220before any file systems are created or used on the RAID device.  If the
221parity is not correct, then missing data cannot be correctly recovered.
222.Pp
223RAID levels may be combined in a hierarchical fashion.  For example, a RAID 0
224device can be constructed out of a number of RAID 5 devices (which, in turn,
225may be constructed out of the physical disks, or of other RAID devices).
226.Pp
227The first step to using the
228.Nm
229driver is to ensure that it is suitably configured in the kernel.  This is
230done by adding a line similar to:
231.Bd -unfilled -offset indent
232pseudo-device   raid   4       # RAIDframe disk device
233.Ed
234.Pp
235to the kernel configuration file.  The
236.Sq count
237argument (
238.Sq 4 ,
239in this case), specifies the number of RAIDframe drivers to configure.
240To turn on component auto-detection and autoconfiguration of RAID
241sets, simply add:
242.Bd -unfilled -offset indent
243options RAID_AUTOCONFIG
244.Ed
245.Pp
246to the kernel configuration file.
247.Pp
248All component partitions must be of the type
249.Dv FS_BSDFFS
250(e.g. 4.2BSD) or
251.Dv FS_RAID .
252The use of the latter is strongly encouraged, and is required if
253autoconfiguration of the RAID set is desired.  Since RAIDframe leaves
254room for disklabels, RAID components can be simply raw disks, or
255partitions which use an entire disk.
256.Pp
257A more detailed treatment of actually using a
258.Nm
259device is found in
260.Xr raidctl 8 .
261It is highly recommended that the steps to reconstruct, copyback, and
262re-compute parity are well understood by the system administrator(s)
263.Ar before
264a component failure.  Doing the wrong thing when a component fails may
265result in data loss.
266.Pp
267Additional internal consistency checking can be enabled by specifying:
268.Bd -unfilled -offset indent
269options RAID_DIAGNOSTIC
270.Ed
271.Pp
272These assertions are disabled by default in order to improve
273performance.
274.Pp
275RAIDframe supports an access tracing facility for tracking both
276requests made and performance of various parts of the RAID systems
277as the request is processed.
278To enable this tracing the following option may be specified:
279.Bd -unfilled -offset indent
280options RF_ACC_TRACE=1
281.Ed
282.Pp
283For extensive debugging there are a number of kernel options which
284will aid in performing extra diagnosis of various parts of the
285RAIDframe sub-systems.
286Note that in order to make full use of these options it is often
287necessary to enable one or more debugging options as listed in
288.Pa src/sys/dev/raidframe/rf_options.h .
289As well, these options are also only typically useful for people who wish
290to debug various parts of RAIDframe.
291The options include:
292.Pp
293For debugging the code which maps RAID addresses to physical
294addresses:
295.Bd -unfilled -offset indent
296options RF_DEBUG_MAP=1
297.Ed
298.Pp
299Parity stripe status debugging is enabled with:
300.Bd -unfilled -offset indent
301options RF_DEBUG_PSS=1
302.Ed
303.Pp
304Additional debugging for queuing is enabled with:
305.Bd -unfilled -offset indent
306options RF_DEBUG_QUEUE=1
307.Ed
308.Pp
309Problems with non-quiescent file systems should be easier to debug if
310the following is enabled:
311.Bd -unfilled -offset indent
312options RF_DEBUG_QUIESCE=1
313.Ed
314.Pp
315Stripelock debugging is enabled with:
316.Bd -unfilled -offset indent
317options RF_DEBUG_STRIPELOCK=1
318.Ed
319.Pp
320Additional diagnostic checks during reconstruction are enabled with:
321.Bd -unfilled -offset indent
322options RF_DEBUG_RECON=1
323.Ed
324.Pp
325Validation of the DAGs (Directed Acyclic Graphs) used to describe an
326I/O access can be performed when the following is enabled:
327.Bd -unfilled -offset indent
328options RF_DEBUG_VALIDATE_DAG=1
329.Ed
330.Pp
331Additional diagnostics during parity verification are enabled with:
332.Bd -unfilled -offset indent
333options RF_DEBUG_VERIFYPARITY=1
334.Ed
335.Pp
336There are a number of less commonly used RAID levels supported by
337RAIDframe.
338These additional RAID types should be considered experimental, and
339may not be ready for production use.
340The various types and the options to enable them are shown here:
341.Pp
342For Even-Odd parity:
343.Bd -unfilled -offset indent
344options RF_INCLUDE_EVENODD=1
345.Ed
346.Pp
347For RAID level 5 with rotated sparing:
348.Bd -unfilled -offset indent
349options RF_INCLUDE_RAID5_RS=1
350.Ed
351.Pp
352For Parity Logging (highly experimental):
353.Bd -unfilled -offset indent
354options RF_INCLUDE_PARITYLOGGING=1
355.Ed
356.Pp
357For Chain Declustering:
358.Bd -unfilled -offset indent
359options RF_INCLUDE_CHAINDECLUSTER=1
360.Ed
361.Pp
362For Interleaved Declustering:
363.Bd -unfilled -offset indent
364options RF_INCLUDE_INTERDECLUSTER=1
365.Ed
366.Pp
367For Parity Declustering:
368.Bd -unfilled -offset indent
369options RF_INCLUDE_PARITY_DECLUSTERING=1
370.Ed
371.Pp
372For Parity Declustering with Distributed Spares:
373.Bd -unfilled -offset indent
374options RF_INCLUDE_PARITY_DECLUSTERING_DS=1
375.Ed
376.Pp
377The reader is referred to the RAIDframe documentation mentioned in the
378.Sx HISTORY
379section for more detail on these various RAID configurations.
380.Sh WARNINGS
381Certain RAID levels (1, 4, 5, 6, and others) can protect against some
382data loss due to component failure.  However the loss of two
383components of a RAID 4 or 5 system, or the loss of a single component
384of a RAID 0 system, will result in the entire file systems on that RAID
385device being lost.
386RAID is
387.Ar NOT
388a substitute for good backup practices.
389.Pp
390Recomputation of parity
391.Ar MUST
392be performed whenever there is a chance that it may have been
393compromised.  This includes after system crashes, or before a RAID
394device has been used for the first time.  Failure to keep parity
395correct will be catastrophic should a component ever fail -- it is
396better to use RAID 0 and get the additional space and speed, than it
397is to use parity, but not keep the parity correct.  At least with RAID
3980 there is no perception of increased data security.
399.Sh FILES
400.Bl -tag -width /dev/XXrXraidX -compact
401.It Pa /dev/{,r}raid*
402.Nm
403device special files.
404.El
405.Sh SEE ALSO
406.Xr config 1 ,
407.Xr sd 4 ,
408.Xr MAKEDEV 8 ,
409.Xr fsck 8 ,
410.Xr mount 8 ,
411.Xr newfs 8 ,
412.Xr raidctl 8
413.Sh HISTORY
414The
415.Nm
416driver in
417.Nx
418is a port of RAIDframe, a framework for rapid prototyping of RAID
419structures developed by the folks at the Parallel Data Laboratory at
420Carnegie Mellon University (CMU).  RAIDframe, as originally distributed
421by CMU, provides a RAID simulator for a number of different
422architectures, and a user-level device driver and a kernel device
423driver for Digital Unix.  The
424.Nm
425driver is a kernelized version of RAIDframe v1.1.
426.Pp
427A more complete description of the internals and functionality of
428RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
429for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
430Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
431Parallel Data Laboratory of Carnegie Mellon University.
432The
433.Nm
434driver first appeared in
435.Nx 1.4 .
436.Sh COPYRIGHT
437.Bd -unfilled
438The RAIDframe Copyright is as follows:
439.Pp
440Copyright (c) 1994-1996 Carnegie-Mellon University.
441All rights reserved.
442.Pp
443Permission to use, copy, modify and distribute this software and
444its documentation is hereby granted, provided that both the copyright
445notice and this permission notice appear in all copies of the
446software, derivative works or modified versions, and any portions
447thereof, and that both notices appear in supporting documentation.
448.Pp
449CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
450CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
451FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
452.Pp
453Carnegie Mellon requests users of this software to return to
454.Pp
455 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
456 School of Computer Science
457 Carnegie Mellon University
458 Pittsburgh PA 15213-3890
459.Pp
460any improvements or extensions that they make and grant Carnegie the
461rights to redistribute these changes.
462.Ed
463