xref: /netbsd-src/share/man/man4/raid.4 (revision b5677b36047b601b9addaaa494a58ceae82c2a6c)
1.\"     $NetBSD: raid.4,v 1.35 2008/05/02 18:11:05 martin Exp $
2.\"
3.\" Copyright (c) 1998 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Greg Oster
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.\"
31.\" Copyright (c) 1995 Carnegie-Mellon University.
32.\" All rights reserved.
33.\"
34.\" Author: Mark Holland
35.\"
36.\" Permission to use, copy, modify and distribute this software and
37.\" its documentation is hereby granted, provided that both the copyright
38.\" notice and this permission notice appear in all copies of the
39.\" software, derivative works or modified versions, and any portions
40.\" thereof, and that both notices appear in supporting documentation.
41.\"
42.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
43.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
44.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
45.\"
46.\" Carnegie Mellon requests users of this software to return to
47.\"
48.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
49.\"  School of Computer Science
50.\"  Carnegie Mellon University
51.\"  Pittsburgh PA 15213-3890
52.\"
53.\" any improvements or extensions that they make and grant Carnegie the
54.\" rights to redistribute these changes.
55.\"
56.Dd August 6, 2007
57.Dt RAID 4
58.Os
59.Sh NAME
60.Nm raid
61.Nd RAIDframe disk driver
62.Sh SYNOPSIS
63.Cd options RAID_AUTOCONFIG
64.Cd options RAID_DIAGNOSTIC
65.Cd options RF_ACC_TRACE=n
66.Cd options RF_DEBUG_MAP=n
67.Cd options RF_DEBUG_PSS=n
68.Cd options RF_DEBUG_QUEUE=n
69.Cd options RF_DEBUG_QUIESCE=n
70.Cd options RF_DEBUG_RECON=n
71.Cd options RF_DEBUG_STRIPELOCK=n
72.Cd options RF_DEBUG_VALIDATE_DAG=n
73.Cd options RF_DEBUG_VERIFYPARITY=n
74.Cd options RF_INCLUDE_CHAINDECLUSTER=n
75.Cd options RF_INCLUDE_EVENODD=n
76.Cd options RF_INCLUDE_INTERDECLUSTER=n
77.Cd options RF_INCLUDE_PARITY_DECLUSTERING=n
78.Cd options RF_INCLUDE_PARITY_DECLUSTERING_DS=n
79.Cd options RF_INCLUDE_PARITYLOGGING=n
80.Cd options RF_INCLUDE_RAID5_RS=n
81.Pp
82.Cd "pseudo-device raid" Op Ar count
83.Sh DESCRIPTION
84The
85.Nm
86driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to
87.Nx .
88This
89document assumes that the reader has at least some familiarity with RAID
90and RAID concepts.  The reader is also assumed to know how to configure
91disks and pseudo-devices into kernels, how to generate kernels, and how
92to partition disks.
93.Pp
94RAIDframe provides a number of different RAID levels including:
95.Bl -tag -width indent
96.It RAID 0
97provides simple data striping across the components.
98.It RAID 1
99provides mirroring.
100.It RAID 4
101provides data striping across the components, with parity
102stored on a dedicated drive (in this case, the last component).
103.It RAID 5
104provides data striping across the components, with parity
105distributed across all the components.
106.El
107.Pp
108There are a wide variety of other RAID levels supported by RAIDframe.
109The configuration file options to enable them are briefly outlined
110at the end of this section.
111.Pp
112Depending on the parity level configured, the device driver can
113support the failure of component drives.  The number of failures
114allowed depends on the parity level selected.  If the driver is able
115to handle drive failures, and a drive does fail, then the system is
116operating in "degraded mode".  In this mode, all missing data must be
117reconstructed from the data and parity present on the other
118components.  This results in much slower data accesses, but
119does mean that a failure need not bring the system to a complete halt.
120.Pp
121The RAID driver supports and enforces the use of
122.Sq component labels .
123A
124.Sq component label
125contains important information about the component, including a
126user-specified serial number, the row and column of that component in
127the RAID set, and whether the data (and parity) on the component is
128.Sq clean .
129The component label currently lives at the half-way point of the
130.Sq reserved section
131located at the beginning of each component.
132This
133.Sq reserved section
134is RF_PROTECTED_SECTORS in length (64 blocks or 32Kbytes) and the
135component label is currently 1Kbyte in size.
136.Pp
137If the driver determines that the component labels are very inconsistent with
138respect to each other (e.g. two or more serial numbers do not match)
139or that the component label is not consistent with its assigned place
140in the set (e.g. the component label claims the component should be
141the 3rd one in a 6-disk set, but the RAID set has it as the 3rd component
142in a 5-disk set) then the device will fail to configure.  If the
143driver determines that exactly one component label seems to be
144incorrect, and the RAID set is being configured as a set that supports
145a single failure, then the RAID set will be allowed to configure, but
146the incorrectly labeled component will be marked as
147.Sq failed ,
148and the RAID set will begin operation in degraded mode.
149If all of the components are consistent among themselves, the RAID set
150will configure normally.
151.Pp
152Component labels are also used to support the auto-detection and
153autoconfiguration of RAID sets.  A RAID set can be flagged as
154autoconfigurable, in which case it will be configured automatically
155during the kernel boot process.  RAID file systems which are
156automatically configured are also eligible to be the root file system.
157There is currently only limited support (alpha, amd64, i386, pmax,
158sparc, sparc64, and vax architectures)
159for booting a kernel directly from a RAID 1 set, and no support for
160booting from any other RAID sets.  To use a RAID set as the root
161file system, a kernel is usually obtained from a small non-RAID
162partition, after which any autoconfiguring RAID set can be used for the
163root file system.  See
164.Xr raidctl 8
165for more information on autoconfiguration of RAID sets.
166Note that with autoconfiguration of RAID sets, it is no longer
167necessary to hard-code SCSI IDs of drives.
168The autoconfiguration code will
169correctly configure a device even after any number of the components
170have had their device IDs changed or device names changed.
171.Pp
172The driver supports
173.Sq hot spares ,
174disks which are on-line, but are not
175actively used in an existing file system.  Should a disk fail, the
176driver is capable of reconstructing the failed disk onto a hot spare
177or back onto a replacement drive.
178If the components are hot swappable, the failed disk can then be
179removed, a new disk put in its place, and a copyback operation
180performed.  The copyback operation, as its name indicates, will copy
181the reconstructed data from the hot spare to the previously failed
182(and now replaced) disk.  Hot spares can also be hot-added using
183.Xr raidctl 8 .
184.Pp
185If a component cannot be detected when the RAID device is configured,
186that component will be simply marked as 'failed'.
187.Pp
188The user-land utility for doing all
189.Nm
190configuration and other operations
191is
192.Xr raidctl 8 .
193Most importantly,
194.Xr raidctl 8
195must be used with the
196.Fl i
197option to initialize all RAID sets.  In particular, this
198initialization includes re-building the parity data.  This rebuilding
199of parity data is also required when either a) a new RAID device is
200brought up for the first time or b) after an un-clean shutdown of a
201RAID device.  By using the
202.Fl P
203option to
204.Xr raidctl 8 ,
205and performing this on-demand recomputation of all parity
206before doing a
207.Xr fsck 8
208or a
209.Xr newfs 8 ,
210file system integrity and parity integrity can be ensured.  It bears
211repeating again that parity recomputation is
212.Ar required
213before any file systems are created or used on the RAID device.  If the
214parity is not correct, then missing data cannot be correctly recovered.
215.Pp
216RAID levels may be combined in a hierarchical fashion.  For example, a RAID 0
217device can be constructed out of a number of RAID 5 devices (which, in turn,
218may be constructed out of the physical disks, or of other RAID devices).
219.Pp
220The first step to using the
221.Nm
222driver is to ensure that it is suitably configured in the kernel.  This is
223done by adding a line similar to:
224.Bd -unfilled -offset indent
225pseudo-device   raid   4       # RAIDframe disk device
226.Ed
227.Pp
228to the kernel configuration file.  The
229.Sq count
230argument (
231.Sq 4 ,
232in this case), specifies the number of RAIDframe drivers to configure.
233To turn on component auto-detection and autoconfiguration of RAID
234sets, simply add:
235.Bd -unfilled -offset indent
236options RAID_AUTOCONFIG
237.Ed
238.Pp
239to the kernel configuration file.
240.Pp
241All component partitions must be of the type
242.Dv FS_BSDFFS
243(e.g. 4.2BSD) or
244.Dv FS_RAID .
245The use of the latter is strongly encouraged, and is required if
246autoconfiguration of the RAID set is desired.  Since RAIDframe leaves
247room for disklabels, RAID components can be simply raw disks, or
248partitions which use an entire disk.
249.Pp
250A more detailed treatment of actually using a
251.Nm
252device is found in
253.Xr raidctl 8 .
254It is highly recommended that the steps to reconstruct, copyback, and
255re-compute parity are well understood by the system administrator(s)
256.Ar before
257a component failure.  Doing the wrong thing when a component fails may
258result in data loss.
259.Pp
260Additional internal consistency checking can be enabled by specifying:
261.Bd -unfilled -offset indent
262options RAID_DIAGNOSTIC
263.Ed
264.Pp
265These assertions are disabled by default in order to improve
266performance.
267.Pp
268RAIDframe supports an access tracing facility for tracking both
269requests made and performance of various parts of the RAID systems
270as the request is processed.
271To enable this tracing the following option may be specified:
272.Bd -unfilled -offset indent
273options RF_ACC_TRACE=1
274.Ed
275.Pp
276For extensive debugging there are a number of kernel options which
277will aid in performing extra diagnosis of various parts of the
278RAIDframe sub-systems.
279Note that in order to make full use of these options it is often
280necessary to enable one or more debugging options as listed in
281.Pa src/sys/dev/raidframe/rf_options.h .
282As well, these options are also only typically useful for people who wish
283to debug various parts of RAIDframe.
284The options include:
285.Pp
286For debugging the code which maps RAID addresses to physical
287addresses:
288.Bd -unfilled -offset indent
289options RF_DEBUG_MAP=1
290.Ed
291.Pp
292Parity stripe status debugging is enabled with:
293.Bd -unfilled -offset indent
294options RF_DEBUG_PSS=1
295.Ed
296.Pp
297Additional debugging for queuing is enabled with:
298.Bd -unfilled -offset indent
299options RF_DEBUG_QUEUE=1
300.Ed
301.Pp
302Problems with non-quiescent file systems should be easier to debug if
303the following is enabled:
304.Bd -unfilled -offset indent
305options RF_DEBUG_QUIESCE=1
306.Ed
307.Pp
308Stripelock debugging is enabled with:
309.Bd -unfilled -offset indent
310options RF_DEBUG_STRIPELOCK=1
311.Ed
312.Pp
313Additional diagnostic checks during reconstruction are enabled with:
314.Bd -unfilled -offset indent
315options RF_DEBUG_RECON=1
316.Ed
317.Pp
318Validation of the DAGs (Directed Acyclic Graphs) used to describe an
319I/O access can be performed when the following is enabled:
320.Bd -unfilled -offset indent
321options RF_DEBUG_VALIDATE_DAG=1
322.Ed
323.Pp
324Additional diagnostics during parity verification are enabled with:
325.Bd -unfilled -offset indent
326options RF_DEBUG_VERIFYPARITY=1
327.Ed
328.Pp
329There are a number of less commonly used RAID levels supported by
330RAIDframe.
331These additional RAID types should be considered experimental, and
332may not be ready for production use.
333The various types and the options to enable them are shown here:
334.Pp
335For Even-Odd parity:
336.Bd -unfilled -offset indent
337options RF_INCLUDE_EVENODD=1
338.Ed
339.Pp
340For RAID level 5 with rotated sparing:
341.Bd -unfilled -offset indent
342options RF_INCLUDE_RAID5_RS=1
343.Ed
344.Pp
345For Parity Logging (highly experimental):
346.Bd -unfilled -offset indent
347options RF_INCLUDE_PARITYLOGGING=1
348.Ed
349.Pp
350For Chain Declustering:
351.Bd -unfilled -offset indent
352options RF_INCLUDE_CHAINDECLUSTER=1
353.Ed
354.Pp
355For Interleaved Declustering:
356.Bd -unfilled -offset indent
357options RF_INCLUDE_INTERDECLUSTER=1
358.Ed
359.Pp
360For Parity Declustering:
361.Bd -unfilled -offset indent
362options RF_INCLUDE_PARITY_DECLUSTERING=1
363.Ed
364.Pp
365For Parity Declustering with Distributed Spares:
366.Bd -unfilled -offset indent
367options RF_INCLUDE_PARITY_DECLUSTERING_DS=1
368.Ed
369.Pp
370The reader is referred to the RAIDframe documentation mentioned in the
371.Sx HISTORY
372section for more detail on these various RAID configurations.
373.Sh WARNINGS
374Certain RAID levels (1, 4, 5, 6, and others) can protect against some
375data loss due to component failure.  However the loss of two
376components of a RAID 4 or 5 system, or the loss of a single component
377of a RAID 0 system, will result in the entire file systems on that RAID
378device being lost.
379RAID is
380.Ar NOT
381a substitute for good backup practices.
382.Pp
383Recomputation of parity
384.Ar MUST
385be performed whenever there is a chance that it may have been
386compromised.  This includes after system crashes, or before a RAID
387device has been used for the first time.  Failure to keep parity
388correct will be catastrophic should a component ever fail -- it is
389better to use RAID 0 and get the additional space and speed, than it
390is to use parity, but not keep the parity correct.  At least with RAID
3910 there is no perception of increased data security.
392.Sh FILES
393.Bl -tag -width /dev/XXrXraidX -compact
394.It Pa /dev/{,r}raid*
395.Nm
396device special files.
397.El
398.Sh SEE ALSO
399.Xr config 1 ,
400.Xr sd 4 ,
401.Xr MAKEDEV 8 ,
402.Xr fsck 8 ,
403.Xr mount 8 ,
404.Xr newfs 8 ,
405.Xr raidctl 8
406.Sh HISTORY
407The
408.Nm
409driver in
410.Nx
411is a port of RAIDframe, a framework for rapid prototyping of RAID
412structures developed by the folks at the Parallel Data Laboratory at
413Carnegie Mellon University (CMU).  RAIDframe, as originally distributed
414by CMU, provides a RAID simulator for a number of different
415architectures, and a user-level device driver and a kernel device
416driver for Digital Unix.  The
417.Nm
418driver is a kernelized version of RAIDframe v1.1.
419.Pp
420A more complete description of the internals and functionality of
421RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
422for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
423Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
424Parallel Data Laboratory of Carnegie Mellon University.
425The
426.Nm
427driver first appeared in
428.Nx 1.4 .
429.Sh COPYRIGHT
430.Bd -unfilled
431The RAIDframe Copyright is as follows:
432.Pp
433Copyright (c) 1994-1996 Carnegie-Mellon University.
434All rights reserved.
435.Pp
436Permission to use, copy, modify and distribute this software and
437its documentation is hereby granted, provided that both the copyright
438notice and this permission notice appear in all copies of the
439software, derivative works or modified versions, and any portions
440thereof, and that both notices appear in supporting documentation.
441.Pp
442CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
443CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
444FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
445.Pp
446Carnegie Mellon requests users of this software to return to
447.Pp
448 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
449 School of Computer Science
450 Carnegie Mellon University
451 Pittsburgh PA 15213-3890
452.Pp
453any improvements or extensions that they make and grant Carnegie the
454rights to redistribute these changes.
455.Ed
456