1.\" $NetBSD: raid.4,v 1.20 2001/09/22 16:03:58 wiz Exp $ 2.\" 3.\" Copyright (c) 1998 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Greg Oster 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.\" 38.\" Copyright (c) 1995 Carnegie-Mellon University. 39.\" All rights reserved. 40.\" 41.\" Author: Mark Holland 42.\" 43.\" Permission to use, copy, modify and distribute this software and 44.\" its documentation is hereby granted, provided that both the copyright 45.\" notice and this permission notice appear in all copies of the 46.\" software, derivative works or modified versions, and any portions 47.\" thereof, and that both notices appear in supporting documentation. 48.\" 49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 50.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 52.\" 53.\" Carnegie Mellon requests users of this software to return to 54.\" 55.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 56.\" School of Computer Science 57.\" Carnegie Mellon University 58.\" Pittsburgh PA 15213-3890 59.\" 60.\" any improvements or extensions that they make and grant Carnegie the 61.\" rights to redistribute these changes. 62.\" 63.Dd November 9, 1998 64.Dt RAID 4 65.Os 66.Sh NAME 67.Nm raid 68.Nd RAIDframe disk driver 69.Sh SYNOPSIS 70.Cd "pseudo-device raid" Op Ar count 71.Sh DESCRIPTION 72The 73.Nm 74driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to 75.Nx . 76This 77document assumes that the reader has at least some familiarity with RAID 78and RAID concepts. The reader is also assumed to know how to configure 79disks and pseudo-devices into kernels, how to generate kernels, and how 80to partition disks. 81.Pp 82RAIDframe provides a number of different RAID levels including: 83.Bl -tag -width indent 84.It RAID 0 85provides simple data striping across the components. 86.It RAID 1 87provides mirroring. 88.It RAID 4 89provides data striping across the components, with parity 90stored on a dedicated drive (in this case, the last component). 91.It RAID 5 92provides data striping across the components, with parity 93distributed across all the components. 94.El 95.Pp 96There are a wide variety of other RAID levels supported by RAIDframe, 97including Even-Odd parity, RAID level 5 with rotated sparing, Chained 98declustering, and Interleaved declustering. The reader is referred 99to the RAIDframe documentation mentioned in the 100.Sx HISTORY 101section for more detail on these various RAID configurations. 102.Pp 103Depending on the parity level configured, the device driver can 104support the failure of component drives. The number of failures 105allowed depends on the parity level selected. If the driver is able 106to handle drive failures, and a drive does fail, then the system is 107operating in "degraded mode". In this mode, all missing data must be 108reconstructed from the data and parity present on the other 109components. This results in much slower data accesses, but 110does mean that a failure need not bring the system to a complete halt. 111.Pp 112The RAID driver supports and enforces the use of 113.Sq component labels . 114A 115.Sq component label 116contains important information about the component, including a 117user-specified serial number, the row and column of that component in 118the RAID set, and whether the data (and parity) on the component is 119.Sq clean . 120If the driver determines that the labels are very inconsistent with 121respect to each other (e.g. two or more serial numbers do not match) 122or that the component label is not consistent with it's assigned place 123in the set (e.g. the component label claims the component should be 124the 3rd one a 6-disk set, but the RAID set has it as the 3rd component 125in a 5-disk set) then the device will fail to configure. If the 126driver determines that exactly one component label seems to be 127incorrect, and the RAID set is being configured as a set that supports 128a single failure, then the RAID set will be allowed to configure, but 129the incorrectly labeled component will be marked as 130.Sq failed , 131and the RAID set will begin operation in degraded mode. 132If all of the components are consistent among themselves, the RAID set 133will configure normally. 134.Pp 135Component labels are also used to support the auto-detection and 136auto-configuration of RAID sets. A RAID set can be flagged as 137auto-configurable, in which case it will be configured automatically 138during the kernel boot process. RAID filesystems which are 139automatically configured are also eligible to be the root filesystem. 140There is currently only limited support (alpha and pmax architectures) 141for booting a kernel directly from a RAID 1 set, and no support for 142booting from any other RAID sets. To use a RAID set as the root 143filesystem, a kernel is usually obtained from a small non-RAID 144partition, after which any auto-configuring RAID set can be used for the 145root filesystem. See 146.Xr raidctl 8 147for more information on auto-configuration of RAID sets. 148.Pp 149The driver supports 150.Sq hot spares , 151disks which are on-line, but are not 152actively used in an existing filesystem. Should a disk fail, the 153driver is capable of reconstructing the failed disk onto a hot spare 154or back onto a replacement drive. 155If the components are hot swapable, the failed disk can then be 156removed, a new disk put in its place, and a copyback operation 157performed. The copyback operation, as its name indicates, will copy 158the reconstructed data from the hot spare to the previously failed 159(and now replaced) disk. Hot spares can also be hot-added using 160.Xr raidctl 8 . 161.Pp 162If a component cannot be detected when the RAID device is configured, 163that component will be simply marked as 'failed'. 164.Pp 165The user-land utility for doing all 166.Nm 167configuration and other operations 168is 169.Xr raidctl 8 . 170Most importantly, 171.Xr raidctl 8 172must be used with the 173.Fl i 174option to initialize all RAID sets. In particular, this 175initialization includes re-building the parity data. This rebuilding 176of parity data is also required when either a) a new RAID device is 177brought up for the first time or b) after an un-clean shutdown of a 178RAID device. By using the 179.Fl P 180option to 181.Xr raidctl 8 , 182and performing this on-demand recomputation of all parity 183before doing a 184.Xr fsck 8 185or a 186.Xr newfs 8 , 187filesystem integrity and parity integrity can be ensured. It bears 188repeating again that parity recomputation is 189.Ar required 190before any filesystems are created or used on the RAID device. If the 191parity is not correct, then missing data cannot be correctly recovered. 192.Pp 193RAID levels may be combined in a hierarchical fashion. For example, a RAID 0 194device can be constructed out of a number of RAID 5 devices (which, in turn, 195may be constructed out of the physical disks, or of other RAID devices). 196.Pp 197It is important that drives be hard-coded at their respective 198addresses (i.e. not left free-floating, where a drive with SCSI ID of 1994 can end up as /dev/sd0c) for well-behaved functioning of the RAID 200device. This is true for all types of drives, including IDE, HP-IB, 201etc. For normal SCSI drives, for example, the following can be used 202to fix the device addresses: 203.Bd -unfilled -offset indent 204sd0 at scsibus0 target 0 lun ? # SCSI disk drives 205sd1 at scsibus0 target 1 lun ? # SCSI disk drives 206sd2 at scsibus0 target 2 lun ? # SCSI disk drives 207sd3 at scsibus0 target 3 lun ? # SCSI disk drives 208sd4 at scsibus0 target 4 lun ? # SCSI disk drives 209sd5 at scsibus0 target 5 lun ? # SCSI disk drives 210sd6 at scsibus0 target 6 lun ? # SCSI disk drives 211.Ed 212.Pp 213See 214.Xr sd 4 215for more information. The rationale for fixing the device addresses 216is as follows: Consider a system with three SCSI drives at SCSI ID's 2174, 5, and 6, and which map to components /dev/sd0e, /dev/sd1e, and 218/dev/sd2e of a RAID 5 set. If the drive with SCSI ID 5 fails, and the 219system reboots, the old /dev/sd2e will show up as /dev/sd1e. The RAID 220driver is able to detect that component positions have changed, and 221will not allow normal configuration. If the device addresses are hard 222coded, however, the RAID driver would detect that the middle component 223is unavailable, and bring the RAID 5 set up in degraded mode. Note 224that the auto-detection and auto-configuration code does not care 225about where the components live. The auto-configuration code will 226correctly configure a device even after any number of the components 227have been re-arranged. 228.Pp 229The first step to using the 230.Nm 231driver is to ensure that it is suitably configured in the kernel. This is 232done by adding a line similar to: 233.Bd -unfilled -offset indent 234pseudo-device raid 4 # RAIDframe disk device 235.Ed 236.Pp 237to the kernel configuration file. The 238.Sq count 239argument ( 240.Sq 4 , 241in this case), specifies the number of RAIDframe drivers to configure. 242To turn on component auto-detection and auto-configuration of RAID 243sets, simply add: 244.Bd -unfilled -offset indent 245options RAID_AUTOCONFIG 246.Ed 247.Pp 248to the kernel configuration file. 249.Pp 250All component partitions must be of the type 251.Dv FS_BSDFFS 252(e.g. 4.2BSD) or 253.Dv FS_RAID . 254The use of the latter is strongly encouraged, and is required if 255auto-configuration of the RAID set is desired. Since RAIDframe leaves 256room for disklabels, RAID components can be simply raw disks, or 257partitions which use an entire disk. 258.Pp 259A more detailed treatment of actually using a 260.Nm 261device is found in 262.Xr raidctl 8 . 263It is highly recommended that the steps to reconstruct, copyback, and 264re-compute parity are well understood by the system administrator(s) 265.Ar before 266a component failure. Doing the wrong thing when a component fails may 267result in data loss. 268.Pp 269Additional internal consistency checking can be enabled by specifying: 270.Bd -unfilled -offset indent 271options RAID_DIAGNOSTIC 272.Ed 273.Pp 274These assertions are disabled by default in order to improve 275performance. 276.Sh WARNINGS 277Certain RAID levels (1, 4, 5, 6, and others) can protect against some 278data loss due to component failure. However the loss of two 279components of a RAID 4 or 5 system, or the loss of a single component 280of a RAID 0 system, will result in the entire filesystems on that RAID 281device being lost. 282RAID is 283.Ar NOT 284a substitute for good backup practices. 285.Pp 286Recomputation of parity 287.Ar MUST 288be performed whenever there is a chance that it may have been 289compromised. This includes after system crashes, or before a RAID 290device has been used for the first time. Failure to keep parity 291correct will be catastrophic should a component ever fail -- it is 292better to use RAID 0 and get the additional space and speed, than it 293is to use parity, but not keep the parity correct. At least with RAID 2940 there is no perception of increased data security. 295.Sh FILES 296.Bl -tag -width /dev/XXrXraidX -compact 297.It Pa /dev/{,r}raid* 298.Nm 299device special files. 300.El 301.Sh SEE ALSO 302.Xr sd 4 , 303.Xr MAKEDEV 8 , 304.Xr config 8 , 305.Xr fsck 8 , 306.Xr mount 8 , 307.Xr newfs 8 , 308.Xr raidctl 8 309.Sh HISTORY 310The 311.Nm 312driver in 313.Nx 314is a port of RAIDframe, a framework for rapid prototyping of RAID 315structures developed by the folks at the Parallel Data Laboratory at 316Carnegie Mellon University (CMU). RAIDframe, as originally distributed 317by CMU, provides a RAID simulator for a number of different 318architectures, and a user-level device driver and a kernel device 319driver for Digital Unix. The 320.Nm 321driver is a kernelized version of RAIDframe v1.1. 322.Pp 323A more complete description of the internals and functionality of 324RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool 325for RAID Systems", by William V. Courtright II, Garth Gibson, Mark 326Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the 327Parallel Data Laboratory of Carnegie Mellon University. 328The 329.Nm 330driver first appeared in 331.Nx 1.4 . 332.Sh COPYRIGHT 333.Bd -unfilled 334The RAIDframe Copyright is as follows: 335.Pp 336Copyright (c) 1994-1996 Carnegie-Mellon University. 337All rights reserved. 338.Pp 339Permission to use, copy, modify and distribute this software and 340its documentation is hereby granted, provided that both the copyright 341notice and this permission notice appear in all copies of the 342software, derivative works or modified versions, and any portions 343thereof, and that both notices appear in supporting documentation. 344.Pp 345CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 346CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 347FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 348.Pp 349Carnegie Mellon requests users of this software to return to 350.Pp 351 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 352 School of Computer Science 353 Carnegie Mellon University 354 Pittsburgh PA 15213-3890 355.Pp 356any improvements or extensions that they make and grant Carnegie the 357rights to redistribute these changes. 358.Ed 359