1.\" $NetBSD: raid.4,v 1.21 2002/09/04 00:26:08 wiz Exp $ 2.\" 3.\" Copyright (c) 1998 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Greg Oster 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.\" 38.\" Copyright (c) 1995 Carnegie-Mellon University. 39.\" All rights reserved. 40.\" 41.\" Author: Mark Holland 42.\" 43.\" Permission to use, copy, modify and distribute this software and 44.\" its documentation is hereby granted, provided that both the copyright 45.\" notice and this permission notice appear in all copies of the 46.\" software, derivative works or modified versions, and any portions 47.\" thereof, and that both notices appear in supporting documentation. 48.\" 49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 50.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 52.\" 53.\" Carnegie Mellon requests users of this software to return to 54.\" 55.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 56.\" School of Computer Science 57.\" Carnegie Mellon University 58.\" Pittsburgh PA 15213-3890 59.\" 60.\" any improvements or extensions that they make and grant Carnegie the 61.\" rights to redistribute these changes. 62.\" 63.Dd November 9, 1998 64.Dt RAID 4 65.Os 66.Sh NAME 67.Nm raid 68.Nd RAIDframe disk driver 69.Sh SYNOPSIS 70.Cd "pseudo-device raid" Op Ar count 71.Sh DESCRIPTION 72The 73.Nm 74driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to 75.Nx . 76This 77document assumes that the reader has at least some familiarity with RAID 78and RAID concepts. The reader is also assumed to know how to configure 79disks and pseudo-devices into kernels, how to generate kernels, and how 80to partition disks. 81.Pp 82RAIDframe provides a number of different RAID levels including: 83.Bl -tag -width indent 84.It RAID 0 85provides simple data striping across the components. 86.It RAID 1 87provides mirroring. 88.It RAID 4 89provides data striping across the components, with parity 90stored on a dedicated drive (in this case, the last component). 91.It RAID 5 92provides data striping across the components, with parity 93distributed across all the components. 94.El 95.Pp 96There are a wide variety of other RAID levels supported by RAIDframe, 97including Even-Odd parity, RAID level 5 with rotated sparing, Chained 98declustering, and Interleaved declustering. The reader is referred 99to the RAIDframe documentation mentioned in the 100.Sx HISTORY 101section for more detail on these various RAID configurations. 102.Pp 103Depending on the parity level configured, the device driver can 104support the failure of component drives. The number of failures 105allowed depends on the parity level selected. If the driver is able 106to handle drive failures, and a drive does fail, then the system is 107operating in "degraded mode". In this mode, all missing data must be 108reconstructed from the data and parity present on the other 109components. This results in much slower data accesses, but 110does mean that a failure need not bring the system to a complete halt. 111.Pp 112The RAID driver supports and enforces the use of 113.Sq component labels . 114A 115.Sq component label 116contains important information about the component, including a 117user-specified serial number, the row and column of that component in 118the RAID set, and whether the data (and parity) on the component is 119.Sq clean . 120If the driver determines that the labels are very inconsistent with 121respect to each other (e.g. two or more serial numbers do not match) 122or that the component label is not consistent with it's assigned place 123in the set (e.g. the component label claims the component should be 124the 3rd one a 6-disk set, but the RAID set has it as the 3rd component 125in a 5-disk set) then the device will fail to configure. If the 126driver determines that exactly one component label seems to be 127incorrect, and the RAID set is being configured as a set that supports 128a single failure, then the RAID set will be allowed to configure, but 129the incorrectly labeled component will be marked as 130.Sq failed , 131and the RAID set will begin operation in degraded mode. 132If all of the components are consistent among themselves, the RAID set 133will configure normally. 134.Pp 135Component labels are also used to support the auto-detection and 136auto-configuration of RAID sets. A RAID set can be flagged as 137auto-configurable, in which case it will be configured automatically 138during the kernel boot process. RAID file systems which are 139automatically configured are also eligible to be the root file system. 140There is currently only limited support (alpha and pmax architectures) 141for booting a kernel directly from a RAID 1 set, and no support for 142booting from any other RAID sets. To use a RAID set as the root 143file system, a kernel is usually obtained from a small non-RAID 144partition, after which any auto-configuring RAID set can be used for the 145root file system. See 146.Xr raidctl 8 147for more information on auto-configuration of RAID sets. 148.Pp 149The driver supports 150.Sq hot spares , 151disks which are on-line, but are not 152actively used in an existing file system. Should a disk fail, the 153driver is capable of reconstructing the failed disk onto a hot spare 154or back onto a replacement drive. 155If the components are hot swappable, the failed disk can then be 156removed, a new disk put in its place, and a copyback operation 157performed. The copyback operation, as its name indicates, will copy 158the reconstructed data from the hot spare to the previously failed 159(and now replaced) disk. Hot spares can also be hot-added using 160.Xr raidctl 8 . 161.Pp 162If a component cannot be detected when the RAID device is configured, 163that component will be simply marked as 'failed'. 164.Pp 165The user-land utility for doing all 166.Nm 167configuration and other operations 168is 169.Xr raidctl 8 . 170Most importantly, 171.Xr raidctl 8 172must be used with the 173.Fl i 174option to initialize all RAID sets. In particular, this 175initialization includes re-building the parity data. This rebuilding 176of parity data is also required when either a) a new RAID device is 177brought up for the first time or b) after an un-clean shutdown of a 178RAID device. By using the 179.Fl P 180option to 181.Xr raidctl 8 , 182and performing this on-demand recomputation of all parity 183before doing a 184.Xr fsck 8 185or a 186.Xr newfs 8 , 187file system integrity and parity integrity can be ensured. It bears 188repeating again that parity recomputation is 189.Ar required 190before any file systems are created or used on the RAID device. If the 191parity is not correct, then missing data cannot be correctly recovered. 192.Pp 193RAID levels may be combined in a hierarchical fashion. For example, a RAID 0 194device can be constructed out of a number of RAID 5 devices (which, in turn, 195may be constructed out of the physical disks, or of other RAID devices). 196.Pp 197It is important that drives be hard-coded at their respective 198addresses (i.e. not left free-floating, where a drive with SCSI ID of 1994 can end up as 200.Pa /dev/sd0c ) 201for well-behaved functioning of the RAID device. This is true for all 202types of drives, including IDE, HP-IB, etc. For normal SCSI drives, for 203example, the following can be used to fix the device addresses: 204.Bd -unfilled -offset indent 205sd0 at scsibus0 target 0 lun ? # SCSI disk drives 206sd1 at scsibus0 target 1 lun ? # SCSI disk drives 207sd2 at scsibus0 target 2 lun ? # SCSI disk drives 208sd3 at scsibus0 target 3 lun ? # SCSI disk drives 209sd4 at scsibus0 target 4 lun ? # SCSI disk drives 210sd5 at scsibus0 target 5 lun ? # SCSI disk drives 211sd6 at scsibus0 target 6 lun ? # SCSI disk drives 212.Ed 213.Pp 214See 215.Xr sd 4 216for more information. The rationale for fixing the device addresses 217is as follows: Consider a system with three SCSI drives at SCSI ID's 2184, 5, and 6, and which map to components 219.Pa /dev/sd0e , 220.Pa /dev/sd1e , 221and 222.Pa /dev/sd2e 223of a RAID 5 set. If the drive with SCSI ID 5 fails, and the 224system reboots, the old 225.Pa /dev/sd2e 226will show up as 227.Pa /dev/sd1e . 228The RAID driver is able to detect that component positions have changed, and 229will not allow normal configuration. If the device addresses are hard 230coded, however, the RAID driver would detect that the middle component 231is unavailable, and bring the RAID 5 set up in degraded mode. Note 232that the auto-detection and auto-configuration code does not care 233about where the components live. The auto-configuration code will 234correctly configure a device even after any number of the components 235have been re-arranged. 236.Pp 237The first step to using the 238.Nm 239driver is to ensure that it is suitably configured in the kernel. This is 240done by adding a line similar to: 241.Bd -unfilled -offset indent 242pseudo-device raid 4 # RAIDframe disk device 243.Ed 244.Pp 245to the kernel configuration file. The 246.Sq count 247argument ( 248.Sq 4 , 249in this case), specifies the number of RAIDframe drivers to configure. 250To turn on component auto-detection and auto-configuration of RAID 251sets, simply add: 252.Bd -unfilled -offset indent 253options RAID_AUTOCONFIG 254.Ed 255.Pp 256to the kernel configuration file. 257.Pp 258All component partitions must be of the type 259.Dv FS_BSDFFS 260(e.g. 4.2BSD) or 261.Dv FS_RAID . 262The use of the latter is strongly encouraged, and is required if 263auto-configuration of the RAID set is desired. Since RAIDframe leaves 264room for disklabels, RAID components can be simply raw disks, or 265partitions which use an entire disk. 266.Pp 267A more detailed treatment of actually using a 268.Nm 269device is found in 270.Xr raidctl 8 . 271It is highly recommended that the steps to reconstruct, copyback, and 272re-compute parity are well understood by the system administrator(s) 273.Ar before 274a component failure. Doing the wrong thing when a component fails may 275result in data loss. 276.Pp 277Additional internal consistency checking can be enabled by specifying: 278.Bd -unfilled -offset indent 279options RAID_DIAGNOSTIC 280.Ed 281.Pp 282These assertions are disabled by default in order to improve 283performance. 284.Sh WARNINGS 285Certain RAID levels (1, 4, 5, 6, and others) can protect against some 286data loss due to component failure. However the loss of two 287components of a RAID 4 or 5 system, or the loss of a single component 288of a RAID 0 system, will result in the entire file systems on that RAID 289device being lost. 290RAID is 291.Ar NOT 292a substitute for good backup practices. 293.Pp 294Recomputation of parity 295.Ar MUST 296be performed whenever there is a chance that it may have been 297compromised. This includes after system crashes, or before a RAID 298device has been used for the first time. Failure to keep parity 299correct will be catastrophic should a component ever fail -- it is 300better to use RAID 0 and get the additional space and speed, than it 301is to use parity, but not keep the parity correct. At least with RAID 3020 there is no perception of increased data security. 303.Sh FILES 304.Bl -tag -width /dev/XXrXraidX -compact 305.It Pa /dev/{,r}raid* 306.Nm 307device special files. 308.El 309.Sh SEE ALSO 310.Xr sd 4 , 311.Xr MAKEDEV 8 , 312.Xr config 8 , 313.Xr fsck 8 , 314.Xr mount 8 , 315.Xr newfs 8 , 316.Xr raidctl 8 317.Sh HISTORY 318The 319.Nm 320driver in 321.Nx 322is a port of RAIDframe, a framework for rapid prototyping of RAID 323structures developed by the folks at the Parallel Data Laboratory at 324Carnegie Mellon University (CMU). RAIDframe, as originally distributed 325by CMU, provides a RAID simulator for a number of different 326architectures, and a user-level device driver and a kernel device 327driver for Digital Unix. The 328.Nm 329driver is a kernelized version of RAIDframe v1.1. 330.Pp 331A more complete description of the internals and functionality of 332RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool 333for RAID Systems", by William V. Courtright II, Garth Gibson, Mark 334Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the 335Parallel Data Laboratory of Carnegie Mellon University. 336The 337.Nm 338driver first appeared in 339.Nx 1.4 . 340.Sh COPYRIGHT 341.Bd -unfilled 342The RAIDframe Copyright is as follows: 343.Pp 344Copyright (c) 1994-1996 Carnegie-Mellon University. 345All rights reserved. 346.Pp 347Permission to use, copy, modify and distribute this software and 348its documentation is hereby granted, provided that both the copyright 349notice and this permission notice appear in all copies of the 350software, derivative works or modified versions, and any portions 351thereof, and that both notices appear in supporting documentation. 352.Pp 353CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 354CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 355FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 356.Pp 357Carnegie Mellon requests users of this software to return to 358.Pp 359 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 360 School of Computer Science 361 Carnegie Mellon University 362 Pittsburgh PA 15213-3890 363.Pp 364any improvements or extensions that they make and grant Carnegie the 365rights to redistribute these changes. 366.Ed 367