1.\" $NetBSD: raid.4,v 1.16 2000/11/02 03:34:08 oster Exp $ 2.\" 3.\" Copyright (c) 1998 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Greg Oster 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.\" 38.\" Copyright (c) 1995 Carnegie-Mellon University. 39.\" All rights reserved. 40.\" 41.\" Author: Mark Holland 42.\" 43.\" Permission to use, copy, modify and distribute this software and 44.\" its documentation is hereby granted, provided that both the copyright 45.\" notice and this permission notice appear in all copies of the 46.\" software, derivative works or modified versions, and any portions 47.\" thereof, and that both notices appear in supporting documentation. 48.\" 49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 50.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 52.\" 53.\" Carnegie Mellon requests users of this software to return to 54.\" 55.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 56.\" School of Computer Science 57.\" Carnegie Mellon University 58.\" Pittsburgh PA 15213-3890 59.\" 60.\" any improvements or extensions that they make and grant Carnegie the 61.\" rights to redistribute these changes. 62.\" 63.Dd November 9, 1998 64.Dt RAID 4 65.Os 66.Sh NAME 67.Nm raid 68.Nd RAIDframe disk driver 69.Sh SYNOPSIS 70.Cd "pseudo-device raid" Op Ar count 71.Sh DESCRIPTION 72The 73.Nm 74driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to NetBSD. This 75document assumes that the reader has at least some familiarity with RAID 76and RAID concepts. The reader is also assumed to know how to configure 77disks and pseudo-devices into kernels, how to generate kernels, and how 78to partition disks. 79.Pp 80RAIDframe provides a number of different RAID levels including: 81.Bl -tag -width indent 82.It RAID 0 83provides simple data striping across the components. 84.It RAID 1 85provides mirroring. 86.It RAID 4 87provides data striping across the components, with parity 88stored on a dedicated drive (in this case, the last component). 89.It RAID 5 90provides data striping across the components, with parity 91distributed across all the components. 92.El 93.Pp 94There are a wide variety of other RAID levels supported by RAIDframe, 95including Even-Odd parity, RAID level 5 with rotated sparing, Chained 96declustering, and Interleaved declustering. The reader is referred 97to the RAIDframe documentation mentioned in the 98.Sx HISTORY 99section for more detail on these various RAID configurations. 100.Pp 101Depending on the parity level configured, the device driver can 102support the failure of component drives. The number of failures 103allowed depends on the parity level selected. If the driver is able 104to handle drive failures, and a drive does fail, then the system is 105operating in "degraded mode". In this mode, all missing data must be 106reconstructed from the data and parity present on the other 107components. This results in much slower data accesses, but 108does mean that a failure need not bring the system to a complete halt. 109.Pp 110The RAID driver supports and enforces the use of 111.Sq component labels . 112A 113.Sq component label 114contains important information about the component, including a 115user-specified serial number, the row and column of that component in 116the RAID set, and whether the data (and parity) on the component is 117.Sq clean . 118If the driver determines that the labels are very inconsistent with 119respect to each other (e.g. two or more serial numbers do not match) 120or that the component label is not consistent with it's assigned place 121in the set (e.g. the component label claims the component should be 122the 3rd one a 6-disk set, but the RAID set has it as the 3rd component 123in a 5-disk set) then the device will fail to configure. If the 124driver determines that exactly one component label seems to be 125incorrect, and the RAID set is being configured as a set that supports 126a single failure, then the RAID set will be allowed to configure, but 127the incorrectly labeled component will be marked as 128.Sq failed , 129and the RAID set will begin operation in degraded mode. 130If all of the components are consistent among themselves, the RAID set 131will configure normally. 132.Pp 133Component labels are also used to support the auto-detection and 134auto-configuration of RAID sets. A RAID set can be flagged as 135auto-configurable, in which case it will be configured automatically 136during the kernel boot process. RAID filesystems which are 137automatically configured are also eligible to be the root filesystem. 138There is currently only limited support (alpha and pmax architectures) 139for booting a kernel directly from a RAID 1 set, and no support for 140booting from any other RAID sets. To use a RAID set as the root 141filesystem, a kernel is usually obtained from a small non-RAID 142partition, after which any auto-configuring RAID set can be used for the 143root filesystem. See 144.Xr raidctl 8 145for more information on auto-configuration of RAID sets. 146.Pp 147The driver supports 148.Sq hot spares , 149disks which are on-line, but are not 150actively used in an existing filesystem. Should a disk fail, the 151driver is capable of reconstructing the failed disk onto a hot spare 152or back onto a replacement drive. 153If the components are hot swapable, the failed disk can then be 154removed, a new disk put in its place, and a copyback operation 155performed. The copyback operation, as its name indicates, will copy 156the reconstructed data from the hot spare to the previously failed 157(and now replaced) disk. Hot spares can also be hot-added using 158.Xr raidctl 8 . 159.Pp 160If a component cannot be detected when the RAID device is configured, 161that component will be simply marked as 'failed'. 162.Pp 163The user-land utility for doing all 164.Nm 165configuration and other operations 166is 167.Xr raidctl 8 . 168Most importantly, 169.Xr raidctl 8 170must be used with the 171.Fl i 172option to initialize all RAID sets. In particular, this 173initialization includes re-building the parity data. This rebuilding 174of parity data is also required when either a) a new RAID device is 175brought up for the first time or b) after an un-clean shutdown of a 176RAID device. By using the 177.Fl P 178option to 179.Xr raidctl 8 , 180and performing this on-demand recomputation of all parity 181before doing a 182.Xr fsck 8 183or a 184.Xr newfs 8 , 185filesystem integrity and parity integrity can be ensured. It bears 186repeating again that parity recomputation is 187.Ar required 188before any filesystems are created or used on the RAID device. If the 189parity is not correct, then missing data cannot be correctly recovered. 190.Pp 191RAID levels may be combined in a hierarchical fashion. For example, a RAID 0 192device can be constructed out of a number of RAID 5 devices (which, in turn, 193may be constructed out of the physical disks, or of other RAID devices). 194.Pp 195It is important that drives be hard-coded at their respective 196addresses (i.e. not left free-floating, where a drive with SCSI ID of 1974 can end up as /dev/sd0c) for well-behaved functioning of the RAID 198device. This is true for all types of drives, including IDE, HP-IB, 199etc. For normal SCSI drives, for example, the following can be used 200to fix the device addresses: 201.Bd -unfilled -offset indent 202sd0 at scsibus0 target 0 lun ? # SCSI disk drives 203sd1 at scsibus0 target 1 lun ? # SCSI disk drives 204sd2 at scsibus0 target 2 lun ? # SCSI disk drives 205sd3 at scsibus0 target 3 lun ? # SCSI disk drives 206sd4 at scsibus0 target 4 lun ? # SCSI disk drives 207sd5 at scsibus0 target 5 lun ? # SCSI disk drives 208sd6 at scsibus0 target 6 lun ? # SCSI disk drives 209.Ed 210.Pp 211See 212.Xr sd 4 213for more information. The rationale for fixing the device addresses 214is as follows: Consider a system with three SCSI drives at SCSI ID's 2154, 5, and 6, and which map to components /dev/sd0e, /dev/sd1e, and 216/dev/sd2e of a RAID 5 set. If the drive with SCSI ID 5 fails, and the 217system reboots, the old /dev/sd2e will show up as /dev/sd1e. The RAID 218driver is able to detect that component positions have changed, and 219will not allow normal configuration. If the device addresses are hard 220coded, however, the RAID driver would detect that the middle component 221is unavailable, and bring the RAID 5 set up in degraded mode. Note 222that the auto-detection and auto-configuration code does not care 223about where the components live. The auto-configuration code will 224correctly configure a device even after any number of the components 225have been re-arranged. 226.Pp 227The first step to using the 228.Nm 229driver is to ensure that it is suitably configured in the kernel. This is 230done by adding a line similar to: 231.Bd -unfilled -offset indent 232pseudo-device raid 4 # RAIDframe disk device 233.Ed 234.Pp 235to the kernel configuration file. The 236.Sq count 237argument ( 238.Sq 4 , 239in this case), specifies the number of RAIDframe drivers to configure. 240To turn on component auto-detection and auto-configuration of RAID 241sets, simply add: 242.Bd -unfilled -offset indent 243options RAID_AUTOCONFIG 244.Ed 245.Pp 246to the kernel configuration file. 247.Pp 248All component partitions must be of the type 249.Dv FS_BSDFFS 250(e.g. 4.2BSD) or 251.Dv FS_RAID . 252The use of the latter is strongly encouraged, and is required if 253auto-configuration of the RAID set is desired. Since RAIDframe leaves 254room for disklabels, RAID components can be simply raw disks, or 255partitions which use an entire disk. 256.Pp 257A more detailed treatment of actually using a 258.Nm 259device is found in 260.Xr raidctl 8 . 261It is highly recommended that the steps to reconstruct, copyback, and 262re-compute parity are well understood by the system administrator(s) 263.Ar before 264a component failure. Doing the wrong thing when a component fails may 265result in data loss. 266.Pp 267.Sh WARNINGS 268Certain RAID levels (1, 4, 5, 6, and others) can protect against some 269data loss due to component failure. However the loss of two 270components of a RAID 4 or 5 system, or the loss of a single component 271of a RAID 0 system, will result in the entire filesystems on that RAID 272device being lost. 273RAID is 274.Ar NOT 275a substitute for good backup practices. 276.Pp 277Recomputation of parity 278.Ar MUST 279be performed whenever there is a chance that it may have been 280compromised. This includes after system crashes, or before a RAID 281device has been used for the first time. Failure to keep parity 282correct will be catastrophic should a component ever fail -- it is 283better to use RAID 0 and get the additional space and speed, than it 284is to use parity, but not keep the parity correct. At least with RAID 2850 there is no perception of increased data security. 286.Pp 287.Sh FILES 288.Bl -tag -width /dev/XXrXraidX -compact 289.It Pa /dev/{,r}raid* 290.Nm 291device special files. 292.El 293.Pp 294.Sh SEE ALSO 295.Xr MAKEDEV 8 , 296.Xr raidctl 8 , 297.Xr config 8 , 298.Xr fsck 8 , 299.Xr mount 8 , 300.Xr newfs 8 , 301.Xr sd 4 302.Sh HISTORY 303The 304.Nm 305driver in 306.Nx 307is a port of RAIDframe, a framework for rapid prototyping of RAID 308structures developed by the folks at the Parallel Data Laboratory at 309Carnegie Mellon University (CMU). RAIDframe, as originally distributed 310by CMU, provides a RAID simulator for a number of different 311architectures, and a user-level device driver and a kernel device 312driver for Digital Unix. The 313.Nm 314driver is a kernelized version of RAIDframe v1.1. 315.Pp 316A more complete description of the internals and functionality of 317RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool 318for RAID Systems", by William V. Courtright II, Garth Gibson, Mark 319Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the 320Parallel Data Laboratory of Carnegie Mellon University. 321The 322.Nm 323driver first appeared in 324.Nx 1.4 . 325.Sh COPYRIGHT 326.Bd -unfilled 327The RAIDframe Copyright is as follows: 328 329Copyright (c) 1994-1996 Carnegie-Mellon University. 330All rights reserved. 331 332Permission to use, copy, modify and distribute this software and 333its documentation is hereby granted, provided that both the copyright 334notice and this permission notice appear in all copies of the 335software, derivative works or modified versions, and any portions 336thereof, and that both notices appear in supporting documentation. 337 338CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 339CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 340FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 341 342Carnegie Mellon requests users of this software to return to 343 344 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 345 School of Computer Science 346 Carnegie Mellon University 347 Pittsburgh PA 15213-3890 348 349any improvements or extensions that they make and grant Carnegie the 350rights to redistribute these changes. 351.Ed 352