1.\" $NetBSD: raid.4,v 1.32 2005/10/08 18:30:27 oster Exp $ 2.\" 3.\" Copyright (c) 1998 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Greg Oster 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.\" 38.\" Copyright (c) 1995 Carnegie-Mellon University. 39.\" All rights reserved. 40.\" 41.\" Author: Mark Holland 42.\" 43.\" Permission to use, copy, modify and distribute this software and 44.\" its documentation is hereby granted, provided that both the copyright 45.\" notice and this permission notice appear in all copies of the 46.\" software, derivative works or modified versions, and any portions 47.\" thereof, and that both notices appear in supporting documentation. 48.\" 49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 50.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 52.\" 53.\" Carnegie Mellon requests users of this software to return to 54.\" 55.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 56.\" School of Computer Science 57.\" Carnegie Mellon University 58.\" Pittsburgh PA 15213-3890 59.\" 60.\" any improvements or extensions that they make and grant Carnegie the 61.\" rights to redistribute these changes. 62.\" 63.Dd October 8, 2005 64.Dt RAID 4 65.Os 66.Sh NAME 67.Nm raid 68.Nd RAIDframe disk driver 69.Sh SYNOPSIS 70.Cd options RAID_AUTOCONFIG 71.Cd options RAID_DIAGNOSTIC 72.Cd options RF_ACC_TRACE=n 73.Cd options RF_DEBUG_MAP=n 74.Cd options RF_DEBUG_PSS=n 75.Cd options RF_DEBUG_QUEUE=n 76.Cd options RF_DEBUG_QUIESCE=n 77.Cd options RF_DEBUG_RECON=n 78.Cd options RF_DEBUG_STRIPELOCK=n 79.Cd options RF_DEBUG_VALIDATE_DAG=n 80.Cd options RF_DEBUG_VERIFYPARITY=n 81.Cd options RF_INCLUDE_CHAINDECLUSTER=n 82.Cd options RF_INCLUDE_EVENODD=n 83.Cd options RF_INCLUDE_INTERDECLUSTER=n 84.Cd options RF_INCLUDE_PARITY_DECLUSTERING=n 85.Cd options RF_INCLUDE_PARITY_DECLUSTERING_DS=n 86.Cd options RF_INCLUDE_PARITYLOGGING=n 87.Cd options RF_INCLUDE_RAID5_RS=n 88.Pp 89.Cd "pseudo-device raid" Op Ar count 90.Sh DESCRIPTION 91The 92.Nm 93driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to 94.Nx . 95This 96document assumes that the reader has at least some familiarity with RAID 97and RAID concepts. The reader is also assumed to know how to configure 98disks and pseudo-devices into kernels, how to generate kernels, and how 99to partition disks. 100.Pp 101RAIDframe provides a number of different RAID levels including: 102.Bl -tag -width indent 103.It RAID 0 104provides simple data striping across the components. 105.It RAID 1 106provides mirroring. 107.It RAID 4 108provides data striping across the components, with parity 109stored on a dedicated drive (in this case, the last component). 110.It RAID 5 111provides data striping across the components, with parity 112distributed across all the components. 113.El 114.Pp 115There are a wide variety of other RAID levels supported by RAIDframe. 116The configuration file options to enable them are briefly outlined 117at the end of this section. 118.Pp 119Depending on the parity level configured, the device driver can 120support the failure of component drives. The number of failures 121allowed depends on the parity level selected. If the driver is able 122to handle drive failures, and a drive does fail, then the system is 123operating in "degraded mode". In this mode, all missing data must be 124reconstructed from the data and parity present on the other 125components. This results in much slower data accesses, but 126does mean that a failure need not bring the system to a complete halt. 127.Pp 128The RAID driver supports and enforces the use of 129.Sq component labels . 130A 131.Sq component label 132contains important information about the component, including a 133user-specified serial number, the row and column of that component in 134the RAID set, and whether the data (and parity) on the component is 135.Sq clean . 136If the driver determines that the labels are very inconsistent with 137respect to each other (e.g. two or more serial numbers do not match) 138or that the component label is not consistent with its assigned place 139in the set (e.g. the component label claims the component should be 140the 3rd one in a 6-disk set, but the RAID set has it as the 3rd component 141in a 5-disk set) then the device will fail to configure. If the 142driver determines that exactly one component label seems to be 143incorrect, and the RAID set is being configured as a set that supports 144a single failure, then the RAID set will be allowed to configure, but 145the incorrectly labeled component will be marked as 146.Sq failed , 147and the RAID set will begin operation in degraded mode. 148If all of the components are consistent among themselves, the RAID set 149will configure normally. 150.Pp 151Component labels are also used to support the auto-detection and 152autoconfiguration of RAID sets. A RAID set can be flagged as 153autoconfigurable, in which case it will be configured automatically 154during the kernel boot process. RAID file systems which are 155automatically configured are also eligible to be the root file system. 156There is currently only limited support (alpha, amd64, i386, pmax, 157sparc, sparc64, and vax architectures) 158for booting a kernel directly from a RAID 1 set, and no support for 159booting from any other RAID sets. To use a RAID set as the root 160file system, a kernel is usually obtained from a small non-RAID 161partition, after which any autoconfiguring RAID set can be used for the 162root file system. See 163.Xr raidctl 8 164for more information on autoconfiguration of RAID sets. 165Note that with autoconfiguration of RAID sets, it is no longer 166necessary to hard-code SCSI IDs of drives. 167The autoconfiguration code will 168correctly configure a device even after any number of the components 169have had their device IDs changed or device names changed. 170.Pp 171The driver supports 172.Sq hot spares , 173disks which are on-line, but are not 174actively used in an existing file system. Should a disk fail, the 175driver is capable of reconstructing the failed disk onto a hot spare 176or back onto a replacement drive. 177If the components are hot swappable, the failed disk can then be 178removed, a new disk put in its place, and a copyback operation 179performed. The copyback operation, as its name indicates, will copy 180the reconstructed data from the hot spare to the previously failed 181(and now replaced) disk. Hot spares can also be hot-added using 182.Xr raidctl 8 . 183.Pp 184If a component cannot be detected when the RAID device is configured, 185that component will be simply marked as 'failed'. 186.Pp 187The user-land utility for doing all 188.Nm 189configuration and other operations 190is 191.Xr raidctl 8 . 192Most importantly, 193.Xr raidctl 8 194must be used with the 195.Fl i 196option to initialize all RAID sets. In particular, this 197initialization includes re-building the parity data. This rebuilding 198of parity data is also required when either a) a new RAID device is 199brought up for the first time or b) after an un-clean shutdown of a 200RAID device. By using the 201.Fl P 202option to 203.Xr raidctl 8 , 204and performing this on-demand recomputation of all parity 205before doing a 206.Xr fsck 8 207or a 208.Xr newfs 8 , 209file system integrity and parity integrity can be ensured. It bears 210repeating again that parity recomputation is 211.Ar required 212before any file systems are created or used on the RAID device. If the 213parity is not correct, then missing data cannot be correctly recovered. 214.Pp 215RAID levels may be combined in a hierarchical fashion. For example, a RAID 0 216device can be constructed out of a number of RAID 5 devices (which, in turn, 217may be constructed out of the physical disks, or of other RAID devices). 218.Pp 219The first step to using the 220.Nm 221driver is to ensure that it is suitably configured in the kernel. This is 222done by adding a line similar to: 223.Bd -unfilled -offset indent 224pseudo-device raid 4 # RAIDframe disk device 225.Ed 226.Pp 227to the kernel configuration file. The 228.Sq count 229argument ( 230.Sq 4 , 231in this case), specifies the number of RAIDframe drivers to configure. 232To turn on component auto-detection and autoconfiguration of RAID 233sets, simply add: 234.Bd -unfilled -offset indent 235options RAID_AUTOCONFIG 236.Ed 237.Pp 238to the kernel configuration file. 239.Pp 240All component partitions must be of the type 241.Dv FS_BSDFFS 242(e.g. 4.2BSD) or 243.Dv FS_RAID . 244The use of the latter is strongly encouraged, and is required if 245autoconfiguration of the RAID set is desired. Since RAIDframe leaves 246room for disklabels, RAID components can be simply raw disks, or 247partitions which use an entire disk. 248.Pp 249A more detailed treatment of actually using a 250.Nm 251device is found in 252.Xr raidctl 8 . 253It is highly recommended that the steps to reconstruct, copyback, and 254re-compute parity are well understood by the system administrator(s) 255.Ar before 256a component failure. Doing the wrong thing when a component fails may 257result in data loss. 258.Pp 259Additional internal consistency checking can be enabled by specifying: 260.Bd -unfilled -offset indent 261options RAID_DIAGNOSTIC 262.Ed 263.Pp 264These assertions are disabled by default in order to improve 265performance. 266.Pp 267RAIDframe supports an access tracing facility for tracking both 268requests made and performance of various parts of the RAID systems 269as the request is processed. 270To enable this tracing the following option may be specified: 271.Bd -unfilled -offset indent 272options RF_ACC_TRACE=1 273.Ed 274.Pp 275For extensive debugging there are a number of kernel options which 276will aid in performing extra diagnosis of various parts of the 277RAIDframe sub-systems. 278Note that in order to make full use of these options it is often 279necessary to enable one or more debugging options as listed in 280.Pa src/sys/dev/raidframe/rf_options.h . 281As well, these options are also only typically useful for people who wish 282to debug various parts of RAIDframe. 283The options include: 284.Pp 285For debugging the code which maps RAID addresses to physical 286addresses: 287.Bd -unfilled -offset indent 288options RF_DEBUG_MAP=1 289.Ed 290.Pp 291Parity stripe status debugging is enabled with: 292.Bd -unfilled -offset indent 293options RF_DEBUG_PSS=1 294.Ed 295.Pp 296Additional debugging for queuing is enabled with: 297.Bd -unfilled -offset indent 298options RF_DEBUG_QUEUE=1 299.Ed 300.Pp 301Problems with non-quiescent file systems should be easier to debug if 302the following is enabled: 303.Bd -unfilled -offset indent 304options RF_DEBUG_QUIESCE=1 305.Ed 306.Pp 307Stripelock debugging is enabled with: 308.Bd -unfilled -offset indent 309options RF_DEBUG_STRIPELOCK=1 310.Ed 311.Pp 312Additional diagnostic checks during reconstruction are enabled with: 313.Bd -unfilled -offset indent 314options RF_DEBUG_RECON=1 315.Ed 316.Pp 317Validation of the DAGs (Directed Acyclic Graphs) used to describe an 318I/O access can be performed when the following is enabled: 319.Bd -unfilled -offset indent 320options RF_DEBUG_VALIDATE_DAG=1 321.Ed 322.Pp 323Additional diagnostics during parity verification are enabled with: 324.Bd -unfilled -offset indent 325options RF_DEBUG_VERIFYPARITY=1 326.Ed 327.Pp 328There are a number of less commonly used RAID levels supported by 329RAIDframe. 330These additional RAID types should be considered experimental, and 331may not be ready for production use. 332The various types and the options to enable them are shown here: 333.Pp 334For Even-Odd parity: 335.Bd -unfilled -offset indent 336options RF_INCLUDE_EVENODD=1 337.Ed 338.Pp 339For RAID level 5 with rotated sparing: 340.Bd -unfilled -offset indent 341options RF_INCLUDE_RAID5_RS=1 342.Ed 343.Pp 344For Parity Logging (highly experimental): 345.Bd -unfilled -offset indent 346options RF_INCLUDE_PARITYLOGGING=1 347.Ed 348.Pp 349For Chain Declustering: 350.Bd -unfilled -offset indent 351options RF_INCLUDE_CHAINDECLUSTER=1 352.Ed 353.Pp 354For Interleaved Declustering: 355.Bd -unfilled -offset indent 356options RF_INCLUDE_INTERDECLUSTER=1 357.Ed 358.Pp 359For Parity Declustering: 360.Bd -unfilled -offset indent 361options RF_INCLUDE_PARITY_DECLUSTERING=1 362.Ed 363.Pp 364For Parity Declustering with Distributed Spares: 365.Bd -unfilled -offset indent 366options RF_INCLUDE_PARITY_DECLUSTERING_DS=1 367.Ed 368.Pp 369The reader is referred to the RAIDframe documentation mentioned in the 370.Sx HISTORY 371section for more detail on these various RAID configurations. 372.Sh WARNINGS 373Certain RAID levels (1, 4, 5, 6, and others) can protect against some 374data loss due to component failure. However the loss of two 375components of a RAID 4 or 5 system, or the loss of a single component 376of a RAID 0 system, will result in the entire file systems on that RAID 377device being lost. 378RAID is 379.Ar NOT 380a substitute for good backup practices. 381.Pp 382Recomputation of parity 383.Ar MUST 384be performed whenever there is a chance that it may have been 385compromised. This includes after system crashes, or before a RAID 386device has been used for the first time. Failure to keep parity 387correct will be catastrophic should a component ever fail -- it is 388better to use RAID 0 and get the additional space and speed, than it 389is to use parity, but not keep the parity correct. At least with RAID 3900 there is no perception of increased data security. 391.Sh FILES 392.Bl -tag -width /dev/XXrXraidX -compact 393.It Pa /dev/{,r}raid* 394.Nm 395device special files. 396.El 397.Sh SEE ALSO 398.Xr config 1 , 399.Xr sd 4 , 400.Xr MAKEDEV 8 , 401.Xr fsck 8 , 402.Xr mount 8 , 403.Xr newfs 8 , 404.Xr raidctl 8 405.Sh HISTORY 406The 407.Nm 408driver in 409.Nx 410is a port of RAIDframe, a framework for rapid prototyping of RAID 411structures developed by the folks at the Parallel Data Laboratory at 412Carnegie Mellon University (CMU). RAIDframe, as originally distributed 413by CMU, provides a RAID simulator for a number of different 414architectures, and a user-level device driver and a kernel device 415driver for Digital Unix. The 416.Nm 417driver is a kernelized version of RAIDframe v1.1. 418.Pp 419A more complete description of the internals and functionality of 420RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool 421for RAID Systems", by William V. Courtright II, Garth Gibson, Mark 422Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the 423Parallel Data Laboratory of Carnegie Mellon University. 424The 425.Nm 426driver first appeared in 427.Nx 1.4 . 428.Sh COPYRIGHT 429.Bd -unfilled 430The RAIDframe Copyright is as follows: 431.Pp 432Copyright (c) 1994-1996 Carnegie-Mellon University. 433All rights reserved. 434.Pp 435Permission to use, copy, modify and distribute this software and 436its documentation is hereby granted, provided that both the copyright 437notice and this permission notice appear in all copies of the 438software, derivative works or modified versions, and any portions 439thereof, and that both notices appear in supporting documentation. 440.Pp 441CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 442CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 443FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 444.Pp 445Carnegie Mellon requests users of this software to return to 446.Pp 447 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 448 School of Computer Science 449 Carnegie Mellon University 450 Pittsburgh PA 15213-3890 451.Pp 452any improvements or extensions that they make and grant Carnegie the 453rights to redistribute these changes. 454.Ed 455