1.\" $NetBSD: raidctl.8,v 1.56 2008/08/28 21:24:30 wiz Exp $ 2.\" 3.\" Copyright (c) 1998, 2002 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Greg Oster 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.\" 31.\" Copyright (c) 1995 Carnegie-Mellon University. 32.\" All rights reserved. 33.\" 34.\" Author: Mark Holland 35.\" 36.\" Permission to use, copy, modify and distribute this software and 37.\" its documentation is hereby granted, provided that both the copyright 38.\" notice and this permission notice appear in all copies of the 39.\" software, derivative works or modified versions, and any portions 40.\" thereof, and that both notices appear in supporting documentation. 41.\" 42.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 43.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 44.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 45.\" 46.\" Carnegie Mellon requests users of this software to return to 47.\" 48.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 49.\" School of Computer Science 50.\" Carnegie Mellon University 51.\" Pittsburgh PA 15213-3890 52.\" 53.\" any improvements or extensions that they make and grant Carnegie the 54.\" rights to redistribute these changes. 55.\" 56.Dd August 26, 2008 57.Dt RAIDCTL 8 58.Os 59.Sh NAME 60.Nm raidctl 61.Nd configuration utility for the RAIDframe disk driver 62.Sh SYNOPSIS 63.Nm 64.Op Fl v 65.Fl a Ar component Ar dev 66.Nm 67.Op Fl v 68.Fl A Op yes | no | root 69.Ar dev 70.Nm 71.Op Fl v 72.Fl B Ar dev 73.Nm 74.Op Fl v 75.Fl c Ar config_file Ar dev 76.Nm 77.Op Fl v 78.Fl C Ar config_file Ar dev 79.Nm 80.Op Fl v 81.Fl f Ar component Ar dev 82.Nm 83.Op Fl v 84.Fl F Ar component Ar dev 85.Nm 86.Op Fl v 87.Fl g Ar component Ar dev 88.Nm 89.Op Fl v 90.Fl G Ar dev 91.Nm 92.Op Fl v 93.Fl i Ar dev 94.Nm 95.Op Fl v 96.Fl I Ar serial_number Ar dev 97.Nm 98.Op Fl v 99.Fl p Ar dev 100.Nm 101.Op Fl v 102.Fl P Ar dev 103.Nm 104.Op Fl v 105.Fl r Ar component Ar dev 106.Nm 107.Op Fl v 108.Fl R Ar component Ar dev 109.Nm 110.Op Fl v 111.Fl s Ar dev 112.Nm 113.Op Fl v 114.Fl S Ar dev 115.Nm 116.Op Fl v 117.Fl u Ar dev 118.Sh DESCRIPTION 119.Nm 120is the user-land control program for 121.Xr raid 4 , 122the RAIDframe disk device. 123.Nm 124is primarily used to dynamically configure and unconfigure RAIDframe disk 125devices. 126For more information about the RAIDframe disk device, see 127.Xr raid 4 . 128.Pp 129This document assumes the reader has at least rudimentary knowledge of 130RAID and RAID concepts. 131.Pp 132The command-line options for 133.Nm 134are as follows: 135.Bl -tag -width indent 136.It Fl a Ar component Ar dev 137Add 138.Ar component 139as a hot spare for the device 140.Ar dev . 141Component labels (which identify the location of a given 142component within a particular RAID set) are automatically added to the 143hot spare after it has been used and are not required for 144.Ar component 145before it is used. 146.It Fl A Ic yes Ar dev 147Make the RAID set auto-configurable. 148The RAID set will be automatically configured at boot 149.Ar before 150the root file system is mounted. 151Note that all components of the set must be of type 152.Dv RAID 153in the disklabel. 154.It Fl A Ic no Ar dev 155Turn off auto-configuration for the RAID set. 156.It Fl A Ic root Ar dev 157Make the RAID set auto-configurable, and also mark the set as being 158eligible to be the root partition. 159A RAID set configured this way will 160.Ar override 161the use of the boot disk as the root device. 162All components of the set must be of type 163.Dv RAID 164in the disklabel. 165Note that only certain architectures 166.Pq currently alpha, i386, pmax, sparc, sparc64, and vax 167support booting a kernel directly from a RAID set. 168.It Fl B Ar dev 169Initiate a copyback of reconstructed data from a spare disk to 170its original disk. 171This is performed after a component has failed, 172and the failed drive has been reconstructed onto a spare drive. 173.It Fl c Ar config_file Ar dev 174Configure the RAIDframe device 175.Ar dev 176according to the configuration given in 177.Ar config_file . 178A description of the contents of 179.Ar config_file 180is given later. 181.It Fl C Ar config_file Ar dev 182As for 183.Fl c , 184but forces the configuration to take place. 185This is required the first time a RAID set is configured. 186.It Fl f Ar component Ar dev 187This marks the specified 188.Ar component 189as having failed, but does not initiate a reconstruction of that component. 190.It Fl F Ar component Ar dev 191Fails the specified 192.Ar component 193of the device, and immediately begin a reconstruction of the failed 194disk onto an available hot spare. 195This is one of the mechanisms used to start 196the reconstruction process if a component does have a hardware failure. 197.It Fl g Ar component Ar dev 198Get the component label for the specified component. 199.It Fl G Ar dev 200Generate the configuration of the RAIDframe device in a format suitable for 201use with the 202.Fl c 203or 204.Fl C 205options. 206.It Fl i Ar dev 207Initialize the RAID device. 208In particular, (re-)write the parity on the selected device. 209This 210.Em MUST 211be done for 212.Em all 213RAID sets before the RAID device is labeled and before 214file systems are created on the RAID device. 215.It Fl I Ar serial_number Ar dev 216Initialize the component labels on each component of the device. 217.Ar serial_number 218is used as one of the keys in determining whether a 219particular set of components belong to the same RAID set. 220While not strictly enforced, different serial numbers should be used for 221different RAID sets. 222This step 223.Em MUST 224be performed when a new RAID set is created. 225.It Fl p Ar dev 226Check the status of the parity on the RAID set. 227Displays a status message, 228and returns successfully if the parity is up-to-date. 229.It Fl P Ar dev 230Check the status of the parity on the RAID set, and initialize 231(re-write) the parity if the parity is not known to be up-to-date. 232This is normally used after a system crash (and before a 233.Xr fsck 8 ) 234to ensure the integrity of the parity. 235.It Fl r Ar component Ar dev 236Remove the spare disk specified by 237.Ar component 238from the set of available spare components. 239.It Fl R Ar component Ar dev 240Fails the specified 241.Ar component , 242if necessary, and immediately begins a reconstruction back to 243.Ar component . 244This is useful for reconstructing back onto a component after 245it has been replaced following a failure. 246.It Fl s Ar dev 247Display the status of the RAIDframe device for each of the components 248and spares. 249.It Fl S Ar dev 250Check the status of parity re-writing, component reconstruction, and 251component copyback. 252The output indicates the amount of progress 253achieved in each of these areas. 254.It Fl u Ar dev 255Unconfigure the RAIDframe device. 256This does not remove any component labels or change any configuration 257settings (e.g. auto-configuration settings) for the RAID set. 258.It Fl v 259Be more verbose. 260For operations such as reconstructions, parity 261re-writing, and copybacks, provide a progress indicator. 262.El 263.Pp 264The device used by 265.Nm 266is specified by 267.Ar dev . 268.Ar dev 269may be either the full name of the device, e.g., 270.Pa /dev/rraid0d , 271for the i386 architecture, or 272.Pa /dev/rraid0c 273for many others, or just simply 274.Pa raid0 275(for 276.Pa /dev/rraid0[cd] ) . 277It is recommended that the partitions used to represent the 278RAID device are not used for file systems. 279.Ss Configuration file 280The format of the configuration file is complex, and 281only an abbreviated treatment is given here. 282In the configuration files, a 283.Sq # 284indicates the beginning of a comment. 285.Pp 286There are 4 required sections of a configuration file, and 2 287optional sections. 288Each section begins with a 289.Sq START , 290followed by the section name, 291and the configuration parameters associated with that section. 292The first section is the 293.Sq array 294section, and it specifies 295the number of rows, columns, and spare disks in the RAID set. 296For example: 297.Bd -literal -offset indent 298START array 2991 3 0 300.Ed 301.Pp 302indicates an array with 1 row, 3 columns, and 0 spare disks. 303Note that although multi-dimensional arrays may be specified, they are 304.Em NOT 305supported in the driver. 306.Pp 307The second section, the 308.Sq disks 309section, specifies the actual components of the device. 310For example: 311.Bd -literal -offset indent 312START disks 313/dev/sd0e 314/dev/sd1e 315/dev/sd2e 316.Ed 317.Pp 318specifies the three component disks to be used in the RAID device. 319If any of the specified drives cannot be found when the RAID device is 320configured, then they will be marked as 321.Sq failed , 322and the system will operate in degraded mode. 323Note that it is 324.Em imperative 325that the order of the components in the configuration file does not 326change between configurations of a RAID device. 327Changing the order of the components will result in data loss 328if the set is configured with the 329.Fl C 330option. 331In normal circumstances, the RAID set will not configure if only 332.Fl c 333is specified, and the components are out-of-order. 334.Pp 335The next section, which is the 336.Sq spare 337section, is optional, and, if present, specifies the devices to be used as 338.Sq hot spares 339\(em devices which are on-line, 340but are not actively used by the RAID driver unless 341one of the main components fail. 342A simple 343.Sq spare 344section might be: 345.Bd -literal -offset indent 346START spare 347/dev/sd3e 348.Ed 349.Pp 350for a configuration with a single spare component. 351If no spare drives are to be used in the configuration, then the 352.Sq spare 353section may be omitted. 354.Pp 355The next section is the 356.Sq layout 357section. 358This section describes the general layout parameters for the RAID device, 359and provides such information as 360sectors per stripe unit, 361stripe units per parity unit, 362stripe units per reconstruction unit, 363and the parity configuration to use. 364This section might look like: 365.Bd -literal -offset indent 366START layout 367# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level 36832 1 1 5 369.Ed 370.Pp 371The sectors per stripe unit specifies, in blocks, the interleave 372factor; i.e., the number of contiguous sectors to be written to each 373component for a single stripe. 374Appropriate selection of this value (32 in this example) 375is the subject of much research in RAID architectures. 376The stripe units per parity unit and 377stripe units per reconstruction unit are normally each set to 1. 378While certain values above 1 are permitted, a discussion of valid 379values and the consequences of using anything other than 1 are outside 380the scope of this document. 381The last value in this section (5 in this example) 382indicates the parity configuration desired. 383Valid entries include: 384.Bl -tag -width inde 385.It 0 386RAID level 0. 387No parity, only simple striping. 388.It 1 389RAID level 1. 390Mirroring. 391The parity is the mirror. 392.It 4 393RAID level 4. 394Striping across components, with parity stored on the last component. 395.It 5 396RAID level 5. 397Striping across components, parity distributed across all components. 398.El 399.Pp 400There are other valid entries here, including those for Even-Odd 401parity, RAID level 5 with rotated sparing, Chained declustering, 402and Interleaved declustering, but as of this writing the code for 403those parity operations has not been tested with 404.Nx . 405.Pp 406The next required section is the 407.Sq queue 408section. 409This is most often specified as: 410.Bd -literal -offset indent 411START queue 412fifo 100 413.Ed 414.Pp 415where the queuing method is specified as fifo (first-in, first-out), 416and the size of the per-component queue is limited to 100 requests. 417Other queuing methods may also be specified, but a discussion of them 418is beyond the scope of this document. 419.Pp 420The final section, the 421.Sq debug 422section, is optional. 423For more details on this the reader is referred to 424the RAIDframe documentation discussed in the 425.Sx HISTORY 426section. 427.Pp 428See 429.Sx EXAMPLES 430for a more complete configuration file example. 431.Sh FILES 432.Bl -tag -width /dev/XXrXraidX -compact 433.It Pa /dev/{,r}raid* 434.Cm raid 435device special files. 436.El 437.Sh EXAMPLES 438It is highly recommended that before using the RAID driver for real 439file systems that the system administrator(s) become quite familiar 440with the use of 441.Nm , 442and that they understand how the component reconstruction process works. 443The examples in this section will focus on configuring a 444number of different RAID sets of varying degrees of redundancy. 445By working through these examples, administrators should be able to 446develop a good feel for how to configure a RAID set, and how to 447initiate reconstruction of failed components. 448.Pp 449In the following examples 450.Sq raid0 451will be used to denote the RAID device. 452Depending on the architecture, 453.Pa /dev/rraid0c 454or 455.Pa /dev/rraid0d 456may be used in place of 457.Pa raid0 . 458.Ss Initialization and Configuration 459The initial step in configuring a RAID set is to identify the components 460that will be used in the RAID set. 461All components should be the same size. 462Each component should have a disklabel type of 463.Dv FS_RAID , 464and a typical disklabel entry for a RAID component might look like: 465.Bd -literal -offset indent 466f: 1800000 200495 RAID # (Cyl. 405*- 4041*) 467.Ed 468.Pp 469While 470.Dv FS_BSDFFS 471will also work as the component type, the type 472.Dv FS_RAID 473is preferred for RAIDframe use, as it is required for features such as 474auto-configuration. 475As part of the initial configuration of each RAID set, 476each component will be given a 477.Sq component label . 478A 479.Sq component label 480contains important information about the component, including a 481user-specified serial number, the row and column of that component in 482the RAID set, the redundancy level of the RAID set, a 483.Sq modification counter , 484and whether the parity information (if any) on that 485component is known to be correct. 486Component labels are an integral part of the RAID set, 487since they are used to ensure that components 488are configured in the correct order, and used to keep track of other 489vital information about the RAID set. 490Component labels are also required for the auto-detection 491and auto-configuration of RAID sets at boot time. 492For a component label to be considered valid, that 493particular component label must be in agreement with the other 494component labels in the set. 495For example, the serial number, 496.Sq modification counter , 497number of rows and number of columns must all be in agreement. 498If any of these are different, then the component is 499not considered to be part of the set. 500See 501.Xr raid 4 502for more information about component labels. 503.Pp 504Once the components have been identified, and the disks have 505appropriate labels, 506.Nm 507is then used to configure the 508.Xr raid 4 509device. 510To configure the device, a configuration file which looks something like: 511.Bd -literal -offset indent 512START array 513# numRow numCol numSpare 5141 3 1 515 516START disks 517/dev/sd1e 518/dev/sd2e 519/dev/sd3e 520 521START spare 522/dev/sd4e 523 524START layout 525# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 52632 1 1 5 527 528START queue 529fifo 100 530.Ed 531.Pp 532is created in a file. 533The above configuration file specifies a RAID 5 534set consisting of the components 535.Pa /dev/sd1e , 536.Pa /dev/sd2e , 537and 538.Pa /dev/sd3e , 539with 540.Pa /dev/sd4e 541available as a 542.Sq hot spare 543in case one of the three main drives should fail. 544A RAID 0 set would be specified in a similar way: 545.Bd -literal -offset indent 546START array 547# numRow numCol numSpare 5481 4 0 549 550START disks 551/dev/sd10e 552/dev/sd11e 553/dev/sd12e 554/dev/sd13e 555 556START layout 557# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 55864 1 1 0 559 560START queue 561fifo 100 562.Ed 563.Pp 564In this case, devices 565.Pa /dev/sd10e , 566.Pa /dev/sd11e , 567.Pa /dev/sd12e , 568and 569.Pa /dev/sd13e 570are the components that make up this RAID set. 571Note that there are no hot spares for a RAID 0 set, 572since there is no way to recover data if any of the components fail. 573.Pp 574For a RAID 1 (mirror) set, the following configuration might be used: 575.Bd -literal -offset indent 576START array 577# numRow numCol numSpare 5781 2 0 579 580START disks 581/dev/sd20e 582/dev/sd21e 583 584START layout 585# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 586128 1 1 1 587 588START queue 589fifo 100 590.Ed 591.Pp 592In this case, 593.Pa /dev/sd20e 594and 595.Pa /dev/sd21e 596are the two components of the mirror set. 597While no hot spares have been specified in this 598configuration, they easily could be, just as they were specified in 599the RAID 5 case above. 600Note as well that RAID 1 sets are currently limited to only 2 components. 601At present, n-way mirroring is not possible. 602.Pp 603The first time a RAID set is configured, the 604.Fl C 605option must be used: 606.Bd -literal -offset indent 607raidctl -C raid0.conf raid0 608.Ed 609.Pp 610where 611.Pa raid0.conf 612is the name of the RAID configuration file. 613The 614.Fl C 615forces the configuration to succeed, even if any of the component 616labels are incorrect. 617The 618.Fl C 619option should not be used lightly in 620situations other than initial configurations, as if 621the system is refusing to configure a RAID set, there is probably a 622very good reason for it. 623After the initial configuration is done (and 624appropriate component labels are added with the 625.Fl I 626option) then raid0 can be configured normally with: 627.Bd -literal -offset indent 628raidctl -c raid0.conf raid0 629.Ed 630.Pp 631When the RAID set is configured for the first time, it is 632necessary to initialize the component labels, and to initialize the 633parity on the RAID set. 634Initializing the component labels is done with: 635.Bd -literal -offset indent 636raidctl -I 112341 raid0 637.Ed 638.Pp 639where 640.Sq 112341 641is a user-specified serial number for the RAID set. 642This initialization step is 643.Em required 644for all RAID sets. 645As well, using different serial numbers between RAID sets is 646.Em strongly encouraged , 647as using the same serial number for all RAID sets will only serve to 648decrease the usefulness of the component label checking. 649.Pp 650Initializing the RAID set is done via the 651.Fl i 652option. 653This initialization 654.Em MUST 655be done for 656.Em all 657RAID sets, since among other things it verifies that the parity (if 658any) on the RAID set is correct. 659Since this initialization may be quite time-consuming, the 660.Fl v 661option may be also used in conjunction with 662.Fl i : 663.Bd -literal -offset indent 664raidctl -iv raid0 665.Ed 666.Pp 667This will give more verbose output on the 668status of the initialization: 669.Bd -literal -offset indent 670Initiating re-write of parity 671Parity Re-write status: 672 10% |**** | ETA: 06:03 / 673.Ed 674.Pp 675The output provides a 676.Sq Percent Complete 677in both a numeric and graphical format, as well as an estimated time 678to completion of the operation. 679.Pp 680Since it is the parity that provides the 681.Sq redundancy 682part of RAID, it is critical that the parity is correct as much as possible. 683If the parity is not correct, then there is no 684guarantee that data will not be lost if a component fails. 685.Pp 686Once the parity is known to be correct, it is then safe to perform 687.Xr disklabel 8 , 688.Xr newfs 8 , 689or 690.Xr fsck 8 691on the device or its file systems, and then to mount the file systems 692for use. 693.Pp 694Under certain circumstances (e.g., the additional component has not 695arrived, or data is being migrated off of a disk destined to become a 696component) it may be desirable to configure a RAID 1 set with only 697a single component. 698This can be achieved by using the word 699.Dq absent 700to indicate that a particular component is not present. 701In the following: 702.Bd -literal -offset indent 703START array 704# numRow numCol numSpare 7051 2 0 706 707START disks 708absent 709/dev/sd0e 710 711START layout 712# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 713128 1 1 1 714 715START queue 716fifo 100 717.Ed 718.Pp 719.Pa /dev/sd0e 720is the real component, and will be the second disk of a RAID 1 set. 721The first component is simply marked as being absent. 722Configuration (using 723.Fl C 724and 725.Fl I Ar 12345 726as above) proceeds normally, but initialization of the RAID set will 727have to wait until all physical components are present. 728After configuration, this set can be used normally, but will be operating 729in degraded mode. 730Once a second physical component is obtained, it can be hot-added, 731the existing data mirrored, and normal operation resumed. 732.Pp 733The size of the resulting RAID set will depend on the number of data 734components in the set. 735Space is automatically reserved for the component labels, and 736the actual amount of space used 737for data on a component will be rounded down to the largest possible 738multiple of the sectors per stripe unit (sectPerSU) value. 739Thus, the amount of space provided by the RAID set will be less 740than the sum of the size of the components. 741.Ss Maintenance of the RAID set 742After the parity has been initialized for the first time, the command: 743.Bd -literal -offset indent 744raidctl -p raid0 745.Ed 746.Pp 747can be used to check the current status of the parity. 748To check the parity and rebuild it necessary (for example, 749after an unclean shutdown) the command: 750.Bd -literal -offset indent 751raidctl -P raid0 752.Ed 753.Pp 754is used. 755Note that re-writing the parity can be done while 756other operations on the RAID set are taking place (e.g., while doing a 757.Xr fsck 8 758on a file system on the RAID set). 759However: for maximum effectiveness of the RAID set, the parity should be 760known to be correct before any data on the set is modified. 761.Pp 762To see how the RAID set is doing, the following command can be used to 763show the RAID set's status: 764.Bd -literal -offset indent 765raidctl -s raid0 766.Ed 767.Pp 768The output will look something like: 769.Bd -literal -offset indent 770Components: 771 /dev/sd1e: optimal 772 /dev/sd2e: optimal 773 /dev/sd3e: optimal 774Spares: 775 /dev/sd4e: spare 776Component label for /dev/sd1e: 777 Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 778 Version: 2 Serial Number: 13432 Mod Counter: 65 779 Clean: No Status: 0 780 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 781 RAID Level: 5 blocksize: 512 numBlocks: 1799936 782 Autoconfig: No 783 Last configured as: raid0 784Component label for /dev/sd2e: 785 Row: 0 Column: 1 Num Rows: 1 Num Columns: 3 786 Version: 2 Serial Number: 13432 Mod Counter: 65 787 Clean: No Status: 0 788 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 789 RAID Level: 5 blocksize: 512 numBlocks: 1799936 790 Autoconfig: No 791 Last configured as: raid0 792Component label for /dev/sd3e: 793 Row: 0 Column: 2 Num Rows: 1 Num Columns: 3 794 Version: 2 Serial Number: 13432 Mod Counter: 65 795 Clean: No Status: 0 796 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 797 RAID Level: 5 blocksize: 512 numBlocks: 1799936 798 Autoconfig: No 799 Last configured as: raid0 800Parity status: clean 801Reconstruction is 100% complete. 802Parity Re-write is 100% complete. 803Copyback is 100% complete. 804.Ed 805.Pp 806This indicates that all is well with the RAID set. 807Of importance here are the component lines which read 808.Sq optimal , 809and the 810.Sq Parity status 811line. 812.Sq Parity status: clean 813indicates that the parity is up-to-date for this RAID set, 814whether or not the RAID set is in redundant or degraded mode. 815.Sq Parity status: DIRTY 816indicates that it is not known if the parity information is 817consistent with the data, and that the parity information needs 818to be checked. 819Note that if there are file systems open on the RAID set, 820the individual components will not be 821.Sq clean 822but the set as a whole can still be clean. 823.Pp 824To check the component label of 825.Pa /dev/sd1e , 826the following is used: 827.Bd -literal -offset indent 828raidctl -g /dev/sd1e raid0 829.Ed 830.Pp 831The output of this command will look something like: 832.Bd -literal -offset indent 833Component label for /dev/sd1e: 834 Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 835 Version: 2 Serial Number: 13432 Mod Counter: 65 836 Clean: No Status: 0 837 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 838 RAID Level: 5 blocksize: 512 numBlocks: 1799936 839 Autoconfig: No 840 Last configured as: raid0 841.Ed 842.Ss Dealing with Component Failures 843If for some reason 844(perhaps to test reconstruction) it is necessary to pretend a drive 845has failed, the following will perform that function: 846.Bd -literal -offset indent 847raidctl -f /dev/sd2e raid0 848.Ed 849.Pp 850The system will then be performing all operations in degraded mode, 851where missing data is re-computed from existing data and the parity. 852In this case, obtaining the status of raid0 will return (in part): 853.Bd -literal -offset indent 854Components: 855 /dev/sd1e: optimal 856 /dev/sd2e: failed 857 /dev/sd3e: optimal 858Spares: 859 /dev/sd4e: spare 860.Ed 861.Pp 862Note that with the use of 863.Fl f 864a reconstruction has not been started. 865To both fail the disk and start a reconstruction, the 866.Fl F 867option must be used: 868.Bd -literal -offset indent 869raidctl -F /dev/sd2e raid0 870.Ed 871.Pp 872The 873.Fl f 874option may be used first, and then the 875.Fl F 876option used later, on the same disk, if desired. 877Immediately after the reconstruction is started, the status will report: 878.Bd -literal -offset indent 879Components: 880 /dev/sd1e: optimal 881 /dev/sd2e: reconstructing 882 /dev/sd3e: optimal 883Spares: 884 /dev/sd4e: used_spare 885[...] 886Parity status: clean 887Reconstruction is 10% complete. 888Parity Re-write is 100% complete. 889Copyback is 100% complete. 890.Ed 891.Pp 892This indicates that a reconstruction is in progress. 893To find out how the reconstruction is progressing the 894.Fl S 895option may be used. 896This will indicate the progress in terms of the 897percentage of the reconstruction that is completed. 898When the reconstruction is finished the 899.Fl s 900option will show: 901.Bd -literal -offset indent 902Components: 903 /dev/sd1e: optimal 904 /dev/sd2e: spared 905 /dev/sd3e: optimal 906Spares: 907 /dev/sd4e: used_spare 908[...] 909Parity status: clean 910Reconstruction is 100% complete. 911Parity Re-write is 100% complete. 912Copyback is 100% complete. 913.Ed 914.Pp 915At this point there are at least two options. 916First, if 917.Pa /dev/sd2e 918is known to be good (i.e., the failure was either caused by 919.Fl f 920or 921.Fl F , 922or the failed disk was replaced), then a copyback of the data can 923be initiated with the 924.Fl B 925option. 926In this example, this would copy the entire contents of 927.Pa /dev/sd4e 928to 929.Pa /dev/sd2e . 930Once the copyback procedure is complete, the 931status of the device would be (in part): 932.Bd -literal -offset indent 933Components: 934 /dev/sd1e: optimal 935 /dev/sd2e: optimal 936 /dev/sd3e: optimal 937Spares: 938 /dev/sd4e: spare 939.Ed 940.Pp 941and the system is back to normal operation. 942.Pp 943The second option after the reconstruction is to simply use 944.Pa /dev/sd4e 945in place of 946.Pa /dev/sd2e 947in the configuration file. 948For example, the configuration file (in part) might now look like: 949.Bd -literal -offset indent 950START array 9511 3 0 952 953START drives 954/dev/sd1e 955/dev/sd4e 956/dev/sd3e 957.Ed 958.Pp 959This can be done as 960.Pa /dev/sd4e 961is completely interchangeable with 962.Pa /dev/sd2e 963at this point. 964Note that extreme care must be taken when 965changing the order of the drives in a configuration. 966This is one of the few instances where the devices and/or 967their orderings can be changed without loss of data! 968In general, the ordering of components in a configuration file should 969.Em never 970be changed. 971.Pp 972If a component fails and there are no hot spares 973available on-line, the status of the RAID set might (in part) look like: 974.Bd -literal -offset indent 975Components: 976 /dev/sd1e: optimal 977 /dev/sd2e: failed 978 /dev/sd3e: optimal 979No spares. 980.Ed 981.Pp 982In this case there are a number of options. 983The first option is to add a hot spare using: 984.Bd -literal -offset indent 985raidctl -a /dev/sd4e raid0 986.Ed 987.Pp 988After the hot add, the status would then be: 989.Bd -literal -offset indent 990Components: 991 /dev/sd1e: optimal 992 /dev/sd2e: failed 993 /dev/sd3e: optimal 994Spares: 995 /dev/sd4e: spare 996.Ed 997.Pp 998Reconstruction could then take place using 999.Fl F 1000as describe above. 1001.Pp 1002A second option is to rebuild directly onto 1003.Pa /dev/sd2e . 1004Once the disk containing 1005.Pa /dev/sd2e 1006has been replaced, one can simply use: 1007.Bd -literal -offset indent 1008raidctl -R /dev/sd2e raid0 1009.Ed 1010.Pp 1011to rebuild the 1012.Pa /dev/sd2e 1013component. 1014As the rebuilding is in progress, the status will be: 1015.Bd -literal -offset indent 1016Components: 1017 /dev/sd1e: optimal 1018 /dev/sd2e: reconstructing 1019 /dev/sd3e: optimal 1020No spares. 1021.Ed 1022.Pp 1023and when completed, will be: 1024.Bd -literal -offset indent 1025Components: 1026 /dev/sd1e: optimal 1027 /dev/sd2e: optimal 1028 /dev/sd3e: optimal 1029No spares. 1030.Ed 1031.Pp 1032In circumstances where a particular component is completely 1033unavailable after a reboot, a special component name will be used to 1034indicate the missing component. 1035For example: 1036.Bd -literal -offset indent 1037Components: 1038 /dev/sd2e: optimal 1039 component1: failed 1040No spares. 1041.Ed 1042.Pp 1043indicates that the second component of this RAID set was not detected 1044at all by the auto-configuration code. 1045The name 1046.Sq component1 1047can be used anywhere a normal component name would be used. 1048For example, to add a hot spare to the above set, and rebuild to that hot 1049spare, the following could be done: 1050.Bd -literal -offset indent 1051raidctl -a /dev/sd3e raid0 1052raidctl -F component1 raid0 1053.Ed 1054.Pp 1055at which point the data missing from 1056.Sq component1 1057would be reconstructed onto 1058.Pa /dev/sd3e . 1059.Pp 1060When more than one component is marked as 1061.Sq failed 1062due to a non-component hardware failure (e.g., loss of power to two 1063components, adapter problems, termination problems, or cabling issues) it 1064is quite possible to recover the data on the RAID set. 1065The first thing to be aware of is that the first disk to fail will 1066almost certainly be out-of-sync with the remainder of the array. 1067If any IO was performed between the time the first component is considered 1068.Sq failed 1069and when the second component is considered 1070.Sq failed , 1071then the first component to fail will 1072.Em not 1073contain correct data, and should be ignored. 1074When the second component is marked as failed, however, the RAID device will 1075(currently) panic the system. 1076At this point the data on the RAID set 1077(not including the first failed component) is still self consistent, 1078and will be in no worse state of repair than had the power gone out in 1079the middle of a write to a file system on a non-RAID device. 1080The problem, however, is that the component labels may now have 3 different 1081.Sq modification counters 1082(one value on the first component that failed, one value on the second 1083component that failed, and a third value on the remaining components). 1084In such a situation, the RAID set will not autoconfigure, 1085and can only be forcibly re-configured 1086with the 1087.Fl C 1088option. 1089To recover the RAID set, one must first remedy whatever physical 1090problem caused the multiple-component failure. 1091After that is done, the RAID set can be restored by forcibly 1092configuring the raid set 1093.Em without 1094the component that failed first. 1095For example, if 1096.Pa /dev/sd1e 1097and 1098.Pa /dev/sd2e 1099fail (in that order) in a RAID set of the following configuration: 1100.Bd -literal -offset indent 1101START array 11021 4 0 1103 1104START drives 1105/dev/sd1e 1106/dev/sd2e 1107/dev/sd3e 1108/dev/sd4e 1109 1110START layout 1111# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 111264 1 1 5 1113 1114START queue 1115fifo 100 1116 1117.Ed 1118.Pp 1119then the following configuration (say "recover_raid0.conf") 1120.Bd -literal -offset indent 1121START array 11221 4 0 1123 1124START drives 1125absent 1126/dev/sd2e 1127/dev/sd3e 1128/dev/sd4e 1129 1130START layout 1131# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 113264 1 1 5 1133 1134START queue 1135fifo 100 1136.Ed 1137.Pp 1138can be used with 1139.Bd -literal -offset indent 1140raidctl -C recover_raid0.conf raid0 1141.Ed 1142.Pp 1143to force the configuration of raid0. 1144A 1145.Bd -literal -offset indent 1146raidctl -I 12345 raid0 1147.Ed 1148.Pp 1149will be required in order to synchronize the component labels. 1150At this point the file systems on the RAID set can then be checked and 1151corrected. 1152To complete the re-construction of the RAID set, 1153.Pa /dev/sd1e 1154is simply hot-added back into the array, and reconstructed 1155as described earlier. 1156.Ss RAID on RAID 1157RAID sets can be layered to create more complex and much larger RAID sets. 1158A RAID 0 set, for example, could be constructed from four RAID 5 sets. 1159The following configuration file shows such a setup: 1160.Bd -literal -offset indent 1161START array 1162# numRow numCol numSpare 11631 4 0 1164 1165START disks 1166/dev/raid1e 1167/dev/raid2e 1168/dev/raid3e 1169/dev/raid4e 1170 1171START layout 1172# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 1173128 1 1 0 1174 1175START queue 1176fifo 100 1177.Ed 1178.Pp 1179A similar configuration file might be used for a RAID 0 set 1180constructed from components on RAID 1 sets. 1181In such a configuration, the mirroring provides a high degree 1182of redundancy, while the striping provides additional speed benefits. 1183.Ss Auto-configuration and Root on RAID 1184RAID sets can also be auto-configured at boot. 1185To make a set auto-configurable, 1186simply prepare the RAID set as above, and then do a: 1187.Bd -literal -offset indent 1188raidctl -A yes raid0 1189.Ed 1190.Pp 1191to turn on auto-configuration for that set. 1192To turn off auto-configuration, use: 1193.Bd -literal -offset indent 1194raidctl -A no raid0 1195.Ed 1196.Pp 1197RAID sets which are auto-configurable will be configured before the 1198root file system is mounted. 1199These RAID sets are thus available for 1200use as a root file system, or for any other file system. 1201A primary advantage of using the auto-configuration is that RAID components 1202become more independent of the disks they reside on. 1203For example, SCSI ID's can change, but auto-configured sets will always be 1204configured correctly, even if the SCSI ID's of the component disks 1205have become scrambled. 1206.Pp 1207Having a system's root file system 1208.Pq Pa / 1209on a RAID set is also allowed, with the 1210.Sq a 1211partition of such a RAID set being used for 1212.Pa / . 1213To use raid0a as the root file system, simply use: 1214.Bd -literal -offset indent 1215raidctl -A root raid0 1216.Ed 1217.Pp 1218To return raid0a to be just an auto-configuring set simply use the 1219.Fl A Ar yes 1220arguments. 1221.Pp 1222Note that kernels can only be directly read from RAID 1 components on 1223architectures that support that 1224.Pq currently alpha, i386, pmax, sparc, sparc64, and vax . 1225On those architectures, the 1226.Dv FS_RAID 1227file system is recognized by the bootblocks, and will properly load the 1228kernel directly from a RAID 1 component. 1229For other architectures, or to support the root file system 1230on other RAID sets, some other mechanism must be used to get a kernel booting. 1231For example, a small partition containing only the secondary boot-blocks 1232and an alternate kernel (or two) could be used. 1233Once a kernel is booting however, and an auto-configuring RAID set is 1234found that is eligible to be root, then that RAID set will be 1235auto-configured and used as the root device. 1236If two or more RAID sets claim to be root devices, then the 1237user will be prompted to select the root device. 1238At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices. 1239.Pp 1240A typical RAID 1 setup with root on RAID might be as follows: 1241.Bl -enum 1242.It 1243wd0a - a small partition, which contains a complete, bootable, basic 1244.Nx 1245installation. 1246.It 1247wd1a - also contains a complete, bootable, basic 1248.Nx 1249installation. 1250.It 1251wd0e and wd1e - a RAID 1 set, raid0, used for the root file system. 1252.It 1253wd0f and wd1f - a RAID 1 set, raid1, which will be used only for 1254swap space. 1255.It 1256wd0g and wd1g - a RAID 1 set, raid2, used for 1257.Pa /usr , 1258.Pa /home , 1259or other data, if desired. 1260.It 1261wd0h and wd1h - a RAID 1 set, raid3, if desired. 1262.El 1263.Pp 1264RAID sets raid0, raid1, and raid2 are all marked as auto-configurable. 1265raid0 is marked as being a root file system. 1266When new kernels are installed, the kernel is not only copied to 1267.Pa / , 1268but also to wd0a and wd1a. 1269The kernel on wd0a is required, since that 1270is the kernel the system boots from. 1271The kernel on wd1a is also 1272required, since that will be the kernel used should wd0 fail. 1273The important point here is to have redundant copies of the kernel 1274available, in the event that one of the drives fail. 1275.Pp 1276There is no requirement that the root file system be on the same disk 1277as the kernel. 1278For example, obtaining the kernel from wd0a, and using 1279sd0e and sd1e for raid0, and the root file system, is fine. 1280It 1281.Em is 1282critical, however, that there be multiple kernels available, in the 1283event of media failure. 1284.Pp 1285Multi-layered RAID devices (such as a RAID 0 set made 1286up of RAID 1 sets) are 1287.Em not 1288supported as root devices or auto-configurable devices at this point. 1289(Multi-layered RAID devices 1290.Em are 1291supported in general, however, as mentioned earlier.) 1292Note that in order to enable component auto-detection and 1293auto-configuration of RAID devices, the line: 1294.Bd -literal -offset indent 1295options RAID_AUTOCONFIG 1296.Ed 1297.Pp 1298must be in the kernel configuration file. 1299See 1300.Xr raid 4 1301for more details. 1302.Ss Swapping on RAID 1303A RAID device can be used as a swap device. 1304In order to ensure that a RAID device used as a swap device 1305is correctly unconfigured when the system is shutdown or rebooted, 1306it is recommended that the line 1307.Bd -literal -offset indent 1308swapoff=YES 1309.Ed 1310.Pp 1311be added to 1312.Pa /etc/rc.conf . 1313.Ss Unconfiguration 1314The final operation performed by 1315.Nm 1316is to unconfigure a 1317.Xr raid 4 1318device. 1319This is accomplished via a simple: 1320.Bd -literal -offset indent 1321raidctl -u raid0 1322.Ed 1323.Pp 1324at which point the device is ready to be reconfigured. 1325.Ss Performance Tuning 1326Selection of the various parameter values which result in the best 1327performance can be quite tricky, and often requires a bit of 1328trial-and-error to get those values most appropriate for a given system. 1329A whole range of factors come into play, including: 1330.Bl -enum 1331.It 1332Types of components (e.g., SCSI vs. IDE) and their bandwidth 1333.It 1334Types of controller cards and their bandwidth 1335.It 1336Distribution of components among controllers 1337.It 1338IO bandwidth 1339.It 1340file system access patterns 1341.It 1342CPU speed 1343.El 1344.Pp 1345As with most performance tuning, benchmarking under real-life loads 1346may be the only way to measure expected performance. 1347Understanding some of the underlying technology is also useful in tuning. 1348The goal of this section is to provide pointers to those parameters which may 1349make significant differences in performance. 1350.Pp 1351For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient. 1352Since data in a RAID 1 set is arranged in a linear 1353fashion on each component, selecting an appropriate stripe size is 1354somewhat less critical than it is for a RAID 5 set. 1355However: a stripe size that is too small will cause large IO's to be 1356broken up into a number of smaller ones, hurting performance. 1357At the same time, a large stripe size may cause problems with 1358concurrent accesses to stripes, which may also affect performance. 1359Thus values in the range of 32 to 128 are often the most effective. 1360.Pp 1361Tuning RAID 5 sets is trickier. 1362In the best case, IO is presented to the RAID set one stripe at a time. 1363Since the entire stripe is available at the beginning of the IO, 1364the parity of that stripe can be calculated before the stripe is written, 1365and then the stripe data and parity can be written in parallel. 1366When the amount of data being written is less than a full stripe worth, the 1367.Sq small write 1368problem occurs. 1369Since a 1370.Sq small write 1371means only a portion of the stripe on the components is going to 1372change, the data (and parity) on the components must be updated 1373slightly differently. 1374First, the 1375.Sq old parity 1376and 1377.Sq old data 1378must be read from the components. 1379Then the new parity is constructed, 1380using the new data to be written, and the old data and old parity. 1381Finally, the new data and new parity are written. 1382All this extra data shuffling results in a serious loss of performance, 1383and is typically 2 to 4 times slower than a full stripe write (or read). 1384To combat this problem in the real world, it may be useful 1385to ensure that stripe sizes are small enough that a 1386.Sq large IO 1387from the system will use exactly one large stripe write. 1388As is seen later, there are some file system dependencies 1389which may come into play here as well. 1390.Pp 1391Since the size of a 1392.Sq large IO 1393is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may 1394be desirable to select a SectPerSU value of 16 blocks (8K) or 32 1395blocks (16K). 1396Since there are 4 data sectors per stripe, the maximum 1397data per stripe is 64 blocks (32K) or 128 blocks (64K). 1398Again, empirical measurement will provide the best indicators of which 1399values will yeild better performance. 1400.Pp 1401The parameters used for the file system are also critical to good performance. 1402For 1403.Xr newfs 8 , 1404for example, increasing the block size to 32K or 64K may improve 1405performance dramatically. 1406As well, changing the cylinders-per-group 1407parameter from 16 to 32 or higher is often not only necessary for 1408larger file systems, but may also have positive performance implications. 1409.Ss Summary 1410Despite the length of this man-page, configuring a RAID set is a 1411relatively straight-forward process. 1412All that needs to be done is the following steps: 1413.Bl -enum 1414.It 1415Use 1416.Xr disklabel 8 1417to create the components (of type RAID). 1418.It 1419Construct a RAID configuration file: e.g., 1420.Pa raid0.conf 1421.It 1422Configure the RAID set with: 1423.Bd -literal -offset indent 1424raidctl -C raid0.conf raid0 1425.Ed 1426.Pp 1427.It 1428Initialize the component labels with: 1429.Bd -literal -offset indent 1430raidctl -I 123456 raid0 1431.Ed 1432.Pp 1433.It 1434Initialize other important parts of the set with: 1435.Bd -literal -offset indent 1436raidctl -i raid0 1437.Ed 1438.Pp 1439.It 1440Get the default label for the RAID set: 1441.Bd -literal -offset indent 1442disklabel raid0 \*[Gt] /tmp/label 1443.Ed 1444.Pp 1445.It 1446Edit the label: 1447.Bd -literal -offset indent 1448vi /tmp/label 1449.Ed 1450.Pp 1451.It 1452Put the new label on the RAID set: 1453.Bd -literal -offset indent 1454disklabel -R -r raid0 /tmp/label 1455.Ed 1456.Pp 1457.It 1458Create the file system: 1459.Bd -literal -offset indent 1460newfs /dev/rraid0e 1461.Ed 1462.Pp 1463.It 1464Mount the file system: 1465.Bd -literal -offset indent 1466mount /dev/raid0e /mnt 1467.Ed 1468.Pp 1469.It 1470Use: 1471.Bd -literal -offset indent 1472raidctl -c raid0.conf raid0 1473.Ed 1474.Pp 1475To re-configure the RAID set the next time it is needed, or put 1476.Pa raid0.conf 1477into 1478.Pa /etc 1479where it will automatically be started by the 1480.Pa /etc/rc.d 1481scripts. 1482.El 1483.Sh SEE ALSO 1484.Xr ccd 4 , 1485.Xr raid 4 , 1486.Xr rc 8 1487.Sh HISTORY 1488RAIDframe is a framework for rapid prototyping of RAID structures 1489developed by the folks at the Parallel Data Laboratory at Carnegie 1490Mellon University (CMU). 1491A more complete description of the internals and functionality of 1492RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool 1493for RAID Systems", by William V. Courtright II, Garth Gibson, Mark 1494Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the 1495Parallel Data Laboratory of Carnegie Mellon University. 1496.Pp 1497The 1498.Nm 1499command first appeared as a program in CMU's RAIDframe v1.1 distribution. 1500This version of 1501.Nm 1502is a complete re-write, and first appeared in 1503.Nx 1.4 . 1504.Sh COPYRIGHT 1505.Bd -literal 1506The RAIDframe Copyright is as follows: 1507 1508Copyright (c) 1994-1996 Carnegie-Mellon University. 1509All rights reserved. 1510 1511Permission to use, copy, modify and distribute this software and 1512its documentation is hereby granted, provided that both the copyright 1513notice and this permission notice appear in all copies of the 1514software, derivative works or modified versions, and any portions 1515thereof, and that both notices appear in supporting documentation. 1516 1517CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 1518CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 1519FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 1520 1521Carnegie Mellon requests users of this software to return to 1522 1523 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 1524 School of Computer Science 1525 Carnegie Mellon University 1526 Pittsburgh PA 15213-3890 1527 1528any improvements or extensions that they make and grant Carnegie the 1529rights to redistribute these changes. 1530.Ed 1531.Sh WARNINGS 1532Certain RAID levels (1, 4, 5, 6, and others) can protect against some 1533data loss due to component failure. 1534However the loss of two components of a RAID 4 or 5 system, 1535or the loss of a single component of a RAID 0 system will 1536result in the entire file system being lost. 1537RAID is 1538.Em NOT 1539a substitute for good backup practices. 1540.Pp 1541Recomputation of parity 1542.Em MUST 1543be performed whenever there is a chance that it may have been compromised. 1544This includes after system crashes, or before a RAID 1545device has been used for the first time. 1546Failure to keep parity correct will be catastrophic should a 1547component ever fail \(em it is better to use RAID 0 and get the 1548additional space and speed, than it is to use parity, but 1549not keep the parity correct. 1550At least with RAID 0 there is no perception of increased data security. 1551.Sh BUGS 1552Hot-spare removal is currently not available. 1553