xref: /netbsd-src/sbin/raidctl/raidctl.8 (revision d710132b4b8ce7f7cccaaf660cb16aa16b4077a0)
1.\"     $NetBSD: raidctl.8,v 1.35 2003/02/25 10:35:08 wiz Exp $
2.\"
3.\" Copyright (c) 1998, 2002 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Greg Oster
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\"
38.\" Copyright (c) 1995 Carnegie-Mellon University.
39.\" All rights reserved.
40.\"
41.\" Author: Mark Holland
42.\"
43.\" Permission to use, copy, modify and distribute this software and
44.\" its documentation is hereby granted, provided that both the copyright
45.\" notice and this permission notice appear in all copies of the
46.\" software, derivative works or modified versions, and any portions
47.\" thereof, and that both notices appear in supporting documentation.
48.\"
49.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
50.\" CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
51.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
52.\"
53.\" Carnegie Mellon requests users of this software to return to
54.\"
55.\"  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
56.\"  School of Computer Science
57.\"  Carnegie Mellon University
58.\"  Pittsburgh PA 15213-3890
59.\"
60.\" any improvements or extensions that they make and grant Carnegie the
61.\" rights to redistribute these changes.
62.\"
63.Dd July 10, 2001
64.Dt RAIDCTL 8
65.Os
66.Sh NAME
67.Nm raidctl
68.Nd configuration utility for the RAIDframe disk driver
69.Sh SYNOPSIS
70.Nm
71.Op Fl v
72.Fl a Ar component Ar dev
73.Nm
74.Op Fl v
75.Fl A Op yes | no | root
76.Ar dev
77.Nm
78.Op Fl v
79.Fl B Ar dev
80.Nm
81.Op Fl v
82.Fl c Ar config_file Ar dev
83.Nm
84.Op Fl v
85.Fl C Ar config_file Ar dev
86.Nm
87.Op Fl v
88.Fl f Ar component Ar dev
89.Nm
90.Op Fl v
91.Fl F Ar component Ar dev
92.Nm
93.Op Fl v
94.Fl g Ar component Ar dev
95.Nm
96.Op Fl v
97.Fl G Ar dev
98.Nm
99.Op Fl v
100.Fl i Ar dev
101.Nm
102.Op Fl v
103.Fl I Ar serial_number Ar dev
104.Nm
105.Op Fl v
106.Fl p Ar dev
107.Nm
108.Op Fl v
109.Fl P Ar dev
110.Nm
111.Op Fl v
112.Fl r Ar component Ar dev
113.Nm
114.Op Fl v
115.Fl R Ar component Ar dev
116.Nm
117.Op Fl v
118.Fl s Ar dev
119.Nm
120.Op Fl v
121.Fl S Ar dev
122.Nm
123.Op Fl v
124.Fl u Ar dev
125.Sh DESCRIPTION
126.Nm
127is the user-land control program for
128.Xr raid 4 ,
129the RAIDframe disk device.
130.Nm
131is primarily used to dynamically configure and unconfigure RAIDframe disk
132devices.
133For more information about the RAIDframe disk device, see
134.Xr raid 4 .
135.Pp
136This document assumes the reader has at least rudimentary knowledge of
137RAID and RAID concepts.
138.Pp
139The command-line options for
140.Nm
141are as follows:
142.Bl -tag -width indent
143.It Fl a Ar component Ar dev
144Add
145.Ar component
146as a hot spare for the device
147.Ar dev .
148.It Fl A Ic yes Ar dev
149Make the RAID set auto-configurable.
150The RAID set will be automatically configured at boot
151.Ar before
152the root file system is mounted.
153Note that all components of the set must be of type
154.Dv RAID
155in the disklabel.
156.It Fl A Ic no Ar dev
157Turn off auto-configuration for the RAID set.
158.It Fl A Ic root Ar dev
159Make the RAID set auto-configurable, and also mark the set as being
160eligible to be the root partition.
161A RAID set configured this way will
162.Ar override
163the use of the boot disk as the root device.
164All components of the set must be of type
165.Dv RAID
166in the disklabel.
167Note that the kernel being booted must currently reside on a non-RAID set.
168.It Fl B Ar dev
169Initiate a copyback of reconstructed data from a spare disk to
170its original disk.
171This is performed after a component has failed,
172and the failed drive has been reconstructed onto a spare drive.
173.It Fl c Ar config_file Ar dev
174Configure the RAIDframe device
175.Ar dev
176according to the configuration given in
177.Ar config_file .
178A description of the contents of
179.Ar config_file
180is given later.
181.It Fl C Ar config_file Ar dev
182As for
183.Fl c ,
184but forces the configuration to take place.
185This is required the first time a RAID set is configured.
186.It Fl f Ar component Ar dev
187This marks the specified
188.Ar component
189as having failed, but does not initiate a reconstruction of that component.
190.It Fl F Ar component Ar dev
191Fails the specified
192.Ar component
193of the device, and immediately begin a reconstruction of the failed
194disk onto an available hot spare.
195This is one of the mechanisms used to start
196the reconstruction process if a component does have a hardware failure.
197.It Fl g Ar component Ar dev
198Get the component label for the specified component.
199.It Fl G Ar dev
200Generate the configuration of the RAIDframe device in a format suitable for
201use with the
202.Fl c
203or
204.Fl C
205options.
206.It Fl i Ar dev
207Initialize the RAID device.
208In particular, (re-)write the parity on the selected device.
209This
210.Em MUST
211be done for
212.Em all
213RAID sets before the RAID device is labeled and before
214file systems are created on the RAID device.
215.It Fl I Ar serial_number Ar dev
216Initialize the component labels on each component of the device.
217.Ar serial_number
218is used as one of the keys in determining whether a
219particular set of components belong to the same RAID set.
220While not strictly enforced, different serial numbers should be used for
221different RAID sets.
222This step
223.Em MUST
224be performed when a new RAID set is created.
225.It Fl p Ar dev
226Check the status of the parity on the RAID set.
227Displays a status message,
228and returns successfully if the parity is up-to-date.
229.It Fl P Ar dev
230Check the status of the parity on the RAID set, and initialize
231(re-write) the parity if the parity is not known to be up-to-date.
232This is normally used after a system crash (and before a
233.Xr fsck 8 )
234to ensure the integrity of the parity.
235.It Fl r Ar component Ar dev
236Remove the spare disk specified by
237.Ar component
238from the set of available spare components.
239.It Fl R Ar component Ar dev
240Fails the specified
241.Ar component ,
242if necessary, and immediately begins a reconstruction back to
243.Ar component .
244This is useful for reconstructing back onto a component after
245it has been replaced following a failure.
246.It Fl s Ar dev
247Display the status of the RAIDframe device for each of the components
248and spares.
249.It Fl S Ar dev
250Check the status of parity re-writing, component reconstruction, and
251component copyback.
252The output indicates the amount of progress
253achieved in each of these areas.
254.It Fl u Ar dev
255Unconfigure the RAIDframe device.
256.It Fl v
257Be more verbose.
258For operations such as reconstructions, parity
259re-writing, and copybacks, provide a progress indicator.
260.El
261.Pp
262The device used by
263.Nm
264is specified by
265.Ar dev .
266.Ar dev
267may be either the full name of the device, e.g.,
268.Pa /dev/rraid0d ,
269for the i386 architecture, or
270.Pa /dev/rraid0c
271for many others, or just simply
272.Pa raid0
273(for
274.Pa /dev/rraid0[cd] ) .
275It is recommended that the partitions used to represent the
276RAID device are not used for file systems.
277.Ss Configuration file
278The format of the configuration file is complex, and
279only an abbreviated treatment is given here.
280In the configuration files, a
281.Sq #
282indicates the beginning of a comment.
283.Pp
284There are 4 required sections of a configuration file, and 2
285optional sections.
286Each section begins with a
287.Sq START ,
288followed by the section name,
289and the configuration parameters associated with that section.
290The first section is the
291.Sq array
292section, and it specifies
293the number of rows, columns, and spare disks in the RAID set.
294For example:
295.Bd -literal -offset indent
296START array
2971 3 0
298.Ed
299.Pp
300indicates an array with 1 row, 3 columns, and 0 spare disks.
301Note that although multi-dimensional arrays may be specified, they are
302.Em NOT
303supported in the driver.
304.Pp
305The second section, the
306.Sq disks
307section, specifies the actual components of the device.
308For example:
309.Bd -literal -offset indent
310START disks
311/dev/sd0e
312/dev/sd1e
313/dev/sd2e
314.Ed
315.Pp
316specifies the three component disks to be used in the RAID device.
317If any of the specified drives cannot be found when the RAID device is
318configured, then they will be marked as
319.Sq failed ,
320and the system will operate in degraded mode.
321Note that it is
322.Em imperative
323that the order of the components in the configuration file does not
324change between configurations of a RAID device.
325Changing the order of the components will result in data loss
326if the set is configured with the
327.Fl C
328option.
329In normal circumstances, the RAID set will not configure if only
330.Fl c
331is specified, and the components are out-of-order.
332.Pp
333The next section, which is the
334.Sq spare
335section, is optional, and, if present, specifies the devices to be used as
336.Sq hot spares
337\(em devices which are on-line,
338but are not actively used by the RAID driver unless
339one of the main components fail.
340A simple
341.Sq spare
342section might be:
343.Bd -literal -offset indent
344START spare
345/dev/sd3e
346.Ed
347.Pp
348for a configuration with a single spare component.
349If no spare drives are to be used in the configuration, then the
350.Sq spare
351section may be omitted.
352.Pp
353The next section is the
354.Sq layout
355section.
356This section describes the general layout parameters for the RAID device,
357and provides such information as
358sectors per stripe unit,
359stripe units per parity unit,
360stripe units per reconstruction unit,
361and the parity configuration to use.
362This section might look like:
363.Bd -literal -offset indent
364START layout
365# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
36632 1 1 5
367.Ed
368.Pp
369The sectors per stripe unit specifies, in blocks, the interleave
370factor; i.e., the number of contiguous sectors to be written to each
371component for a single stripe.
372Appropriate selection of this value (32 in this example)
373is the subject of much research in RAID architectures.
374The stripe units per parity unit and
375stripe units per reconstruction unit are normally each set to 1.
376While certain values above 1 are permitted, a discussion of valid
377values and the consequences of using anything other than 1 are outside
378the scope of this document.
379The last value in this section (5 in this example)
380indicates the parity configuration desired.
381Valid entries include:
382.Bl -tag -width inde
383.It 0
384RAID level 0.
385No parity, only simple striping.
386.It 1
387RAID level 1.
388Mirroring.
389The parity is the mirror.
390.It 4
391RAID level 4.
392Striping across components, with parity stored on the last component.
393.It 5
394RAID level 5.
395Striping across components, parity distributed across all components.
396.El
397.Pp
398There are other valid entries here, including those for Even-Odd
399parity, RAID level 5 with rotated sparing, Chained declustering,
400and Interleaved declustering, but as of this writing the code for
401those parity operations has not been tested with
402.Nx .
403.Pp
404The next required section is the
405.Sq queue
406section.
407This is most often specified as:
408.Bd -literal -offset indent
409START queue
410fifo 100
411.Ed
412.Pp
413where the queuing method is specified as fifo (first-in, first-out),
414and the size of the per-component queue is limited to 100 requests.
415Other queuing methods may also be specified, but a discussion of them
416is beyond the scope of this document.
417.Pp
418The final section, the
419.Sq debug
420section, is optional.
421For more details on this the reader is referred to
422the RAIDframe documentation discussed in the
423.Sx HISTORY
424section.
425.Pp
426See
427.Sx EXAMPLES
428for a more complete configuration file example.
429.Sh FILES
430.Bl -tag -width /dev/XXrXraidX -compact
431.It Pa /dev/{,r}raid*
432.Cm raid
433device special files.
434.El
435.Sh EXAMPLES
436It is highly recommended that before using the RAID driver for real
437file systems that the system administrator(s) become quite familiar
438with the use of
439.Nm ,
440and that they understand how the component reconstruction process works.
441The examples in this section will focus on configuring a
442number of different RAID sets of varying degrees of redundancy.
443By working through these examples, administrators should be able to
444develop a good feel for how to configure a RAID set, and how to
445initiate reconstruction of failed components.
446.Pp
447In the following examples
448.Sq raid0
449will be used to denote the RAID device.
450Depending on the architecture,
451.Pa /dev/rraid0c
452or
453.Pa /dev/rraid0d
454may be used in place of
455.Pa raid0 .
456.Ss Initialization and Configuration
457The initial step in configuring a RAID set is to identify the components
458that will be used in the RAID set.
459All components should be the same size.
460Each component should have a disklabel type of
461.Dv FS_RAID ,
462and a typical disklabel entry for a RAID component might look like:
463.Bd -literal -offset indent
464f:  1800000  200495     RAID              # (Cyl.  405*- 4041*)
465.Ed
466.Pp
467While
468.Dv FS_BSDFFS
469will also work as the component type, the type
470.Dv FS_RAID
471is preferred for RAIDframe use, as it is required for features such as
472auto-configuration.
473As part of the initial configuration of each RAID set,
474each component will be given a
475.Sq component label .
476A
477.Sq component label
478contains important information about the component, including a
479user-specified serial number, the row and column of that component in
480the RAID set, the redundancy level of the RAID set, a
481.Sq modification counter ,
482and whether the parity information (if any) on that
483component is known to be correct.
484Component labels are an integral part of the RAID set,
485since they are used to ensure that components
486are configured in the correct order, and used to keep track of other
487vital information about the RAID set.
488Component labels are also required for the auto-detection
489and auto-configuration of RAID sets at boot time.
490For a component label to be considered valid, that
491particular component label must be in agreement with the other
492component labels in the set.
493For example, the serial number,
494.Sq modification counter ,
495number of rows and number of columns must all be in agreement.
496If any of these are different, then the component is
497not considered to be part of the set.
498See
499.Xr raid 4
500for more information about component labels.
501.Pp
502Once the components have been identified, and the disks have
503appropriate labels,
504.Nm
505is then used to configure the
506.Xr raid 4
507device.
508To configure the device, a configuration file which looks something like:
509.Bd -literal -offset indent
510START array
511# numRow numCol numSpare
5121 3 1
513
514START disks
515/dev/sd1e
516/dev/sd2e
517/dev/sd3e
518
519START spare
520/dev/sd4e
521
522START layout
523# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
52432 1 1 5
525
526START queue
527fifo 100
528.Ed
529.Pp
530is created in a file.
531The above configuration file specifies a RAID 5
532set consisting of the components
533.Pa /dev/sd1e ,
534.Pa /dev/sd2e ,
535and
536.Pa /dev/sd3e ,
537with
538.Pa /dev/sd4e
539available as a
540.Sq hot spare
541in case one of the three main drives should fail.
542A RAID 0 set would be specified in a similar way:
543.Bd -literal -offset indent
544START array
545# numRow numCol numSpare
5461 4 0
547
548START disks
549/dev/sd10e
550/dev/sd11e
551/dev/sd12e
552/dev/sd13e
553
554START layout
555# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
55664 1 1 0
557
558START queue
559fifo 100
560.Ed
561.Pp
562In this case, devices
563.Pa /dev/sd10e ,
564.Pa /dev/sd11e ,
565.Pa /dev/sd12e ,
566and
567.Pa /dev/sd13e
568are the components that make up this RAID set.
569Note that there are no hot spares for a RAID 0 set,
570since there is no way to recover data if any of the components fail.
571.Pp
572For a RAID 1 (mirror) set, the following configuration might be used:
573.Bd -literal -offset indent
574START array
575# numRow numCol numSpare
5761 2 0
577
578START disks
579/dev/sd20e
580/dev/sd21e
581
582START layout
583# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
584128 1 1 1
585
586START queue
587fifo 100
588.Ed
589.Pp
590In this case,
591.Pa /dev/sd20e
592and
593.Pa /dev/sd21e
594are the two components of the mirror set.
595While no hot spares have been specified in this
596configuration, they easily could be, just as they were specified in
597the RAID 5 case above.
598Note as well that RAID 1 sets are currently limited to only 2 components.
599At present, n-way mirroring is not possible.
600.Pp
601The first time a RAID set is configured, the
602.Fl C
603option must be used:
604.Bd -literal -offset indent
605raidctl -C raid0.conf raid0
606.Ed
607.Pp
608where
609.Pa raid0.conf
610is the name of the RAID configuration file.
611The
612.Fl C
613forces the configuration to succeed, even if any of the component
614labels are incorrect.
615The
616.Fl C
617option should not be used lightly in
618situations other than initial configurations, as if
619the system is refusing to configure a RAID set, there is probably a
620very good reason for it.
621After the initial configuration is done (and
622appropriate component labels are added with the
623.Fl I
624option) then raid0 can be configured normally with:
625.Bd -literal -offset indent
626raidctl -c raid0.conf raid0
627.Ed
628.Pp
629When the RAID set is configured for the first time, it is
630necessary to initialize the component labels, and to initialize the
631parity on the RAID set.
632Initializing the component labels is done with:
633.Bd -literal -offset indent
634raidctl -I 112341 raid0
635.Ed
636.Pp
637where
638.Sq 112341
639is a user-specified serial number for the RAID set.
640This initialization step is
641.Em required
642for all RAID sets.
643As well, using different serial numbers between RAID sets is
644.Em strongly encouraged ,
645as using the same serial number for all RAID sets will only serve to
646decrease the usefulness of the component label checking.
647.Pp
648Initializing the RAID set is done via the
649.Fl i
650option.
651This initialization
652.Em MUST
653be done for
654.Em all
655RAID sets, since among other things it verifies that the parity (if
656any) on the RAID set is correct.
657Since this initialization may be quite time-consuming, the
658.Fl v
659option may be also used in conjunction with
660.Fl i :
661.Bd -literal -offset indent
662raidctl -iv raid0
663.Ed
664.Pp
665This will give more verbose output on the
666status of the initialization:
667.Bd -literal -offset indent
668Initiating re-write of parity
669Parity Re-write status:
670 10% |****                                   | ETA:    06:03 /
671.Ed
672.Pp
673The output provides a
674.Sq Percent Complete
675in both a numeric and graphical format, as well as an estimated time
676to completion of the operation.
677.Pp
678Since it is the parity that provides the
679.Sq redundancy
680part of RAID, it is critical that the parity is correct as much as possible.
681If the parity is not correct, then there is no
682guarantee that data will not be lost if a component fails.
683.Pp
684Once the parity is known to be correct, it is then safe to perform
685.Xr disklabel 8 ,
686.Xr newfs 8 ,
687or
688.Xr fsck 8
689on the device or its file systems, and then to mount the file systems
690for use.
691.Pp
692Under certain circumstances (e.g., the additional component has not
693arrived, or data is being migrated off of a disk destined to become a
694component) it may be desirable to configure a RAID 1 set with only
695a single component.
696This can be achieved by configuring the set with a physically existing
697component (as either the first or second component) and with a
698.Sq fake
699component.
700In the following:
701.Bd -literal -offset indent
702START array
703# numRow numCol numSpare
7041 2 0
705
706START disks
707/dev/sd6e
708/dev/sd0e
709
710START layout
711# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
712128 1 1 1
713
714START queue
715fifo 100
716.Ed
717.Pp
718.Pa /dev/sd0e
719is the real component, and will be the second disk of a RAID 1 set.
720The component
721.Pa /dev/sd6e ,
722which must exist, but have no physical device associated with it,
723is simply used as a placeholder.
724Configuration (using
725.Fl C
726and
727.Fl I Ar 12345
728as above) proceeds normally, but initialization of the RAID set will
729have to wait until all physical components are present.
730After configuration, this set can be used normally, but will be operating
731in degraded mode.
732Once a second physical component is obtained, it can be hot-added,
733the existing data mirrored, and normal operation resumed.
734.Ss Maintenance of the RAID set
735After the parity has been initialized for the first time, the command:
736.Bd -literal -offset indent
737raidctl -p raid0
738.Ed
739.Pp
740can be used to check the current status of the parity.
741To check the parity and rebuild it necessary (for example,
742after an unclean shutdown) the command:
743.Bd -literal -offset indent
744raidctl -P raid0
745.Ed
746.Pp
747is used.
748Note that re-writing the parity can be done while
749other operations on the RAID set are taking place (e.g., while doing a
750.Xr fsck 8
751on a file system on the RAID set).
752However: for maximum effectiveness of the RAID set, the parity should be
753known to be correct before any data on the set is modified.
754.Pp
755To see how the RAID set is doing, the following command can be used to
756show the RAID set's status:
757.Bd -literal -offset indent
758raidctl -s raid0
759.Ed
760.Pp
761The output will look something like:
762.Bd -literal -offset indent
763Components:
764           /dev/sd1e: optimal
765           /dev/sd2e: optimal
766           /dev/sd3e: optimal
767Spares:
768           /dev/sd4e: spare
769Component label for /dev/sd1e:
770   Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
771   Version: 2 Serial Number: 13432 Mod Counter: 65
772   Clean: No Status: 0
773   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
774   RAID Level: 5  blocksize: 512 numBlocks: 1799936
775   Autoconfig: No
776   Last configured as: raid0
777Component label for /dev/sd2e:
778   Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
779   Version: 2 Serial Number: 13432 Mod Counter: 65
780   Clean: No Status: 0
781   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
782   RAID Level: 5  blocksize: 512 numBlocks: 1799936
783   Autoconfig: No
784   Last configured as: raid0
785Component label for /dev/sd3e:
786   Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
787   Version: 2 Serial Number: 13432 Mod Counter: 65
788   Clean: No Status: 0
789   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
790   RAID Level: 5  blocksize: 512 numBlocks: 1799936
791   Autoconfig: No
792   Last configured as: raid0
793Parity status: clean
794Reconstruction is 100% complete.
795Parity Re-write is 100% complete.
796Copyback is 100% complete.
797.Ed
798.Pp
799This indicates that all is well with the RAID set.
800Of importance here are the component lines which read
801.Sq optimal ,
802and the
803.Sq Parity status
804line which indicates that the parity is up-to-date.
805Note that if there are file systems open on the RAID set,
806the individual components will not be
807.Sq clean
808but the set as a whole can still be clean.
809.Pp
810To check the component label of
811.Pa /dev/sd1e ,
812the following is used:
813.Bd -literal -offset indent
814raidctl -g /dev/sd1e raid0
815.Ed
816.Pp
817The output of this command will look something like:
818.Bd -literal -offset indent
819Component label for /dev/sd1e:
820   Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
821   Version: 2 Serial Number: 13432 Mod Counter: 65
822   Clean: No Status: 0
823   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
824   RAID Level: 5  blocksize: 512 numBlocks: 1799936
825   Autoconfig: No
826   Last configured as: raid0
827.Ed
828.Ss Dealing with Component Failures
829If for some reason
830(perhaps to test reconstruction) it is necessary to pretend a drive
831has failed, the following will perform that function:
832.Bd -literal -offset indent
833raidctl -f /dev/sd2e raid0
834.Ed
835.Pp
836The system will then be performing all operations in degraded mode,
837where missing data is re-computed from existing data and the parity.
838In this case, obtaining the status of raid0 will return (in part):
839.Bd -literal -offset indent
840Components:
841           /dev/sd1e: optimal
842           /dev/sd2e: failed
843           /dev/sd3e: optimal
844Spares:
845           /dev/sd4e: spare
846.Ed
847.Pp
848Note that with the use of
849.Fl f
850a reconstruction has not been started.
851To both fail the disk and start a reconstruction, the
852.Fl F
853option must be used:
854.Bd -literal -offset indent
855raidctl -F /dev/sd2e raid0
856.Ed
857.Pp
858The
859.Fl f
860option may be used first, and then the
861.Fl F
862option used later, on the same disk, if desired.
863Immediately after the reconstruction is started, the status will report:
864.Bd -literal -offset indent
865Components:
866           /dev/sd1e: optimal
867           /dev/sd2e: reconstructing
868           /dev/sd3e: optimal
869Spares:
870           /dev/sd4e: used_spare
871[...]
872Parity status: clean
873Reconstruction is 10% complete.
874Parity Re-write is 100% complete.
875Copyback is 100% complete.
876.Ed
877.Pp
878This indicates that a reconstruction is in progress.
879To find out how the reconstruction is progressing the
880.Fl S
881option may be used.
882This will indicate the progress in terms of the
883percentage of the reconstruction that is completed.
884When the reconstruction is finished the
885.Fl s
886option will show:
887.Bd -literal -offset indent
888Components:
889           /dev/sd1e: optimal
890           /dev/sd2e: spared
891           /dev/sd3e: optimal
892Spares:
893           /dev/sd4e: used_spare
894[...]
895Parity status: clean
896Reconstruction is 100% complete.
897Parity Re-write is 100% complete.
898Copyback is 100% complete.
899.Ed
900.Pp
901At this point there are at least two options.
902First, if
903.Pa /dev/sd2e
904is known to be good (i.e., the failure was either caused by
905.Fl f
906or
907.Fl F ,
908or the failed disk was replaced), then a copyback of the data can
909be initiated with the
910.Fl B
911option.
912In this example, this would copy the entire contents of
913.Pa /dev/sd4e
914to
915.Pa /dev/sd2e .
916Once the copyback procedure is complete, the
917status of the device would be (in part):
918.Bd -literal -offset indent
919Components:
920           /dev/sd1e: optimal
921           /dev/sd2e: optimal
922           /dev/sd3e: optimal
923Spares:
924           /dev/sd4e: spare
925.Ed
926.Pp
927and the system is back to normal operation.
928.Pp
929The second option after the reconstruction is to simply use
930.Pa /dev/sd4e
931in place of
932.Pa /dev/sd2e
933in the configuration file.
934For example, the configuration file (in part) might now look like:
935.Bd -literal -offset indent
936START array
9371 3 0
938
939START drives
940/dev/sd1e
941/dev/sd4e
942/dev/sd3e
943.Ed
944.Pp
945This can be done as
946.Pa /dev/sd4e
947is completely interchangeable with
948.Pa /dev/sd2e
949at this point.
950Note that extreme care must be taken when
951changing the order of the drives in a configuration.
952This is one of the few instances where the devices and/or
953their orderings can be changed without loss of data!
954In general, the ordering of components in a configuration file should
955.Em never
956be changed.
957.Pp
958If a component fails and there are no hot spares
959available on-line, the status of the RAID set might (in part) look like:
960.Bd -literal -offset indent
961Components:
962           /dev/sd1e: optimal
963           /dev/sd2e: failed
964           /dev/sd3e: optimal
965No spares.
966.Ed
967.Pp
968In this case there are a number of options.
969The first option is to add a hot spare using:
970.Bd -literal -offset indent
971raidctl -a /dev/sd4e raid0
972.Ed
973.Pp
974After the hot add, the status would then be:
975.Bd -literal -offset indent
976Components:
977           /dev/sd1e: optimal
978           /dev/sd2e: failed
979           /dev/sd3e: optimal
980Spares:
981           /dev/sd4e: spare
982.Ed
983.Pp
984Reconstruction could then take place using
985.Fl F
986as describe above.
987.Pp
988A second option is to rebuild directly onto
989.Pa /dev/sd2e .
990Once the disk containing
991.Pa /dev/sd2e
992has been replaced, one can simply use:
993.Bd -literal -offset indent
994raidctl -R /dev/sd2e raid0
995.Ed
996.Pp
997to rebuild the
998.Pa /dev/sd2e
999component.
1000As the rebuilding is in progress, the status will be:
1001.Bd -literal -offset indent
1002Components:
1003           /dev/sd1e: optimal
1004           /dev/sd2e: reconstructing
1005           /dev/sd3e: optimal
1006No spares.
1007.Ed
1008.Pp
1009and when completed, will be:
1010.Bd -literal -offset indent
1011Components:
1012           /dev/sd1e: optimal
1013           /dev/sd2e: optimal
1014           /dev/sd3e: optimal
1015No spares.
1016.Ed
1017.Pp
1018In circumstances where a particular component is completely
1019unavailable after a reboot, a special component name will be used to
1020indicate the missing component.
1021For example:
1022.Bd -literal -offset indent
1023Components:
1024           /dev/sd2e: optimal
1025          component1: failed
1026No spares.
1027.Ed
1028.Pp
1029indicates that the second component of this RAID set was not detected
1030at all by the auto-configuration code.
1031The name
1032.Sq component1
1033can be used anywhere a normal component name would be used.
1034For example, to add a hot spare to the above set, and rebuild to that hot
1035spare, the following could be done:
1036.Bd -literal -offset indent
1037raidctl -a /dev/sd3e raid0
1038raidctl -F component1 raid0
1039.Ed
1040.Pp
1041at which point the data missing from
1042.Sq component1
1043would be reconstructed onto
1044.Pa /dev/sd3e .
1045.Pp
1046When more than one component is marked as
1047.Sq failed
1048due to a non-component hardware failure (e.g., loss of power to two
1049components, adapter problems, termination problems, or cabling issues) it
1050is quite possible to recover the data on the RAID set.
1051The first thing to be aware of is that the first disk to fail will
1052almost certainly be out-of-sync with the remainder of the array.
1053If any IO was performed between the time the first component is considered
1054.Sq failed
1055and when the second component is considered
1056.Sq failed ,
1057then the first component to fail will
1058.Em not
1059contain correct data, and should be ignored.
1060When the second component is marked as failed, however, the RAID device will
1061(currently) panic the system.
1062At this point the data on the RAID set
1063(not including the first failed component) is still self consistent,
1064and will be in no worse state of repair than had the power gone out in
1065the middle of a write to a file system on a non-RAID device.
1066The problem, however, is that the component labels may now have 3 different
1067.Sq modification counters
1068(one value on the first component that failed, one value on the second
1069component that failed, and a third value on the remaining components).
1070In such a situation, the RAID set will not autoconfigure,
1071and can only be forcibly re-configured
1072with the
1073.Fl C
1074option.
1075To recover the RAID set, one must first remedy whatever physical
1076problem caused the multiple-component failure.
1077After that is done, the RAID set can be restored by forcibly
1078configuring the raid set
1079.Em without
1080the component that failed first.
1081For example, if
1082.Pa /dev/sd1e
1083and
1084.Pa /dev/sd2e
1085fail (in that order) in a RAID set of the following configuration:
1086.Bd -literal -offset indent
1087START array
10881 4 0
1089
1090START drives
1091/dev/sd1e
1092/dev/sd2e
1093/dev/sd3e
1094/dev/sd4e
1095
1096START layout
1097# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
109864 1 1 5
1099
1100START queue
1101fifo 100
1102
1103.Ed
1104.Pp
1105then the following configuration (say "recover_raid0.conf")
1106.Bd -literal -offset indent
1107START array
11081 4 0
1109
1110START drives
1111/dev/sd6e
1112/dev/sd2e
1113/dev/sd3e
1114/dev/sd4e
1115
1116START layout
1117# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
111864 1 1 5
1119
1120START queue
1121fifo 100
1122.Ed
1123.Pp
1124(where
1125.Pa /dev/sd6e
1126has no physical device) can be used with
1127.Bd -literal -offset indent
1128raidctl -C recover_raid0.conf raid0
1129.Ed
1130.Pp
1131to force the configuration of raid0.
1132A
1133.Bd -literal -offset indent
1134raidctl -I 12345 raid0
1135.Ed
1136.Pp
1137will be required in order to synchronize the component labels.
1138At this point the file systems on the RAID set can then be checked and
1139corrected.
1140To complete the re-construction of the RAID set,
1141.Pa /dev/sd1e
1142is simply hot-added back into the array, and reconstructed
1143as described earlier.
1144.Ss RAID on RAID
1145RAID sets can be layered to create more complex and much larger RAID sets.
1146A RAID 0 set, for example, could be constructed from four RAID 5 sets.
1147The following configuration file shows such a setup:
1148.Bd -literal -offset indent
1149START array
1150# numRow numCol numSpare
11511 4 0
1152
1153START disks
1154/dev/raid1e
1155/dev/raid2e
1156/dev/raid3e
1157/dev/raid4e
1158
1159START layout
1160# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
1161128 1 1 0
1162
1163START queue
1164fifo 100
1165.Ed
1166.Pp
1167A similar configuration file might be used for a RAID 0 set
1168constructed from components on RAID 1 sets.
1169In such a configuration, the mirroring provides a high degree
1170of redundancy, while the striping provides additional speed benefits.
1171.Ss Auto-configuration and Root on RAID
1172RAID sets can also be auto-configured at boot.
1173To make a set auto-configurable,
1174simply prepare the RAID set as above, and then do a:
1175.Bd -literal -offset indent
1176raidctl -A yes raid0
1177.Ed
1178.Pp
1179to turn on auto-configuration for that set.
1180To turn off auto-configuration, use:
1181.Bd -literal -offset indent
1182raidctl -A no raid0
1183.Ed
1184.Pp
1185RAID sets which are auto-configurable will be configured before the
1186root file system is mounted.
1187These RAID sets are thus available for
1188use as a root file system, or for any other file system.
1189A primary advantage of using the auto-configuration is that RAID components
1190become more independent of the disks they reside on.
1191For example, SCSI ID's can change, but auto-configured sets will always be
1192configured correctly, even if the SCSI ID's of the component disks
1193have become scrambled.
1194.Pp
1195Having a system's root file system
1196.Pq Pa /
1197on a RAID set is also allowed, with the
1198.Sq a
1199partition of such a RAID set being used for
1200.Pa / .
1201To use raid0a as the root file system, simply use:
1202.Bd -literal -offset indent
1203raidctl -A root raid0
1204.Ed
1205.Pp
1206To return raid0a to be just an auto-configuring set simply use the
1207.Fl A Ar yes
1208arguments.
1209.Pp
1210Note that kernels can only be directly read from RAID 1 components on
1211alpha and pmax architectures.
1212On those architectures, the
1213.Dv FS_RAID
1214file system is recognized by the bootblocks, and will properly load the
1215kernel directly from a RAID 1 component.
1216For other architectures, or to support the root file system
1217on other RAID sets, some other mechanism must be used to get a kernel booting.
1218For example, a small partition containing only the secondary boot-blocks
1219and an alternate kernel (or two) could be used.
1220Once a kernel is booting however, and an auto-configuring RAID set is
1221found that is eligible to be root, then that RAID set will be
1222auto-configured and used as the root device.
1223If two or more RAID sets claim to be root devices, then the
1224user will be prompted to select the root device.
1225At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices.
1226.Pp
1227A typical RAID 1 setup with root on RAID might be as follows:
1228.Bl -enum
1229.It
1230wd0a - a small partition, which contains a complete, bootable, basic
1231.Nx
1232installation.
1233.It
1234wd1a - also contains a complete, bootable, basic
1235.Nx
1236installation.
1237.It
1238wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.
1239.It
1240wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
1241swap space.
1242.It
1243wd0g and wd1g - a RAID 1 set, raid2, used for
1244.Pa /usr ,
1245.Pa /home ,
1246or other data, if desired.
1247.It
1248wd0h and wd0h - a RAID 1 set, raid3, if desired.
1249.El
1250.Pp
1251RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.
1252raid0 is marked as being a root file system.
1253When new kernels are installed, the kernel is not only copied to
1254.Pa / ,
1255but also to wd0a and wd1a.
1256The kernel on wd0a is required, since that
1257is the kernel the system boots from.
1258The kernel on wd1a is also
1259required, since that will be the kernel used should wd0 fail.
1260The important point here is to have redundant copies of the kernel
1261available, in the event that one of the drives fail.
1262.Pp
1263There is no requirement that the root file system be on the same disk
1264as the kernel.
1265For example, obtaining the kernel from wd0a, and using
1266sd0e and sd1e for raid0, and the root file system, is fine.
1267It
1268.Em is
1269critical, however, that there be multiple kernels available, in the
1270event of media failure.
1271.Pp
1272Multi-layered RAID devices (such as a RAID 0 set made
1273up of RAID 1 sets) are
1274.Em not
1275supported as root devices or auto-configurable devices at this point.
1276(Multi-layered RAID devices
1277.Em are
1278supported in general, however, as mentioned earlier.)
1279Note that in order to enable component auto-detection and
1280auto-configuration of RAID devices, the line:
1281.Bd -literal -offset indent
1282options    RAID_AUTOCONFIG
1283.Ed
1284.Pp
1285must be in the kernel configuration file.
1286See
1287.Xr raid 4
1288for more details.
1289.Ss Swapping on RAID
1290A RAID device can be used as a swap device.
1291In order to ensure that a RAID device used as a swap device
1292is correctly unconfigured when the system is shutdown or rebooted,
1293it is recommended that the line
1294.Bd -literal -offset indent
1295swapoff=YES
1296.Ed
1297.Pp
1298be added to
1299.Pa /etc/rc.conf .
1300.Ss Unconfiguration
1301The final operation performed by
1302.Nm
1303is to unconfigure a
1304.Xr raid 4
1305device.
1306This is accomplished via a simple:
1307.Bd -literal -offset indent
1308raidctl -u raid0
1309.Ed
1310.Pp
1311at which point the device is ready to be reconfigured.
1312.Ss Performance Tuning
1313Selection of the various parameter values which result in the best
1314performance can be quite tricky, and often requires a bit of
1315trial-and-error to get those values most appropriate for a given system.
1316A whole range of factors come into play, including:
1317.Bl -enum
1318.It
1319Types of components (e.g., SCSI vs. IDE) and their bandwidth
1320.It
1321Types of controller cards and their bandwidth
1322.It
1323Distribution of components among controllers
1324.It
1325IO bandwidth
1326.It
1327file system access patterns
1328.It
1329CPU speed
1330.El
1331.Pp
1332As with most performance tuning, benchmarking under real-life loads
1333may be the only way to measure expected performance.
1334Understanding some of the underlying technology is also useful in tuning.
1335The goal of this section is to provide pointers to those parameters which may
1336make significant differences in performance.
1337.Pp
1338For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
1339Since data in a RAID 1 set is arranged in a linear
1340fashion on each component, selecting an appropriate stripe size is
1341somewhat less critical than it is for a RAID 5 set.
1342However: a stripe size that is too small will cause large IO's to be
1343broken up into a number of smaller ones, hurting performance.
1344At the same time, a large stripe size may cause problems with
1345concurrent accesses to stripes, which may also affect performance.
1346Thus values in the range of 32 to 128 are often the most effective.
1347.Pp
1348Tuning RAID 5 sets is trickier.
1349In the best case, IO is presented to the RAID set one stripe at a time.
1350Since the entire stripe is available at the beginning of the IO,
1351the parity of that stripe can be calculated before the stripe is written,
1352and then the stripe data and parity can be written in parallel.
1353When the amount of data being written is less than a full stripe worth, the
1354.Sq small write
1355problem occurs.
1356Since a
1357.Sq small write
1358means only a portion of the stripe on the components is going to
1359change, the data (and parity) on the components must be updated
1360slightly differently.
1361First, the
1362.Sq old parity
1363and
1364.Sq old data
1365must be read from the components.
1366Then the new parity is constructed,
1367using the new data to be written, and the old data and old parity.
1368Finally, the new data and new parity are written.
1369All this extra data shuffling results in a serious loss of performance,
1370and is typically 2 to 4 times slower than a full stripe write (or read).
1371To combat this problem in the real world, it may be useful
1372to ensure that stripe sizes are small enough that a
1373.Sq large IO
1374from the system will use exactly one large stripe write.
1375As is seen later, there are some file system dependencies
1376which may come into play here as well.
1377.Pp
1378Since the size of a
1379.Sq large IO
1380is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
1381be desirable to select a SectPerSU value of 16 blocks (8K) or 32
1382blocks (16K).
1383Since there are 4 data sectors per stripe, the maximum
1384data per stripe is 64 blocks (32K) or 128 blocks (64K).
1385Again, empirical measurement will provide the best indicators of which
1386values will yeild better performance.
1387.Pp
1388The parameters used for the file system are also critical to good performance.
1389For
1390.Xr newfs 8 ,
1391for example, increasing the block size to 32K or 64K may improve
1392performance dramatically.
1393As well, changing the cylinders-per-group
1394parameter from 16 to 32 or higher is often not only necessary for
1395larger file systems, but may also have positive performance implications.
1396.Ss Summary
1397Despite the length of this man-page, configuring a RAID set is a
1398relatively straight-forward process.
1399All that needs to be done is the following steps:
1400.Bl -enum
1401.It
1402Use
1403.Xr disklabel 8
1404to create the components (of type RAID).
1405.It
1406Construct a RAID configuration file: e.g.,
1407.Pa raid0.conf
1408.It
1409Configure the RAID set with:
1410.Bd -literal -offset indent
1411raidctl -C raid0.conf raid0
1412.Ed
1413.Pp
1414.It
1415Initialize the component labels with:
1416.Bd -literal -offset indent
1417raidctl -I 123456 raid0
1418.Ed
1419.Pp
1420.It
1421Initialize other important parts of the set with:
1422.Bd -literal -offset indent
1423raidctl -i raid0
1424.Ed
1425.Pp
1426.It
1427Get the default label for the RAID set:
1428.Bd -literal -offset indent
1429disklabel raid0 \*[Gt] /tmp/label
1430.Ed
1431.Pp
1432.It
1433Edit the label:
1434.Bd -literal -offset indent
1435vi /tmp/label
1436.Ed
1437.Pp
1438.It
1439Put the new label on the RAID set:
1440.Bd -literal -offset indent
1441disklabel -R -r raid0 /tmp/label
1442.Ed
1443.Pp
1444.It
1445Create the file system:
1446.Bd -literal -offset indent
1447newfs /dev/rraid0e
1448.Ed
1449.Pp
1450.It
1451Mount the file system:
1452.Bd -literal -offset indent
1453mount /dev/raid0e /mnt
1454.Ed
1455.Pp
1456.It
1457Use:
1458.Bd -literal -offset indent
1459raidctl -c raid0.conf raid0
1460.Ed
1461.Pp
1462To re-configure the RAID set the next time it is needed, or put
1463.Pa raid0.conf
1464into
1465.Pa /etc
1466where it will automatically be started by the
1467.Pa /etc/rc.d
1468scripts.
1469.El
1470.Sh SEE ALSO
1471.Xr ccd 4 ,
1472.Xr raid 4 ,
1473.Xr rc 8
1474.Sh HISTORY
1475RAIDframe is a framework for rapid prototyping of RAID structures
1476developed by the folks at the Parallel Data Laboratory at Carnegie
1477Mellon University (CMU).
1478A more complete description of the internals and functionality of
1479RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
1480for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
1481Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
1482Parallel Data Laboratory of Carnegie Mellon University.
1483.Pp
1484The
1485.Nm
1486command first appeared as a program in CMU's RAIDframe v1.1 distribution.
1487This version of
1488.Nm
1489is a complete re-write, and first appeared in
1490.Nx 1.4 .
1491.Sh COPYRIGHT
1492.Bd -literal
1493The RAIDframe Copyright is as follows:
1494
1495Copyright (c) 1994-1996 Carnegie-Mellon University.
1496All rights reserved.
1497
1498Permission to use, copy, modify and distribute this software and
1499its documentation is hereby granted, provided that both the copyright
1500notice and this permission notice appear in all copies of the
1501software, derivative works or modified versions, and any portions
1502thereof, and that both notices appear in supporting documentation.
1503
1504CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
1505CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
1506FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
1507
1508Carnegie Mellon requests users of this software to return to
1509
1510 Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
1511 School of Computer Science
1512 Carnegie Mellon University
1513 Pittsburgh PA 15213-3890
1514
1515any improvements or extensions that they make and grant Carnegie the
1516rights to redistribute these changes.
1517.Ed
1518.Sh WARNINGS
1519Certain RAID levels (1, 4, 5, 6, and others) can protect against some
1520data loss due to component failure.
1521However the loss of two components of a RAID 4 or 5 system,
1522or the loss of a single component of a RAID 0 system will
1523result in the entire file system being lost.
1524RAID is
1525.Em NOT
1526a substitute for good backup practices.
1527.Pp
1528Recomputation of parity
1529.Em MUST
1530be performed whenever there is a chance that it may have been compromised.
1531This includes after system crashes, or before a RAID
1532device has been used for the first time.
1533Failure to keep parity correct will be catastrophic should a
1534component ever fail \(em it is better to use RAID 0 and get the
1535additional space and speed, than it is to use parity, but
1536not keep the parity correct.
1537At least with RAID 0 there is no perception of increased data security.
1538.Sh BUGS
1539Hot-spare removal is currently not available.
1540