xref: /freebsd-src/sys/contrib/openzfs/man/man7/zpoolconcepts.7 (revision 716fd348e01c5f2ba125f878a634a753436c2994)
13ff01b23SMartin Matuska.\"
23ff01b23SMartin Matuska.\" CDDL HEADER START
33ff01b23SMartin Matuska.\"
43ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the
53ff01b23SMartin Matuska.\" Common Development and Distribution License (the "License").
63ff01b23SMartin Matuska.\" You may not use this file except in compliance with the License.
73ff01b23SMartin Matuska.\"
83ff01b23SMartin Matuska.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
93ff01b23SMartin Matuska.\" or http://www.opensolaris.org/os/licensing.
103ff01b23SMartin Matuska.\" See the License for the specific language governing permissions
113ff01b23SMartin Matuska.\" and limitations under the License.
123ff01b23SMartin Matuska.\"
133ff01b23SMartin Matuska.\" When distributing Covered Code, include this CDDL HEADER in each
143ff01b23SMartin Matuska.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
153ff01b23SMartin Matuska.\" If applicable, add the following below this CDDL HEADER, with the
163ff01b23SMartin Matuska.\" fields enclosed by brackets "[]" replaced with your own identifying
173ff01b23SMartin Matuska.\" information: Portions Copyright [yyyy] [name of copyright owner]
183ff01b23SMartin Matuska.\"
193ff01b23SMartin Matuska.\" CDDL HEADER END
203ff01b23SMartin Matuska.\"
213ff01b23SMartin Matuska.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
223ff01b23SMartin Matuska.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
233ff01b23SMartin Matuska.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
243ff01b23SMartin Matuska.\" Copyright (c) 2017 Datto Inc.
253ff01b23SMartin Matuska.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
263ff01b23SMartin Matuska.\" Copyright 2017 Nexenta Systems, Inc.
273ff01b23SMartin Matuska.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
283ff01b23SMartin Matuska.\"
293ff01b23SMartin Matuska.Dd June 2, 2021
303ff01b23SMartin Matuska.Dt ZPOOLCONCEPTS 7
313ff01b23SMartin Matuska.Os
323ff01b23SMartin Matuska.
333ff01b23SMartin Matuska.Sh NAME
343ff01b23SMartin Matuska.Nm zpoolconcepts
353ff01b23SMartin Matuska.Nd overview of ZFS storage pools
363ff01b23SMartin Matuska.
373ff01b23SMartin Matuska.Sh DESCRIPTION
383ff01b23SMartin Matuska.Ss Virtual Devices (vdevs)
393ff01b23SMartin MatuskaA "virtual device" describes a single device or a collection of devices
403ff01b23SMartin Matuskaorganized according to certain performance and fault characteristics.
413ff01b23SMartin MatuskaThe following virtual devices are supported:
423ff01b23SMartin Matuska.Bl -tag -width "special"
433ff01b23SMartin Matuska.It Sy disk
443ff01b23SMartin MatuskaA block device, typically located under
453ff01b23SMartin Matuska.Pa /dev .
463ff01b23SMartin MatuskaZFS can use individual slices or partitions, though the recommended mode of
473ff01b23SMartin Matuskaoperation is to use whole disks.
483ff01b23SMartin MatuskaA disk can be specified by a full path, or it can be a shorthand name
493ff01b23SMartin Matuska.Po the relative portion of the path under
503ff01b23SMartin Matuska.Pa /dev
513ff01b23SMartin Matuska.Pc .
523ff01b23SMartin MatuskaA whole disk can be specified by omitting the slice or partition designation.
533ff01b23SMartin MatuskaFor example,
543ff01b23SMartin Matuska.Pa sda
553ff01b23SMartin Matuskais equivalent to
563ff01b23SMartin Matuska.Pa /dev/sda .
573ff01b23SMartin MatuskaWhen given a whole disk, ZFS automatically labels the disk, if necessary.
583ff01b23SMartin Matuska.It Sy file
593ff01b23SMartin MatuskaA regular file.
603ff01b23SMartin MatuskaThe use of files as a backing store is strongly discouraged.
613ff01b23SMartin MatuskaIt is designed primarily for experimental purposes, as the fault tolerance of a
623ff01b23SMartin Matuskafile is only as good as the file system on which it resides.
633ff01b23SMartin MatuskaA file must be specified by a full path.
643ff01b23SMartin Matuska.It Sy mirror
653ff01b23SMartin MatuskaA mirror of two or more devices.
663ff01b23SMartin MatuskaData is replicated in an identical fashion across all components of a mirror.
673ff01b23SMartin MatuskaA mirror with
683ff01b23SMartin Matuska.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1
693ff01b23SMartin Matuskadevices failing without losing data.
703ff01b23SMartin Matuska.It Sy raidz , raidz1 , raidz2 , raidz3
713ff01b23SMartin MatuskaA variation on RAID-5 that allows for better distribution of parity and
723ff01b23SMartin Matuskaeliminates the RAID-5
733ff01b23SMartin Matuska.Qq write hole
743ff01b23SMartin Matuska.Pq in which data and parity become inconsistent after a power loss .
753ff01b23SMartin MatuskaData and parity is striped across all disks within a raidz group.
763ff01b23SMartin Matuska.Pp
773ff01b23SMartin MatuskaA raidz group can have single, double, or triple parity, meaning that the
783ff01b23SMartin Matuskaraidz group can sustain one, two, or three failures, respectively, without
793ff01b23SMartin Matuskalosing any data.
803ff01b23SMartin MatuskaThe
813ff01b23SMartin Matuska.Sy raidz1
823ff01b23SMartin Matuskavdev type specifies a single-parity raidz group; the
833ff01b23SMartin Matuska.Sy raidz2
843ff01b23SMartin Matuskavdev type specifies a double-parity raidz group; and the
853ff01b23SMartin Matuska.Sy raidz3
863ff01b23SMartin Matuskavdev type specifies a triple-parity raidz group.
873ff01b23SMartin MatuskaThe
883ff01b23SMartin Matuska.Sy raidz
893ff01b23SMartin Matuskavdev type is an alias for
903ff01b23SMartin Matuska.Sy raidz1 .
913ff01b23SMartin Matuska.Pp
923ff01b23SMartin MatuskaA raidz group with
933ff01b23SMartin Matuska.Em N No disks of size Em X No with Em P No parity disks can hold approximately
943ff01b23SMartin Matuska.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data.
953ff01b23SMartin MatuskaThe minimum number of devices in a raidz group is one more than the number of
963ff01b23SMartin Matuskaparity disks.
973ff01b23SMartin MatuskaThe recommended number is between 3 and 9 to help increase performance.
983ff01b23SMartin Matuska.It Sy draid , draid1 , draid2 , draid3
993ff01b23SMartin MatuskaA variant of raidz that provides integrated distributed hot spares which
1003ff01b23SMartin Matuskaallows for faster resilvering while retaining the benefits of raidz.
1013ff01b23SMartin MatuskaA dRAID vdev is constructed from multiple internal raidz groups, each with
1023ff01b23SMartin Matuska.Em D No data devices and Em P No parity devices.
1033ff01b23SMartin MatuskaThese groups are distributed over all of the children in order to fully
1043ff01b23SMartin Matuskautilize the available disk performance.
1053ff01b23SMartin Matuska.Pp
1063ff01b23SMartin MatuskaUnlike raidz, dRAID uses a fixed stripe width (padding as necessary with
1073ff01b23SMartin Matuskazeros) to allow fully sequential resilvering.
1083ff01b23SMartin MatuskaThis fixed stripe width significantly effects both usable capacity and IOPS.
1093ff01b23SMartin MatuskaFor example, with the default
110*716fd348SMartin Matuska.Em D=8 No and Em 4 KiB No disk sectors the minimum allocation size is Em 32 KiB .
1113ff01b23SMartin MatuskaIf using compression, this relatively large allocation size can reduce the
1123ff01b23SMartin Matuskaeffective compression ratio.
1133ff01b23SMartin MatuskaWhen using ZFS volumes and dRAID, the default of the
1143ff01b23SMartin Matuska.Sy volblocksize
1153ff01b23SMartin Matuskaproperty is increased to account for the allocation size.
1163ff01b23SMartin MatuskaIf a dRAID pool will hold a significant amount of small blocks, it is
1173ff01b23SMartin Matuskarecommended to also add a mirrored
1183ff01b23SMartin Matuska.Sy special
1193ff01b23SMartin Matuskavdev to store those blocks.
1203ff01b23SMartin Matuska.Pp
1213ff01b23SMartin MatuskaIn regards to I/O, performance is similar to raidz since for any read all
1223ff01b23SMartin Matuska.Em D No data disks must be accessed.
1233ff01b23SMartin MatuskaDelivered random IOPS can be reasonably approximated as
1243ff01b23SMartin Matuska.Sy floor((N-S)/(D+P))*single_drive_IOPS .
1253ff01b23SMartin Matuska.Pp
126da5137abSMartin MatuskaLike raidz, a dRAID can have single-, double-, or triple-parity.
1273ff01b23SMartin MatuskaThe
1283ff01b23SMartin Matuska.Sy draid1 ,
1293ff01b23SMartin Matuska.Sy draid2 ,
1303ff01b23SMartin Matuskaand
1313ff01b23SMartin Matuska.Sy draid3
1323ff01b23SMartin Matuskatypes can be used to specify the parity level.
1333ff01b23SMartin MatuskaThe
1343ff01b23SMartin Matuska.Sy draid
1353ff01b23SMartin Matuskavdev type is an alias for
1363ff01b23SMartin Matuska.Sy draid1 .
1373ff01b23SMartin Matuska.Pp
1383ff01b23SMartin MatuskaA dRAID with
1393ff01b23SMartin Matuska.Em N No disks of size Em X , D No data disks per redundancy group, Em P
1403ff01b23SMartin Matuska.No parity level, and Em S No distributed hot spares can hold approximately
1413ff01b23SMartin Matuska.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P
1423ff01b23SMartin Matuskadevices failing without losing data.
1433ff01b23SMartin Matuska.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc
1443ff01b23SMartin MatuskaA non-default dRAID configuration can be specified by appending one or more
1453ff01b23SMartin Matuskaof the following optional arguments to the
1463ff01b23SMartin Matuska.Sy draid
1473ff01b23SMartin Matuskakeyword:
1483ff01b23SMartin Matuska.Bl -tag -compact -width "children"
1493ff01b23SMartin Matuska.It Ar parity
1503ff01b23SMartin MatuskaThe parity level (1-3).
1513ff01b23SMartin Matuska.It Ar data
1523ff01b23SMartin MatuskaThe number of data devices per redundancy group.
1533ff01b23SMartin MatuskaIn general, a smaller value of
1543ff01b23SMartin Matuska.Em D No will increase IOPS, improve the compression ratio,
1553ff01b23SMartin Matuskaand speed up resilvering at the expense of total usable capacity.
1563ff01b23SMartin MatuskaDefaults to
1573ff01b23SMartin Matuska.Em 8 , No unless Em N-P-S No is less than Em 8 .
1583ff01b23SMartin Matuska.It Ar children
1593ff01b23SMartin MatuskaThe expected number of children.
1603ff01b23SMartin MatuskaUseful as a cross-check when listing a large number of devices.
1613ff01b23SMartin MatuskaAn error is returned when the provided number of children differs.
1623ff01b23SMartin Matuska.It Ar spares
1633ff01b23SMartin MatuskaThe number of distributed hot spares.
1643ff01b23SMartin MatuskaDefaults to zero.
1653ff01b23SMartin Matuska.El
1663ff01b23SMartin Matuska.It Sy spare
1673ff01b23SMartin MatuskaA pseudo-vdev which keeps track of available hot spares for a pool.
1683ff01b23SMartin MatuskaFor more information, see the
1693ff01b23SMartin Matuska.Sx Hot Spares
1703ff01b23SMartin Matuskasection.
1713ff01b23SMartin Matuska.It Sy log
1723ff01b23SMartin MatuskaA separate intent log device.
1733ff01b23SMartin MatuskaIf more than one log device is specified, then writes are load-balanced between
1743ff01b23SMartin Matuskadevices.
1753ff01b23SMartin MatuskaLog devices can be mirrored.
1763ff01b23SMartin MatuskaHowever, raidz vdev types are not supported for the intent log.
1773ff01b23SMartin MatuskaFor more information, see the
1783ff01b23SMartin Matuska.Sx Intent Log
1793ff01b23SMartin Matuskasection.
1803ff01b23SMartin Matuska.It Sy dedup
1813ff01b23SMartin MatuskaA device dedicated solely for deduplication tables.
1823ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal
1833ff01b23SMartin Matuskadevices in the pool.
1843ff01b23SMartin MatuskaIf more than one dedup device is specified, then
1853ff01b23SMartin Matuskaallocations are load-balanced between those devices.
1863ff01b23SMartin Matuska.It Sy special
1873ff01b23SMartin MatuskaA device dedicated solely for allocating various kinds of internal metadata,
1883ff01b23SMartin Matuskaand optionally small file blocks.
1893ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal
1903ff01b23SMartin Matuskadevices in the pool.
1913ff01b23SMartin MatuskaIf more than one special device is specified, then
1923ff01b23SMartin Matuskaallocations are load-balanced between those devices.
1933ff01b23SMartin Matuska.Pp
1943ff01b23SMartin MatuskaFor more information on special allocations, see the
1953ff01b23SMartin Matuska.Sx Special Allocation Class
1963ff01b23SMartin Matuskasection.
1973ff01b23SMartin Matuska.It Sy cache
1983ff01b23SMartin MatuskaA device used to cache storage pool data.
1993ff01b23SMartin MatuskaA cache device cannot be configured as a mirror or raidz group.
2003ff01b23SMartin MatuskaFor more information, see the
2013ff01b23SMartin Matuska.Sx Cache Devices
2023ff01b23SMartin Matuskasection.
2033ff01b23SMartin Matuska.El
2043ff01b23SMartin Matuska.Pp
2053ff01b23SMartin MatuskaVirtual devices cannot be nested, so a mirror or raidz virtual device can only
2063ff01b23SMartin Matuskacontain files or disks.
2073ff01b23SMartin MatuskaMirrors of mirrors
2083ff01b23SMartin Matuska.Pq or other combinations
2093ff01b23SMartin Matuskaare not allowed.
2103ff01b23SMartin Matuska.Pp
2113ff01b23SMartin MatuskaA pool can have any number of virtual devices at the top of the configuration
2123ff01b23SMartin Matuska.Po known as
2133ff01b23SMartin Matuska.Qq root vdevs
2143ff01b23SMartin Matuska.Pc .
2153ff01b23SMartin MatuskaData is dynamically distributed across all top-level devices to balance data
2163ff01b23SMartin Matuskaamong devices.
2173ff01b23SMartin MatuskaAs new virtual devices are added, ZFS automatically places data on the newly
2183ff01b23SMartin Matuskaavailable devices.
2193ff01b23SMartin Matuska.Pp
2203ff01b23SMartin MatuskaVirtual devices are specified one at a time on the command line,
2213ff01b23SMartin Matuskaseparated by whitespace.
2223ff01b23SMartin MatuskaKeywords like
2233ff01b23SMartin Matuska.Sy mirror No and Sy raidz
2243ff01b23SMartin Matuskaare used to distinguish where a group ends and another begins.
2253ff01b23SMartin MatuskaFor example, the following creates a pool with two root vdevs,
2263ff01b23SMartin Matuskaeach a mirror of two disks:
2273ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd
2283ff01b23SMartin Matuska.
2293ff01b23SMartin Matuska.Ss Device Failure and Recovery
2303ff01b23SMartin MatuskaZFS supports a rich set of mechanisms for handling device failure and data
2313ff01b23SMartin Matuskacorruption.
2323ff01b23SMartin MatuskaAll metadata and data is checksummed, and ZFS automatically repairs bad data
2333ff01b23SMartin Matuskafrom a good copy when corruption is detected.
2343ff01b23SMartin Matuska.Pp
2353ff01b23SMartin MatuskaIn order to take advantage of these features, a pool must make use of some form
2363ff01b23SMartin Matuskaof redundancy, using either mirrored or raidz groups.
2373ff01b23SMartin MatuskaWhile ZFS supports running in a non-redundant configuration, where each root
2383ff01b23SMartin Matuskavdev is simply a disk or file, this is strongly discouraged.
2393ff01b23SMartin MatuskaA single case of bit corruption can render some or all of your data unavailable.
2403ff01b23SMartin Matuska.Pp
2413ff01b23SMartin MatuskaA pool's health status is described by one of three states:
2423ff01b23SMartin Matuska.Sy online , degraded , No or Sy faulted .
2433ff01b23SMartin MatuskaAn online pool has all devices operating normally.
2443ff01b23SMartin MatuskaA degraded pool is one in which one or more devices have failed, but the data is
2453ff01b23SMartin Matuskastill available due to a redundant configuration.
2463ff01b23SMartin MatuskaA faulted pool has corrupted metadata, or one or more faulted devices, and
2473ff01b23SMartin Matuskainsufficient replicas to continue functioning.
2483ff01b23SMartin Matuska.Pp
2493ff01b23SMartin MatuskaThe health of the top-level vdev, such as a mirror or raidz device,
2503ff01b23SMartin Matuskais potentially impacted by the state of its associated vdevs,
2513ff01b23SMartin Matuskaor component devices.
2523ff01b23SMartin MatuskaA top-level vdev or component device is in one of the following states:
2533ff01b23SMartin Matuska.Bl -tag -width "DEGRADED"
2543ff01b23SMartin Matuska.It Sy DEGRADED
2553ff01b23SMartin MatuskaOne or more top-level vdevs is in the degraded state because one or more
2563ff01b23SMartin Matuskacomponent devices are offline.
2573ff01b23SMartin MatuskaSufficient replicas exist to continue functioning.
2583ff01b23SMartin Matuska.Pp
2593ff01b23SMartin MatuskaOne or more component devices is in the degraded or faulted state, but
2603ff01b23SMartin Matuskasufficient replicas exist to continue functioning.
2613ff01b23SMartin MatuskaThe underlying conditions are as follows:
2623ff01b23SMartin Matuska.Bl -bullet -compact
2633ff01b23SMartin Matuska.It
2643ff01b23SMartin MatuskaThe number of checksum errors exceeds acceptable levels and the device is
2653ff01b23SMartin Matuskadegraded as an indication that something may be wrong.
2663ff01b23SMartin MatuskaZFS continues to use the device as necessary.
2673ff01b23SMartin Matuska.It
2683ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels.
2693ff01b23SMartin MatuskaThe device could not be marked as faulted because there are insufficient
2703ff01b23SMartin Matuskareplicas to continue functioning.
2713ff01b23SMartin Matuska.El
2723ff01b23SMartin Matuska.It Sy FAULTED
2733ff01b23SMartin MatuskaOne or more top-level vdevs is in the faulted state because one or more
2743ff01b23SMartin Matuskacomponent devices are offline.
2753ff01b23SMartin MatuskaInsufficient replicas exist to continue functioning.
2763ff01b23SMartin Matuska.Pp
2773ff01b23SMartin MatuskaOne or more component devices is in the faulted state, and insufficient
2783ff01b23SMartin Matuskareplicas exist to continue functioning.
2793ff01b23SMartin MatuskaThe underlying conditions are as follows:
2803ff01b23SMartin Matuska.Bl -bullet -compact
2813ff01b23SMartin Matuska.It
2823ff01b23SMartin MatuskaThe device could be opened, but the contents did not match expected values.
2833ff01b23SMartin Matuska.It
2843ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels and the device is faulted to
2853ff01b23SMartin Matuskaprevent further use of the device.
2863ff01b23SMartin Matuska.El
2873ff01b23SMartin Matuska.It Sy OFFLINE
2883ff01b23SMartin MatuskaThe device was explicitly taken offline by the
2893ff01b23SMartin Matuska.Nm zpool Cm offline
2903ff01b23SMartin Matuskacommand.
2913ff01b23SMartin Matuska.It Sy ONLINE
2923ff01b23SMartin MatuskaThe device is online and functioning.
2933ff01b23SMartin Matuska.It Sy REMOVED
2943ff01b23SMartin MatuskaThe device was physically removed while the system was running.
2953ff01b23SMartin MatuskaDevice removal detection is hardware-dependent and may not be supported on all
2963ff01b23SMartin Matuskaplatforms.
2973ff01b23SMartin Matuska.It Sy UNAVAIL
2983ff01b23SMartin MatuskaThe device could not be opened.
2993ff01b23SMartin MatuskaIf a pool is imported when a device was unavailable, then the device will be
3003ff01b23SMartin Matuskaidentified by a unique identifier instead of its path since the path was never
3013ff01b23SMartin Matuskacorrect in the first place.
3023ff01b23SMartin Matuska.El
3033ff01b23SMartin Matuska.Pp
3043ff01b23SMartin MatuskaChecksum errors represent events where a disk returned data that was expected
3053ff01b23SMartin Matuskato be correct, but was not.
3063ff01b23SMartin MatuskaIn other words, these are instances of silent data corruption.
3073ff01b23SMartin MatuskaThe checksum errors are reported in
3083ff01b23SMartin Matuska.Nm zpool Cm status
3093ff01b23SMartin Matuskaand
3103ff01b23SMartin Matuska.Nm zpool Cm events .
3113ff01b23SMartin MatuskaWhen a block is stored redundantly, a damaged block may be reconstructed
3123ff01b23SMartin Matuska(e.g. from raidz parity or a mirrored copy).
3133ff01b23SMartin MatuskaIn this case, ZFS reports the checksum error against the disks that contained
3143ff01b23SMartin Matuskadamaged data.
3153ff01b23SMartin MatuskaIf a block is unable to be reconstructed (e.g. due to 3 disks being damaged
3163ff01b23SMartin Matuskain a raidz2 group), it is not possible to determine which disks were silently
3173ff01b23SMartin Matuskacorrupted.
3183ff01b23SMartin MatuskaIn this case, checksum errors are reported for all disks on which the block
3193ff01b23SMartin Matuskais stored.
3203ff01b23SMartin Matuska.Pp
3213ff01b23SMartin MatuskaIf a device is removed and later re-attached to the system,
3223ff01b23SMartin MatuskaZFS attempts online the device automatically.
3233ff01b23SMartin MatuskaDevice attachment detection is hardware-dependent
3243ff01b23SMartin Matuskaand might not be supported on all platforms.
3253ff01b23SMartin Matuska.
3263ff01b23SMartin Matuska.Ss Hot Spares
3273ff01b23SMartin MatuskaZFS allows devices to be associated with pools as
3283ff01b23SMartin Matuska.Qq hot spares .
3293ff01b23SMartin MatuskaThese devices are not actively used in the pool, but when an active device
3303ff01b23SMartin Matuskafails, it is automatically replaced by a hot spare.
3313ff01b23SMartin MatuskaTo create a pool with hot spares, specify a
3323ff01b23SMartin Matuska.Sy spare
3333ff01b23SMartin Matuskavdev with any number of devices.
3343ff01b23SMartin MatuskaFor example,
3353ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd
3363ff01b23SMartin Matuska.Pp
3373ff01b23SMartin MatuskaSpares can be shared across multiple pools, and can be added with the
3383ff01b23SMartin Matuska.Nm zpool Cm add
3393ff01b23SMartin Matuskacommand and removed with the
3403ff01b23SMartin Matuska.Nm zpool Cm remove
3413ff01b23SMartin Matuskacommand.
3423ff01b23SMartin MatuskaOnce a spare replacement is initiated, a new
3433ff01b23SMartin Matuska.Sy spare
3443ff01b23SMartin Matuskavdev is created within the configuration that will remain there until the
3453ff01b23SMartin Matuskaoriginal device is replaced.
3463ff01b23SMartin MatuskaAt this point, the hot spare becomes available again if another device fails.
3473ff01b23SMartin Matuska.Pp
3483ff01b23SMartin MatuskaIf a pool has a shared spare that is currently being used, the pool can not be
3493ff01b23SMartin Matuskaexported since other pools may use this shared spare, which may lead to
3503ff01b23SMartin Matuskapotential data corruption.
3513ff01b23SMartin Matuska.Pp
3523ff01b23SMartin MatuskaShared spares add some risk.
3533ff01b23SMartin MatuskaIf the pools are imported on different hosts,
3543ff01b23SMartin Matuskaand both pools suffer a device failure at the same time,
3553ff01b23SMartin Matuskaboth could attempt to use the spare at the same time.
3563ff01b23SMartin MatuskaThis may not be detected, resulting in data corruption.
3573ff01b23SMartin Matuska.Pp
3583ff01b23SMartin MatuskaAn in-progress spare replacement can be cancelled by detaching the hot spare.
3593ff01b23SMartin MatuskaIf the original faulted device is detached, then the hot spare assumes its
3603ff01b23SMartin Matuskaplace in the configuration, and is removed from the spare list of all active
3613ff01b23SMartin Matuskapools.
3623ff01b23SMartin Matuska.Pp
3633ff01b23SMartin MatuskaThe
3643ff01b23SMartin Matuska.Sy draid
3653ff01b23SMartin Matuskavdev type provides distributed hot spares.
3663ff01b23SMartin MatuskaThese hot spares are named after the dRAID vdev they're a part of
3673ff01b23SMartin Matuska.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 ,
3683ff01b23SMartin Matuska.No which is a single parity dRAID Pc
3693ff01b23SMartin Matuskaand may only be used by that dRAID vdev.
3703ff01b23SMartin MatuskaOtherwise, they behave the same as normal hot spares.
3713ff01b23SMartin Matuska.Pp
3723ff01b23SMartin MatuskaSpares cannot replace log devices.
3733ff01b23SMartin Matuska.
3743ff01b23SMartin Matuska.Ss Intent Log
3753ff01b23SMartin MatuskaThe ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous
3763ff01b23SMartin Matuskatransactions.
3773ff01b23SMartin MatuskaFor instance, databases often require their transactions to be on stable storage
3783ff01b23SMartin Matuskadevices when returning from a system call.
3793ff01b23SMartin MatuskaNFS and other applications can also use
3803ff01b23SMartin Matuska.Xr fsync 2
3813ff01b23SMartin Matuskato ensure data stability.
3823ff01b23SMartin MatuskaBy default, the intent log is allocated from blocks within the main pool.
3833ff01b23SMartin MatuskaHowever, it might be possible to get better performance using separate intent
3843ff01b23SMartin Matuskalog devices such as NVRAM or a dedicated disk.
3853ff01b23SMartin MatuskaFor example:
3863ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc
3873ff01b23SMartin Matuska.Pp
3883ff01b23SMartin MatuskaMultiple log devices can also be specified, and they can be mirrored.
3893ff01b23SMartin MatuskaSee the
3903ff01b23SMartin Matuska.Sx EXAMPLES
3913ff01b23SMartin Matuskasection for an example of mirroring multiple log devices.
3923ff01b23SMartin Matuska.Pp
3933ff01b23SMartin MatuskaLog devices can be added, replaced, attached, detached and removed.
3943ff01b23SMartin MatuskaIn addition, log devices are imported and exported as part of the pool
3953ff01b23SMartin Matuskathat contains them.
3963ff01b23SMartin MatuskaMirrored devices can be removed by specifying the top-level mirror vdev.
3973ff01b23SMartin Matuska.
3983ff01b23SMartin Matuska.Ss Cache Devices
3993ff01b23SMartin MatuskaDevices can be added to a storage pool as
4003ff01b23SMartin Matuska.Qq cache devices .
4013ff01b23SMartin MatuskaThese devices provide an additional layer of caching between main memory and
4023ff01b23SMartin Matuskadisk.
4033ff01b23SMartin MatuskaFor read-heavy workloads, where the working set size is much larger than what
4043ff01b23SMartin Matuskacan be cached in main memory, using cache devices allows much more of this
4053ff01b23SMartin Matuskaworking set to be served from low latency media.
4063ff01b23SMartin MatuskaUsing cache devices provides the greatest performance improvement for random
4073ff01b23SMartin Matuskaread-workloads of mostly static content.
4083ff01b23SMartin Matuska.Pp
4093ff01b23SMartin MatuskaTo create a pool with cache devices, specify a
4103ff01b23SMartin Matuska.Sy cache
4113ff01b23SMartin Matuskavdev with any number of devices.
4123ff01b23SMartin MatuskaFor example:
4133ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd
4143ff01b23SMartin Matuska.Pp
4153ff01b23SMartin MatuskaCache devices cannot be mirrored or part of a raidz configuration.
4163ff01b23SMartin MatuskaIf a read error is encountered on a cache device, that read I/O is reissued to
4173ff01b23SMartin Matuskathe original storage pool device, which might be part of a mirrored or raidz
4183ff01b23SMartin Matuskaconfiguration.
4193ff01b23SMartin Matuska.Pp
4203ff01b23SMartin MatuskaThe content of the cache devices is persistent across reboots and restored
4213ff01b23SMartin Matuskaasynchronously when importing the pool in L2ARC (persistent L2ARC).
4223ff01b23SMartin MatuskaThis can be disabled by setting
4233ff01b23SMartin Matuska.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 .
4243ff01b23SMartin MatuskaFor cache devices smaller than
425*716fd348SMartin Matuska.Em 1 GiB ,
4263ff01b23SMartin Matuskawe do not write the metadata structures
4273ff01b23SMartin Matuskarequired for rebuilding the L2ARC in order not to waste space.
4283ff01b23SMartin MatuskaThis can be changed with
4293ff01b23SMartin Matuska.Sy l2arc_rebuild_blocks_min_l2size .
4303ff01b23SMartin MatuskaThe cache device header
4313ff01b23SMartin Matuska.Pq Em 512 B
4323ff01b23SMartin Matuskais updated even if no metadata structures are written.
4333ff01b23SMartin MatuskaSetting
4343ff01b23SMartin Matuska.Sy l2arc_headroom Ns = Ns Sy 0
4353ff01b23SMartin Matuskawill result in scanning the full-length ARC lists for cacheable content to be
4363ff01b23SMartin Matuskawritten in L2ARC (persistent ARC).
4373ff01b23SMartin MatuskaIf a cache device is added with
4383ff01b23SMartin Matuska.Nm zpool Cm add
4393ff01b23SMartin Matuskaits label and header will be overwritten and its contents are not going to be
4403ff01b23SMartin Matuskarestored in L2ARC, even if the device was previously part of the pool.
4413ff01b23SMartin MatuskaIf a cache device is onlined with
4423ff01b23SMartin Matuska.Nm zpool Cm online
4433ff01b23SMartin Matuskaits contents will be restored in L2ARC.
4443ff01b23SMartin MatuskaThis is useful in case of memory pressure
4453ff01b23SMartin Matuskawhere the contents of the cache device are not fully restored in L2ARC.
4463ff01b23SMartin MatuskaThe user can off- and online the cache device when there is less memory pressure
4473ff01b23SMartin Matuskain order to fully restore its contents to L2ARC.
4483ff01b23SMartin Matuska.
4493ff01b23SMartin Matuska.Ss Pool checkpoint
4503ff01b23SMartin MatuskaBefore starting critical procedures that include destructive actions
4513ff01b23SMartin Matuska.Pq like Nm zfs Cm destroy ,
4523ff01b23SMartin Matuskaan administrator can checkpoint the pool's state and in the case of a
4533ff01b23SMartin Matuskamistake or failure, rewind the entire pool back to the checkpoint.
4543ff01b23SMartin MatuskaOtherwise, the checkpoint can be discarded when the procedure has completed
4553ff01b23SMartin Matuskasuccessfully.
4563ff01b23SMartin Matuska.Pp
4573ff01b23SMartin MatuskaA pool checkpoint can be thought of as a pool-wide snapshot and should be used
4583ff01b23SMartin Matuskawith care as it contains every part of the pool's state, from properties to vdev
4593ff01b23SMartin Matuskaconfiguration.
4603ff01b23SMartin MatuskaThus, certain operations are not allowed while a pool has a checkpoint.
4613ff01b23SMartin MatuskaSpecifically, vdev removal/attach/detach, mirror splitting, and
4623ff01b23SMartin Matuskachanging the pool's GUID.
4633ff01b23SMartin MatuskaAdding a new vdev is supported, but in the case of a rewind it will have to be
4643ff01b23SMartin Matuskaadded again.
4653ff01b23SMartin MatuskaFinally, users of this feature should keep in mind that scrubs in a pool that
4663ff01b23SMartin Matuskahas a checkpoint do not repair checkpointed data.
4673ff01b23SMartin Matuska.Pp
4683ff01b23SMartin MatuskaTo create a checkpoint for a pool:
4693ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Ar pool
4703ff01b23SMartin Matuska.Pp
4713ff01b23SMartin MatuskaTo later rewind to its checkpointed state, you need to first export it and
4723ff01b23SMartin Matuskathen rewind it during import:
4733ff01b23SMartin Matuska.Dl # Nm zpool Cm export Ar pool
4743ff01b23SMartin Matuska.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool
4753ff01b23SMartin Matuska.Pp
4763ff01b23SMartin MatuskaTo discard the checkpoint from a pool:
4773ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Fl d Ar pool
4783ff01b23SMartin Matuska.Pp
4793ff01b23SMartin MatuskaDataset reservations (controlled by the
4803ff01b23SMartin Matuska.Sy reservation No and Sy refreservation
4813ff01b23SMartin Matuskaproperties) may be unenforceable while a checkpoint exists, because the
4823ff01b23SMartin Matuskacheckpoint is allowed to consume the dataset's reservation.
4833ff01b23SMartin MatuskaFinally, data that is part of the checkpoint but has been freed in the
4843ff01b23SMartin Matuskacurrent state of the pool won't be scanned during a scrub.
4853ff01b23SMartin Matuska.
4863ff01b23SMartin Matuska.Ss Special Allocation Class
4873ff01b23SMartin MatuskaAllocations in the special class are dedicated to specific block types.
4883ff01b23SMartin MatuskaBy default this includes all metadata, the indirect blocks of user data, and
4893ff01b23SMartin Matuskaany deduplication tables.
4903ff01b23SMartin MatuskaThe class can also be provisioned to accept small file blocks.
4913ff01b23SMartin Matuska.Pp
4923ff01b23SMartin MatuskaA pool must always have at least one normal
4933ff01b23SMartin Matuska.Pq non- Ns Sy dedup Ns /- Ns Sy special
4943ff01b23SMartin Matuskavdev before
4953ff01b23SMartin Matuskaother devices can be assigned to the special class.
4963ff01b23SMartin MatuskaIf the
4973ff01b23SMartin Matuska.Sy special
4983ff01b23SMartin Matuskaclass becomes full, then allocations intended for it
4993ff01b23SMartin Matuskawill spill back into the normal class.
5003ff01b23SMartin Matuska.Pp
5013ff01b23SMartin MatuskaDeduplication tables can be excluded from the special class by unsetting the
5023ff01b23SMartin Matuska.Sy zfs_ddt_data_is_special
5033ff01b23SMartin MatuskaZFS module parameter.
5043ff01b23SMartin Matuska.Pp
5053ff01b23SMartin MatuskaInclusion of small file blocks in the special class is opt-in.
5063ff01b23SMartin MatuskaEach dataset can control the size of small file blocks allowed
5073ff01b23SMartin Matuskain the special class by setting the
5083ff01b23SMartin Matuska.Sy special_small_blocks
5093ff01b23SMartin Matuskaproperty to nonzero.
5103ff01b23SMartin MatuskaSee
5113ff01b23SMartin Matuska.Xr zfsprops 7
5123ff01b23SMartin Matuskafor more info on this property.
513