13ff01b23SMartin Matuska.\" 23ff01b23SMartin Matuska.\" CDDL HEADER START 33ff01b23SMartin Matuska.\" 43ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the 53ff01b23SMartin Matuska.\" Common Development and Distribution License (the "License"). 63ff01b23SMartin Matuska.\" You may not use this file except in compliance with the License. 73ff01b23SMartin Matuska.\" 83ff01b23SMartin Matuska.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 93ff01b23SMartin Matuska.\" or http://www.opensolaris.org/os/licensing. 103ff01b23SMartin Matuska.\" See the License for the specific language governing permissions 113ff01b23SMartin Matuska.\" and limitations under the License. 123ff01b23SMartin Matuska.\" 133ff01b23SMartin Matuska.\" When distributing Covered Code, include this CDDL HEADER in each 143ff01b23SMartin Matuska.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 153ff01b23SMartin Matuska.\" If applicable, add the following below this CDDL HEADER, with the 163ff01b23SMartin Matuska.\" fields enclosed by brackets "[]" replaced with your own identifying 173ff01b23SMartin Matuska.\" information: Portions Copyright [yyyy] [name of copyright owner] 183ff01b23SMartin Matuska.\" 193ff01b23SMartin Matuska.\" CDDL HEADER END 203ff01b23SMartin Matuska.\" 213ff01b23SMartin Matuska.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved. 223ff01b23SMartin Matuska.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved. 233ff01b23SMartin Matuska.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved. 243ff01b23SMartin Matuska.\" Copyright (c) 2017 Datto Inc. 253ff01b23SMartin Matuska.\" Copyright (c) 2018 George Melikov. All Rights Reserved. 263ff01b23SMartin Matuska.\" Copyright 2017 Nexenta Systems, Inc. 273ff01b23SMartin Matuska.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved. 283ff01b23SMartin Matuska.\" 293ff01b23SMartin Matuska.Dd June 2, 2021 303ff01b23SMartin Matuska.Dt ZPOOLCONCEPTS 7 313ff01b23SMartin Matuska.Os 323ff01b23SMartin Matuska. 333ff01b23SMartin Matuska.Sh NAME 343ff01b23SMartin Matuska.Nm zpoolconcepts 353ff01b23SMartin Matuska.Nd overview of ZFS storage pools 363ff01b23SMartin Matuska. 373ff01b23SMartin Matuska.Sh DESCRIPTION 383ff01b23SMartin Matuska.Ss Virtual Devices (vdevs) 393ff01b23SMartin MatuskaA "virtual device" describes a single device or a collection of devices 403ff01b23SMartin Matuskaorganized according to certain performance and fault characteristics. 413ff01b23SMartin MatuskaThe following virtual devices are supported: 423ff01b23SMartin Matuska.Bl -tag -width "special" 433ff01b23SMartin Matuska.It Sy disk 443ff01b23SMartin MatuskaA block device, typically located under 453ff01b23SMartin Matuska.Pa /dev . 463ff01b23SMartin MatuskaZFS can use individual slices or partitions, though the recommended mode of 473ff01b23SMartin Matuskaoperation is to use whole disks. 483ff01b23SMartin MatuskaA disk can be specified by a full path, or it can be a shorthand name 493ff01b23SMartin Matuska.Po the relative portion of the path under 503ff01b23SMartin Matuska.Pa /dev 513ff01b23SMartin Matuska.Pc . 523ff01b23SMartin MatuskaA whole disk can be specified by omitting the slice or partition designation. 533ff01b23SMartin MatuskaFor example, 543ff01b23SMartin Matuska.Pa sda 553ff01b23SMartin Matuskais equivalent to 563ff01b23SMartin Matuska.Pa /dev/sda . 573ff01b23SMartin MatuskaWhen given a whole disk, ZFS automatically labels the disk, if necessary. 583ff01b23SMartin Matuska.It Sy file 593ff01b23SMartin MatuskaA regular file. 603ff01b23SMartin MatuskaThe use of files as a backing store is strongly discouraged. 613ff01b23SMartin MatuskaIt is designed primarily for experimental purposes, as the fault tolerance of a 623ff01b23SMartin Matuskafile is only as good as the file system on which it resides. 633ff01b23SMartin MatuskaA file must be specified by a full path. 643ff01b23SMartin Matuska.It Sy mirror 653ff01b23SMartin MatuskaA mirror of two or more devices. 663ff01b23SMartin MatuskaData is replicated in an identical fashion across all components of a mirror. 673ff01b23SMartin MatuskaA mirror with 683ff01b23SMartin Matuska.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1 693ff01b23SMartin Matuskadevices failing without losing data. 703ff01b23SMartin Matuska.It Sy raidz , raidz1 , raidz2 , raidz3 713ff01b23SMartin MatuskaA variation on RAID-5 that allows for better distribution of parity and 723ff01b23SMartin Matuskaeliminates the RAID-5 733ff01b23SMartin Matuska.Qq write hole 743ff01b23SMartin Matuska.Pq in which data and parity become inconsistent after a power loss . 753ff01b23SMartin MatuskaData and parity is striped across all disks within a raidz group. 763ff01b23SMartin Matuska.Pp 773ff01b23SMartin MatuskaA raidz group can have single, double, or triple parity, meaning that the 783ff01b23SMartin Matuskaraidz group can sustain one, two, or three failures, respectively, without 793ff01b23SMartin Matuskalosing any data. 803ff01b23SMartin MatuskaThe 813ff01b23SMartin Matuska.Sy raidz1 823ff01b23SMartin Matuskavdev type specifies a single-parity raidz group; the 833ff01b23SMartin Matuska.Sy raidz2 843ff01b23SMartin Matuskavdev type specifies a double-parity raidz group; and the 853ff01b23SMartin Matuska.Sy raidz3 863ff01b23SMartin Matuskavdev type specifies a triple-parity raidz group. 873ff01b23SMartin MatuskaThe 883ff01b23SMartin Matuska.Sy raidz 893ff01b23SMartin Matuskavdev type is an alias for 903ff01b23SMartin Matuska.Sy raidz1 . 913ff01b23SMartin Matuska.Pp 923ff01b23SMartin MatuskaA raidz group with 933ff01b23SMartin Matuska.Em N No disks of size Em X No with Em P No parity disks can hold approximately 943ff01b23SMartin Matuska.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data. 953ff01b23SMartin MatuskaThe minimum number of devices in a raidz group is one more than the number of 963ff01b23SMartin Matuskaparity disks. 973ff01b23SMartin MatuskaThe recommended number is between 3 and 9 to help increase performance. 983ff01b23SMartin Matuska.It Sy draid , draid1 , draid2 , draid3 993ff01b23SMartin MatuskaA variant of raidz that provides integrated distributed hot spares which 1003ff01b23SMartin Matuskaallows for faster resilvering while retaining the benefits of raidz. 1013ff01b23SMartin MatuskaA dRAID vdev is constructed from multiple internal raidz groups, each with 1023ff01b23SMartin Matuska.Em D No data devices and Em P No parity devices. 1033ff01b23SMartin MatuskaThese groups are distributed over all of the children in order to fully 1043ff01b23SMartin Matuskautilize the available disk performance. 1053ff01b23SMartin Matuska.Pp 1063ff01b23SMartin MatuskaUnlike raidz, dRAID uses a fixed stripe width (padding as necessary with 1073ff01b23SMartin Matuskazeros) to allow fully sequential resilvering. 1083ff01b23SMartin MatuskaThis fixed stripe width significantly effects both usable capacity and IOPS. 1093ff01b23SMartin MatuskaFor example, with the default 110*716fd348SMartin Matuska.Em D=8 No and Em 4 KiB No disk sectors the minimum allocation size is Em 32 KiB . 1113ff01b23SMartin MatuskaIf using compression, this relatively large allocation size can reduce the 1123ff01b23SMartin Matuskaeffective compression ratio. 1133ff01b23SMartin MatuskaWhen using ZFS volumes and dRAID, the default of the 1143ff01b23SMartin Matuska.Sy volblocksize 1153ff01b23SMartin Matuskaproperty is increased to account for the allocation size. 1163ff01b23SMartin MatuskaIf a dRAID pool will hold a significant amount of small blocks, it is 1173ff01b23SMartin Matuskarecommended to also add a mirrored 1183ff01b23SMartin Matuska.Sy special 1193ff01b23SMartin Matuskavdev to store those blocks. 1203ff01b23SMartin Matuska.Pp 1213ff01b23SMartin MatuskaIn regards to I/O, performance is similar to raidz since for any read all 1223ff01b23SMartin Matuska.Em D No data disks must be accessed. 1233ff01b23SMartin MatuskaDelivered random IOPS can be reasonably approximated as 1243ff01b23SMartin Matuska.Sy floor((N-S)/(D+P))*single_drive_IOPS . 1253ff01b23SMartin Matuska.Pp 126da5137abSMartin MatuskaLike raidz, a dRAID can have single-, double-, or triple-parity. 1273ff01b23SMartin MatuskaThe 1283ff01b23SMartin Matuska.Sy draid1 , 1293ff01b23SMartin Matuska.Sy draid2 , 1303ff01b23SMartin Matuskaand 1313ff01b23SMartin Matuska.Sy draid3 1323ff01b23SMartin Matuskatypes can be used to specify the parity level. 1333ff01b23SMartin MatuskaThe 1343ff01b23SMartin Matuska.Sy draid 1353ff01b23SMartin Matuskavdev type is an alias for 1363ff01b23SMartin Matuska.Sy draid1 . 1373ff01b23SMartin Matuska.Pp 1383ff01b23SMartin MatuskaA dRAID with 1393ff01b23SMartin Matuska.Em N No disks of size Em X , D No data disks per redundancy group, Em P 1403ff01b23SMartin Matuska.No parity level, and Em S No distributed hot spares can hold approximately 1413ff01b23SMartin Matuska.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P 1423ff01b23SMartin Matuskadevices failing without losing data. 1433ff01b23SMartin Matuska.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc 1443ff01b23SMartin MatuskaA non-default dRAID configuration can be specified by appending one or more 1453ff01b23SMartin Matuskaof the following optional arguments to the 1463ff01b23SMartin Matuska.Sy draid 1473ff01b23SMartin Matuskakeyword: 1483ff01b23SMartin Matuska.Bl -tag -compact -width "children" 1493ff01b23SMartin Matuska.It Ar parity 1503ff01b23SMartin MatuskaThe parity level (1-3). 1513ff01b23SMartin Matuska.It Ar data 1523ff01b23SMartin MatuskaThe number of data devices per redundancy group. 1533ff01b23SMartin MatuskaIn general, a smaller value of 1543ff01b23SMartin Matuska.Em D No will increase IOPS, improve the compression ratio, 1553ff01b23SMartin Matuskaand speed up resilvering at the expense of total usable capacity. 1563ff01b23SMartin MatuskaDefaults to 1573ff01b23SMartin Matuska.Em 8 , No unless Em N-P-S No is less than Em 8 . 1583ff01b23SMartin Matuska.It Ar children 1593ff01b23SMartin MatuskaThe expected number of children. 1603ff01b23SMartin MatuskaUseful as a cross-check when listing a large number of devices. 1613ff01b23SMartin MatuskaAn error is returned when the provided number of children differs. 1623ff01b23SMartin Matuska.It Ar spares 1633ff01b23SMartin MatuskaThe number of distributed hot spares. 1643ff01b23SMartin MatuskaDefaults to zero. 1653ff01b23SMartin Matuska.El 1663ff01b23SMartin Matuska.It Sy spare 1673ff01b23SMartin MatuskaA pseudo-vdev which keeps track of available hot spares for a pool. 1683ff01b23SMartin MatuskaFor more information, see the 1693ff01b23SMartin Matuska.Sx Hot Spares 1703ff01b23SMartin Matuskasection. 1713ff01b23SMartin Matuska.It Sy log 1723ff01b23SMartin MatuskaA separate intent log device. 1733ff01b23SMartin MatuskaIf more than one log device is specified, then writes are load-balanced between 1743ff01b23SMartin Matuskadevices. 1753ff01b23SMartin MatuskaLog devices can be mirrored. 1763ff01b23SMartin MatuskaHowever, raidz vdev types are not supported for the intent log. 1773ff01b23SMartin MatuskaFor more information, see the 1783ff01b23SMartin Matuska.Sx Intent Log 1793ff01b23SMartin Matuskasection. 1803ff01b23SMartin Matuska.It Sy dedup 1813ff01b23SMartin MatuskaA device dedicated solely for deduplication tables. 1823ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal 1833ff01b23SMartin Matuskadevices in the pool. 1843ff01b23SMartin MatuskaIf more than one dedup device is specified, then 1853ff01b23SMartin Matuskaallocations are load-balanced between those devices. 1863ff01b23SMartin Matuska.It Sy special 1873ff01b23SMartin MatuskaA device dedicated solely for allocating various kinds of internal metadata, 1883ff01b23SMartin Matuskaand optionally small file blocks. 1893ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal 1903ff01b23SMartin Matuskadevices in the pool. 1913ff01b23SMartin MatuskaIf more than one special device is specified, then 1923ff01b23SMartin Matuskaallocations are load-balanced between those devices. 1933ff01b23SMartin Matuska.Pp 1943ff01b23SMartin MatuskaFor more information on special allocations, see the 1953ff01b23SMartin Matuska.Sx Special Allocation Class 1963ff01b23SMartin Matuskasection. 1973ff01b23SMartin Matuska.It Sy cache 1983ff01b23SMartin MatuskaA device used to cache storage pool data. 1993ff01b23SMartin MatuskaA cache device cannot be configured as a mirror or raidz group. 2003ff01b23SMartin MatuskaFor more information, see the 2013ff01b23SMartin Matuska.Sx Cache Devices 2023ff01b23SMartin Matuskasection. 2033ff01b23SMartin Matuska.El 2043ff01b23SMartin Matuska.Pp 2053ff01b23SMartin MatuskaVirtual devices cannot be nested, so a mirror or raidz virtual device can only 2063ff01b23SMartin Matuskacontain files or disks. 2073ff01b23SMartin MatuskaMirrors of mirrors 2083ff01b23SMartin Matuska.Pq or other combinations 2093ff01b23SMartin Matuskaare not allowed. 2103ff01b23SMartin Matuska.Pp 2113ff01b23SMartin MatuskaA pool can have any number of virtual devices at the top of the configuration 2123ff01b23SMartin Matuska.Po known as 2133ff01b23SMartin Matuska.Qq root vdevs 2143ff01b23SMartin Matuska.Pc . 2153ff01b23SMartin MatuskaData is dynamically distributed across all top-level devices to balance data 2163ff01b23SMartin Matuskaamong devices. 2173ff01b23SMartin MatuskaAs new virtual devices are added, ZFS automatically places data on the newly 2183ff01b23SMartin Matuskaavailable devices. 2193ff01b23SMartin Matuska.Pp 2203ff01b23SMartin MatuskaVirtual devices are specified one at a time on the command line, 2213ff01b23SMartin Matuskaseparated by whitespace. 2223ff01b23SMartin MatuskaKeywords like 2233ff01b23SMartin Matuska.Sy mirror No and Sy raidz 2243ff01b23SMartin Matuskaare used to distinguish where a group ends and another begins. 2253ff01b23SMartin MatuskaFor example, the following creates a pool with two root vdevs, 2263ff01b23SMartin Matuskaeach a mirror of two disks: 2273ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd 2283ff01b23SMartin Matuska. 2293ff01b23SMartin Matuska.Ss Device Failure and Recovery 2303ff01b23SMartin MatuskaZFS supports a rich set of mechanisms for handling device failure and data 2313ff01b23SMartin Matuskacorruption. 2323ff01b23SMartin MatuskaAll metadata and data is checksummed, and ZFS automatically repairs bad data 2333ff01b23SMartin Matuskafrom a good copy when corruption is detected. 2343ff01b23SMartin Matuska.Pp 2353ff01b23SMartin MatuskaIn order to take advantage of these features, a pool must make use of some form 2363ff01b23SMartin Matuskaof redundancy, using either mirrored or raidz groups. 2373ff01b23SMartin MatuskaWhile ZFS supports running in a non-redundant configuration, where each root 2383ff01b23SMartin Matuskavdev is simply a disk or file, this is strongly discouraged. 2393ff01b23SMartin MatuskaA single case of bit corruption can render some or all of your data unavailable. 2403ff01b23SMartin Matuska.Pp 2413ff01b23SMartin MatuskaA pool's health status is described by one of three states: 2423ff01b23SMartin Matuska.Sy online , degraded , No or Sy faulted . 2433ff01b23SMartin MatuskaAn online pool has all devices operating normally. 2443ff01b23SMartin MatuskaA degraded pool is one in which one or more devices have failed, but the data is 2453ff01b23SMartin Matuskastill available due to a redundant configuration. 2463ff01b23SMartin MatuskaA faulted pool has corrupted metadata, or one or more faulted devices, and 2473ff01b23SMartin Matuskainsufficient replicas to continue functioning. 2483ff01b23SMartin Matuska.Pp 2493ff01b23SMartin MatuskaThe health of the top-level vdev, such as a mirror or raidz device, 2503ff01b23SMartin Matuskais potentially impacted by the state of its associated vdevs, 2513ff01b23SMartin Matuskaor component devices. 2523ff01b23SMartin MatuskaA top-level vdev or component device is in one of the following states: 2533ff01b23SMartin Matuska.Bl -tag -width "DEGRADED" 2543ff01b23SMartin Matuska.It Sy DEGRADED 2553ff01b23SMartin MatuskaOne or more top-level vdevs is in the degraded state because one or more 2563ff01b23SMartin Matuskacomponent devices are offline. 2573ff01b23SMartin MatuskaSufficient replicas exist to continue functioning. 2583ff01b23SMartin Matuska.Pp 2593ff01b23SMartin MatuskaOne or more component devices is in the degraded or faulted state, but 2603ff01b23SMartin Matuskasufficient replicas exist to continue functioning. 2613ff01b23SMartin MatuskaThe underlying conditions are as follows: 2623ff01b23SMartin Matuska.Bl -bullet -compact 2633ff01b23SMartin Matuska.It 2643ff01b23SMartin MatuskaThe number of checksum errors exceeds acceptable levels and the device is 2653ff01b23SMartin Matuskadegraded as an indication that something may be wrong. 2663ff01b23SMartin MatuskaZFS continues to use the device as necessary. 2673ff01b23SMartin Matuska.It 2683ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels. 2693ff01b23SMartin MatuskaThe device could not be marked as faulted because there are insufficient 2703ff01b23SMartin Matuskareplicas to continue functioning. 2713ff01b23SMartin Matuska.El 2723ff01b23SMartin Matuska.It Sy FAULTED 2733ff01b23SMartin MatuskaOne or more top-level vdevs is in the faulted state because one or more 2743ff01b23SMartin Matuskacomponent devices are offline. 2753ff01b23SMartin MatuskaInsufficient replicas exist to continue functioning. 2763ff01b23SMartin Matuska.Pp 2773ff01b23SMartin MatuskaOne or more component devices is in the faulted state, and insufficient 2783ff01b23SMartin Matuskareplicas exist to continue functioning. 2793ff01b23SMartin MatuskaThe underlying conditions are as follows: 2803ff01b23SMartin Matuska.Bl -bullet -compact 2813ff01b23SMartin Matuska.It 2823ff01b23SMartin MatuskaThe device could be opened, but the contents did not match expected values. 2833ff01b23SMartin Matuska.It 2843ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels and the device is faulted to 2853ff01b23SMartin Matuskaprevent further use of the device. 2863ff01b23SMartin Matuska.El 2873ff01b23SMartin Matuska.It Sy OFFLINE 2883ff01b23SMartin MatuskaThe device was explicitly taken offline by the 2893ff01b23SMartin Matuska.Nm zpool Cm offline 2903ff01b23SMartin Matuskacommand. 2913ff01b23SMartin Matuska.It Sy ONLINE 2923ff01b23SMartin MatuskaThe device is online and functioning. 2933ff01b23SMartin Matuska.It Sy REMOVED 2943ff01b23SMartin MatuskaThe device was physically removed while the system was running. 2953ff01b23SMartin MatuskaDevice removal detection is hardware-dependent and may not be supported on all 2963ff01b23SMartin Matuskaplatforms. 2973ff01b23SMartin Matuska.It Sy UNAVAIL 2983ff01b23SMartin MatuskaThe device could not be opened. 2993ff01b23SMartin MatuskaIf a pool is imported when a device was unavailable, then the device will be 3003ff01b23SMartin Matuskaidentified by a unique identifier instead of its path since the path was never 3013ff01b23SMartin Matuskacorrect in the first place. 3023ff01b23SMartin Matuska.El 3033ff01b23SMartin Matuska.Pp 3043ff01b23SMartin MatuskaChecksum errors represent events where a disk returned data that was expected 3053ff01b23SMartin Matuskato be correct, but was not. 3063ff01b23SMartin MatuskaIn other words, these are instances of silent data corruption. 3073ff01b23SMartin MatuskaThe checksum errors are reported in 3083ff01b23SMartin Matuska.Nm zpool Cm status 3093ff01b23SMartin Matuskaand 3103ff01b23SMartin Matuska.Nm zpool Cm events . 3113ff01b23SMartin MatuskaWhen a block is stored redundantly, a damaged block may be reconstructed 3123ff01b23SMartin Matuska(e.g. from raidz parity or a mirrored copy). 3133ff01b23SMartin MatuskaIn this case, ZFS reports the checksum error against the disks that contained 3143ff01b23SMartin Matuskadamaged data. 3153ff01b23SMartin MatuskaIf a block is unable to be reconstructed (e.g. due to 3 disks being damaged 3163ff01b23SMartin Matuskain a raidz2 group), it is not possible to determine which disks were silently 3173ff01b23SMartin Matuskacorrupted. 3183ff01b23SMartin MatuskaIn this case, checksum errors are reported for all disks on which the block 3193ff01b23SMartin Matuskais stored. 3203ff01b23SMartin Matuska.Pp 3213ff01b23SMartin MatuskaIf a device is removed and later re-attached to the system, 3223ff01b23SMartin MatuskaZFS attempts online the device automatically. 3233ff01b23SMartin MatuskaDevice attachment detection is hardware-dependent 3243ff01b23SMartin Matuskaand might not be supported on all platforms. 3253ff01b23SMartin Matuska. 3263ff01b23SMartin Matuska.Ss Hot Spares 3273ff01b23SMartin MatuskaZFS allows devices to be associated with pools as 3283ff01b23SMartin Matuska.Qq hot spares . 3293ff01b23SMartin MatuskaThese devices are not actively used in the pool, but when an active device 3303ff01b23SMartin Matuskafails, it is automatically replaced by a hot spare. 3313ff01b23SMartin MatuskaTo create a pool with hot spares, specify a 3323ff01b23SMartin Matuska.Sy spare 3333ff01b23SMartin Matuskavdev with any number of devices. 3343ff01b23SMartin MatuskaFor example, 3353ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd 3363ff01b23SMartin Matuska.Pp 3373ff01b23SMartin MatuskaSpares can be shared across multiple pools, and can be added with the 3383ff01b23SMartin Matuska.Nm zpool Cm add 3393ff01b23SMartin Matuskacommand and removed with the 3403ff01b23SMartin Matuska.Nm zpool Cm remove 3413ff01b23SMartin Matuskacommand. 3423ff01b23SMartin MatuskaOnce a spare replacement is initiated, a new 3433ff01b23SMartin Matuska.Sy spare 3443ff01b23SMartin Matuskavdev is created within the configuration that will remain there until the 3453ff01b23SMartin Matuskaoriginal device is replaced. 3463ff01b23SMartin MatuskaAt this point, the hot spare becomes available again if another device fails. 3473ff01b23SMartin Matuska.Pp 3483ff01b23SMartin MatuskaIf a pool has a shared spare that is currently being used, the pool can not be 3493ff01b23SMartin Matuskaexported since other pools may use this shared spare, which may lead to 3503ff01b23SMartin Matuskapotential data corruption. 3513ff01b23SMartin Matuska.Pp 3523ff01b23SMartin MatuskaShared spares add some risk. 3533ff01b23SMartin MatuskaIf the pools are imported on different hosts, 3543ff01b23SMartin Matuskaand both pools suffer a device failure at the same time, 3553ff01b23SMartin Matuskaboth could attempt to use the spare at the same time. 3563ff01b23SMartin MatuskaThis may not be detected, resulting in data corruption. 3573ff01b23SMartin Matuska.Pp 3583ff01b23SMartin MatuskaAn in-progress spare replacement can be cancelled by detaching the hot spare. 3593ff01b23SMartin MatuskaIf the original faulted device is detached, then the hot spare assumes its 3603ff01b23SMartin Matuskaplace in the configuration, and is removed from the spare list of all active 3613ff01b23SMartin Matuskapools. 3623ff01b23SMartin Matuska.Pp 3633ff01b23SMartin MatuskaThe 3643ff01b23SMartin Matuska.Sy draid 3653ff01b23SMartin Matuskavdev type provides distributed hot spares. 3663ff01b23SMartin MatuskaThese hot spares are named after the dRAID vdev they're a part of 3673ff01b23SMartin Matuska.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 , 3683ff01b23SMartin Matuska.No which is a single parity dRAID Pc 3693ff01b23SMartin Matuskaand may only be used by that dRAID vdev. 3703ff01b23SMartin MatuskaOtherwise, they behave the same as normal hot spares. 3713ff01b23SMartin Matuska.Pp 3723ff01b23SMartin MatuskaSpares cannot replace log devices. 3733ff01b23SMartin Matuska. 3743ff01b23SMartin Matuska.Ss Intent Log 3753ff01b23SMartin MatuskaThe ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous 3763ff01b23SMartin Matuskatransactions. 3773ff01b23SMartin MatuskaFor instance, databases often require their transactions to be on stable storage 3783ff01b23SMartin Matuskadevices when returning from a system call. 3793ff01b23SMartin MatuskaNFS and other applications can also use 3803ff01b23SMartin Matuska.Xr fsync 2 3813ff01b23SMartin Matuskato ensure data stability. 3823ff01b23SMartin MatuskaBy default, the intent log is allocated from blocks within the main pool. 3833ff01b23SMartin MatuskaHowever, it might be possible to get better performance using separate intent 3843ff01b23SMartin Matuskalog devices such as NVRAM or a dedicated disk. 3853ff01b23SMartin MatuskaFor example: 3863ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc 3873ff01b23SMartin Matuska.Pp 3883ff01b23SMartin MatuskaMultiple log devices can also be specified, and they can be mirrored. 3893ff01b23SMartin MatuskaSee the 3903ff01b23SMartin Matuska.Sx EXAMPLES 3913ff01b23SMartin Matuskasection for an example of mirroring multiple log devices. 3923ff01b23SMartin Matuska.Pp 3933ff01b23SMartin MatuskaLog devices can be added, replaced, attached, detached and removed. 3943ff01b23SMartin MatuskaIn addition, log devices are imported and exported as part of the pool 3953ff01b23SMartin Matuskathat contains them. 3963ff01b23SMartin MatuskaMirrored devices can be removed by specifying the top-level mirror vdev. 3973ff01b23SMartin Matuska. 3983ff01b23SMartin Matuska.Ss Cache Devices 3993ff01b23SMartin MatuskaDevices can be added to a storage pool as 4003ff01b23SMartin Matuska.Qq cache devices . 4013ff01b23SMartin MatuskaThese devices provide an additional layer of caching between main memory and 4023ff01b23SMartin Matuskadisk. 4033ff01b23SMartin MatuskaFor read-heavy workloads, where the working set size is much larger than what 4043ff01b23SMartin Matuskacan be cached in main memory, using cache devices allows much more of this 4053ff01b23SMartin Matuskaworking set to be served from low latency media. 4063ff01b23SMartin MatuskaUsing cache devices provides the greatest performance improvement for random 4073ff01b23SMartin Matuskaread-workloads of mostly static content. 4083ff01b23SMartin Matuska.Pp 4093ff01b23SMartin MatuskaTo create a pool with cache devices, specify a 4103ff01b23SMartin Matuska.Sy cache 4113ff01b23SMartin Matuskavdev with any number of devices. 4123ff01b23SMartin MatuskaFor example: 4133ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd 4143ff01b23SMartin Matuska.Pp 4153ff01b23SMartin MatuskaCache devices cannot be mirrored or part of a raidz configuration. 4163ff01b23SMartin MatuskaIf a read error is encountered on a cache device, that read I/O is reissued to 4173ff01b23SMartin Matuskathe original storage pool device, which might be part of a mirrored or raidz 4183ff01b23SMartin Matuskaconfiguration. 4193ff01b23SMartin Matuska.Pp 4203ff01b23SMartin MatuskaThe content of the cache devices is persistent across reboots and restored 4213ff01b23SMartin Matuskaasynchronously when importing the pool in L2ARC (persistent L2ARC). 4223ff01b23SMartin MatuskaThis can be disabled by setting 4233ff01b23SMartin Matuska.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 . 4243ff01b23SMartin MatuskaFor cache devices smaller than 425*716fd348SMartin Matuska.Em 1 GiB , 4263ff01b23SMartin Matuskawe do not write the metadata structures 4273ff01b23SMartin Matuskarequired for rebuilding the L2ARC in order not to waste space. 4283ff01b23SMartin MatuskaThis can be changed with 4293ff01b23SMartin Matuska.Sy l2arc_rebuild_blocks_min_l2size . 4303ff01b23SMartin MatuskaThe cache device header 4313ff01b23SMartin Matuska.Pq Em 512 B 4323ff01b23SMartin Matuskais updated even if no metadata structures are written. 4333ff01b23SMartin MatuskaSetting 4343ff01b23SMartin Matuska.Sy l2arc_headroom Ns = Ns Sy 0 4353ff01b23SMartin Matuskawill result in scanning the full-length ARC lists for cacheable content to be 4363ff01b23SMartin Matuskawritten in L2ARC (persistent ARC). 4373ff01b23SMartin MatuskaIf a cache device is added with 4383ff01b23SMartin Matuska.Nm zpool Cm add 4393ff01b23SMartin Matuskaits label and header will be overwritten and its contents are not going to be 4403ff01b23SMartin Matuskarestored in L2ARC, even if the device was previously part of the pool. 4413ff01b23SMartin MatuskaIf a cache device is onlined with 4423ff01b23SMartin Matuska.Nm zpool Cm online 4433ff01b23SMartin Matuskaits contents will be restored in L2ARC. 4443ff01b23SMartin MatuskaThis is useful in case of memory pressure 4453ff01b23SMartin Matuskawhere the contents of the cache device are not fully restored in L2ARC. 4463ff01b23SMartin MatuskaThe user can off- and online the cache device when there is less memory pressure 4473ff01b23SMartin Matuskain order to fully restore its contents to L2ARC. 4483ff01b23SMartin Matuska. 4493ff01b23SMartin Matuska.Ss Pool checkpoint 4503ff01b23SMartin MatuskaBefore starting critical procedures that include destructive actions 4513ff01b23SMartin Matuska.Pq like Nm zfs Cm destroy , 4523ff01b23SMartin Matuskaan administrator can checkpoint the pool's state and in the case of a 4533ff01b23SMartin Matuskamistake or failure, rewind the entire pool back to the checkpoint. 4543ff01b23SMartin MatuskaOtherwise, the checkpoint can be discarded when the procedure has completed 4553ff01b23SMartin Matuskasuccessfully. 4563ff01b23SMartin Matuska.Pp 4573ff01b23SMartin MatuskaA pool checkpoint can be thought of as a pool-wide snapshot and should be used 4583ff01b23SMartin Matuskawith care as it contains every part of the pool's state, from properties to vdev 4593ff01b23SMartin Matuskaconfiguration. 4603ff01b23SMartin MatuskaThus, certain operations are not allowed while a pool has a checkpoint. 4613ff01b23SMartin MatuskaSpecifically, vdev removal/attach/detach, mirror splitting, and 4623ff01b23SMartin Matuskachanging the pool's GUID. 4633ff01b23SMartin MatuskaAdding a new vdev is supported, but in the case of a rewind it will have to be 4643ff01b23SMartin Matuskaadded again. 4653ff01b23SMartin MatuskaFinally, users of this feature should keep in mind that scrubs in a pool that 4663ff01b23SMartin Matuskahas a checkpoint do not repair checkpointed data. 4673ff01b23SMartin Matuska.Pp 4683ff01b23SMartin MatuskaTo create a checkpoint for a pool: 4693ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Ar pool 4703ff01b23SMartin Matuska.Pp 4713ff01b23SMartin MatuskaTo later rewind to its checkpointed state, you need to first export it and 4723ff01b23SMartin Matuskathen rewind it during import: 4733ff01b23SMartin Matuska.Dl # Nm zpool Cm export Ar pool 4743ff01b23SMartin Matuska.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool 4753ff01b23SMartin Matuska.Pp 4763ff01b23SMartin MatuskaTo discard the checkpoint from a pool: 4773ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Fl d Ar pool 4783ff01b23SMartin Matuska.Pp 4793ff01b23SMartin MatuskaDataset reservations (controlled by the 4803ff01b23SMartin Matuska.Sy reservation No and Sy refreservation 4813ff01b23SMartin Matuskaproperties) may be unenforceable while a checkpoint exists, because the 4823ff01b23SMartin Matuskacheckpoint is allowed to consume the dataset's reservation. 4833ff01b23SMartin MatuskaFinally, data that is part of the checkpoint but has been freed in the 4843ff01b23SMartin Matuskacurrent state of the pool won't be scanned during a scrub. 4853ff01b23SMartin Matuska. 4863ff01b23SMartin Matuska.Ss Special Allocation Class 4873ff01b23SMartin MatuskaAllocations in the special class are dedicated to specific block types. 4883ff01b23SMartin MatuskaBy default this includes all metadata, the indirect blocks of user data, and 4893ff01b23SMartin Matuskaany deduplication tables. 4903ff01b23SMartin MatuskaThe class can also be provisioned to accept small file blocks. 4913ff01b23SMartin Matuska.Pp 4923ff01b23SMartin MatuskaA pool must always have at least one normal 4933ff01b23SMartin Matuska.Pq non- Ns Sy dedup Ns /- Ns Sy special 4943ff01b23SMartin Matuskavdev before 4953ff01b23SMartin Matuskaother devices can be assigned to the special class. 4963ff01b23SMartin MatuskaIf the 4973ff01b23SMartin Matuska.Sy special 4983ff01b23SMartin Matuskaclass becomes full, then allocations intended for it 4993ff01b23SMartin Matuskawill spill back into the normal class. 5003ff01b23SMartin Matuska.Pp 5013ff01b23SMartin MatuskaDeduplication tables can be excluded from the special class by unsetting the 5023ff01b23SMartin Matuska.Sy zfs_ddt_data_is_special 5033ff01b23SMartin MatuskaZFS module parameter. 5043ff01b23SMartin Matuska.Pp 5053ff01b23SMartin MatuskaInclusion of small file blocks in the special class is opt-in. 5063ff01b23SMartin MatuskaEach dataset can control the size of small file blocks allowed 5073ff01b23SMartin Matuskain the special class by setting the 5083ff01b23SMartin Matuska.Sy special_small_blocks 5093ff01b23SMartin Matuskaproperty to nonzero. 5103ff01b23SMartin MatuskaSee 5113ff01b23SMartin Matuska.Xr zfsprops 7 5123ff01b23SMartin Matuskafor more info on this property. 513