1.\" 2.\" Copyright (c) 2008 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.\" $DragonFly: src/share/man/man5/hammer.5,v 1.15 2008/11/02 18:56:47 swildner Exp $ 33.\" 34.Dd September 28, 2009 35.Os 36.Dt HAMMER 5 37.Sh NAME 38.Nm HAMMER 39.Nd HAMMER file system 40.Sh SYNOPSIS 41To compile this driver into the kernel, 42place the following line in your 43kernel configuration file: 44.Bd -ragged -offset indent 45.Cd options HAMMER 46.Ed 47.Pp 48Alternatively, to load the driver as a 49module at boot time, place the following line in 50.Xr loader.conf 5 : 51.Bd -literal -offset indent 52hammer_load="YES" 53.Ed 54.Pp 55To mount via 56.Xr fstab 5 : 57.Bd -literal -offset indent 58/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0 59.Ed 60.Sh DESCRIPTION 61The 62.Nm 63file system provides facilities to store file system data onto disk devices 64and is intended to replace 65.Xr ffs 5 66as the default file system for 67.Dx . 68Among its features are instant crash recovery, 69large file systems spanning multiple volumes, 70data integrity checking, 71fine grained history retention, 72mirroring capability, and pseudo file systems. 73.Pp 74All functions related to managing 75.Nm 76file systems are provided by the 77.Xr newfs_hammer 8 , 78.Xr mount_hammer 8 , 79.Xr hammer 8 , 80.Xr chflags 1 , 81and 82.Xr undo 1 83utilities. 84.Pp 85For a more detailed introduction refer to the paper and slides listed in the 86.Sx SEE ALSO 87section. 88For some common usages of 89.Nm 90see the 91.Sx EXAMPLES 92section below. 93.Ss Instant Crash Recovery 94After a non-graceful system shutdown, 95.Nm 96file systems will be brought back into a fully coherent state 97when mounting the file system, usually within a few seconds. 98.Ss Large File Systems & Multi Volume 99A 100.Nm 101file system can be up to 1 Exabyte in size. 102It can span up to 256 volumes, 103each volume occupies a 104.Dx 105disk slice or partition, or another special file, 106and can be up to 4096 TB in size. 107Minimum recommended 108.Nm 109file system size is 50 GB. 110For volumes over 2 TB in size 111.Xr gpt 8 112and 113.Xr disklabel64 8 114normally need to be used. 115.Ss Data Integrity Checking 116.Nm 117has high focus on data integrity, 118CRC checks are made for all major structures and data. 119.Nm 120snapshots implements features to make data integrity checking easier: 121The atime and mtime fields are locked to the ctime 122for files accessed via a snapshot. 123The 124.Fa st_dev 125field is based on the PFS 126.Ar shared-uuid 127and not on any real device. 128This means that archiving the contents of a snapshot with e.g.\& 129.Xr tar 1 130and piping it to something like 131.Xr md5 1 132will yield a consistent result. 133The consistency is also retained on mirroring targets. 134.Ss Transaction IDs 135The 136.Nm 137file system uses 64 bit, hexadecimal transaction IDs to refer to historical 138file or directory data. 139An ID has the 140.Xr printf 3 141format 142.Li %#016llx , 143such as 144.Li 0x00000001061a8ba6 . 145.Pp 146Related 147.Xr hammer 8 148commands: 149.Ar snapshot , 150.Ar synctid 151.Ss History & Snapshots 152History metadata on the media is written with every sync operation, so that 153by default the resolution of a file's history is 30-60 seconds until the next 154prune operation. 155Prior versions of files or directories are generally accessible by appending 156.Li @@ 157and a transaction ID to the name. 158The common way of accessing history, however, is by taking snapshots. 159.Pp 160Snapshots are softlinks to prior versions of directories and their files. 161Their data will be retained across prune operations for as long as the 162softlink exists. 163Removing the softlink enables the file system to reclaim the space 164again upon the next prune & reblock operations. 165.Pp 166Related 167.Xr hammer 8 168commands: 169.Ar cleanup , 170.Ar history , 171.Ar snapshot ; 172see also 173.Xr undo 1 174.Ss Pruning & Reblocking 175Pruning is the act of deleting file system history. 176By default only history used by the given snapshots 177and history from after the latest snapshot will be retained. 178By setting the per PFS parameter 179.Cm prune-min , 180history is guaranteed to be saved at least this time interval. 181All other history is deleted. 182Reblocking will reorder all elements and thus defragment the file system and 183free space for reuse. 184After pruning a file system must be reblocked to recover all available space. 185Reblocking is needed even when using the 186.Ar nohistory 187.Xr mount_hammer 8 188option or 189.Xr chflags 1 190flag. 191.Pp 192Related 193.Xr hammer 8 194commands: 195.Ar cleanup , 196.Ar snapshot , 197.Ar prune , 198.Ar prune-everything , 199.Ar rebalance , 200.Ar reblock , 201.Ar reblock-btree , 202.Ar reblock-inodes , 203.Ar reblock-dirs , 204.Ar reblock-data 205.Ss Mirroring & Pseudo File Systems 206In order to allow inode numbers to be duplicated on the slaves 207.Nm Ap s 208mirroring feature uses 209.Dq Pseudo File Systems 210(PFSs). 211A 212.Nm 213file system supports up to 65535 PFSs. 214Multiple slaves per master are supported, but multiple masters per slave 215are not. 216Slaves are always read-only. 217Upgrading slaves to masters and downgrading masters to slaves are supported. 218.Pp 219It is recommended to use a 220.Nm null 221mount to access a PFS; 222this way no tools are confused by the PFS root being a symlink 223and inodes not being unique across a 224.Nm 225file system. 226.Pp 227Related 228.Xr hammer 8 229commands: 230.Ar pfs-master , 231.Ar pfs-slave , 232.Ar pfs-cleanup , 233.Ar pfs-status , 234.Ar pfs-update , 235.Ar pfs-destroy , 236.Ar pfs-upgrade , 237.Ar pfs-downgrade , 238.Ar mirror-copy , 239.Ar mirror-stream , 240.Ar mirror-read , 241.Ar mirror-read-stream , 242.Ar mirror-write , 243.Ar mirror-dump 244.Ss NFS Export 245.Nm 246file systems support NFS export. 247NFS export of PFSs is done using 248.Nm null 249mounts. 250For example, to export the PFS 251.Pa /hammer/pfs/data , 252create a 253.Nm null 254mount, e.g.\& to 255.Pa /hammer/data 256and export the latter path. 257.Pp 258Don't export a directory containing a PFS (e.g.\& 259.Pa /hammer/pfs 260above). 261Only 262.Nm null 263mount for PFS root 264(e.g.\& 265.Pa /hammer/data 266above) 267should be exported 268(subdirectory may be escaped if exported). 269.Sh EXAMPLES 270.Ss Preparing the File System 271To create and mount a 272.Nm 273file system use the 274.Xr newfs_hammer 8 275and 276.Xr mount_hammer 8 277commands. 278Note that all 279.Nm 280file systems must have a unique name on a per-machine basis. 281.Bd -literal -offset indent 282newfs_hammer -L HOME /dev/ad0s1d 283mount_hammer /dev/ad0s1d /home 284.Ed 285.Pp 286Similarly, multi volume file systems can be created and mounted by 287specifying additional arguments. 288.Bd -literal -offset indent 289newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d 290mount_hammer /dev/ad0s1d /dev/ad1s1d /home 291.Ed 292.Pp 293Once created and mounted, 294.Nm 295file systems need periodic clean up making snapshots, pruning and reblocking, 296in order to have access to history and file system not to fill up. 297For this it is recommended to use the 298.Xr hammer 8 299.Ar cleanup 300metacommand. 301.Pp 302By default, 303.Dx 304is set up to run 305.Nm hammer Ar cleanup 306nightly via 307.Xr periodic 8 . 308.Pp 309It is also possible to perform these operations individually via 310.Xr crontab 5 . 311For example, to reblock the 312.Pa /home 313file system every night at 2:15 for up to 5 minutes: 314.Bd -literal -offset indent 31515 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e 316 >/dev/null 2>&1 317.Ed 318.Ss Snapshots 319The 320.Xr hammer 8 321utility's 322.Ar snapshot 323command provides several ways of taking snapshots. 324They all assume a directory where snapshots are kept. 325.Bd -literal -offset indent 326mkdir /snaps 327hammer snapshot /home /snaps/snap1 328(...after some changes in /home...) 329hammer snapshot /home /snaps/snap2 330.Ed 331.Pp 332The softlinks in 333.Pa /snaps 334point to the state of the 335.Pa /home 336directory at the time each snapshot was taken, and could now be used to copy 337the data somewhere else for backup purposes. 338.Pp 339By default, 340.Dx 341is set up to create nightly snapshots of all 342.Nm 343file systems via 344.Xr periodic 8 345and to keep them for 60 days. 346.Ss Pruning 347A snapshot directory is also the argument to the 348.Xr hammer 8 Ap s 349.Ar prune 350command which frees historical data from the file system that is not 351pointed to by any snapshot link and is not from after the latest snapshot. 352.Bd -literal -offset indent 353rm /snaps/snap1 354hammer prune /snaps 355.Ed 356.Ss Mirroring 357Mirroring can be set up using 358.Nm Ap s 359pseudo file systems. 360To associate the slave with the master its shared UUID should be set to 361the master's shared UUID as output by the 362.Nm hammer Ar pfs-master 363command. 364.Bd -literal -offset indent 365hammer pfs-master /home/pfs/master 366hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid> 367.Ed 368.Pp 369The 370.Pa /home/pfs/slave 371link is unusable for as long as no mirroring operation has taken place. 372.Pp 373To mirror the master's data, either pipe a 374.Fa mirror-read 375command into a 376.Fa mirror-write 377or, as a short-cut, use the 378.Fa mirror-copy 379command (which works across a 380.Xr ssh 1 381connection as well). 382Initial mirroring operation has to be done to the PFS path (as 383.Xr mount_null 8 384can't access it yet). 385.Bd -literal -offset indent 386hammer mirror-copy /home/pfs/master /home/pfs/slave 387.Ed 388.Pp 389After this initial step 390.Nm null 391mount can be setup for 392.Pa /home/pfs/slave . 393Further operations can use 394.Nm null 395mounts. 396.Bd -literal -offset indent 397mount_null /home/pfs/master /home/master 398mount_null /home/pfs/slave /home/slave 399 400hammer mirror-copy /home/master /home/slave 401.Ed 402.Ss NFS Export 403To NFS export from the 404.Nm 405file system 406.Pa /hammer 407the directory 408.Pa /hammer/non-pfs 409without PFSs, and the PFS 410.Pa /hammer/pfs/data , 411the latter is null mounted to 412.Pa /hammer/data . 413.Pp 414Add to 415.Pa /etc/fstab 416(see 417.Xr fstab 5 ) : 418.Bd -literal -offset indent 419/hammer/pfs/data /hammer/data null rw 420.Ed 421.Pp 422Add to 423.Pa /etc/exports 424(see 425.Xr exports 5 ) : 426.Bd -literal -offset indent 427/hammer/non-pfs 428/hammer/data 429.Ed 430.Sh SEE ALSO 431.Xr chflags 1 , 432.Xr md5 1 , 433.Xr tar 1 , 434.Xr undo 1 , 435.Xr exports 5 , 436.Xr ffs 5 , 437.Xr fstab 5 , 438.Xr disklabel64 8 , 439.Xr gpt 8 , 440.Xr hammer 8 , 441.Xr mount_hammer 8 , 442.Xr mount_null 8 , 443.Xr newfs_hammer 8 444.Rs 445.%A Matthew Dillon 446.%D June 2008 447.%O http://www.dragonflybsd.org/hammer/hammer.pdf 448.%T "The HAMMER Filesystem" 449.Re 450.Rs 451.%A Matthew Dillon 452.%D October 2008 453.%O http://www.dragonflybsd.org/hammer/nycbsdcon/ 454.%T "Slideshow from NYCBSDCon 2008" 455.Re 456.Sh FILESYSTEM PERFORMANCE 457The 458.Nm 459file system has a front-end which processes VNOPS and issues necessary 460block reads from disk, and a back-end which handles meta-data updates 461on-media and performs all meta-data write operations. 462Bulk file write operations are handled by the front-end. 463Because 464.Nm 465defers meta-data updates virtually no meta-data read operations will be 466issued by the frontend while writing large amounts of data to the file system 467or even when creating new files or directories, and even though the 468kernel prioritizes reads over writes the fact that writes are cached by 469the drive itself tends to lead to excessive priority given to writes. 470.Pp 471There are four bioq sysctls, shown below with default values, 472which can be adjusted to give reads a higher priority: 473.Bd -literal -offset indent 474kern.bioq_reorder_minor_bytes: 262144 475kern.bioq_reorder_burst_bytes: 3000000 476kern.bioq_reorder_minor_interval: 5 477kern.bioq_reorder_burst_interval: 60 478.Ed 479.Pp 480If a higher read priority is desired it is recommended that the 481.Fa kern.bioq_reorder_minor_interval 482be increased to 15, 30, or even 60, and the 483.Fa kern.bioq_reorder_burst_bytes 484be decreased to 262144 or 524288. 485.Sh HISTORY 486The 487.Nm 488file system first appeared in 489.Dx 1.11 . 490.Sh AUTHORS 491.An -nosplit 492The 493.Nm 494file system was designed and implemented by 495.An Matthew Dillon Aq dillon@backplane.com . 496This manual page was written by 497.An Sascha Wildner . 498