xref: /dflybsd-src/share/man/man8/swapcache.8 (revision 9ef1e017fc7fdc61dd2e78a093efae5913f7c536)
13ffc7051SMatthew Dillon.\"
23ffc7051SMatthew Dillon.\" swapcache - Cache clean filesystem data & meta-data on SSD-based swap
33ffc7051SMatthew Dillon.\"
43ffc7051SMatthew Dillon.\" Redistribution and use in source and binary forms, with or without
53ffc7051SMatthew Dillon.\" modification, are permitted provided that the following conditions
63ffc7051SMatthew Dillon.\" are met:
73ffc7051SMatthew Dillon.\" 1. Redistributions of source code must retain the above copyright
83ffc7051SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer.
93ffc7051SMatthew Dillon.\" 2. Redistributions in binary form must reproduce the above copyright
103ffc7051SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer in the
113ffc7051SMatthew Dillon.\"    documentation and/or other materials provided with the distribution.
123ffc7051SMatthew Dillon.Dd February 7, 2010
133ffc7051SMatthew Dillon.Dt SWAPCACHE 8
143ffc7051SMatthew Dillon.Os
153ffc7051SMatthew Dillon.Sh NAME
163ffc7051SMatthew Dillon.Nm swapcache
1767bda820SThomas Nikolajsen.Nd a mechanism to use fast swap to cache filesystem data and meta-data
1826353f58SMatthew Dillon.Sh SYNOPSIS
193ffc7051SMatthew Dillon.Cd sysctl vm.swapcache.accrate=100000
203ffc7051SMatthew Dillon.Cd sysctl vm.swapcache.maxfilesize=0
213ffc7051SMatthew Dillon.Cd sysctl vm.swapcache.maxburst=2000000000
223ffc7051SMatthew Dillon.Cd sysctl vm.swapcache.curburst=4000000000
23ed7b872cSMatthew Dillon.Cd sysctl vm.swapcache.minburst=10000000
24ed7b872cSMatthew Dillon.Cd sysctl vm.swapcache.read_enable=0
25ed7b872cSMatthew Dillon.Cd sysctl vm.swapcache.meta_enable=0
26ed7b872cSMatthew Dillon.Cd sysctl vm.swapcache.data_enable=0
27ed7b872cSMatthew Dillon.Cd sysctl vm.swapcache.use_chflags=1
28ed7b872cSMatthew Dillon.Cd sysctl vm.swapcache.maxlaunder=256
2975cdc755SMatthew Dillon.Cd sysctl vm.swapcache.hysteresis=(vm.stats.vm.v_inactive_target/2)
30ed7b872cSMatthew Dillon.Sh DESCRIPTION
31ed7b872cSMatthew Dillon.Nm
32ed7b872cSMatthew Dillonis a system capability which allows a solid state disk (SSD) in a swap
33ed7b872cSMatthew Dillonspace configuration to be used to cache clean filesystem data and meta-data
34ed7b872cSMatthew Dillonin addition to its normal function of backing anonymous memory.
35ed7b872cSMatthew Dillon.Pp
36ed7b872cSMatthew DillonSysctls are used to manage operational parameters and can be adjusted at
3767bda820SThomas Nikolajsenany time.
3867bda820SThomas NikolajsenTypically a large initial burst is desired after system boot,
39ed7b872cSMatthew Dilloncontrolled by the initial
4067bda820SThomas Nikolajsen.Va vm.swapcache.curburst
41ed7b872cSMatthew Dillonparameter.
42ed7b872cSMatthew DillonThis parameter is reduced as data is written to swap by the swapcache
43ed7b872cSMatthew Dillonand increased at a rate specified by
4467bda820SThomas Nikolajsen.Va vm.swapcache.accrate .
45ed7b872cSMatthew DillonOnce this parameter reaches zero write activity ceases until it has
46ed7b872cSMatthew Dillonrecovered sufficiently for write activity to resume.
47ed7b872cSMatthew Dillon.Pp
4867bda820SThomas Nikolajsen.Va vm.swapcache.meta_enable
4967bda820SThomas Nikolajsenenables the writing of filesystem meta-data to the swapcache.
5067bda820SThomas NikolajsenFilesystem
51ed7b872cSMatthew Dillonmetadata is any data which the filesystem accesses via the disk device
5267bda820SThomas Nikolajsenusing buffercache.
5367bda820SThomas NikolajsenMeta-data is cached globally regardless of file or directory flags.
54ed7b872cSMatthew Dillon.Pp
5567bda820SThomas Nikolajsen.Va vm.swapcache.data_enable
5626353f58SMatthew Dillonenables the writing of clean filesystem file-data to the swapcache.
5726353f58SMatthew DillonFilesystem filedata is any data which the filesystem accesses via a
5867bda820SThomas Nikolajsenregular file.
5967bda820SThomas NikolajsenIn technical terms, when the buffer cache is used to access
6026353f58SMatthew Dillona regular file through its vnode.
6167bda820SThomas NikolajsenPlease do not blindly turn on this option, see the
6267bda820SThomas Nikolajsen.Sx PERFORMANCE TUNING
6326353f58SMatthew Dillonsection for more information.
64ed7b872cSMatthew Dillon.Pp
6567bda820SThomas Nikolajsen.Va vm.swapcache.use_chflags
66ed7b872cSMatthew Dillonenables the use of the
6767bda820SThomas Nikolajsen.Va cache
68ed7b872cSMatthew Dillonand
6967bda820SThomas Nikolajsen.Va noscache
70ed7b872cSMatthew Dillon.Xr chflags 1
71ed7b872cSMatthew Dillonflags to control which files will be data-cached.
7267bda820SThomas NikolajsenIf this sysctl is disabled and
7367bda820SThomas Nikolajsen.Va data_enable
7467bda820SThomas Nikolajsenis enabled, the system will ignore file flags and attempt to
7567bda820SThomas Nikolajsenswapcache all regular files.
76ed7b872cSMatthew Dillon.Pp
7767bda820SThomas Nikolajsen.Va vm.swapcache.read_enable
78ed7b872cSMatthew Dillonenables reading from the swapcache and should be set to 1 for normal
79ed7b872cSMatthew Dillonoperation.
80ed7b872cSMatthew Dillon.Pp
8167bda820SThomas Nikolajsen.Va vm.swapcache.maxfilesize
82ed7b872cSMatthew Dilloncontrols which files are to be cached based on their size.
83ed7b872cSMatthew DillonIf set to non-zero only files smaller than the specified size
8467bda820SThomas Nikolajsenwill be cached.
8567bda820SThomas NikolajsenLarger files will not be cached.
8675cdc755SMatthew Dillon.Pp
8767bda820SThomas Nikolajsen.Va vm.swapcache.maxlaunder
8875cdc755SMatthew Dilloncontrols the maximum number of clean VM pages which will be added to
8975cdc755SMatthew Dillonthe swap cache and written out to swap on each poll.
9075cdc755SMatthew DillonSwapcache polls ten times a second.
9175cdc755SMatthew Dillon.Pp
9267bda820SThomas Nikolajsen.Va vm.swapcache.hysteresis
9375cdc755SMatthew Dilloncontrols how many pages swapcache waits to be added to the inactive page
9467bda820SThomas Nikolajsenqueue before continuing its scan.
9567bda820SThomas NikolajsenOnce it decides to scan it continues subject to the above limitations
9667bda820SThomas Nikolajsenuntil it reaches the end of the inactive page queue.
9775cdc755SMatthew DillonThis parameter is designed to make swapcache generate more bulky bursts
9875cdc755SMatthew Dillonto swap which helps SSDs reduce write amplification effects.
99ed7b872cSMatthew Dillon.Sh PERFORMANCE TUNING
100ed7b872cSMatthew DillonBest operation is achieved when the active data set fits within the
101ed7b872cSMatthew Dillonswapcache.
102ed7b872cSMatthew Dillon.Pp
103ed7b872cSMatthew Dillon.Bl -tag -width 4n -compact
10467bda820SThomas Nikolajsen.It Va vm.swapcache.accrate
105ed7b872cSMatthew DillonThis specifies the burst accumulation rate in bytes per second and
106ed7b872cSMatthew Dillonultimately controls the write bandwidth to swap averaged over a long
107ed7b872cSMatthew Dillonperiod of time.
108ed7b872cSMatthew DillonThis parameter must be carefully chosen to manage the write endurance of
109ed7b872cSMatthew Dillonthe SSD in order to avoid wearing it out too quickly.
110ed7b872cSMatthew DillonEven though SSDs have limited write endurance, there is massive
111ed7b872cSMatthew Dilloncost/performance benefit to using one in a swapcache configuration.
112ed7b872cSMatthew Dillon.Pp
113c280af89SMatthew DillonLet's use the old Intel X25V 40GB MLC SATA SSD as an example.
11467bda820SThomas NikolajsenThis device has approximately a
115a865840aSMatthew Dillon40TB (40 terabyte) write endurance, but see later
116a865840aSMatthew Dillonnotes on this, it is more a minimum value.
11767bda820SThomas NikolajsenLimiting the long term average bandwidth to 100KB/sec leads to no more
11867bda820SThomas Nikolajsenthan ~9GB/day writing which calculates approximately to a 12 year endurance.
11967bda820SThomas NikolajsenEndurance scales linearly with size.
12067bda820SThomas NikolajsenThe 80GB version of this SSD
1213ffc7051SMatthew Dillonwill have a write endurance of approximately 80TB.
1223ffc7051SMatthew Dillon.Pp
123a865840aSMatthew DillonMLC SSDs have a 1000-10000x write endurance, while the lower density
1248b14c46eSMatthew Dillonhigher-cost SLC SSDs have a 10000-100000x write endurance, approximately.
125a865840aSMatthew DillonMLC SSDs can be used for the swapcache (and swap) as long as the system
126a865840aSMatthew Dillonmanager is cognizant of its limitations.
127c280af89SMatthew DillonHowever, over the years tests have shown the SLC SSDs do not really live
128c280af89SMatthew Dillonup to their hype and are no more reliable than MLC SSDs.  Instead of
129c280af89SMatthew Dillonworrying about SLC vs MLC, just use MLC (or TLC or whateve), leave
130c280af89SMatthew Dillonmore space unpartitioned which the SSD can utilize to improve durability,
131c280af89SMatthew Dillonand be cognizant of the SSDs rate of wear.
1323ffc7051SMatthew Dillon.Pp
13367bda820SThomas Nikolajsen.It Va vm.swapcache.meta_enable
1343ffc7051SMatthew DillonTurning on just
13567bda820SThomas Nikolajsen.Va meta_enable
1363ffc7051SMatthew Dilloncauses only filesystem meta-data to be cached and will result
13775d25c98SMatthew Dillonin very fast directory operations even over millions of inodes
13875d25c98SMatthew Dillonand even in the face of other invasive operations being run
13975d25c98SMatthew Dillonby other processes.
1403ffc7051SMatthew Dillon.Pp
14167bda820SThomas NikolajsenFor
14267bda820SThomas Nikolajsen.Nm HAMMER
14367bda820SThomas Nikolajsenfilesystems meta-data includes the B-Tree, directory entries,
14467bda820SThomas Nikolajsenand data related to tiny files.
14567bda820SThomas NikolajsenApproximately 6 GB of swapcache is needed
14626353f58SMatthew Dillonfor every 14 million or so inodes cached, effectively giving one the
14767bda820SThomas Nikolajsenability to cache all the meta-data in a multi-terabyte filesystem using
14826353f58SMatthew Dillona fairly small SSD.
14926353f58SMatthew Dillon.Pp
15067bda820SThomas Nikolajsen.It Va vm.swapcache.data_enable
1513ffc7051SMatthew DillonTurning on
15267bda820SThomas Nikolajsen.Va data_enable
15367bda820SThomas Nikolajsen(with or without other features) allows bulk file data to be cached.
1543ffc7051SMatthew DillonThis feature is very useful for web server operation when the
1553ffc7051SMatthew Dillonoperational data set fits in swap.
1563d048a1bSMatthew DillonHowever, care must be taken to avoid thrashing the swapcache.
1573d048a1bSMatthew DillonIn almost all cases you will want to leave chflags mode enabled
1583d048a1bSMatthew Dillonand use 'chflags cache' on governing directories to control which
1593d048a1bSMatthew Dillondirectory subtrees file data should be cached for.
1603d048a1bSMatthew Dillon.Pp
161*edf2e657SSascha Wildner.Dx
162*edf2e657SSascha Wildneruses generously large kern.maxvnodes values,
163c280af89SMatthew Dillontypically in excess of 400K vnodes, but large numbers
164c280af89SMatthew Dillonof small files can still cause problems for swapcache.
165c280af89SMatthew DillonWhen operating on a filesystem containing a large number of
166c280af89SMatthew Dillonsmall files, vnode recycling by the kernel will cause related
167c280af89SMatthew Dillonswapcache data to be lost and also cause the swapcache to
168c280af89SMatthew Dillonpotentially thrash.
1693d048a1bSMatthew DillonCache thrashing due to vnode recyclement can occur whether chflags
1703d048a1bSMatthew Dillonmode is used or not.
1713d048a1bSMatthew Dillon.Pp
1723d048a1bSMatthew DillonTo solve the thrashing problem you can turn on HAMMER's
1733d048a1bSMatthew Dillondouble buffering feature via
1743d048a1bSMatthew Dillon.Va vfs.hammer.double_buffer .
1753d048a1bSMatthew DillonThis causes HAMMER to cache file data via its block device.
1763d048a1bSMatthew DillonHAMMER cannot avoid also caching file data via individual vnodes
1773d048a1bSMatthew Dillonbut will try to expire the second copy more quickly (hence
1783d048a1bSMatthew Dillonwhy it is called double buffer mode), but the key point here is
1793d048a1bSMatthew Dillonthat
1803d048a1bSMatthew Dillon.Nm
1813d048a1bSMatthew Dillonwill only cache the data blocks via the block device when
1823d048a1bSMatthew Dillondouble_buffer mode is used and since the block device is associated
183c280af89SMatthew Dillonwith the mount, vnode recycling will not mess with it.
1843d048a1bSMatthew DillonThis allows the data for any number (potentially millions) of files to
185c280af89SMatthew Dillonbe swapcached.
1863d048a1bSMatthew DillonYou still should use chflags mode to control the size of the dataset
1873d048a1bSMatthew Dillonbeing cached to remain under 75% of configured swap space.
188788ef3f9SMatthew Dillon.Pp
1892dc854bcSMatthew DillonData caching is definitely more wasteful of the SSD's write durability
1902dc854bcSMatthew Dillonthan meta-data caching.
1913d048a1bSMatthew DillonIf not carefully managed the swapcache may exhaust its burst and smack
1923d048a1bSMatthew Dillonagainst the long term average bandwidth limit, causing the SSD to wear
1933d048a1bSMatthew Dillonout at the maximum rate you programmed.
19467bda820SThomas NikolajsenData caching is far less wasteful and more efficient
195c280af89SMatthew Dillonif you provide a sufficiently large SSD.
1963ffc7051SMatthew Dillon.Pp
1973d048a1bSMatthew DillonWhen caching large data sets you may want to use a medium-sized SSD
1986044bbebSSascha Wildnerwith good write performance instead of a small SSD to accommodate
1993d048a1bSMatthew Dillonthe higher burst write rate data caching incurs and to reduce
2003d048a1bSMatthew Dilloninterference between reading and writing.
2013d048a1bSMatthew DillonWrite durability also tends to scale with larger SSDs, but keep in mind
2023d048a1bSMatthew Dillonthat newer flash technologies use smaller feature sizes on-chip
2033d048a1bSMatthew Dillonwhich reduce the write durability of the chips, so pay careful attention
2043d048a1bSMatthew Dillonto the type of flash employed by the SSD when making durability
2053d048a1bSMatthew Dillonassumptions.
2062dc854bcSMatthew DillonFor example, an Intel X25-V only has 40MB/s in write performance
2072dc854bcSMatthew Dillonand burst writing by swapcache will seriously interfere with
20867bda820SThomas Nikolajsenconcurrent read operation on the SSD.
20967bda820SThomas NikolajsenThe 80GB X25-M on the otherhand has double the write performance.
210c280af89SMatthew DillonHigher-capacity and larger form-factor SSDs tend to have better
211c280af89SMatthew Dillonwrite-performance.
2123d048a1bSMatthew DillonBut the Intel 310 series SSDs use flash chips with a smaller feature
2133d048a1bSMatthew Dillonsize so an 80G 310 series SSD will wind up with a durability relative
2143d048a1bSMatthew Dillonclose to the older 40G X25-V.
2152dc854bcSMatthew Dillon.Pp
216c280af89SMatthew DillonWhen data caching is turned on you can fine-tune what gets swapcached
217c280af89SMatthew Dillonby also turning on swapcache's chflags mode and using
218e9b56058SMatthew Dillon.Xr chflags 1
219e9b56058SMatthew Dillonwith the
22067bda820SThomas Nikolajsen.Va cache
221c280af89SMatthew Dillonflag to enable data caching on a directory-tree (recursive) basis.
222e9b56058SMatthew DillonThis flag is tracked by the namecache and does not need to be
223e9b56058SMatthew Dillonrecursively set in the directory tree.
22475cdc755SMatthew DillonSimply setting the flag in a top level directory or mount point
22575cdc755SMatthew Dillonis usually sufficient.
22675cdc755SMatthew DillonHowever, the flag does not track across mount points.
227e9b56058SMatthew DillonA typical setup is something like this:
228e9b56058SMatthew Dillon.Pp
229e9b56058SMatthew Dillon.Dl chflags cache /etc /sbin /bin /usr /home
230e9b56058SMatthew Dillon.Dl chflags noscache /usr/obj
231e9b56058SMatthew Dillon.Pp
2323d048a1bSMatthew DillonIt is possible to tell
2333d048a1bSMatthew Dillon.Nm
234c280af89SMatthew Dillonto ignore the cache flag by leaving
23567bda820SThomas Nikolajsen.Va vm.swapcache.use_chflags
236c280af89SMatthew Dillonset to zero.
237c280af89SMatthew DillonIn many situations it is convenient to simply not use chflags mode, but
238c280af89SMatthew Dillonif you have numerous mixed SSDs and HDDs you may want to use this flag
239c280af89SMatthew Dillonto enable swapcache on the HDDs and disable it on the SSDs even if
240c280af89SMatthew Dillonyou do not care about fine-grained control.
24167bda820SThomas Nikolajsen.Nm chflag Ns 'ing .
242ab19123cSMatthew Dillon.Pp
24375cdc755SMatthew DillonFilesystems such as NFS which do not support flags generally
24475cdc755SMatthew Dillonhave a
24567bda820SThomas Nikolajsen.Va cache
24675cdc755SMatthew Dillonmount option which enables swapcache operation on the mount.
24775cdc755SMatthew Dillon.Pp
24867bda820SThomas Nikolajsen.It Va vm.swapcache.maxfilesize
2493ffc7051SMatthew DillonThis may be used to reduce cache thrashing when a focus on a small
2503ffc7051SMatthew Dillonpotentially fragmented filespace is desired, leaving the
2513d048a1bSMatthew Dillonlarger (more linearly accessed) files alone.
2523ffc7051SMatthew Dillon.Pp
25367bda820SThomas Nikolajsen.It Va vm.swapcache.minburst
25460e72c96SJustin C. SherrillThis controls hysteresis and prevents nickel-and-dime write bursting.
2553ffc7051SMatthew DillonOnce
25667bda820SThomas Nikolajsen.Va curburst
25767bda820SThomas Nikolajsendrops to zero, writing to the swapcache ceases until it has recovered past
25867bda820SThomas Nikolajsen.Va minburst .
2593ffc7051SMatthew DillonThe idea here is to avoid creating a heavily fragmented swapcache where
2603ffc7051SMatthew Dillonreading data from a file must alternate between the cache and the primary
26167bda820SThomas Nikolajsenfilesystem.
26267bda820SThomas NikolajsenDoing so does not save disk seeks on the primary filesystem
26367bda820SThomas Nikolajsenso we want to avoid doing small bursts.
26467bda820SThomas NikolajsenThis parameter allows us to do larger bursts.
2653ffc7051SMatthew DillonThe larger bursts also tend to improve SSD performance as the SSD itself
2663ffc7051SMatthew Dilloncan do a better job write-combining and erasing blocks.
2673ffc7051SMatthew Dillon.Pp
26867bda820SThomas Nikolajsen.It Va vm_swapcache.maxswappct
269e9b56058SMatthew DillonThis controls the maximum amount of swapspace
270e9b56058SMatthew Dillon.Nm
271e9b56058SMatthew Dillonmay use, in percentage terms.
2723d048a1bSMatthew DillonThe default is 75%, leaving the remaining 25% of swap available for normal
2733d048a1bSMatthew Dillonpaging operations.
2743ffc7051SMatthew Dillon.El
2753ffc7051SMatthew Dillon.Pp
276c280af89SMatthew DillonIt is important to ensure that your swap partition is nicely aligned.
277*edf2e657SSascha WildnerThe standard
278*edf2e657SSascha Wildner.Dx
279c280af89SMatthew Dillon.Xr disklabel 8
280c280af89SMatthew Dillonprogram guarantees high alignment (~1MB) automatically.
281c280af89SMatthew DillonSwap-on HDDs benefit because HDDs tend to use a larger physical sector size
282c280af89SMatthew Dillonthan 512 bytes, and proper alignment for SSDs will reduce write amplification
283c280af89SMatthew Dillonand write-combining inefficiencies.
284a865840aSMatthew Dillon.Pp
2853ffc7051SMatthew DillonFinally, interleaved swap (multiple SSDs) may be used to increase
286c280af89SMatthew Dillonswap and swapcache performance even further.
2873d048a1bSMatthew DillonA single SATA-II SSD is typically capable of reading 120-220MB/sec.
28867bda820SThomas NikolajsenConfiguring two SSDs for your swap will
289788ef3f9SMatthew Dillonimprove aggregate swapcache read performance by 1.5x to 1.8x.
29067bda820SThomas NikolajsenIn tests with two Intel 40GB SSDs 300MB/sec was easily achieved.
2913d048a1bSMatthew DillonWith two SATA-III SSDs it is possible to achieve 600MB/sec or better
2925242e856SSascha Wildnerand well over 400MB/sec random-read performance (versus the ~3MB/sec
2933d048a1bSMatthew Dillonrandom read performance a hard drive gives you).
294c280af89SMatthew DillonFaster SATA interfaces or newer NVMe technologies have significantly
295c280af89SMatthew Dillonmore read bandwidth (3GB/sec+ for NVMe), but may still lag on the
296c280af89SMatthew Dillonwrite bandwidth.
297c280af89SMatthew DillonWith newer technologies, one swap device is usually plenty.
298788ef3f9SMatthew Dillon.Pp
299499dbb9aSSascha Wildner.Dx
300499dbb9aSSascha Wildnerdefaults to a maximum of 512G of configured swap.
301c280af89SMatthew DillonKeep in mind that each 1GB of actually configured swap requires
302c280af89SMatthew Dillonapproximately 1MB of wired ram to manage.
3033ffc7051SMatthew Dillon.Pp
3043ffc7051SMatthew DillonIn addition there will be periods of time where the system is in
30567bda820SThomas Nikolajsensteady state and not writing to the swapcache.
30667bda820SThomas NikolajsenDuring these periods
30767bda820SThomas Nikolajsen.Va curburst
3083ffc7051SMatthew Dillonwill inch back up but will not exceed
30967bda820SThomas Nikolajsen.Va maxburst .
3103ffc7051SMatthew DillonThus the
31167bda820SThomas Nikolajsen.Va maxburst
3123ffc7051SMatthew Dillonvalue controls how large a repeated burst can be.
3133d048a1bSMatthew DillonRemember that
3143d048a1bSMatthew Dillon.Va curburst
3153d048a1bSMatthew Dillondynamically tracks burst and will go up and down depending.
3163ffc7051SMatthew Dillon.Pp
3173ffc7051SMatthew DillonA second bursting parameter called
31867bda820SThomas Nikolajsen.Va vm.swapcache.minburst
3193ffc7051SMatthew Dilloncontrols bursting when the maximum write bandwidth has been reached.
3203ffc7051SMatthew DillonWhen
32167bda820SThomas Nikolajsen.Va minburst
3223ffc7051SMatthew Dillonreaches zero write activity ceases and
32367bda820SThomas Nikolajsen.Va curburst
3243ffc7051SMatthew Dillonis allowed to recover up to
32567bda820SThomas Nikolajsen.Va minburst
32667bda820SThomas Nikolajsenbefore write activity resumes.
32767bda820SThomas NikolajsenThe recommended range for the
32867bda820SThomas Nikolajsen.Va minburst
32967bda820SThomas Nikolajsenparameter is 1MB to 50MB.
33067bda820SThomas NikolajsenThis parameter has a relationship to
3313ffc7051SMatthew Dillonhow fragmented the swapcache gets when not in a steady state.
3323ffc7051SMatthew DillonLarge bursts reduce fragmentation and reduce incidences of
33367bda820SThomas Nikolajsenexcessive seeking on the hard drive.
33467bda820SThomas NikolajsenIf set too low the
3353ffc7051SMatthew Dillonswapcache will become fragmented within a single regular file
3363ffc7051SMatthew Dillonand the constant back-and-forth between the swapcache and the
3373ffc7051SMatthew Dillonhard drive will result in excessive seeking on the hard drive.
3383ffc7051SMatthew Dillon.Sh SWAPCACHE SIZE & MANAGEMENT
339e9b56058SMatthew DillonThe swapcache feature will use up to 75% of configured swap space
340e9b56058SMatthew Dillonby default.
341c280af89SMatthew DillonThe remaining 25% is reserved for normal paging operations.
34275d25c98SMatthew DillonThe system operator should configure at least 4 times the SWAP space
34367bda820SThomas Nikolajsenversus main memory and no less than 8GB of swap space.
344c280af89SMatthew DillonA typical 128GB SSD might use 64GB for boot + base and 56GB for
345c280af89SMatthew Dillonswap, with 8GB left unpartitioned.  The system might then have a large
346c280af89SMatthew Dillonadditional hard drive for bulk data.
347c280af89SMatthew DillonEven with many packages installed, 64GB is comfortable for
348c280af89SMatthew Dillonboot + base.
349c280af89SMatthew Dillon.Pp
350c280af89SMatthew DillonWhen configuring a SSD that will be used for swap or swapcache
351c280af89SMatthew Dillonit is a good idea to leave around 10% unpartitioned to improve
352c280af89SMatthew Dillonthe SSDs durability.
353c280af89SMatthew Dillon.Pp
354c280af89SMatthew DillonYou do not need to use swapcache if you have no hard drives in the
355c280af89SMatthew Dillonsystem, though in fact swapcache can help if you use NFS heavily
356c280af89SMatthew Dillonas a client.
3573ffc7051SMatthew Dillon.Pp
358e9b56058SMatthew DillonThe
35967bda820SThomas Nikolajsen.Va vm_swapcache.maxswappct
360e9b56058SMatthew Dillonsysctl may be used to change the default.
361e9b56058SMatthew DillonYou may have to change this default if you also use
362e9b56058SMatthew Dillon.Xr tmpfs 5 ,
363e9b56058SMatthew Dillon.Xr vn 4 ,
364e9b56058SMatthew Dillonor if you have not allocated enough swap for reasonable normal paging
365e9b56058SMatthew Dillonactivity to occur (in which case you probably shouldn't be using
366e9b56058SMatthew Dillon.Nm
367e9b56058SMatthew Dillonanyway).
368e9b56058SMatthew Dillon.Pp
3693ffc7051SMatthew DillonIf swapcache reaches the 75% limit it will begin tearing down swap
3703ffc7051SMatthew Dillonin linear bursts by iterating through available VM objects, until
37167bda820SThomas Nikolajsenswap space use drops to 70%.
37267bda820SThomas NikolajsenThe tear-down is limited by the rate at
37367bda820SThomas Nikolajsenwhich new data is written and this rate in turn is often limited by
37467bda820SThomas Nikolajsen.Va vm.swapcache.accrate ,
3753ffc7051SMatthew Dillonresulting in an orderly replacement of cached data and meta-data.
3763ffc7051SMatthew DillonThe limit is typically only reached when doing full data+meta-data
3773ffc7051SMatthew Dilloncaching with no file size limitations and serving primarily large
378c280af89SMatthew Dillonfiles, or bumping
37967bda820SThomas Nikolajsen.Va kern.maxvnodes
38067bda820SThomas Nikolajsenup to very high values.
381788ef3f9SMatthew Dillon.Sh NORMAL SWAP PAGING ACTIVITY WITH SSD SWAP
382788ef3f9SMatthew DillonThis is not a function of
383788ef3f9SMatthew Dillon.Nm
38467bda820SThomas Nikolajsenper se but instead a normal function of the system.
38567bda820SThomas NikolajsenMost systems have
38667bda820SThomas Nikolajsensufficient memory that they do not need to page memory to swap.
38767bda820SThomas NikolajsenThese types of systems are the ones best suited for MLC SSD
38867bda820SThomas Nikolajsenconfigured swap running with a
389788ef3f9SMatthew Dillon.Nm
390788ef3f9SMatthew Dillonconfiguration.
391788ef3f9SMatthew DillonSystems which modestly page to swap, in the range of a few hundred
392788ef3f9SMatthew Dillonmegabytes a day worth of writing, are also well suited for MLC SSD
39367bda820SThomas Nikolajsenconfigured swap.
39467bda820SThomas NikolajsenDesktops usually fall into this category even if they
395788ef3f9SMatthew Dillonpage out a bit more because swap activity is governed by the actions of
396788ef3f9SMatthew Dillona single person.
397788ef3f9SMatthew Dillon.Pp
398788ef3f9SMatthew DillonSystems which page anonymous memory heavily when
399788ef3f9SMatthew Dillon.Nm
400788ef3f9SMatthew Dillonwould otherwise be turned off are not usually well suited for MLC SSD
40167bda820SThomas Nikolajsenconfigured swap.
40267bda820SThomas NikolajsenHeavy paging activity is not governed by
403788ef3f9SMatthew Dillon.Nm
404788ef3f9SMatthew Dillonbandwidth control parameters and can lead to excessive uncontrolled
405c280af89SMatthew Dillonwriting to the SSD, causing premature wearout.
40667bda820SThomas NikolajsenThis isn't to say that
407788ef3f9SMatthew Dillon.Nm
408788ef3f9SMatthew Dillonwould be ineffective, just that the aggregate write bandwidth required
409c280af89SMatthew Dillonto support the system might be too large to be cost-effective for a SSD.
410788ef3f9SMatthew Dillon.Pp
41160e72c96SJustin C. SherrillWith this caveat in mind, SSD based paging on systems with insufficient
41267bda820SThomas NikolajsenRAM can be extremely effective in extending the useful life of the system.
41367bda820SThomas NikolajsenFor example, a system with a measly 192MB of RAM and SSD swap can run
41475d25c98SMatthew Dillona -j 8 parallel build world in a little less than twice the time it
41567bda820SThomas Nikolajsenwould take if the system had 2GB of RAM, whereas it would take 5x to 10x
416c280af89SMatthew Dillonas long with normal HDD based swap.
417147a04c3SMatthew Dillon.Sh USING SWAPCACHE WITH NORMAL HARD DRIVES
418147a04c3SMatthew DillonAlthough
419147a04c3SMatthew Dillon.Nm
420147a04c3SMatthew Dillonis designed to work with SSD-based storage it can also be used with
421147a04c3SMatthew DillonHD-based storage as an aid for offloading the primary storage system.
422147a04c3SMatthew DillonHere we need to make a distinction between using RAID for fanning out
4235242e856SSascha Wildnerstorage versus using RAID for redundancy.  There are numerous situations
424147a04c3SMatthew Dillonwhere RAID-based redundancy does not make sense.
425147a04c3SMatthew Dillon.Pp
426147a04c3SMatthew DillonA good example would be in an environment where the servers themselves
427147a04c3SMatthew Dillonare redundant and can suffer a total failure without effecting
428147a04c3SMatthew Dillonongoing operations.  When the primary storage requirements easily fit onto
429147a04c3SMatthew Dillona single large-capacity drive it doesn't make a whole lot of sense to
430147a04c3SMatthew Dillonuse RAID if your only desire is to improve performance.  If you had a farm
431147a04c3SMatthew Dillonof, say, 20 servers supporting the same facility adding RAID to each one
432147a04c3SMatthew Dillonwould not accomplish anything other than to bloat your deployment and
433e8b22b55SSascha Wildnermaintenance costs.
434147a04c3SMatthew Dillon.Pp
435e8b22b55SSascha WildnerIn these sorts of situations it may be desirable and convenient to have
436147a04c3SMatthew Dillonthe primary filesystem for each machine on a single large drive and then
437147a04c3SMatthew Dillonuse the
438147a04c3SMatthew Dillon.Nm
439147a04c3SMatthew Dillonfacility to offload the drive and make the machine more effective without
440147a04c3SMatthew Dillonactually distributing the filesystem itself across multiple drives.
441147a04c3SMatthew DillonFor the purposes of offloading while a SSD would be the most effective
442147a04c3SMatthew Dillonfrom a performance standpoint, a second medium sized HD with its much lower
443147a04c3SMatthew Dilloncost and higher capacity might actually be more cost effective.
4448b14c46eSMatthew Dillon.Sh EXPLANATION OF STATIC VS DYNAMIC WEARING LEVELING, AND WRITE-COMBINING
4458b14c46eSMatthew DillonModern SSDs keep track of space that has never been written to.
4468b14c46eSMatthew DillonThis would also include space freed up via TRIM, but simply not
4478b14c46eSMatthew Dillontouching a bit of storage in a factory fresh SSD works just as well.
4488b14c46eSMatthew DillonOnce you touch (write to) the storage all bets are off, even if
4498b14c46eSMatthew Dillonyou reformat/repartition later.  It takes sending the SSD a
4508b14c46eSMatthew Dillonwhole-device TRIM command or special format command to take it back
4518b14c46eSMatthew Dillonto its factory-fresh condition (sans wear already present).
4528b14c46eSMatthew Dillon.Pp
4538b14c46eSMatthew DillonSSDs have wear leveling algorithms which are responsible for trying
4548b14c46eSMatthew Dillonto even out the erase/write cycles across all flash cells in the
4558b14c46eSMatthew Dillonstorage.  The better a job the SSD can do the longer the SSD will
456b979d635SSascha Wildnerremain usable.
4578b14c46eSMatthew Dillon.Pp
4588b14c46eSMatthew DillonThe more unused storage there is from the SSDs point of view the
4598b14c46eSMatthew Dilloneasier a time the SSD has running its wear leveling algorithms.
4608b14c46eSMatthew DillonBasically the wear leveling algorithm in a modern SSD (say Intel or OCZ)
4618b14c46eSMatthew Dillonuses a combination of static and dynamic leveling.  Static is the
4628b14c46eSMatthew Dillonbest, allowing the SSD to reuse flash cells that have not been
4638b14c46eSMatthew Dillonerased very much by moving static (unchanging) data out of them and
4648b14c46eSMatthew Dilloninto other cells that have more wear.  Dynamic wear leveling involves
4658b14c46eSMatthew Dillonwriting data to available flash cells and then marking the cells containing
4668b14c46eSMatthew Dillonthe previous copy of the data as being free/reusable.  Dynamic wear leveling
4678b14c46eSMatthew Dillonis the worst kind but the easiest to implement.  Modern SSDs use a combination
4688b14c46eSMatthew Dillonof both algorithms plus also do write-combining.
4698b14c46eSMatthew Dillon.Pp
4708b14c46eSMatthew DillonUSB sticks often use only dynamic wear leveling and have short life spans
4718b14c46eSMatthew Dillonbecause of that.
4728b14c46eSMatthew Dillon.Pp
4738b14c46eSMatthew DillonIn anycase, any unused space in the SSD effectively makes the dynamic
4748b14c46eSMatthew Dillonwear leveling the SSD does more efficient by giving the SSD more 'unused'
4758b14c46eSMatthew Dillonspace above and beyond the physical space it reserves beyond its stated
476566ca746SSascha Wildnerstorage capacity to cycle data through, so the SSD lasts longer in theory.
4778b14c46eSMatthew Dillon.Pp
4788b14c46eSMatthew DillonWrite-combining is a feature whereby the SSD is able to reduced write
4798b14c46eSMatthew Dillonamplification effects by combining OS writes of smaller, discrete,
4808b14c46eSMatthew Dillonnon-contiguous logical sectors into a single contiguous 128KB physical
4818b14c46eSMatthew Dillonflash block.
4828b14c46eSMatthew Dillon.Pp
4838b14c46eSMatthew DillonOn the flip side write-combining also results in more complex lookup tables
4848b14c46eSMatthew Dillonwhich can become fragmented over time and reduce the SSDs read performance.
4858b14c46eSMatthew DillonFragmentation can also occur when write-combined blocks are rewritten
4868b14c46eSMatthew Dillonpiecemeal.
4878b14c46eSMatthew DillonModern SSDs can regain the lost performance by de-combining previously
4888b14c46eSMatthew Dillonwrite-combined areas as part of their static wear leveling algorithm, but
4898b14c46eSMatthew Dillonat the cost of extra write/erase cycles which slightly increase write
4908b14c46eSMatthew Dillonamplification effects.
4918b14c46eSMatthew DillonOperating systems can also help maintain the SSDs performance by utilizing
4928b14c46eSMatthew Dillonlarger blocks.
4938b14c46eSMatthew DillonWrite-combining results in a net-reduction
4948b14c46eSMatthew Dillonof write-amplification effects but due to having to de-combine later and
495b979d635SSascha Wildnerother fragmentary effects it isn't 100%.
4968b14c46eSMatthew DillonFrom testing with Intel devices write-amplification can be well controlled
4975242e856SSascha Wildnerin the 2x-4x range with the OS doing 16K writes, versus a worst-case
4988b14c46eSMatthew Dillon8x write-amplification with 16K blocks, 32x with 4K blocks, and a truly
4998b14c46eSMatthew Dillonhorrid worst-case with 512 byte blocks.
5008b14c46eSMatthew Dillon.Pp
5018b14c46eSMatthew DillonThe
5028b14c46eSMatthew Dillon.Dx
5038b14c46eSMatthew Dillon.Nm
5048b14c46eSMatthew Dillonfeature utilizes 64K-128K writes and is specifically designed to minimize
5058b14c46eSMatthew Dillonwrite amplification and write-combining stresses.
5068b14c46eSMatthew DillonIn terms of placing an actual filesystem on the SSD, the
5078b14c46eSMatthew Dillon.Dx
5088b14c46eSMatthew Dillon.Xr hammer 8
5098b14c46eSMatthew Dillonfilesystem utilizes 16K blocks and is well behaved as long as you limit
5108b14c46eSMatthew Dillonreblocking operations.
5118b14c46eSMatthew DillonFor UFS you should create the filesystem with at least a 4K fragment
5125242e856SSascha Wildnersize, versus the default 2K.
5138b14c46eSMatthew DillonModern Windows filesystems use 4K clusters but it is unclear how SSD-friendly
5148b14c46eSMatthew DillonNTFS is.
5153d048a1bSMatthew Dillon.Sh EXPLANATION OF FLASH CHIP FEATURE SIZE VS ERASE/REWRITE CYCLE DURABILITY
5163d048a1bSMatthew DillonManufacturers continue to produce flash chips with smaller feature sizes.
5173d048a1bSMatthew DillonSmaller flash cells means reduced erase/rewrite cycle durability which in
5183d048a1bSMatthew Dillonturn reduces the durability of the SSD.
5193d048a1bSMatthew Dillon.Pp
5203d048a1bSMatthew DillonThe older 34nm flash typically had a 10,000 cell durability while the newer
5213d048a1bSMatthew Dillon25nm flash is closer to 1000.  The newer flash uses larger ECCs and more
5223d048a1bSMatthew Dillonsensitive voltage comparators on-chip to increase the durability closer to
5233d048a1bSMatthew Dillon3000 cycles.  Generally speaking you should assume a durability of around
5245242e856SSascha Wildner1/3 for the same storage capacity using the new chips versus the older
5253d048a1bSMatthew Dillonchips.  If you can squeeze out a 400TB durability from an older 40GB X25-V
5263d048a1bSMatthew Dillonusing 34nm technology then you should assume around a 400TB durability from
5273d048a1bSMatthew Dillona newer 120GB 310 series SSD using 25nm technology.
52875d25c98SMatthew Dillon.Sh WARNINGS
529a865840aSMatthew DillonI am going to repeat and expand a bit on SSD wear.
530a865840aSMatthew DillonWear on SSDs is a function of the write durability of the cells,
5318b14c46eSMatthew Dillonwhether the SSD implements static or dynamic wear leveling (or both),
5328b14c46eSMatthew Dillonwrite amplification effects when the OS does not issue write-aligned 128KB
5338b14c46eSMatthew Dillonops or when the SSD is unable to write-combine adjacent logical sectors,
5348b14c46eSMatthew Dillonor if the SSD has a poor write-combining algorithm for non-adjacent sectors.
5358b14c46eSMatthew DillonIn addition some additional erase/rewrite activity occurs from cleanup
5368b14c46eSMatthew Dillonoperations the SSD performs as part of its static wear leveling algorithms
5378b14c46eSMatthew Dillonand its write-decombining algorithms (necessary to maintain performance over
5388b14c46eSMatthew Dillontime).  MLC flash uses 128KB physical write/erase blocks while SLC flash
5398b14c46eSMatthew Dillontypically uses 64KB physical write/erase blocks.
5408b14c46eSMatthew Dillon.Pp
5418b14c46eSMatthew DillonThe algorithms the SSD implements in its firmware are probably the most
5428b14c46eSMatthew Dillonimportant part of the device and a major differentiator between e.g. SATA
5438b14c46eSMatthew Dillonand USB-based SSDs.  SATA form factor drives will universally be far superior
5448b14c46eSMatthew Dillonto USB storage sticks.
5458b14c46eSMatthew DillonSSDs can also have wildly different wearout rates and wildly different
5468b14c46eSMatthew Dillonperformance curves over time.
5478b14c46eSMatthew DillonFor example the performance of a SSD which does not implement
5488b14c46eSMatthew Dillonwrite-decombining can seriously degrade over time as its lookup
5498b14c46eSMatthew Dillontables become severely fragmented.
5508b14c46eSMatthew DillonFor the purposes of this manual page we are primarily using Intel and OCZ
5518b14c46eSMatthew Dillondrives when describing performance and wear issues.
552a865840aSMatthew Dillon.Pp
5533ffc7051SMatthew Dillon.Nm
5543ffc7051SMatthew Dillonparameters should be carefully chosen to avoid early wearout.
55567bda820SThomas NikolajsenFor example, the Intel X25V 40GB SSD has a minimum write durability
556a865840aSMatthew Dillonof 40TB and an actual durability that can be quite a bit higher.
55760e72c96SJustin C. SherrillGenerally speaking, you want to select parameters that will give you
558a865840aSMatthew Dillonat least 10 years of service life.
559a865840aSMatthew DillonThe most important parameter to control this is
56067bda820SThomas Nikolajsen.Va vm.swapcache.accrate .
561a865840aSMatthew Dillon.Nm
562a865840aSMatthew Dillonuses a very conservative 100KB/sec default but even a small X25V
56367bda820SThomas Nikolajsencan probably handle 300KB/sec of continuous writing and still last 10 years.
5643ffc7051SMatthew Dillon.Pp
565a865840aSMatthew DillonDepending on the wear leveling algorithm the drive uses, durability
566a865840aSMatthew Dillonand performance can sometimes be improved by configuring less
567a865840aSMatthew Dillonspace (in a manufacturer-fresh drive) than the drive's probed capacity.
56867bda820SThomas NikolajsenFor example, by only using 32GB of a 40GB SSD.
569a865840aSMatthew DillonSSDs typically implement 10% more storage than advertised and
57067bda820SThomas Nikolajsenuse this storage to improve wear leveling.
57167bda820SThomas NikolajsenAs cells begin to fail
57275d25c98SMatthew Dillonthis overallotment slowly becomes part of the primary storage
57367bda820SThomas Nikolajsenuntil it has been exhausted.
57467bda820SThomas NikolajsenAfter that the SSD has basically failed.
57560e72c96SJustin C. SherrillKeep in mind that if you use a larger portion of the SSD's advertised
57675d25c98SMatthew Dillonstorage the SSD will not know if/when you decide to use less unless
57775d25c98SMatthew Dillonappropriate TRIM commands are sent (if supported), or a low level
57875d25c98SMatthew Dillonfactory erase is issued.
5793ffc7051SMatthew Dillon.Pp
580788ef3f9SMatthew Dillon.Nm smartctl
581c8e8a2e5SSascha Wildner(from
582703b8508SSascha Wildner.Xr dports 7 Ap s
583c8e8a2e5SSascha Wildner.Pa sysutils/smartmontools )
584c8e8a2e5SSascha Wildnermay be used to retrieve the wear indicator from the drive.
58567bda820SThomas NikolajsenOne usually runs something like
58667bda820SThomas Nikolajsen.Ql smartctl -d sat -a /dev/daXX
58767bda820SThomas Nikolajsen(for AHCI/SILI/SCSI), or
58867bda820SThomas Nikolajsen.Ql smartctl -a /dev/adXX
58967bda820SThomas Nikolajsenfor NATA.
59067bda820SThomas NikolajsenSome SSDs
591a865840aSMatthew Dillon(particularly the Intels) will brick the SATA port when smart operations
592a865840aSMatthew Dillonare done while the drive is busy with normal activity, so the tool should
593a865840aSMatthew Dillononly be run when the SSD is idle.
594788ef3f9SMatthew Dillon.Pp
59560e72c96SJustin C. SherrillID 232 (0xe8) in the SMART data dump indicates available reserved
59667bda820SThomas Nikolajsenspace and ID 233 (0xe9) is the wear-out meter.
59767bda820SThomas NikolajsenReserved space
59875d25c98SMatthew Dillontypically starts at 100 and decrements to 10, after which the SSD
59967bda820SThomas Nikolajsenis considered to operate in a degraded mode.
60067bda820SThomas NikolajsenThe wear-out meter typically starts at 99 and decrements to 0,
60167bda820SThomas Nikolajsenafter which the SSD has failed.
602a865840aSMatthew Dillon.Pp
60375d25c98SMatthew Dillon.Nm
60467bda820SThomas Nikolajsentends to use large 64KB writes and tends to cluster multiple writes
60567bda820SThomas Nikolajsenlinearly.
60667bda820SThomas NikolajsenThe SSD is able to take significant advantage of this
60767bda820SThomas Nikolajsenand write amplification effects are greatly reduced.
60867bda820SThomas NikolajsenIf we take a 40GB Intel X25V as an example the vendor specifies a write
609a865840aSMatthew Dillondurability of approximately 40TB, but
610a865840aSMatthew Dillon.Nm
611a865840aSMatthew Dillonshould be able to squeeze out upwards of 200TB due the fairly optimal
612a865840aSMatthew Dillonwrite clustering it does.
613a865840aSMatthew DillonThe theoretical limit for the Intel X25V is 400TB (10,000 erase cycles
6143d048a1bSMatthew Dillonper MLC cell, 40GB drive, with 34nm technology), but the firmware doesn't
6153d048a1bSMatthew Dillondo perfect static wear leveling so the actual durability is less.
616955b4283SMatthew DillonIn tests over several hundred days we have validated a write endurance
617955b4283SMatthew Dillongreater than 200TB on the 40G Intel X25V using
618955b4283SMatthew Dillon.Nm .
619a865840aSMatthew Dillon.Pp
6208b14c46eSMatthew DillonIn contrast, filesystems directly stored on a SSD could have
621a865840aSMatthew Dillonfairly severe write amplification effects and will have durabilities
622a865840aSMatthew Dillonranging closer to the vendor-specified limit.
623955b4283SMatthew Dillon.Pp
624c280af89SMatthew DillonTests have shown that power cycling (with proper shutdown) and read
625c280af89SMatthew Dillonoperations do not adversely effect a SSD.  Writing within the wearout
626c280af89SMatthew Dillonconstraints provided by the vendor also does not make a powered SSD any
627c280af89SMatthew Dillonless reliable over time.  Time itself seems to be a factor as the SSD
628c280af89SMatthew Dillonencounters defects and weak cells in the flash chips.  Writes to a SSD
629c280af89SMatthew Dillonwill effect cold durability (a typical flash chip has 10 years of cold
630c280af89SMatthew Dillondata retention when fresh and less than 1 year of cold data retention near
631c280af89SMatthew Dillonthe end of its wear life).  Keeping a SSD cool improves its data retention.
632788ef3f9SMatthew Dillon.Pp
633c280af89SMatthew DillonBeware the standard comparison between SLC, MLC, and TLC-based flash
634c280af89SMatthew Dillonin terms of wearout and durability.  Over the years, tests have shown
635c280af89SMatthew Dillonthat SLC is not actually any more reliable than MLC, despite having a
636c280af89SMatthew Dillonsignificantly larger theoretical durability.  Cell and chip failures seem
637c280af89SMatthew Dillonto trump theoretical wear limitations in terms of device reliability.
638c280af89SMatthew DillonWith that in mind, we do not recommend using SLC for anything anymore.
639c280af89SMatthew DillonInstead we recommend that the flash simply be over-provisioned to provide
640c280af89SMatthew Dillonthe needed durability.
641c280af89SMatthew DillonThis is already done in numerous NVMe solutions for the vendor to be able
642c280af89SMatthew Dillonto provide certain minimum wear guarantees.
643c280af89SMatthew DillonDurability scales with the amount of flash storage (but the fab process
644c280af89SMatthew Dillontypically scales the opposite... smaller feature sizes for flash cells
645c280af89SMatthew Dillongreatly reduce their durability).
646a865840aSMatthew DillonWhen wear calculations are in years, these differences become huge, but
647a865840aSMatthew Dillonoften the quantity of storage needed trumps the wear life so we expect most
648a865840aSMatthew Dillonpeople will be using MLC.
649c280af89SMatthew Dillon.Pp
650c280af89SMatthew DillonBeware the huge difference between larger (e.g. 2.5") form-factor SSDs
651c280af89SMatthew Dillonand smaller SSDs such as USB sticks are very small M.2 storage.  Smaller
652c280af89SMatthew Dillonform-factor devices have fewer flash chips and, much lower write bandwidths,
653c280af89SMatthew Dillonless ram for caching and write-combining, and usb sticks in particular will
654c280af89SMatthew Dillonusually have unsophisticated wear-leveling algorithms compared to a 2.5"
655c280af89SMatthew DillonSSD.  It is generally not a good idea to make a USB stick your primary
656c280af89SMatthew Dillonstorage.  Long-form-factor NGFF/M.2 devices will be better, and 2.5"
657c280af89SMatthew Dillonform factor devices even better.  The read-bandwidth for a SATA SSD caps
658c280af89SMatthew Dillonout more quickly than the read-bandwidth for a NVMe SSD, but the larger
659c280af89SMatthew Dillonform factor of a 2.5" SATA SSD will often have superior write performance
660c280af89SMatthew Dillonto a NGFF NVMe device.  There are 2.5" NVMe devices as well, requiring a
661c280af89SMatthew Dillonspecial connector or PCIe adapter, which give you the best of both worlds.
6623ffc7051SMatthew Dillon.Sh SEE ALSO
66367bda820SThomas Nikolajsen.Xr chflags 1 ,
66445b74f6eSSascha Wildner.Xr fstab 5 ,
665a865840aSMatthew Dillon.Xr disklabel64 8 ,
6668b14c46eSMatthew Dillon.Xr hammer 8 ,
66745b74f6eSSascha Wildner.Xr swapon 8
6683ffc7051SMatthew Dillon.Sh HISTORY
6693ffc7051SMatthew Dillon.Nm
6703ffc7051SMatthew Dillonfirst appeared in
6713ffc7051SMatthew Dillon.Dx 2.5 .
6723ffc7051SMatthew Dillon.Sh AUTHORS
6733ffc7051SMatthew Dillon.An Matthew Dillon
674