xref: /dflybsd-src/share/man/man4/nvme.4 (revision 86d6ddc0ae6c4d434e4a1f19e880ccf5374f7f8a)
1b708a9f9SMatthew Dillon.\" Copyright (c) 2016 The DragonFly Project.  All rights reserved.
2b708a9f9SMatthew Dillon.\"
3b708a9f9SMatthew Dillon.\" This code is derived from software contributed to The DragonFly Project
4b708a9f9SMatthew Dillon.\" by Matthew Dillon <dillon@backplane.com>
5b708a9f9SMatthew Dillon.\"
6b708a9f9SMatthew Dillon.\" Redistribution and use in source and binary forms, with or without
7b708a9f9SMatthew Dillon.\" modification, are permitted provided that the following conditions
8b708a9f9SMatthew Dillon.\" are met:
9b708a9f9SMatthew Dillon.\"
10b708a9f9SMatthew Dillon.\" 1. Redistributions of source code must retain the above copyright
11b708a9f9SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer.
12b708a9f9SMatthew Dillon.\" 2. Redistributions in binary form must reproduce the above copyright
13b708a9f9SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer in
14b708a9f9SMatthew Dillon.\"    the documentation and/or other materials provided with the
15b708a9f9SMatthew Dillon.\"    distribution.
16b708a9f9SMatthew Dillon.\" 3. Neither the name of The DragonFly Project nor the names of its
17b708a9f9SMatthew Dillon.\"    contributors may be used to endorse or promote products derived
18b708a9f9SMatthew Dillon.\"    from this software without specific, prior written permission.
19b708a9f9SMatthew Dillon.\"
20b708a9f9SMatthew Dillon.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
21b708a9f9SMatthew Dillon.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
22b708a9f9SMatthew Dillon.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
23b708a9f9SMatthew Dillon.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
24b708a9f9SMatthew Dillon.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
25b708a9f9SMatthew Dillon.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
26b708a9f9SMatthew Dillon.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
27b708a9f9SMatthew Dillon.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
28b708a9f9SMatthew Dillon.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
29b708a9f9SMatthew Dillon.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
30b708a9f9SMatthew Dillon.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31b708a9f9SMatthew Dillon.\" SUCH DAMAGE.
32b708a9f9SMatthew Dillon.\"
33b708a9f9SMatthew Dillon.Dd June 5, 2015
34b708a9f9SMatthew Dillon.Dt NVME 4
35b708a9f9SMatthew Dillon.Os
36b708a9f9SMatthew Dillon.Sh NAME
37b708a9f9SMatthew Dillon.Nm nvme
38b708a9f9SMatthew Dillon.Nd NVM Express Controller for PCIe-based SSDs
39b708a9f9SMatthew Dillon.Sh SYNOPSIS
40b708a9f9SMatthew DillonTo compile this driver into the kernel,
41b708a9f9SMatthew Dillonplace the following line in your
42b708a9f9SMatthew Dillonkernel configuration file:
43b708a9f9SMatthew Dillon.Bd -ragged -offset indent
44b708a9f9SMatthew Dillon.Cd "device nvme"
45b708a9f9SMatthew Dillon.Ed
46b708a9f9SMatthew Dillon.Pp
47b708a9f9SMatthew DillonAlternatively, to load the driver as a
48b708a9f9SMatthew Dillonmodule at boot time, place the following line in
49b708a9f9SMatthew Dillon.Xr loader.conf 5 :
50b708a9f9SMatthew Dillon.Bd -literal -offset indent
51b708a9f9SMatthew Dillonnvme_load="YES"
52b708a9f9SMatthew Dillon.Ed
53b708a9f9SMatthew Dillon.Sh DESCRIPTION
54b708a9f9SMatthew DillonThe
55b708a9f9SMatthew Dillon.Nm
56b708a9f9SMatthew Dillondriver provides support for PCIe storage controllers conforming to the
57b708a9f9SMatthew DillonNVM Express Controller Interface specification.
58b708a9f9SMatthew DillonNVMe controllers have a direct PCIe host interface to the controller
59b708a9f9SMatthew Dillonwhich in turn has a direct connection to the underlying non-volatile
60b708a9f9SMatthew Dillon(typically flash) storage, yielding a huge reduction in latency and
61b708a9f9SMatthew Dillonincrease in performance over
62b708a9f9SMatthew Dillon.Xr ahci 4 .
63b708a9f9SMatthew Dillon.Pp
64b708a9f9SMatthew DillonIn addition, NVMe controllers are capable of supporting up to 65535
65b708a9f9SMatthew Dillonindependent submission and completion queues each able to support upwards
66b708a9f9SMatthew Dillonof 16384 queue entries.  Each queue may be assigned its own interrupt
67b708a9f9SMatthew Dillonvector out of the controller's pool (up to 2048).
68b708a9f9SMatthew Dillon.Pp
69b708a9f9SMatthew DillonActual controllers typically implement lower limits.  While most controllers
70b708a9f9SMatthew Dillonallow a maximal number of queue entries, the total number of queues is often
71b708a9f9SMatthew Dillonlimited to far less than 65535.  8-32 queues are commonly supported.
72b708a9f9SMatthew DillonSimilarly, while up to 2048 MSI-X vectors can be supported by the spec,
73b708a9f9SMatthew Dillonactual controllers typically support fewer vectors.  Still, having several
74b708a9f9SMatthew DillonMSI-X vectors allows interrupts to be distributed to multiple CPUs,
75b708a9f9SMatthew Dillonreducing bottlenecks and improving performance.  The multiple queues can
76b708a9f9SMatthew Dillonbe divvied up across available cpu cores by the driver, as well as split-up
77b708a9f9SMatthew Dillonbased on the type of I/O operation being performed (such as giving read
78b708a9f9SMatthew Dillonand write I/O commands their own queues).  This also significantly
79b708a9f9SMatthew Dillonreduces bottlenecks and improves performance, particularly in mixed
80b708a9f9SMatthew Dillonread-write environments.
81b708a9f9SMatthew Dillon.Sh FORM FACTOR
82b708a9f9SMatthew DillonNVMe boards usually come in one of two flavors, either a tiny form-factor
83b708a9f9SMatthew Dillonwith a M.2 or NGFF connector, supplying 2 or 4 PCIe lanes, or in a larger
84b708a9f9SMatthew Dillonform that slips into a normal PCIe slot.  The larger form typically
85b708a9f9SMatthew Dillonimplements 2, 4, or 8 lanes.  Also note that adapter cards that fit
86b708a9f9SMatthew Dilloninto normal PCIe slots and can mount the smaller M.2/NGFF NVME cards can
87b708a9f9SMatthew Dillonbe cheaply purchased.
88b708a9f9SMatthew Dillon.Sh PERFORMANCE
89b708a9f9SMatthew DillonTypical performance for a 2-lane (x2) board is in the 700MB/s to 1.5 GByte/s
90b708a9f9SMatthew Dillonrange.  4-lane (x4) boards typically range from 1.0 GBytes/s to 2.5 GBytes/s.
91b708a9f9SMatthew DillonFull-blown PCIe cards run the whole gamut, 2.5 GBytes/sec is fairly typical
92b708a9f9SMatthew Dillonbut performance can exceed 5 GBytes/sec in a high-end card.
93b708a9f9SMatthew Dillon.Pp
94b708a9f9SMatthew DillonMulti-threaded random-read performance can exceed 300,000 IOPS on an x4 board.
95b708a9f9SMatthew DillonSingle-threaded performance is usually in the 40,000 to 100,000 IOPS range.
96b708a9f9SMatthew DillonSequential submission/completion latencies are typically below 35uS while
97b708a9f9SMatthew Dillonrandom submission/completion latencies are typically below 110uS.
98b708a9f9SMatthew DillonPerformance (uncached) through a filesystem will be bottlenecked by additional
99b708a9f9SMatthew Dillonfactors, particularly if testing is only being done on a single file.
100b708a9f9SMatthew Dillon.Pp
101b708a9f9SMatthew DillonThe biggest differentiation between boards is usually write performance.
102b708a9f9SMatthew DillonSmall boards with only a few flash chips have relatively low write
103b708a9f9SMatthew Dillonperformance, usually in the 150MByte/sec range.  Higher-end boards will have
104b708a9f9SMatthew Dillonsignificantly better write performance, potentially exceeding 1.0 GByte/sec.
105b708a9f9SMatthew Dillon.Pp
106b708a9f9SMatthew DillonFor reference, the SATA-III physical interface is limited to 600 MBytes/sec
107b708a9f9SMatthew Dillonand the extra phy layer results in higher latencies, and AHCI controllers are
108b708a9f9SMatthew Dillonlimited to a single 32-entry queue.
109b708a9f9SMatthew Dillon.Sh FEATURES
110cebf490dSSascha WildnerThe
111cebf490dSSascha Wildner.Dx
112b708a9f9SMatthew Dillon.Nm
113b708a9f9SMatthew Dillondriver automatically selects the best SMP-friendly and
114b708a9f9SMatthew DillonI/O-typing queue configuration possible based on what the controller
115b708a9f9SMatthew Dillonsupports.
116b708a9f9SMatthew DillonIt uses a direct disk device API which bypasses CAM, so kernel code paths
117b708a9f9SMatthew Dillonto read and write blocks are SMP-friendly and, depending on the queue
118b708a9f9SMatthew Dillonconfiguration, potentially conflict-free.
119b708a9f9SMatthew DillonThe driver is capable of submitting commands and processing responses on
120b708a9f9SMatthew Dillonmultiple queues simultaniously in a SMP environment.
121b708a9f9SMatthew Dillon.Pp
122b708a9f9SMatthew DillonThe driver pre-reserves DMA memory for all necessary descriptors, queue
123b708a9f9SMatthew Dillonentries, and internal driver structures, and allows for a very generous
12470394f3fSMatthew Dillonnumber of queue entries (1024 x NQueues) for maximum performance.
12534885004SMatthew Dillon.Sh HINTS ON NVME CARDS
12634885004SMatthew DillonSo far I've only been able to test one Samsung NVME M.2 card and
12734885004SMatthew Dillonan Intel 750 HHHL (half-height / half-length) PCIe card.
12834885004SMatthew Dillon.Pp
12934885004SMatthew DillonMy recommendation is to go with Samsung.  The firmware is pretty good.
13034885004SMatthew DillonIt appears to be implemented reasonably well regardless of the queue
13134885004SMatthew Dillonconfiguration or I/O blocksize employed, giving expected scaling without
13234885004SMatthew Dillonany quirky behavior.
13334885004SMatthew Dillon.Pp
13434885004SMatthew DillonThe intel 750 has very poorly-implemented firmware.
13534885004SMatthew DillonFor example, the more queues the driver configures, the poorer
13634885004SMatthew Dillonthe single-threaded read performance is.  And no matter the queue
13734885004SMatthew Dillonconfiguration it appears that adding a second concurrent reader drops
13834885004SMatthew Dillonperformance drastically, then it slowly increases as you add more concurrent
13934885004SMatthew Dillonreaders.  In addition, on the 750, the firmware degrades horribly when
14034885004SMatthew Dillonreads use a blocksize of 64KB.  The best performance is at 32KB.  In fact,
14134885004SMatthew Dillonperformance again degrades horribly if you drop down to 16KB.
14234885004SMatthew DillonAnd if that weren't bad enough, the 750 takes over 13 seconds to become
14334885004SMatthew Dillonready after a machine power-up or reset.
14434885004SMatthew Dillon.Pp
14534885004SMatthew DillonThe grand result of all of this is that filesystem performance through an
14634885004SMatthew DillonIntel NVME card is going to be hit-or-miss, depending on inconseqential
14734885004SMatthew Dillondifferences in blocksize and queue configuration.
14834885004SMatthew DillonRegardless of whatever hacks Intel might be employing in their own drivers,
14934885004SMatthew Dillonthis is just totally unacceptable driver behavior.
15034885004SMatthew Dillon.Pp
15134885004SMatthew DillonI do not recommend rebranders like Plextor or Kingston.  For one thing,
15234885004SMatthew Dillonif you do buy these, be very careful to get one that is actually a NVME
15334885004SMatthew Dilloncard and not a M.2 card with an AHCI controller on it.  Plextor's performance
15434885004SMatthew Dillonis particularly bad.  Kingston seems to have done a better job and reading
15534885004SMatthew Dillonat 1.0GB/s+ is possible despite the cpu overhead of going through an AHCI
15634885004SMatthew Dilloncontroller (the flash in both cases is directly connected to the controller,
15734885004SMatthew Dillonso there is no SATA Phy to get in the way).  Of course, if you actually want
15834885004SMatthew Dillonan AHCI card, then these might be the way to go, and you might even be able
15934885004SMatthew Dillonto boot from them.
16034885004SMatthew Dillon.Sh HINTS ON CONFIGURING MACHINES (BIOS)
16134885004SMatthew DillonIf nvme locks up while trying to probe the BIOS did something horrible to
16234885004SMatthew Dillonthe PCIe card.  If you have enabled your BIOS's FastBoot option, turn it
16334885004SMatthew Dillonoff, this may fix the issue.
16434885004SMatthew Dillon.Pp
165*86d6ddc0SMatthew DillonNot all BIOSes can boot from a NVMe card.  Those that can typically require
166*86d6ddc0SMatthew Dillonbooting via EFI.
167b708a9f9SMatthew Dillon.Sh SEE ALSO
16871990c18SSascha Wildner.Xr ahci 4 ,
169b708a9f9SMatthew Dillon.Xr intro 4 ,
170b708a9f9SMatthew Dillon.Xr pci 4 ,
171278713ffSSascha Wildner.Xr loader.conf 5 ,
172278713ffSSascha Wildner.Xr nvmectl 8
173b708a9f9SMatthew Dillon.Sh HISTORY
174b708a9f9SMatthew DillonThe
175b708a9f9SMatthew Dillon.Nm
176b708a9f9SMatthew Dillondriver first appeared in
177b708a9f9SMatthew Dillon.Dx 4.5 .
178b708a9f9SMatthew Dillon.Sh AUTHORS
179b708a9f9SMatthew Dillon.An -nosplit
180b708a9f9SMatthew DillonThe
181b708a9f9SMatthew Dillon.Nm
182cebf490dSSascha Wildnerdriver for
183cebf490dSSascha Wildner.Dx
184cebf490dSSascha Wildnerwas written from scratch by
185b708a9f9SMatthew Dillon.An Matthew Dillon Aq Mt dillon@backplane.com
18670394f3fSMatthew Dillonbased on the NVM Express 1.2a specification.
187