1b708a9f9SMatthew Dillon.\" Copyright (c) 2016 The DragonFly Project. All rights reserved. 2b708a9f9SMatthew Dillon.\" 3b708a9f9SMatthew Dillon.\" This code is derived from software contributed to The DragonFly Project 4b708a9f9SMatthew Dillon.\" by Matthew Dillon <dillon@backplane.com> 5b708a9f9SMatthew Dillon.\" 6b708a9f9SMatthew Dillon.\" Redistribution and use in source and binary forms, with or without 7b708a9f9SMatthew Dillon.\" modification, are permitted provided that the following conditions 8b708a9f9SMatthew Dillon.\" are met: 9b708a9f9SMatthew Dillon.\" 10b708a9f9SMatthew Dillon.\" 1. Redistributions of source code must retain the above copyright 11b708a9f9SMatthew Dillon.\" notice, this list of conditions and the following disclaimer. 12b708a9f9SMatthew Dillon.\" 2. Redistributions in binary form must reproduce the above copyright 13b708a9f9SMatthew Dillon.\" notice, this list of conditions and the following disclaimer in 14b708a9f9SMatthew Dillon.\" the documentation and/or other materials provided with the 15b708a9f9SMatthew Dillon.\" distribution. 16b708a9f9SMatthew Dillon.\" 3. Neither the name of The DragonFly Project nor the names of its 17b708a9f9SMatthew Dillon.\" contributors may be used to endorse or promote products derived 18b708a9f9SMatthew Dillon.\" from this software without specific, prior written permission. 19b708a9f9SMatthew Dillon.\" 20b708a9f9SMatthew Dillon.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 21b708a9f9SMatthew Dillon.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 22b708a9f9SMatthew Dillon.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 23b708a9f9SMatthew Dillon.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 24b708a9f9SMatthew Dillon.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 25b708a9f9SMatthew Dillon.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 26b708a9f9SMatthew Dillon.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 27b708a9f9SMatthew Dillon.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 28b708a9f9SMatthew Dillon.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 29b708a9f9SMatthew Dillon.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 30b708a9f9SMatthew Dillon.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31b708a9f9SMatthew Dillon.\" SUCH DAMAGE. 32b708a9f9SMatthew Dillon.\" 33b708a9f9SMatthew Dillon.Dd June 5, 2015 34b708a9f9SMatthew Dillon.Dt NVME 4 35b708a9f9SMatthew Dillon.Os 36b708a9f9SMatthew Dillon.Sh NAME 37b708a9f9SMatthew Dillon.Nm nvme 38b708a9f9SMatthew Dillon.Nd NVM Express Controller for PCIe-based SSDs 39b708a9f9SMatthew Dillon.Sh SYNOPSIS 40b708a9f9SMatthew DillonTo compile this driver into the kernel, 41b708a9f9SMatthew Dillonplace the following line in your 42b708a9f9SMatthew Dillonkernel configuration file: 43b708a9f9SMatthew Dillon.Bd -ragged -offset indent 44b708a9f9SMatthew Dillon.Cd "device nvme" 45b708a9f9SMatthew Dillon.Ed 46b708a9f9SMatthew Dillon.Pp 47b708a9f9SMatthew DillonAlternatively, to load the driver as a 48b708a9f9SMatthew Dillonmodule at boot time, place the following line in 49b708a9f9SMatthew Dillon.Xr loader.conf 5 : 50b708a9f9SMatthew Dillon.Bd -literal -offset indent 51b708a9f9SMatthew Dillonnvme_load="YES" 52b708a9f9SMatthew Dillon.Ed 53b708a9f9SMatthew Dillon.Sh DESCRIPTION 54b708a9f9SMatthew DillonThe 55b708a9f9SMatthew Dillon.Nm 56b708a9f9SMatthew Dillondriver provides support for PCIe storage controllers conforming to the 57b708a9f9SMatthew DillonNVM Express Controller Interface specification. 58b708a9f9SMatthew DillonNVMe controllers have a direct PCIe host interface to the controller 59b708a9f9SMatthew Dillonwhich in turn has a direct connection to the underlying non-volatile 60b708a9f9SMatthew Dillon(typically flash) storage, yielding a huge reduction in latency and 61b708a9f9SMatthew Dillonincrease in performance over 62b708a9f9SMatthew Dillon.Xr ahci 4 . 63b708a9f9SMatthew Dillon.Pp 64b708a9f9SMatthew DillonIn addition, NVMe controllers are capable of supporting up to 65535 65b708a9f9SMatthew Dillonindependent submission and completion queues each able to support upwards 66b708a9f9SMatthew Dillonof 16384 queue entries. Each queue may be assigned its own interrupt 67b708a9f9SMatthew Dillonvector out of the controller's pool (up to 2048). 68b708a9f9SMatthew Dillon.Pp 69b708a9f9SMatthew DillonActual controllers typically implement lower limits. While most controllers 70b708a9f9SMatthew Dillonallow a maximal number of queue entries, the total number of queues is often 71b708a9f9SMatthew Dillonlimited to far less than 65535. 8-32 queues are commonly supported. 72b708a9f9SMatthew DillonSimilarly, while up to 2048 MSI-X vectors can be supported by the spec, 73b708a9f9SMatthew Dillonactual controllers typically support fewer vectors. Still, having several 74b708a9f9SMatthew DillonMSI-X vectors allows interrupts to be distributed to multiple CPUs, 75b708a9f9SMatthew Dillonreducing bottlenecks and improving performance. The multiple queues can 76b708a9f9SMatthew Dillonbe divvied up across available cpu cores by the driver, as well as split-up 77b708a9f9SMatthew Dillonbased on the type of I/O operation being performed (such as giving read 78b708a9f9SMatthew Dillonand write I/O commands their own queues). This also significantly 79b708a9f9SMatthew Dillonreduces bottlenecks and improves performance, particularly in mixed 80b708a9f9SMatthew Dillonread-write environments. 81b708a9f9SMatthew Dillon.Sh FORM FACTOR 82b708a9f9SMatthew DillonNVMe boards usually come in one of two flavors, either a tiny form-factor 83b708a9f9SMatthew Dillonwith a M.2 or NGFF connector, supplying 2 or 4 PCIe lanes, or in a larger 84b708a9f9SMatthew Dillonform that slips into a normal PCIe slot. The larger form typically 85b708a9f9SMatthew Dillonimplements 2, 4, or 8 lanes. Also note that adapter cards that fit 86b708a9f9SMatthew Dilloninto normal PCIe slots and can mount the smaller M.2/NGFF NVME cards can 87b708a9f9SMatthew Dillonbe cheaply purchased. 88b708a9f9SMatthew Dillon.Sh PERFORMANCE 89b708a9f9SMatthew DillonTypical performance for a 2-lane (x2) board is in the 700MB/s to 1.5 GByte/s 90b708a9f9SMatthew Dillonrange. 4-lane (x4) boards typically range from 1.0 GBytes/s to 2.5 GBytes/s. 91b708a9f9SMatthew DillonFull-blown PCIe cards run the whole gamut, 2.5 GBytes/sec is fairly typical 92b708a9f9SMatthew Dillonbut performance can exceed 5 GBytes/sec in a high-end card. 93b708a9f9SMatthew Dillon.Pp 94b708a9f9SMatthew DillonMulti-threaded random-read performance can exceed 300,000 IOPS on an x4 board. 95b708a9f9SMatthew DillonSingle-threaded performance is usually in the 40,000 to 100,000 IOPS range. 96b708a9f9SMatthew DillonSequential submission/completion latencies are typically below 35uS while 97b708a9f9SMatthew Dillonrandom submission/completion latencies are typically below 110uS. 98b708a9f9SMatthew DillonPerformance (uncached) through a filesystem will be bottlenecked by additional 99b708a9f9SMatthew Dillonfactors, particularly if testing is only being done on a single file. 100b708a9f9SMatthew Dillon.Pp 101b708a9f9SMatthew DillonThe biggest differentiation between boards is usually write performance. 102b708a9f9SMatthew DillonSmall boards with only a few flash chips have relatively low write 103b708a9f9SMatthew Dillonperformance, usually in the 150MByte/sec range. Higher-end boards will have 104b708a9f9SMatthew Dillonsignificantly better write performance, potentially exceeding 1.0 GByte/sec. 105b708a9f9SMatthew Dillon.Pp 106b708a9f9SMatthew DillonFor reference, the SATA-III physical interface is limited to 600 MBytes/sec 107b708a9f9SMatthew Dillonand the extra phy layer results in higher latencies, and AHCI controllers are 108b708a9f9SMatthew Dillonlimited to a single 32-entry queue. 109b708a9f9SMatthew Dillon.Sh FEATURES 110cebf490dSSascha WildnerThe 111cebf490dSSascha Wildner.Dx 112b708a9f9SMatthew Dillon.Nm 113b708a9f9SMatthew Dillondriver automatically selects the best SMP-friendly and 114b708a9f9SMatthew DillonI/O-typing queue configuration possible based on what the controller 115b708a9f9SMatthew Dillonsupports. 116b708a9f9SMatthew DillonIt uses a direct disk device API which bypasses CAM, so kernel code paths 117b708a9f9SMatthew Dillonto read and write blocks are SMP-friendly and, depending on the queue 118b708a9f9SMatthew Dillonconfiguration, potentially conflict-free. 119b708a9f9SMatthew DillonThe driver is capable of submitting commands and processing responses on 120b708a9f9SMatthew Dillonmultiple queues simultaniously in a SMP environment. 121b708a9f9SMatthew Dillon.Pp 122b708a9f9SMatthew DillonThe driver pre-reserves DMA memory for all necessary descriptors, queue 123b708a9f9SMatthew Dillonentries, and internal driver structures, and allows for a very generous 12470394f3fSMatthew Dillonnumber of queue entries (1024 x NQueues) for maximum performance. 12534885004SMatthew Dillon.Sh HINTS ON NVME CARDS 12634885004SMatthew DillonSo far I've only been able to test one Samsung NVME M.2 card and 12734885004SMatthew Dillonan Intel 750 HHHL (half-height / half-length) PCIe card. 12834885004SMatthew Dillon.Pp 12934885004SMatthew DillonMy recommendation is to go with Samsung. The firmware is pretty good. 13034885004SMatthew DillonIt appears to be implemented reasonably well regardless of the queue 13134885004SMatthew Dillonconfiguration or I/O blocksize employed, giving expected scaling without 13234885004SMatthew Dillonany quirky behavior. 13334885004SMatthew Dillon.Pp 13434885004SMatthew DillonThe intel 750 has very poorly-implemented firmware. 13534885004SMatthew DillonFor example, the more queues the driver configures, the poorer 13634885004SMatthew Dillonthe single-threaded read performance is. And no matter the queue 13734885004SMatthew Dillonconfiguration it appears that adding a second concurrent reader drops 13834885004SMatthew Dillonperformance drastically, then it slowly increases as you add more concurrent 13934885004SMatthew Dillonreaders. In addition, on the 750, the firmware degrades horribly when 14034885004SMatthew Dillonreads use a blocksize of 64KB. The best performance is at 32KB. In fact, 14134885004SMatthew Dillonperformance again degrades horribly if you drop down to 16KB. 14234885004SMatthew DillonAnd if that weren't bad enough, the 750 takes over 13 seconds to become 14334885004SMatthew Dillonready after a machine power-up or reset. 14434885004SMatthew Dillon.Pp 14534885004SMatthew DillonThe grand result of all of this is that filesystem performance through an 14634885004SMatthew DillonIntel NVME card is going to be hit-or-miss, depending on inconseqential 14734885004SMatthew Dillondifferences in blocksize and queue configuration. 14834885004SMatthew DillonRegardless of whatever hacks Intel might be employing in their own drivers, 14934885004SMatthew Dillonthis is just totally unacceptable driver behavior. 15034885004SMatthew Dillon.Pp 15134885004SMatthew DillonI do not recommend rebranders like Plextor or Kingston. For one thing, 15234885004SMatthew Dillonif you do buy these, be very careful to get one that is actually a NVME 15334885004SMatthew Dilloncard and not a M.2 card with an AHCI controller on it. Plextor's performance 15434885004SMatthew Dillonis particularly bad. Kingston seems to have done a better job and reading 15534885004SMatthew Dillonat 1.0GB/s+ is possible despite the cpu overhead of going through an AHCI 15634885004SMatthew Dilloncontroller (the flash in both cases is directly connected to the controller, 15734885004SMatthew Dillonso there is no SATA Phy to get in the way). Of course, if you actually want 15834885004SMatthew Dillonan AHCI card, then these might be the way to go, and you might even be able 15934885004SMatthew Dillonto boot from them. 16034885004SMatthew Dillon.Sh HINTS ON CONFIGURING MACHINES (BIOS) 16134885004SMatthew DillonIf nvme locks up while trying to probe the BIOS did something horrible to 16234885004SMatthew Dillonthe PCIe card. If you have enabled your BIOS's FastBoot option, turn it 16334885004SMatthew Dillonoff, this may fix the issue. 16434885004SMatthew Dillon.Pp 165*86d6ddc0SMatthew DillonNot all BIOSes can boot from a NVMe card. Those that can typically require 166*86d6ddc0SMatthew Dillonbooting via EFI. 167b708a9f9SMatthew Dillon.Sh SEE ALSO 16871990c18SSascha Wildner.Xr ahci 4 , 169b708a9f9SMatthew Dillon.Xr intro 4 , 170b708a9f9SMatthew Dillon.Xr pci 4 , 171278713ffSSascha Wildner.Xr loader.conf 5 , 172278713ffSSascha Wildner.Xr nvmectl 8 173b708a9f9SMatthew Dillon.Sh HISTORY 174b708a9f9SMatthew DillonThe 175b708a9f9SMatthew Dillon.Nm 176b708a9f9SMatthew Dillondriver first appeared in 177b708a9f9SMatthew Dillon.Dx 4.5 . 178b708a9f9SMatthew Dillon.Sh AUTHORS 179b708a9f9SMatthew Dillon.An -nosplit 180b708a9f9SMatthew DillonThe 181b708a9f9SMatthew Dillon.Nm 182cebf490dSSascha Wildnerdriver for 183cebf490dSSascha Wildner.Dx 184cebf490dSSascha Wildnerwas written from scratch by 185b708a9f9SMatthew Dillon.An Matthew Dillon Aq Mt dillon@backplane.com 18670394f3fSMatthew Dillonbased on the NVM Express 1.2a specification. 187