1495651d1SBen Walker# User Space Drivers {#userspace} 2495651d1SBen Walker 3*1e1fd9acSwawryk## Controlling Hardware From User Space {#userspace_control} 4495651d1SBen Walker 5495651d1SBen WalkerMuch of the documentation for SPDK talks about _user space drivers_, so it's 6495651d1SBen Walkerimportant to understand what that means at a technical level. First and 7495651d1SBen Walkerforemost, a _driver_ is software that directly controls a particular device 8495651d1SBen Walkerattached to a computer. Second, operating systems segregate the system's 9495651d1SBen Walkervirtual memory into two categories of addresses based on privilege level - 10495651d1SBen Walker[kernel space and user space](https://en.wikipedia.org/wiki/User_space). This 11495651d1SBen Walkerseparation is aided by features on the CPU itself that enforce memory 12495651d1SBen Walkerseparation called 13495651d1SBen Walker[protection rings](https://en.wikipedia.org/wiki/Protection_ring). Typically, 14495651d1SBen Walkerdrivers run in kernel space (i.e. ring 0 on x86). SPDK contains drivers that 15495651d1SBen Walkerinstead are designed to run in user space, but they still interface directly 16495651d1SBen Walkerwith the hardware device that they are controlling. 17495651d1SBen Walker 18495651d1SBen WalkerIn order for SPDK to take control of a device, it must first instruct the 19495651d1SBen Walkeroperating system to relinquish control. This is often referred to as unbinding 20495651d1SBen Walkerthe kernel driver from the device and on Linux is done by 21495651d1SBen Walker[writing to a file in sysfs](https://lwn.net/Articles/143397/). 22495651d1SBen WalkerSPDK then rebinds the driver to one of two special device drivers that come 23495651d1SBen Walkerbundled with Linux - 24495651d1SBen Walker[uio](https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html) or 25495651d1SBen Walker[vfio](https://www.kernel.org/doc/Documentation/vfio.txt). These two drivers 26495651d1SBen Walkerare "dummy" drivers in the sense that they mostly indicate to the operating 27495651d1SBen Walkersystem that the device has a driver bound to it so it won't automatically try 28495651d1SBen Walkerto re-bind the default driver. They don't actually initialize the hardware in 29495651d1SBen Walkerany way, nor do they even understand what type of device it is. The primary 30495651d1SBen Walkerdifference between uio and vfio is that vfio is capable of programming the 31495651d1SBen Walkerplatform's 32495651d1SBen Walker[IOMMU](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit), 33495651d1SBen Walkerwhich is a critical piece of hardware for ensuring memory safety in user space 34495651d1SBen Walkerdrivers. See @ref memory for full details. 35495651d1SBen Walker 36495651d1SBen WalkerOnce the device is unbound from the operating system kernel, the operating 37abfdb70dSBen Walkersystem can't use it anymore. For example, if you unbind an NVMe device on Linux, 38abfdb70dSBen Walkerthe devices corresponding to it such as /dev/nvme0n1 will disappear. It further 39abfdb70dSBen Walkermeans that filesystems mounted on the device will also be removed and kernel 40abfdb70dSBen Walkerfilesystems can no longer interact with the device. In fact, the entire kernel 41abfdb70dSBen Walkerblock storage stack is no longer involved. Instead, SPDK provides re-imagined 42abfdb70dSBen Walkerimplementations of most of the layers in a typical operating system storage 43abfdb70dSBen Walkerstack all as C libraries that can be directly embedded into your application. 44abfdb70dSBen WalkerThis includes a [block device abstraction layer](@ref bdev) primarily, but 45abfdb70dSBen Walkeralso [block allocators](@ref blob) and [filesystem-like components](@ref blobfs). 46495651d1SBen Walker 47495651d1SBen WalkerUser space drivers utilize features in uio or vfio to map the 48495651d1SBen Walker[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) for the device 49495651d1SBen Walkerinto the current process, which allows the driver to perform 50495651d1SBen Walker[MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O) directly. The SPDK @ref 51495651d1SBen Walkernvme, for instance, maps the BAR for the NVMe device and then follows along 52495651d1SBen Walkerwith the 53495651d1SBen Walker[NVMe Specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf) 54495651d1SBen Walkerto initialize the device, create queue pairs, and ultimately send I/O. 55495651d1SBen Walker 56*1e1fd9acSwawryk## Interrupts {#userspace_interrupts} 57495651d1SBen Walker 58495651d1SBen WalkerSPDK polls devices for completions instead of waiting for interrupts. There 59495651d1SBen Walkerare a number of reasons for doing this: 1) practically speaking, routing an 60495651d1SBen Walkerinterrupt to a handler in a user space process just isn't feasible for most 61495651d1SBen Walkerhardware designs, 2) interrupts introduce software jitter and have significant 62495651d1SBen Walkeroverhead due to forced context switches. Operations in SPDK are almost 63495651d1SBen Walkeruniversally asynchronous and allow the user to provide a callback on 64495651d1SBen Walkercompletion. The callback is called in response to the user calling a function 65495651d1SBen Walkerto poll for completions. Polling an NVMe device is fast because only host 66495651d1SBen Walkermemory needs to be read (no MMIO) to check a queue pair for a bit flip and 67495651d1SBen Walkertechnologies such as Intel's 68495651d1SBen Walker[DDIO](https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html) 69495651d1SBen Walkerwill ensure that the host memory being checked is present in the CPU cache 70495651d1SBen Walkerafter an update by the device. 71495651d1SBen Walker 72*1e1fd9acSwawryk## Threading {#userspace_threading} 73495651d1SBen Walker 74495651d1SBen WalkerNVMe devices expose multiple queues for submitting requests to the hardware. 75495651d1SBen WalkerSeparate queues can be accessed without coordination, so software can send 76495651d1SBen Walkerrequests to the device from multiple threads of execution in parallel without 77495651d1SBen Walkerlocks. Unfortunately, kernel drivers must be designed to handle I/O coming 78495651d1SBen Walkerfrom lots of different places either in the operating system or in various 79495651d1SBen Walkerprocesses on the system, and the thread topology of those processes changes 80495651d1SBen Walkerover time. Most kernel drivers elect to map hardware queues to cores (as close 81495651d1SBen Walkerto 1:1 as possible), and then when a request is submitted they look up the 82495651d1SBen Walkercorrect hardware queue for whatever core the current thread happens to be 83495651d1SBen Walkerrunning on. Often, they'll need to either acquire a lock around the queue or 84495651d1SBen Walkertemporarily disable interrupts to guard against preemption from threads 85495651d1SBen Walkerrunning on the same core, which can be expensive. This is a large improvement 86495651d1SBen Walkerfrom older hardware interfaces that only had a single queue or no queue at 87495651d1SBen Walkerall, but still isn't always optimal. 88495651d1SBen Walker 89495651d1SBen WalkerA user space driver, on the other hand, is embedded into a single application. 90495651d1SBen WalkerThis application knows exactly how many threads (or processes) exist 91495651d1SBen Walkerbecause the application created them. Therefore, the SPDK drivers choose to 92495651d1SBen Walkerexpose the hardware queues directly to the application with the requirement 93495651d1SBen Walkerthat a hardware queue is only ever accessed from one thread at a time. In 94495651d1SBen Walkerpractice, applications assign one hardware queue to each thread (as opposed to 95495651d1SBen Walkerone hardware queue per core in kernel drivers). This guarantees that the thread 96495651d1SBen Walkercan submit requests without having to perform any sort of coordination (i.e. 97495651d1SBen Walkerlocking) with the other threads in the system. 98