1# User Space Drivers {#userspace} 2 3## Controlling Hardware From User Space {#userspace_control} 4 5Much of the documentation for SPDK talks about _user space drivers_, so it's 6important to understand what that means at a technical level. First and 7foremost, a _driver_ is software that directly controls a particular device 8attached to a computer. Second, operating systems segregate the system's 9virtual memory into two categories of addresses based on privilege level - 10[kernel space and user space](https://en.wikipedia.org/wiki/User_space). This 11separation is aided by features on the CPU itself that enforce memory 12separation called 13[protection rings](https://en.wikipedia.org/wiki/Protection_ring). Typically, 14drivers run in kernel space (i.e. ring 0 on x86). SPDK contains drivers that 15instead are designed to run in user space, but they still interface directly 16with the hardware device that they are controlling. 17 18In order for SPDK to take control of a device, it must first instruct the 19operating system to relinquish control. This is often referred to as unbinding 20the kernel driver from the device and on Linux is done by 21[writing to a file in sysfs](https://lwn.net/Articles/143397/). 22SPDK then rebinds the driver to one of two special device drivers that come 23bundled with Linux - 24[uio](https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html) or 25[vfio](https://www.kernel.org/doc/Documentation/vfio.txt). These two drivers 26are "dummy" drivers in the sense that they mostly indicate to the operating 27system that the device has a driver bound to it so it won't automatically try 28to re-bind the default driver. They don't actually initialize the hardware in 29any way, nor do they even understand what type of device it is. The primary 30difference between uio and vfio is that vfio is capable of programming the 31platform's 32[IOMMU](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit), 33which is a critical piece of hardware for ensuring memory safety in user space 34drivers. See @ref memory for full details. 35 36Once the device is unbound from the operating system kernel, the operating 37system can't use it anymore. For example, if you unbind an NVMe device on Linux, 38the devices corresponding to it such as /dev/nvme0n1 will disappear. It further 39means that filesystems mounted on the device will also be removed and kernel 40filesystems can no longer interact with the device. In fact, the entire kernel 41block storage stack is no longer involved. Instead, SPDK provides re-imagined 42implementations of most of the layers in a typical operating system storage 43stack all as C libraries that can be directly embedded into your application. 44This includes a [block device abstraction layer](@ref bdev) primarily, but 45also [block allocators](@ref blob) and [filesystem-like components](@ref blobfs). 46 47User space drivers utilize features in uio or vfio to map the 48[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) for the device 49into the current process, which allows the driver to perform 50[MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O) directly. The SPDK @ref 51nvme, for instance, maps the BAR for the NVMe device and then follows along 52with the 53[NVMe Specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf) 54to initialize the device, create queue pairs, and ultimately send I/O. 55 56## Interrupts {#userspace_interrupts} 57 58SPDK polls devices for completions instead of waiting for interrupts. There 59are a number of reasons for doing this: 1) practically speaking, routing an 60interrupt to a handler in a user space process just isn't feasible for most 61hardware designs, 2) interrupts introduce software jitter and have significant 62overhead due to forced context switches. Operations in SPDK are almost 63universally asynchronous and allow the user to provide a callback on 64completion. The callback is called in response to the user calling a function 65to poll for completions. Polling an NVMe device is fast because only host 66memory needs to be read (no MMIO) to check a queue pair for a bit flip and 67technologies such as Intel's 68[DDIO](https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html) 69will ensure that the host memory being checked is present in the CPU cache 70after an update by the device. 71 72## Threading {#userspace_threading} 73 74NVMe devices expose multiple queues for submitting requests to the hardware. 75Separate queues can be accessed without coordination, so software can send 76requests to the device from multiple threads of execution in parallel without 77locks. Unfortunately, kernel drivers must be designed to handle I/O coming 78from lots of different places either in the operating system or in various 79processes on the system, and the thread topology of those processes changes 80over time. Most kernel drivers elect to map hardware queues to cores (as close 81to 1:1 as possible), and then when a request is submitted they look up the 82correct hardware queue for whatever core the current thread happens to be 83running on. Often, they'll need to either acquire a lock around the queue or 84temporarily disable interrupts to guard against preemption from threads 85running on the same core, which can be expensive. This is a large improvement 86from older hardware interfaces that only had a single queue or no queue at 87all, but still isn't always optimal. 88 89A user space driver, on the other hand, is embedded into a single application. 90This application knows exactly how many threads (or processes) exist 91because the application created them. Therefore, the SPDK drivers choose to 92expose the hardware queues directly to the application with the requirement 93that a hardware queue is only ever accessed from one thread at a time. In 94practice, applications assign one hardware queue to each thread (as opposed to 95one hardware queue per core in kernel drivers). This guarantees that the thread 96can submit requests without having to perform any sort of coordination (i.e. 97locking) with the other threads in the system. 98