xref: /spdk/doc/userspace.md (revision 1e1fd9ac219da3e52bc166c9d2bb2376c62c113d)
1495651d1SBen Walker# User Space Drivers {#userspace}
2495651d1SBen Walker
3*1e1fd9acSwawryk## Controlling Hardware From User Space {#userspace_control}
4495651d1SBen Walker
5495651d1SBen WalkerMuch of the documentation for SPDK talks about _user space drivers_, so it's
6495651d1SBen Walkerimportant to understand what that means at a technical level. First and
7495651d1SBen Walkerforemost, a _driver_ is software that directly controls a particular device
8495651d1SBen Walkerattached to a computer. Second, operating systems segregate the system's
9495651d1SBen Walkervirtual memory into two categories of addresses based on privilege level -
10495651d1SBen Walker[kernel space and user space](https://en.wikipedia.org/wiki/User_space). This
11495651d1SBen Walkerseparation is aided by features on the CPU itself that enforce memory
12495651d1SBen Walkerseparation called
13495651d1SBen Walker[protection rings](https://en.wikipedia.org/wiki/Protection_ring). Typically,
14495651d1SBen Walkerdrivers run in kernel space (i.e. ring 0 on x86). SPDK contains drivers that
15495651d1SBen Walkerinstead are designed to run in user space, but they still interface directly
16495651d1SBen Walkerwith the hardware device that they are controlling.
17495651d1SBen Walker
18495651d1SBen WalkerIn order for SPDK to take control of a device, it must first instruct the
19495651d1SBen Walkeroperating system to relinquish control. This is often referred to as unbinding
20495651d1SBen Walkerthe kernel driver from the device and on Linux is done by
21495651d1SBen Walker[writing to a file in sysfs](https://lwn.net/Articles/143397/).
22495651d1SBen WalkerSPDK then rebinds the driver to one of two special device drivers that come
23495651d1SBen Walkerbundled with Linux -
24495651d1SBen Walker[uio](https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html) or
25495651d1SBen Walker[vfio](https://www.kernel.org/doc/Documentation/vfio.txt). These two drivers
26495651d1SBen Walkerare "dummy" drivers in the sense that they mostly indicate to the operating
27495651d1SBen Walkersystem that the device has a driver bound to it so it won't automatically try
28495651d1SBen Walkerto re-bind the default driver. They don't actually initialize the hardware in
29495651d1SBen Walkerany way, nor do they even understand what type of device it is. The primary
30495651d1SBen Walkerdifference between uio and vfio is that vfio is capable of programming the
31495651d1SBen Walkerplatform's
32495651d1SBen Walker[IOMMU](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit),
33495651d1SBen Walkerwhich is a critical piece of hardware for ensuring memory safety in user space
34495651d1SBen Walkerdrivers. See @ref memory for full details.
35495651d1SBen Walker
36495651d1SBen WalkerOnce the device is unbound from the operating system kernel, the operating
37abfdb70dSBen Walkersystem can't use it anymore. For example, if you unbind an NVMe device on Linux,
38abfdb70dSBen Walkerthe devices corresponding to it such as /dev/nvme0n1 will disappear. It further
39abfdb70dSBen Walkermeans that filesystems mounted on the device will also be removed and kernel
40abfdb70dSBen Walkerfilesystems can no longer interact with the device. In fact, the entire kernel
41abfdb70dSBen Walkerblock storage stack is no longer involved. Instead, SPDK provides re-imagined
42abfdb70dSBen Walkerimplementations of most of the layers in a typical operating system storage
43abfdb70dSBen Walkerstack all as C libraries that can be directly embedded into your application.
44abfdb70dSBen WalkerThis includes a [block device abstraction layer](@ref bdev) primarily, but
45abfdb70dSBen Walkeralso [block allocators](@ref blob) and [filesystem-like components](@ref blobfs).
46495651d1SBen Walker
47495651d1SBen WalkerUser space drivers utilize features in uio or vfio to map the
48495651d1SBen Walker[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) for the device
49495651d1SBen Walkerinto the current process, which allows the driver to perform
50495651d1SBen Walker[MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O) directly. The SPDK @ref
51495651d1SBen Walkernvme, for instance, maps the BAR for the NVMe device and then follows along
52495651d1SBen Walkerwith the
53495651d1SBen Walker[NVMe Specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf)
54495651d1SBen Walkerto initialize the device, create queue pairs, and ultimately send I/O.
55495651d1SBen Walker
56*1e1fd9acSwawryk## Interrupts {#userspace_interrupts}
57495651d1SBen Walker
58495651d1SBen WalkerSPDK polls devices for completions instead of waiting for interrupts. There
59495651d1SBen Walkerare a number of reasons for doing this: 1) practically speaking, routing an
60495651d1SBen Walkerinterrupt to a handler in a user space process just isn't feasible for most
61495651d1SBen Walkerhardware designs, 2) interrupts introduce software jitter and have significant
62495651d1SBen Walkeroverhead due to forced context switches. Operations in SPDK are almost
63495651d1SBen Walkeruniversally asynchronous and allow the user to provide a callback on
64495651d1SBen Walkercompletion. The callback is called in response to the user calling a function
65495651d1SBen Walkerto poll for completions. Polling an NVMe device is fast because only host
66495651d1SBen Walkermemory needs to be read (no MMIO) to check a queue pair for a bit flip and
67495651d1SBen Walkertechnologies such as Intel's
68495651d1SBen Walker[DDIO](https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html)
69495651d1SBen Walkerwill ensure that the host memory being checked is present in the CPU cache
70495651d1SBen Walkerafter an update by the device.
71495651d1SBen Walker
72*1e1fd9acSwawryk## Threading {#userspace_threading}
73495651d1SBen Walker
74495651d1SBen WalkerNVMe devices expose multiple queues for submitting requests to the hardware.
75495651d1SBen WalkerSeparate queues can be accessed without coordination, so software can send
76495651d1SBen Walkerrequests to the device from multiple threads of execution in parallel without
77495651d1SBen Walkerlocks. Unfortunately, kernel drivers must be designed to handle I/O coming
78495651d1SBen Walkerfrom lots of different places either in the operating system or in various
79495651d1SBen Walkerprocesses on the system, and the thread topology of those processes changes
80495651d1SBen Walkerover time. Most kernel drivers elect to map hardware queues to cores (as close
81495651d1SBen Walkerto 1:1 as possible), and then when a request is submitted they look up the
82495651d1SBen Walkercorrect hardware queue for whatever core the current thread happens to be
83495651d1SBen Walkerrunning on. Often, they'll need to either acquire a lock around the queue or
84495651d1SBen Walkertemporarily disable interrupts to guard against preemption from threads
85495651d1SBen Walkerrunning on the same core, which can be expensive. This is a large improvement
86495651d1SBen Walkerfrom older hardware interfaces that only had a single queue or no queue at
87495651d1SBen Walkerall, but still isn't always optimal.
88495651d1SBen Walker
89495651d1SBen WalkerA user space driver, on the other hand, is embedded into a single application.
90495651d1SBen WalkerThis application knows exactly how many threads (or processes) exist
91495651d1SBen Walkerbecause the application created them. Therefore, the SPDK drivers choose to
92495651d1SBen Walkerexpose the hardware queues directly to the application with the requirement
93495651d1SBen Walkerthat a hardware queue is only ever accessed from one thread at a time. In
94495651d1SBen Walkerpractice, applications assign one hardware queue to each thread (as opposed to
95495651d1SBen Walkerone hardware queue per core in kernel drivers). This guarantees that the thread
96495651d1SBen Walkercan submit requests without having to perform any sort of coordination (i.e.
97495651d1SBen Walkerlocking) with the other threads in the system.
98