xref: /spdk/doc/ublk.md (revision 34edd9f1bf5fda4c987f4500ddc3c9f50be32e7d)
1# ublk Target {#ublk}
2
3## Table of Contents {#ublk_toc}
4
5- @ref ublk_intro
6- @ref ublk_internal
7- @ref ublk_impl
8- @ref ublk_op
9
10## Introduction {#ublk_intro}
11
12[ublk](https://docs.kernel.org/block/ublk.html) (or ubd) is a generic framework for
13implementing generic userspace block device based on `io_uring`.  It is designed to
14create a highly efficient data path for userspace storage software to provide
15high-performance block device service in local host.
16
17The whole ublk service involves three parts: ublk driver, ublk server and ublk workload.
18
19![ublk service stack](img/ublk_service.svg)
20
21* __ublk driver__ is a kernel driver added to kernel 6.0.  It delivers I/O requests
22  from a ublk block device(`/dev/ublkbN`) into a ublk server.
23
24* __ublk workload__ can be any local host process which submits I/O requests to a ublk
25  block device or a kernel filesystem on top of the ublk block device.
26
27* __ublk server__ is the userspace storage software that fetches the I/O requests delivered
28  by the ublk driver.  The ublk server will process the I/O requests with its specific block
29  service logic and connected backends.  Once the ublk server gets the response from the
30  connected backends, it communicates with the ublk driver and completes the I/O requests.
31
32SPDK ublk target acts as a ublk server.  It can handle ublk I/O requests within the whole
33SPDK userspace storage software stack.
34
35A typical usage scenario is for container attached storage:
36
37* Real storage resources are assigned to SPDK, like physical NVMe devices and
38  distributed block storage.
39* SPDK creates refined block devices via ublk kernel module on top of its organized
40  storage resources, based on user configuration.
41* Container orchestrator and runtime can then mount and stage the ublk block devices
42  for container instances to use.
43
44## ublk Internal {#ublk_internal}
45
46Previously, the design of putting I/O processing logic into userspace software always has a
47noticeable interaction overhead between the kernel module and userspace part.
48
49ublk utilizes `io_uring` which has been proven to be very efficient in decreasing the
50interaction overhead.  The I/O request is delivered to the userspace ublk server via the
51newly added `io_uring` command.  A shared buffer via `mmap` is used for sharing I/O descriptor
52to userspace from the kernel driver.  The I/O data is copied only once between the specified
53userspace buffer address and request/bio's pages by the ublk driver.
54
55### Control Plane
56
57A control device is create by ublk kernel module at `/dev/ublk-control`.  Userspace server
58sends control commands to kernel module via the control device using `io_uring`.
59
60Control commands includes add, configure, and start new ublk block device.
61Retrieving device information, stop and delete existing ublk block device are also there.
62
63The add device command creates a bulk char device `/dev/ublkcN`.
64It will be used by the ublk userspace server to `mmap` I/O descriptor buffer.
65The start device command exposes a ublk block device `/dev/ublkbN`.
66The block device can be formatted and mounted by a kernel filesystem,
67or read/written directly by other processes.
68
69### Data Plane
70
71The datapath between ublk server and kernel driver includes `io_uring` and shared
72memory buffer.  The shared memory buffer is an array of I/O descriptors.
73Each SQE (Submission Queue Entry) in `io_uring` is assigned one I/O descriptor and
74one user buffer address.  When ublk kernel driver receives I/O requests from upper
75layer, the information of I/O requests will be filled into I/O descriptors by ublk
76kernel driver.  The I/O data is copied between the specified user buffer address and
77request/bio's pages at the proper time.
78
79At start, the ublk server needs to fill the `io_uring` SQ (Submission Queue).  Each
80SQE is marked with an operation flag `UBLK_IO_FETCH_REQ` which means the SQE is
81ready to get I/O request.
82
83When a CQE (Completion Queue Entry) is returned from the `io_uring` indicating I/O
84request, the ublk server gets the position of the I/O descriptor from CQE.
85The ublk server handles the I/O request based on information in the I/O descriptor.
86
87After the ublk server completes the I/O request, it updates the I/O's completion status
88and ublk operation flag.  This time, the operation flag is `UBLK_IO_COMMIT_AND_FETCH_REQ`
89which informs kernel module that one I/O request is completed, and also the SQE slot
90is free to fetch new I/O request.
91
92`UBLK_IO_COMMIT_AND_FETCH_REQ` is designed for efficiency in ublk. In runtime, the ublk
93server needs to commit I/O results back, and then provide new free SQE slots for fetching
94new I/O requests.  Without `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, `io_uring_submit()` should
95be called twice,  once for committing I/O results back, once for providing free SQE slots.
96With `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, calling `io_uring_submit()` once is enough because
97the ublk driver realizes that the submitted SQEs are reused both for committing back I/O
98results and fetching new requests.
99
100## SPDK Implementation {#ublk_impl}
101
102SPDK ublk target is implemented as a high performance ublk server.
103
104It creates one ublk spdk_thread on each spdk_reactor by default or on user specified
105reactors.  When adding a new ublk block device, SPDK ublk target will assign queues
106of ublk block device to ublk spdk_threads in round-robin.
107That means one ublk device queue will only be processed by one spdk_thread.
108One ublk device with multiple queues can get multiple spdk reactors involved
109to process its I/O requests;
110One spdk_thread created by ublk target may process multiple queues, each from
111different ublk devices.
112In this way, spdk reactors can be fully utilized to achieve best performance,
113when there are only a few ublk devices.
114
115ublk is `io_uring` based. All ublk I/O queues are mapped to `io_uring`.
116ublk spdk_thread gets I/O requests from available CQEs by polling all its assigned
117`io_uring`s.
118When there are completed I/O requests, ublk spdk_thread will submit them as SQE back
119to `io_uring` in batch.
120
121Currently, ublk driver has a system thread context limitation that one ublk device queue
122can be only processed in the context of system thread which initialized the it.  SPDK
123can't schedule ublk spdk_thread between different SPDK reactors.  In other words, SPDK
124dynamic scheduler can't rebalance ublk workload by rescheduling ublk spdk_thread.
125
126## Operation {#ublk_op}
127
128### Enabling SPDK ublk target
129
130Build SPDK with SPDK ublk target enabled.
131
132~~~{.sh}
133./configure --with-ublk
134make -j
135~~~
136
137SPDK ublk target related libraries will then be linked into SPDK application `spdk_tgt`.
138Setup some hugepages for the SPDK, and then run the SPDK application `spdk_tgt`.
139
140~~~{.sh}
141scripts/setup.sh
142build/bin/spdk_tgt &
143~~~
144
145Once the `spdk_tgt` is initialized, user can enable SPDK ublk feature
146by creating ublk target. However, before creating ublk target, ublk kernel module
147`ublk_drv` should be loaded using `modprobe`.
148
149~~~{.sh}
150modprobe ublk_drv
151scripts/rpc.py ublk_create_target
152~~~
153
154### Creating ublk block device
155
156SPDK bdevs are block devices which will be exposed to the local host kernel
157as ublk block devices.  SPDK supports several different types of storage backends,
158including NVMe, Linux AIO, malloc ramdisk and Ceph RBD.  Refer to @ref bdev for
159additional information on configuring SPDK storage backends.
160
161This guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC
162will create a 256MB malloc bdev with 512-byte block size.
163
164~~~{.sh}
165scripts/rpc.py bdev_malloc_create 256 512 -b Malloc0
166~~~
167
168The following RPC will create a ublk block device exposing Malloc0 bdev.
169The created ublk block device has ID 1.  It internally has 2 queues with
170queue depth 128.
171
172~~~{.sh}
173scripts/rpc.py ublk_start_disk Malloc0 1 -q 2 -d 128
174~~~
175
176This RPC will reply back the ID of ublk block device.
177~~~
1781
179~~~
180
181The position of ublk block device is determined by its ID. It is created at `/dev/ublkb${ID}`.
182So the device we just created will be accessible to other processes via `/dev/ublkb1`.
183Now applications like FIO or DD can work on `/dev/ublkb1` directly.
184
185~~~{.sh}
186dd of=/dev/ublkb1 if=/dev/zero bs=512 count=64
187~~~
188
189A ublk block device is a generic kernel block device that can be formatted and
190mounted by kernel file system.
191
192~~~{.sh}
193mkfs /dev/ublkb1
194mount /dev/ublkb1 /mnt/
195mkdir /mnt/testdir
196echo "Hello,SPDK ublk Target" > /mnt/testdir/testfile
197umount /mnt
198~~~
199
200### Deleting ublk block device and exit
201
202After usage, ublk block device can be stopped and deleted by RPC `ublk_stop_disk` with its ID.
203Specify ID 1, then device `/dev/ublkb1` will be removed.
204
205~~~{.sh}
206scripts/rpc.py ublk_stop_disk 1
207~~~
208
209If ublk is not used anymore, SPDK ublk target can be destroyed to free related SPDK
210resources.
211
212~~~{.sh}
213scripts/rpc.py ublk_destroy_target
214~~~
215
216Of course, SPDK ublk target and all ublk block devices would be destroyed automatically
217when SPDK application is terminated.
218