xref: /spdk/doc/ublk.md (revision 34edd9f1bf5fda4c987f4500ddc3c9f50be32e7d)
1651c558dSLiu Xiaodong# ublk Target {#ublk}
2651c558dSLiu Xiaodong
3651c558dSLiu Xiaodong## Table of Contents {#ublk_toc}
4651c558dSLiu Xiaodong
5651c558dSLiu Xiaodong- @ref ublk_intro
6651c558dSLiu Xiaodong- @ref ublk_internal
7651c558dSLiu Xiaodong- @ref ublk_impl
8651c558dSLiu Xiaodong- @ref ublk_op
9651c558dSLiu Xiaodong
10651c558dSLiu Xiaodong## Introduction {#ublk_intro}
11651c558dSLiu Xiaodong
12651c558dSLiu Xiaodong[ublk](https://docs.kernel.org/block/ublk.html) (or ubd) is a generic framework for
13651c558dSLiu Xiaodongimplementing generic userspace block device based on `io_uring`.  It is designed to
14651c558dSLiu Xiaodongcreate a highly efficient data path for userspace storage software to provide
15651c558dSLiu Xiaodonghigh-performance block device service in local host.
16651c558dSLiu Xiaodong
17651c558dSLiu XiaodongThe whole ublk service involves three parts: ublk driver, ublk server and ublk workload.
18651c558dSLiu Xiaodong
19651c558dSLiu Xiaodong![ublk service stack](img/ublk_service.svg)
20651c558dSLiu Xiaodong
21651c558dSLiu Xiaodong* __ublk driver__ is a kernel driver added to kernel 6.0.  It delivers I/O requests
22651c558dSLiu Xiaodong  from a ublk block device(`/dev/ublkbN`) into a ublk server.
23651c558dSLiu Xiaodong
24651c558dSLiu Xiaodong* __ublk workload__ can be any local host process which submits I/O requests to a ublk
25651c558dSLiu Xiaodong  block device or a kernel filesystem on top of the ublk block device.
26651c558dSLiu Xiaodong
27651c558dSLiu Xiaodong* __ublk server__ is the userspace storage software that fetches the I/O requests delivered
28651c558dSLiu Xiaodong  by the ublk driver.  The ublk server will process the I/O requests with its specific block
29651c558dSLiu Xiaodong  service logic and connected backends.  Once the ublk server gets the response from the
30651c558dSLiu Xiaodong  connected backends, it communicates with the ublk driver and completes the I/O requests.
31651c558dSLiu Xiaodong
32651c558dSLiu XiaodongSPDK ublk target acts as a ublk server.  It can handle ublk I/O requests within the whole
33651c558dSLiu XiaodongSPDK userspace storage software stack.
34651c558dSLiu Xiaodong
35651c558dSLiu XiaodongA typical usage scenario is for container attached storage:
36651c558dSLiu Xiaodong
37651c558dSLiu Xiaodong* Real storage resources are assigned to SPDK, like physical NVMe devices and
38651c558dSLiu Xiaodong  distributed block storage.
39651c558dSLiu Xiaodong* SPDK creates refined block devices via ublk kernel module on top of its organized
40651c558dSLiu Xiaodong  storage resources, based on user configuration.
41651c558dSLiu Xiaodong* Container orchestrator and runtime can then mount and stage the ublk block devices
42651c558dSLiu Xiaodong  for container instances to use.
43651c558dSLiu Xiaodong
44651c558dSLiu Xiaodong## ublk Internal {#ublk_internal}
45651c558dSLiu Xiaodong
46651c558dSLiu XiaodongPreviously, the design of putting I/O processing logic into userspace software always has a
47651c558dSLiu Xiaodongnoticeable interaction overhead between the kernel module and userspace part.
48651c558dSLiu Xiaodong
49651c558dSLiu Xiaodongublk utilizes `io_uring` which has been proven to be very efficient in decreasing the
50651c558dSLiu Xiaodonginteraction overhead.  The I/O request is delivered to the userspace ublk server via the
51651c558dSLiu Xiaodongnewly added `io_uring` command.  A shared buffer via `mmap` is used for sharing I/O descriptor
52651c558dSLiu Xiaodongto userspace from the kernel driver.  The I/O data is copied only once between the specified
53651c558dSLiu Xiaodonguserspace buffer address and request/bio's pages by the ublk driver.
54651c558dSLiu Xiaodong
55651c558dSLiu Xiaodong### Control Plane
56651c558dSLiu Xiaodong
57651c558dSLiu XiaodongA control device is create by ublk kernel module at `/dev/ublk-control`.  Userspace server
58651c558dSLiu Xiaodongsends control commands to kernel module via the control device using `io_uring`.
59651c558dSLiu Xiaodong
60651c558dSLiu XiaodongControl commands includes add, configure, and start new ublk block device.
61651c558dSLiu XiaodongRetrieving device information, stop and delete existing ublk block device are also there.
62651c558dSLiu Xiaodong
63651c558dSLiu XiaodongThe add device command creates a bulk char device `/dev/ublkcN`.
64651c558dSLiu XiaodongIt will be used by the ublk userspace server to `mmap` I/O descriptor buffer.
65651c558dSLiu XiaodongThe start device command exposes a ublk block device `/dev/ublkbN`.
66651c558dSLiu XiaodongThe block device can be formatted and mounted by a kernel filesystem,
67651c558dSLiu Xiaodongor read/written directly by other processes.
68651c558dSLiu Xiaodong
69651c558dSLiu Xiaodong### Data Plane
70651c558dSLiu Xiaodong
71651c558dSLiu XiaodongThe datapath between ublk server and kernel driver includes `io_uring` and shared
72651c558dSLiu Xiaodongmemory buffer.  The shared memory buffer is an array of I/O descriptors.
73651c558dSLiu XiaodongEach SQE (Submission Queue Entry) in `io_uring` is assigned one I/O descriptor and
74651c558dSLiu Xiaodongone user buffer address.  When ublk kernel driver receives I/O requests from upper
75651c558dSLiu Xiaodonglayer, the information of I/O requests will be filled into I/O descriptors by ublk
76651c558dSLiu Xiaodongkernel driver.  The I/O data is copied between the specified user buffer address and
77651c558dSLiu Xiaodongrequest/bio's pages at the proper time.
78651c558dSLiu Xiaodong
79651c558dSLiu XiaodongAt start, the ublk server needs to fill the `io_uring` SQ (Submission Queue).  Each
80651c558dSLiu XiaodongSQE is marked with an operation flag `UBLK_IO_FETCH_REQ` which means the SQE is
81651c558dSLiu Xiaodongready to get I/O request.
82651c558dSLiu Xiaodong
83651c558dSLiu XiaodongWhen a CQE (Completion Queue Entry) is returned from the `io_uring` indicating I/O
84651c558dSLiu Xiaodongrequest, the ublk server gets the position of the I/O descriptor from CQE.
85651c558dSLiu XiaodongThe ublk server handles the I/O request based on information in the I/O descriptor.
86651c558dSLiu Xiaodong
87651c558dSLiu XiaodongAfter the ublk server completes the I/O request, it updates the I/O's completion status
88651c558dSLiu Xiaodongand ublk operation flag.  This time, the operation flag is `UBLK_IO_COMMIT_AND_FETCH_REQ`
89651c558dSLiu Xiaodongwhich informs kernel module that one I/O request is completed, and also the SQE slot
90651c558dSLiu Xiaodongis free to fetch new I/O request.
91651c558dSLiu Xiaodong
92651c558dSLiu Xiaodong`UBLK_IO_COMMIT_AND_FETCH_REQ` is designed for efficiency in ublk. In runtime, the ublk
93651c558dSLiu Xiaodongserver needs to commit I/O results back, and then provide new free SQE slots for fetching
94651c558dSLiu Xiaodongnew I/O requests.  Without `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, `io_uring_submit()` should
95651c558dSLiu Xiaodongbe called twice,  once for committing I/O results back, once for providing free SQE slots.
96651c558dSLiu XiaodongWith `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, calling `io_uring_submit()` once is enough because
97651c558dSLiu Xiaodongthe ublk driver realizes that the submitted SQEs are reused both for committing back I/O
98651c558dSLiu Xiaodongresults and fetching new requests.
99651c558dSLiu Xiaodong
100651c558dSLiu Xiaodong## SPDK Implementation {#ublk_impl}
101651c558dSLiu Xiaodong
102651c558dSLiu XiaodongSPDK ublk target is implemented as a high performance ublk server.
103651c558dSLiu Xiaodong
104651c558dSLiu XiaodongIt creates one ublk spdk_thread on each spdk_reactor by default or on user specified
105651c558dSLiu Xiaodongreactors.  When adding a new ublk block device, SPDK ublk target will assign queues
106651c558dSLiu Xiaodongof ublk block device to ublk spdk_threads in round-robin.
107651c558dSLiu XiaodongThat means one ublk device queue will only be processed by one spdk_thread.
108651c558dSLiu XiaodongOne ublk device with multiple queues can get multiple spdk reactors involved
109651c558dSLiu Xiaodongto process its I/O requests;
110651c558dSLiu XiaodongOne spdk_thread created by ublk target may process multiple queues, each from
111651c558dSLiu Xiaodongdifferent ublk devices.
112651c558dSLiu XiaodongIn this way, spdk reactors can be fully utilized to achieve best performance,
113651c558dSLiu Xiaodongwhen there are only a few ublk devices.
114651c558dSLiu Xiaodong
115651c558dSLiu Xiaodongublk is `io_uring` based. All ublk I/O queues are mapped to `io_uring`.
116651c558dSLiu Xiaodongublk spdk_thread gets I/O requests from available CQEs by polling all its assigned
117651c558dSLiu Xiaodong`io_uring`s.
118651c558dSLiu XiaodongWhen there are completed I/O requests, ublk spdk_thread will submit them as SQE back
119651c558dSLiu Xiaodongto `io_uring` in batch.
120651c558dSLiu Xiaodong
121651c558dSLiu XiaodongCurrently, ublk driver has a system thread context limitation that one ublk device queue
122651c558dSLiu Xiaodongcan be only processed in the context of system thread which initialized the it.  SPDK
123651c558dSLiu Xiaodongcan't schedule ublk spdk_thread between different SPDK reactors.  In other words, SPDK
124651c558dSLiu Xiaodongdynamic scheduler can't rebalance ublk workload by rescheduling ublk spdk_thread.
125651c558dSLiu Xiaodong
126651c558dSLiu Xiaodong## Operation {#ublk_op}
127651c558dSLiu Xiaodong
128651c558dSLiu Xiaodong### Enabling SPDK ublk target
129651c558dSLiu Xiaodong
130651c558dSLiu XiaodongBuild SPDK with SPDK ublk target enabled.
131651c558dSLiu Xiaodong
132651c558dSLiu Xiaodong~~~{.sh}
133651c558dSLiu Xiaodong./configure --with-ublk
134651c558dSLiu Xiaodongmake -j
135651c558dSLiu Xiaodong~~~
136651c558dSLiu Xiaodong
137*34edd9f1SKamil GodzwonSPDK ublk target related libraries will then be linked into SPDK application `spdk_tgt`.
138651c558dSLiu XiaodongSetup some hugepages for the SPDK, and then run the SPDK application `spdk_tgt`.
139651c558dSLiu Xiaodong
140651c558dSLiu Xiaodong~~~{.sh}
141651c558dSLiu Xiaodongscripts/setup.sh
142651c558dSLiu Xiaodongbuild/bin/spdk_tgt &
143651c558dSLiu Xiaodong~~~
144651c558dSLiu Xiaodong
145651c558dSLiu XiaodongOnce the `spdk_tgt` is initialized, user can enable SPDK ublk feature
146651c558dSLiu Xiaodongby creating ublk target. However, before creating ublk target, ublk kernel module
147651c558dSLiu Xiaodong`ublk_drv` should be loaded using `modprobe`.
148651c558dSLiu Xiaodong
149651c558dSLiu Xiaodong~~~{.sh}
150651c558dSLiu Xiaodongmodprobe ublk_drv
151651c558dSLiu Xiaodongscripts/rpc.py ublk_create_target
152651c558dSLiu Xiaodong~~~
153651c558dSLiu Xiaodong
154651c558dSLiu Xiaodong### Creating ublk block device
155651c558dSLiu Xiaodong
156651c558dSLiu XiaodongSPDK bdevs are block devices which will be exposed to the local host kernel
157651c558dSLiu Xiaodongas ublk block devices.  SPDK supports several different types of storage backends,
158651c558dSLiu Xiaodongincluding NVMe, Linux AIO, malloc ramdisk and Ceph RBD.  Refer to @ref bdev for
159651c558dSLiu Xiaodongadditional information on configuring SPDK storage backends.
160651c558dSLiu Xiaodong
161651c558dSLiu XiaodongThis guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC
162651c558dSLiu Xiaodongwill create a 256MB malloc bdev with 512-byte block size.
163651c558dSLiu Xiaodong
164651c558dSLiu Xiaodong~~~{.sh}
165651c558dSLiu Xiaodongscripts/rpc.py bdev_malloc_create 256 512 -b Malloc0
166651c558dSLiu Xiaodong~~~
167651c558dSLiu Xiaodong
168651c558dSLiu XiaodongThe following RPC will create a ublk block device exposing Malloc0 bdev.
169651c558dSLiu XiaodongThe created ublk block device has ID 1.  It internally has 2 queues with
170651c558dSLiu Xiaodongqueue depth 128.
171651c558dSLiu Xiaodong
172651c558dSLiu Xiaodong~~~{.sh}
173651c558dSLiu Xiaodongscripts/rpc.py ublk_start_disk Malloc0 1 -q 2 -d 128
174651c558dSLiu Xiaodong~~~
175651c558dSLiu Xiaodong
176651c558dSLiu XiaodongThis RPC will reply back the ID of ublk block device.
177651c558dSLiu Xiaodong~~~
178651c558dSLiu Xiaodong1
179651c558dSLiu Xiaodong~~~
180651c558dSLiu Xiaodong
181651c558dSLiu XiaodongThe position of ublk block device is determined by its ID. It is created at `/dev/ublkb${ID}`.
182651c558dSLiu XiaodongSo the device we just created will be accessible to other processes via `/dev/ublkb1`.
183651c558dSLiu XiaodongNow applications like FIO or DD can work on `/dev/ublkb1` directly.
184651c558dSLiu Xiaodong
185651c558dSLiu Xiaodong~~~{.sh}
186651c558dSLiu Xiaodongdd of=/dev/ublkb1 if=/dev/zero bs=512 count=64
187651c558dSLiu Xiaodong~~~
188651c558dSLiu Xiaodong
189651c558dSLiu XiaodongA ublk block device is a generic kernel block device that can be formatted and
190651c558dSLiu Xiaodongmounted by kernel file system.
191651c558dSLiu Xiaodong
192651c558dSLiu Xiaodong~~~{.sh}
193651c558dSLiu Xiaodongmkfs /dev/ublkb1
194651c558dSLiu Xiaodongmount /dev/ublkb1 /mnt/
195651c558dSLiu Xiaodongmkdir /mnt/testdir
196651c558dSLiu Xiaodongecho "Hello,SPDK ublk Target" > /mnt/testdir/testfile
197651c558dSLiu Xiaodongumount /mnt
198651c558dSLiu Xiaodong~~~
199651c558dSLiu Xiaodong
200651c558dSLiu Xiaodong### Deleting ublk block device and exit
201651c558dSLiu Xiaodong
202651c558dSLiu XiaodongAfter usage, ublk block device can be stopped and deleted by RPC `ublk_stop_disk` with its ID.
203651c558dSLiu XiaodongSpecify ID 1, then device `/dev/ublkb1` will be removed.
204651c558dSLiu Xiaodong
205651c558dSLiu Xiaodong~~~{.sh}
206651c558dSLiu Xiaodongscripts/rpc.py ublk_stop_disk 1
207651c558dSLiu Xiaodong~~~
208651c558dSLiu Xiaodong
209651c558dSLiu XiaodongIf ublk is not used anymore, SPDK ublk target can be destroyed to free related SPDK
210651c558dSLiu Xiaodongresources.
211651c558dSLiu Xiaodong
212651c558dSLiu Xiaodong~~~{.sh}
213651c558dSLiu Xiaodongscripts/rpc.py ublk_destroy_target
214651c558dSLiu Xiaodong~~~
215651c558dSLiu Xiaodong
216651c558dSLiu XiaodongOf course, SPDK ublk target and all ublk block devices would be destroyed automatically
217651c558dSLiu Xiaodongwhen SPDK application is terminated.
218