1651c558dSLiu Xiaodong# ublk Target {#ublk} 2651c558dSLiu Xiaodong 3651c558dSLiu Xiaodong## Table of Contents {#ublk_toc} 4651c558dSLiu Xiaodong 5651c558dSLiu Xiaodong- @ref ublk_intro 6651c558dSLiu Xiaodong- @ref ublk_internal 7651c558dSLiu Xiaodong- @ref ublk_impl 8651c558dSLiu Xiaodong- @ref ublk_op 9651c558dSLiu Xiaodong 10651c558dSLiu Xiaodong## Introduction {#ublk_intro} 11651c558dSLiu Xiaodong 12651c558dSLiu Xiaodong[ublk](https://docs.kernel.org/block/ublk.html) (or ubd) is a generic framework for 13651c558dSLiu Xiaodongimplementing generic userspace block device based on `io_uring`. It is designed to 14651c558dSLiu Xiaodongcreate a highly efficient data path for userspace storage software to provide 15651c558dSLiu Xiaodonghigh-performance block device service in local host. 16651c558dSLiu Xiaodong 17651c558dSLiu XiaodongThe whole ublk service involves three parts: ublk driver, ublk server and ublk workload. 18651c558dSLiu Xiaodong 19651c558dSLiu Xiaodong 20651c558dSLiu Xiaodong 21651c558dSLiu Xiaodong* __ublk driver__ is a kernel driver added to kernel 6.0. It delivers I/O requests 22651c558dSLiu Xiaodong from a ublk block device(`/dev/ublkbN`) into a ublk server. 23651c558dSLiu Xiaodong 24651c558dSLiu Xiaodong* __ublk workload__ can be any local host process which submits I/O requests to a ublk 25651c558dSLiu Xiaodong block device or a kernel filesystem on top of the ublk block device. 26651c558dSLiu Xiaodong 27651c558dSLiu Xiaodong* __ublk server__ is the userspace storage software that fetches the I/O requests delivered 28651c558dSLiu Xiaodong by the ublk driver. The ublk server will process the I/O requests with its specific block 29651c558dSLiu Xiaodong service logic and connected backends. Once the ublk server gets the response from the 30651c558dSLiu Xiaodong connected backends, it communicates with the ublk driver and completes the I/O requests. 31651c558dSLiu Xiaodong 32651c558dSLiu XiaodongSPDK ublk target acts as a ublk server. It can handle ublk I/O requests within the whole 33651c558dSLiu XiaodongSPDK userspace storage software stack. 34651c558dSLiu Xiaodong 35651c558dSLiu XiaodongA typical usage scenario is for container attached storage: 36651c558dSLiu Xiaodong 37651c558dSLiu Xiaodong* Real storage resources are assigned to SPDK, like physical NVMe devices and 38651c558dSLiu Xiaodong distributed block storage. 39651c558dSLiu Xiaodong* SPDK creates refined block devices via ublk kernel module on top of its organized 40651c558dSLiu Xiaodong storage resources, based on user configuration. 41651c558dSLiu Xiaodong* Container orchestrator and runtime can then mount and stage the ublk block devices 42651c558dSLiu Xiaodong for container instances to use. 43651c558dSLiu Xiaodong 44651c558dSLiu Xiaodong## ublk Internal {#ublk_internal} 45651c558dSLiu Xiaodong 46651c558dSLiu XiaodongPreviously, the design of putting I/O processing logic into userspace software always has a 47651c558dSLiu Xiaodongnoticeable interaction overhead between the kernel module and userspace part. 48651c558dSLiu Xiaodong 49651c558dSLiu Xiaodongublk utilizes `io_uring` which has been proven to be very efficient in decreasing the 50651c558dSLiu Xiaodonginteraction overhead. The I/O request is delivered to the userspace ublk server via the 51651c558dSLiu Xiaodongnewly added `io_uring` command. A shared buffer via `mmap` is used for sharing I/O descriptor 52651c558dSLiu Xiaodongto userspace from the kernel driver. The I/O data is copied only once between the specified 53651c558dSLiu Xiaodonguserspace buffer address and request/bio's pages by the ublk driver. 54651c558dSLiu Xiaodong 55651c558dSLiu Xiaodong### Control Plane 56651c558dSLiu Xiaodong 57651c558dSLiu XiaodongA control device is create by ublk kernel module at `/dev/ublk-control`. Userspace server 58651c558dSLiu Xiaodongsends control commands to kernel module via the control device using `io_uring`. 59651c558dSLiu Xiaodong 60651c558dSLiu XiaodongControl commands includes add, configure, and start new ublk block device. 61651c558dSLiu XiaodongRetrieving device information, stop and delete existing ublk block device are also there. 62651c558dSLiu Xiaodong 63651c558dSLiu XiaodongThe add device command creates a bulk char device `/dev/ublkcN`. 64651c558dSLiu XiaodongIt will be used by the ublk userspace server to `mmap` I/O descriptor buffer. 65651c558dSLiu XiaodongThe start device command exposes a ublk block device `/dev/ublkbN`. 66651c558dSLiu XiaodongThe block device can be formatted and mounted by a kernel filesystem, 67651c558dSLiu Xiaodongor read/written directly by other processes. 68651c558dSLiu Xiaodong 69651c558dSLiu Xiaodong### Data Plane 70651c558dSLiu Xiaodong 71651c558dSLiu XiaodongThe datapath between ublk server and kernel driver includes `io_uring` and shared 72651c558dSLiu Xiaodongmemory buffer. The shared memory buffer is an array of I/O descriptors. 73651c558dSLiu XiaodongEach SQE (Submission Queue Entry) in `io_uring` is assigned one I/O descriptor and 74651c558dSLiu Xiaodongone user buffer address. When ublk kernel driver receives I/O requests from upper 75651c558dSLiu Xiaodonglayer, the information of I/O requests will be filled into I/O descriptors by ublk 76651c558dSLiu Xiaodongkernel driver. The I/O data is copied between the specified user buffer address and 77651c558dSLiu Xiaodongrequest/bio's pages at the proper time. 78651c558dSLiu Xiaodong 79651c558dSLiu XiaodongAt start, the ublk server needs to fill the `io_uring` SQ (Submission Queue). Each 80651c558dSLiu XiaodongSQE is marked with an operation flag `UBLK_IO_FETCH_REQ` which means the SQE is 81651c558dSLiu Xiaodongready to get I/O request. 82651c558dSLiu Xiaodong 83651c558dSLiu XiaodongWhen a CQE (Completion Queue Entry) is returned from the `io_uring` indicating I/O 84651c558dSLiu Xiaodongrequest, the ublk server gets the position of the I/O descriptor from CQE. 85651c558dSLiu XiaodongThe ublk server handles the I/O request based on information in the I/O descriptor. 86651c558dSLiu Xiaodong 87651c558dSLiu XiaodongAfter the ublk server completes the I/O request, it updates the I/O's completion status 88651c558dSLiu Xiaodongand ublk operation flag. This time, the operation flag is `UBLK_IO_COMMIT_AND_FETCH_REQ` 89651c558dSLiu Xiaodongwhich informs kernel module that one I/O request is completed, and also the SQE slot 90651c558dSLiu Xiaodongis free to fetch new I/O request. 91651c558dSLiu Xiaodong 92651c558dSLiu Xiaodong`UBLK_IO_COMMIT_AND_FETCH_REQ` is designed for efficiency in ublk. In runtime, the ublk 93651c558dSLiu Xiaodongserver needs to commit I/O results back, and then provide new free SQE slots for fetching 94651c558dSLiu Xiaodongnew I/O requests. Without `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, `io_uring_submit()` should 95651c558dSLiu Xiaodongbe called twice, once for committing I/O results back, once for providing free SQE slots. 96651c558dSLiu XiaodongWith `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, calling `io_uring_submit()` once is enough because 97651c558dSLiu Xiaodongthe ublk driver realizes that the submitted SQEs are reused both for committing back I/O 98651c558dSLiu Xiaodongresults and fetching new requests. 99651c558dSLiu Xiaodong 100651c558dSLiu Xiaodong## SPDK Implementation {#ublk_impl} 101651c558dSLiu Xiaodong 102651c558dSLiu XiaodongSPDK ublk target is implemented as a high performance ublk server. 103651c558dSLiu Xiaodong 104651c558dSLiu XiaodongIt creates one ublk spdk_thread on each spdk_reactor by default or on user specified 105651c558dSLiu Xiaodongreactors. When adding a new ublk block device, SPDK ublk target will assign queues 106651c558dSLiu Xiaodongof ublk block device to ublk spdk_threads in round-robin. 107651c558dSLiu XiaodongThat means one ublk device queue will only be processed by one spdk_thread. 108651c558dSLiu XiaodongOne ublk device with multiple queues can get multiple spdk reactors involved 109651c558dSLiu Xiaodongto process its I/O requests; 110651c558dSLiu XiaodongOne spdk_thread created by ublk target may process multiple queues, each from 111651c558dSLiu Xiaodongdifferent ublk devices. 112651c558dSLiu XiaodongIn this way, spdk reactors can be fully utilized to achieve best performance, 113651c558dSLiu Xiaodongwhen there are only a few ublk devices. 114651c558dSLiu Xiaodong 115651c558dSLiu Xiaodongublk is `io_uring` based. All ublk I/O queues are mapped to `io_uring`. 116651c558dSLiu Xiaodongublk spdk_thread gets I/O requests from available CQEs by polling all its assigned 117651c558dSLiu Xiaodong`io_uring`s. 118651c558dSLiu XiaodongWhen there are completed I/O requests, ublk spdk_thread will submit them as SQE back 119651c558dSLiu Xiaodongto `io_uring` in batch. 120651c558dSLiu Xiaodong 121651c558dSLiu XiaodongCurrently, ublk driver has a system thread context limitation that one ublk device queue 122651c558dSLiu Xiaodongcan be only processed in the context of system thread which initialized the it. SPDK 123651c558dSLiu Xiaodongcan't schedule ublk spdk_thread between different SPDK reactors. In other words, SPDK 124651c558dSLiu Xiaodongdynamic scheduler can't rebalance ublk workload by rescheduling ublk spdk_thread. 125651c558dSLiu Xiaodong 126651c558dSLiu Xiaodong## Operation {#ublk_op} 127651c558dSLiu Xiaodong 128651c558dSLiu Xiaodong### Enabling SPDK ublk target 129651c558dSLiu Xiaodong 130651c558dSLiu XiaodongBuild SPDK with SPDK ublk target enabled. 131651c558dSLiu Xiaodong 132651c558dSLiu Xiaodong~~~{.sh} 133651c558dSLiu Xiaodong./configure --with-ublk 134651c558dSLiu Xiaodongmake -j 135651c558dSLiu Xiaodong~~~ 136651c558dSLiu Xiaodong 137*34edd9f1SKamil GodzwonSPDK ublk target related libraries will then be linked into SPDK application `spdk_tgt`. 138651c558dSLiu XiaodongSetup some hugepages for the SPDK, and then run the SPDK application `spdk_tgt`. 139651c558dSLiu Xiaodong 140651c558dSLiu Xiaodong~~~{.sh} 141651c558dSLiu Xiaodongscripts/setup.sh 142651c558dSLiu Xiaodongbuild/bin/spdk_tgt & 143651c558dSLiu Xiaodong~~~ 144651c558dSLiu Xiaodong 145651c558dSLiu XiaodongOnce the `spdk_tgt` is initialized, user can enable SPDK ublk feature 146651c558dSLiu Xiaodongby creating ublk target. However, before creating ublk target, ublk kernel module 147651c558dSLiu Xiaodong`ublk_drv` should be loaded using `modprobe`. 148651c558dSLiu Xiaodong 149651c558dSLiu Xiaodong~~~{.sh} 150651c558dSLiu Xiaodongmodprobe ublk_drv 151651c558dSLiu Xiaodongscripts/rpc.py ublk_create_target 152651c558dSLiu Xiaodong~~~ 153651c558dSLiu Xiaodong 154651c558dSLiu Xiaodong### Creating ublk block device 155651c558dSLiu Xiaodong 156651c558dSLiu XiaodongSPDK bdevs are block devices which will be exposed to the local host kernel 157651c558dSLiu Xiaodongas ublk block devices. SPDK supports several different types of storage backends, 158651c558dSLiu Xiaodongincluding NVMe, Linux AIO, malloc ramdisk and Ceph RBD. Refer to @ref bdev for 159651c558dSLiu Xiaodongadditional information on configuring SPDK storage backends. 160651c558dSLiu Xiaodong 161651c558dSLiu XiaodongThis guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC 162651c558dSLiu Xiaodongwill create a 256MB malloc bdev with 512-byte block size. 163651c558dSLiu Xiaodong 164651c558dSLiu Xiaodong~~~{.sh} 165651c558dSLiu Xiaodongscripts/rpc.py bdev_malloc_create 256 512 -b Malloc0 166651c558dSLiu Xiaodong~~~ 167651c558dSLiu Xiaodong 168651c558dSLiu XiaodongThe following RPC will create a ublk block device exposing Malloc0 bdev. 169651c558dSLiu XiaodongThe created ublk block device has ID 1. It internally has 2 queues with 170651c558dSLiu Xiaodongqueue depth 128. 171651c558dSLiu Xiaodong 172651c558dSLiu Xiaodong~~~{.sh} 173651c558dSLiu Xiaodongscripts/rpc.py ublk_start_disk Malloc0 1 -q 2 -d 128 174651c558dSLiu Xiaodong~~~ 175651c558dSLiu Xiaodong 176651c558dSLiu XiaodongThis RPC will reply back the ID of ublk block device. 177651c558dSLiu Xiaodong~~~ 178651c558dSLiu Xiaodong1 179651c558dSLiu Xiaodong~~~ 180651c558dSLiu Xiaodong 181651c558dSLiu XiaodongThe position of ublk block device is determined by its ID. It is created at `/dev/ublkb${ID}`. 182651c558dSLiu XiaodongSo the device we just created will be accessible to other processes via `/dev/ublkb1`. 183651c558dSLiu XiaodongNow applications like FIO or DD can work on `/dev/ublkb1` directly. 184651c558dSLiu Xiaodong 185651c558dSLiu Xiaodong~~~{.sh} 186651c558dSLiu Xiaodongdd of=/dev/ublkb1 if=/dev/zero bs=512 count=64 187651c558dSLiu Xiaodong~~~ 188651c558dSLiu Xiaodong 189651c558dSLiu XiaodongA ublk block device is a generic kernel block device that can be formatted and 190651c558dSLiu Xiaodongmounted by kernel file system. 191651c558dSLiu Xiaodong 192651c558dSLiu Xiaodong~~~{.sh} 193651c558dSLiu Xiaodongmkfs /dev/ublkb1 194651c558dSLiu Xiaodongmount /dev/ublkb1 /mnt/ 195651c558dSLiu Xiaodongmkdir /mnt/testdir 196651c558dSLiu Xiaodongecho "Hello,SPDK ublk Target" > /mnt/testdir/testfile 197651c558dSLiu Xiaodongumount /mnt 198651c558dSLiu Xiaodong~~~ 199651c558dSLiu Xiaodong 200651c558dSLiu Xiaodong### Deleting ublk block device and exit 201651c558dSLiu Xiaodong 202651c558dSLiu XiaodongAfter usage, ublk block device can be stopped and deleted by RPC `ublk_stop_disk` with its ID. 203651c558dSLiu XiaodongSpecify ID 1, then device `/dev/ublkb1` will be removed. 204651c558dSLiu Xiaodong 205651c558dSLiu Xiaodong~~~{.sh} 206651c558dSLiu Xiaodongscripts/rpc.py ublk_stop_disk 1 207651c558dSLiu Xiaodong~~~ 208651c558dSLiu Xiaodong 209651c558dSLiu XiaodongIf ublk is not used anymore, SPDK ublk target can be destroyed to free related SPDK 210651c558dSLiu Xiaodongresources. 211651c558dSLiu Xiaodong 212651c558dSLiu Xiaodong~~~{.sh} 213651c558dSLiu Xiaodongscripts/rpc.py ublk_destroy_target 214651c558dSLiu Xiaodong~~~ 215651c558dSLiu Xiaodong 216651c558dSLiu XiaodongOf course, SPDK ublk target and all ublk block devices would be destroyed automatically 217651c558dSLiu Xiaodongwhen SPDK application is terminated. 218