1# Block Device User Guide {#bdev} 2 3# Introduction {#bdev_ug_introduction} 4 5The SPDK block device layer, often simply called *bdev*, is a C library 6intended to be equivalent to the operating system block storage layer that 7often sits immediately above the device drivers in a traditional kernel 8storage stack. Specifically, this library provides the following 9functionality: 10 11* A pluggable module API for implementing block devices that interface with different types of block storage devices. 12* Driver modules for NVMe, malloc (ramdisk), Linux AIO, virtio-scsi, Ceph RBD, Pmem and Vhost-SCSI Initiator and more. 13* An application API for enumerating and claiming SPDK block devices and then performing operations (read, write, unmap, etc.) on those devices. 14* Facilities to stack block devices to create complex I/O pipelines, including logical volume management (lvol) and partition support (GPT). 15* Configuration of block devices via JSON-RPC. 16* Request queueing, timeout, and reset handling. 17* Multiple, lockless queues for sending I/O to block devices. 18 19Bdev module creates abstraction layer that provides common API for all devices. 20User can use available bdev modules or create own module with any type of 21device underneath (please refer to @ref bdev_module for details). SPDK 22provides also vbdev modules which creates block devices on existing bdev. For 23example @ref bdev_ug_logical_volumes or @ref bdev_ug_gpt 24 25# Prerequisites {#bdev_ug_prerequisites} 26 27This guide assumes that you can already build the standard SPDK distribution 28on your platform. The block device layer is a C library with a single public 29header file named bdev.h. All SPDK configuration described in following 30chapters is done by using JSON-RPC commands. SPDK provides a python-based 31command line tool for sending RPC commands located at `scripts/rpc.py`. User 32can list available commands by running this script with `-h` or `--help` flag. 33Additionally user can retrieve currently supported set of RPC commands 34directly from SPDK application by running `scripts/rpc.py get_rpc_methods`. 35Detailed help for each command can be displayed by adding `-h` flag as a 36command parameter. 37 38# General Purpose RPCs {#bdev_ug_general_rpcs} 39 40## get_bdevs {#bdev_ug_get_bdevs} 41 42List of currently available block devices including detailed information about 43them can be get by using `get_bdevs` RPC command. User can add optional 44parameter `name` to get details about specified by that name bdev. 45 46Example response 47 48~~~ 49{ 50 "num_blocks": 32768, 51 "assigned_rate_limits": { 52 "rw_ios_per_sec": 10000, 53 "rw_mbytes_per_sec": 20 54 }, 55 "supported_io_types": { 56 "reset": true, 57 "nvme_admin": false, 58 "unmap": true, 59 "read": true, 60 "write_zeroes": true, 61 "write": true, 62 "flush": true, 63 "nvme_io": false 64 }, 65 "driver_specific": {}, 66 "claimed": false, 67 "block_size": 4096, 68 "product_name": "Malloc disk", 69 "name": "Malloc0" 70} 71~~~ 72 73## set_bdev_qos_limit {#set_bdev_qos_limit} 74 75Users can use the `set_bdev_qos_limit` RPC command to enable, adjust, and disable 76rate limits on an existing bdev. Two types of rate limits are supported: 77IOPS and bandwidth. The rate limits can be enabled, adjusted, and disabled at any 78time for the specified bdev. The bdev name is a required parameter for this 79RPC command and at least one of `rw_ios_per_sec` and `rw_mbytes_per_sec` must be 80specified. When both rate limits are enabled, the first met limit will 81take effect. The value 0 may be specified to disable the corresponding rate 82limit. Users can run this command with `-h` or `--help` for more information. 83 84## Histograms {#rpc_bdev_histogram} 85 86The `enable_bdev_histogram` RPC command allows to enable or disable gathering 87latency data for specified bdev. Histogram can be downloaded by the user by 88calling `get_bdev_histogram` and parsed using scripts/histogram.py script. 89 90Example command 91 92`rpc.py enable_bdev_histogram Nvme0n1 --enable` 93 94The command will enable gathering data for histogram on Nvme0n1 device. 95 96`rpc.py get_bdev_histogram Nvme0n1 | histogram.py` 97 98The command will download gathered histogram data. The script will parse 99the data and show table containing IO count for latency ranges. 100 101`rpc.py enable_bdev_histogram Nvme0n1 --disable` 102 103The command will disable histogram on Nvme0n1 device. 104 105# Ceph RBD {#bdev_config_rbd} 106 107The SPDK RBD bdev driver provides SPDK block layer access to Ceph RADOS block 108devices (RBD). Ceph RBD devices are accessed via librbd and librados libraries 109to access the RADOS block device exported by Ceph. To create Ceph bdev RPC 110command `construct_rbd_bdev` should be used. 111 112Example command 113 114`rpc.py construct_rbd_bdev rbd foo 512` 115 116This command will create a bdev that represents the 'foo' image from a pool called 'rbd'. 117 118To remove a block device representation use the delete_rbd_bdev command. 119 120`rpc.py delete_rbd_bdev Rbd0` 121 122# Crypto Virtual Bdev Module {#bdev_config_crypto} 123 124The crypto virtual bdev module can be configured to provide at rest data encryption 125for any underlying bdev. The module relies on the DPDK CryptoDev Framework to provide 126all cryptographic functionality. The framework provides support for many different software 127only cryptographic modules as well hardware assisted support for the Intel QAT board. The 128framework also provides support for cipher, hash, authentication and AEAD functions. At this 129time the SPDK virtual bdev module supports cipher only as follows: 130 131- AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC 132- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC 133(Note: QAT is functional however is marked as experimental until the hardware has 134been fully integrated with the SPDK CI system.) 135 136In order to support using the bdev block offset (LBA) as the initialization vector (IV), 137the crypto module break up all I/O into crypto operations of a size equal to the block 138size of the underlying bdev. For example, a 4K I/O to a bdev with a 512B block size, 139would result in 8 cryptographic operations. 140 141For reads, the buffer provided to the crypto module will be used as the destination buffer 142for unencrypted data. For writes, however, a temporary scratch buffer is used as the 143destination buffer for encryption which is then passed on to the underlying bdev as the 144write buffer. This is done to avoid encrypting the data in the original source buffer which 145may cause problems in some use cases. 146 147Example command 148 149`rpc.py construct_crypto_bdev -b NVMe1n1 -c CryNvmeA -d crypto_aesni_mb -k 0123456789123456` 150 151This command will create a crypto vbdev called 'CryNvmeA' on top of the NVMe bdev 152'NVMe1n1' and will use the DPDK software driver 'crypto_aesni_mb' and the key 153'0123456789123456'. 154 155To remove the vbdev use the delete_crypto_bdev command. 156 157`rpc.py delete_crypto_bdev CryNvmeA` 158 159# GPT (GUID Partition Table) {#bdev_config_gpt} 160 161The GPT virtual bdev driver is enabled by default and does not require any configuration. 162It will automatically detect @ref bdev_ug_gpt on any attached bdev and will create 163possibly multiple virtual bdevs. 164 165## SPDK GPT partition table {#bdev_ug_gpt} 166 167The SPDK partition type GUID is `7c5222bd-8f5d-4087-9c00-bf9843c7b58c`. Existing SPDK bdevs 168can be exposed as Linux block devices via NBD and then ca be partitioned with 169standard partitioning tools. After partitioning, the bdevs will need to be deleted and 170attached again for the GPT bdev module to see any changes. NBD kernel module must be 171loaded first. To create NBD bdev user should use `start_nbd_disk` RPC command. 172 173Example command 174 175`rpc.py start_nbd_disk Malloc0 /dev/nbd0` 176 177This will expose an SPDK bdev `Malloc0` under the `/dev/nbd0` block device. 178 179To remove NBD device user should use `stop_nbd_disk` RPC command. 180 181Example command 182 183`rpc.py stop_nbd_disk /dev/nbd0` 184 185To display full or specified nbd device list user should use `get_nbd_disks` RPC command. 186 187Example command 188 189`rpc.py stop_nbd_disk -n /dev/nbd0` 190 191## Creating a GPT partition table using NBD {#bdev_ug_gpt_create_part} 192 193~~~ 194# Expose bdev Nvme0n1 as kernel block device /dev/nbd0 by JSON-RPC 195rpc.py start_nbd_disk Nvme0n1 /dev/nbd0 196 197# Create GPT partition table. 198parted -s /dev/nbd0 mklabel gpt 199 200# Add a partition consuming 50% of the available space. 201parted -s /dev/nbd0 mkpart MyPartition '0%' '50%' 202 203# Change the partition type to the SPDK GUID. 204# sgdisk is part of the gdisk package. 205sgdisk -t 1:7c5222bd-8f5d-4087-9c00-bf9843c7b58c /dev/nbd0 206 207# Stop the NBD device (stop exporting /dev/nbd0). 208rpc.py stop_nbd_disk /dev/nbd0 209 210# Now Nvme0n1 is configured with a GPT partition table, and 211# the first partition will be automatically exposed as 212# Nvme0n1p1 in SPDK applications. 213~~~ 214 215# iSCSI bdev {#bdev_config_iscsi} 216 217The SPDK iSCSI bdev driver depends on libiscsi and hence is not enabled by default. 218In order to use it, build SPDK with an extra `--with-iscsi-initiator` configure option. 219 220The following command creates an `iSCSI0` bdev from a single LUN exposed at given iSCSI URL 221with `iqn.2016-06.io.spdk:init` as the reported initiator IQN. 222 223`rpc.py construct_iscsi_bdev -b iSCSI0 -i iqn.2016-06.io.spdk:init --url iscsi://127.0.0.1/iqn.2016-06.io.spdk:disk1/0` 224 225The URL is in the following format: 226`iscsi://[<username>[%<password>]@]<host>[:<port>]/<target-iqn>/<lun>` 227 228# Linux AIO bdev {#bdev_config_aio} 229 230The SPDK AIO bdev driver provides SPDK block layer access to Linux kernel block 231devices or a file on a Linux filesystem via Linux AIO. Note that O_DIRECT is 232used and thus bypasses the Linux page cache. This mode is probably as close to 233a typical kernel based target as a user space target can get without using a 234user-space driver. To create AIO bdev RPC command `construct_aio_bdev` should be 235used. 236 237Example commands 238 239`rpc.py construct_aio_bdev /dev/sda aio0` 240 241This command will create `aio0` device from /dev/sda. 242 243`rpc.py construct_aio_bdev /tmp/file file 8192` 244 245This command will create `file` device with block size 8192 from /tmp/file. 246 247To delete an aio bdev use the delete_aio_bdev command. 248 249`rpc.py delete_aio_bdev aio0` 250 251# OCF Virtual bdev {#bdev_config_cas} 252 253OCF virtual bdev module is based on [Open CAS Framework](https://github.com/Open-CAS/ocf) - a 254high performance block storage caching meta-library. 255To enable the module, configure SPDK with `--with-ocf=/path/to/ocf/library`. 256OCF bdev can be used to enable caching for any underlying bdev. 257 258Below is an example command for creating OCF bdev: 259 260`rpc.py construct_ocf_bdev Cache1 wt Malloc0 Nvme0n1` 261 262This command will create new OCF bdev `Cache1` having bdev `Malloc0` as caching-device 263and `Nvme0n1` as core-device and initial cache mode `Write-Through`. 264`Malloc0` will be used as cache for `Nvme0n1`, so data written to `Cache1` will be present 265on `Nvme0n1` eventually. 266By default, OCF will be configured with cache line size equal 4KiB 267and non-volatile metadata will be disabled. 268 269To remove `Cache1`: 270 271`rpc.py delete_ocf_bdev Cache1` 272 273During removal OCF-cache will be stopped and all cached data will be written to the core device. 274 275Note that OCF has a per-device RAM requirement 276of about 56000 + _cache device size_ * 58 / _cache line size_ (in bytes). 277To get more information on OCF 278please visit [OCF documentation](https://open-cas.github.io/). 279 280# Malloc bdev {#bdev_config_malloc} 281 282Malloc bdevs are ramdisks. Because of its nature they are volatile. They are created from hugepage memory given to SPDK 283application. 284 285# Null {#bdev_config_null} 286 287The SPDK null bdev driver is a dummy block I/O target that discards all writes and returns undefined 288data for reads. It is useful for benchmarking the rest of the bdev I/O stack with minimal block 289device overhead and for testing configurations that can't easily be created with the Malloc bdev. 290To create Null bdev RPC command `construct_null_bdev` should be used. 291 292Example command 293 294`rpc.py construct_null_bdev Null0 8589934592 4096` 295 296This command will create an 8 petabyte `Null0` device with block size 4096. 297 298To delete a null bdev use the delete_null_bdev command. 299 300`rpc.py delete_null_bdev Null0` 301 302# NVMe bdev {#bdev_config_nvme} 303 304There are two ways to create block device based on NVMe device in SPDK. First 305way is to connect local PCIe drive and second one is to connect NVMe-oF device. 306In both cases user should use `construct_nvme_bdev` RPC command to achieve that. 307 308Example commands 309 310`rpc.py construct_nvme_bdev -b NVMe1 -t PCIe -a 0000:01:00.0` 311 312This command will create NVMe bdev of physical device in the system. 313 314`rpc.py construct_nvme_bdev -b Nvme0 -t RDMA -a 192.168.100.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1` 315 316This command will create NVMe bdev of NVMe-oF resource. 317 318To remove a NVMe controller use the delete_nvme_controller command. 319 320`rpc.py delete_nvme_controller Nvme0` 321 322This command will remove NVMe controller named Nvme0. 323 324# Logical volumes {#bdev_ug_logical_volumes} 325 326The Logical Volumes library is a flexible storage space management system. It allows 327creating and managing virtual block devices with variable size on top of other bdevs. 328The SPDK Logical Volume library is built on top of @ref blob. For detailed description 329please refer to @ref lvol. 330 331## Logical volume store {#bdev_ug_lvol_store} 332 333Before creating any logical volumes (lvols), an lvol store has to be created first on 334selected block device. Lvol store is lvols vessel responsible for managing underlying 335bdev space assignment to lvol bdevs and storing metadata. To create lvol store user 336should use using `construct_lvol_store` RPC command. 337 338Example command 339 340`rpc.py construct_lvol_store Malloc2 lvs -c 4096` 341 342This will create lvol store named `lvs` with cluster size 4096, build on top of 343`Malloc2` bdev. In response user will be provided with uuid which is unique lvol store 344identifier. 345 346User can get list of available lvol stores using `get_lvol_stores` RPC command (no 347parameters available). 348 349Example response 350 351~~~ 352{ 353 "uuid": "330a6ab2-f468-11e7-983e-001e67edf35d", 354 "base_bdev": "Malloc2", 355 "free_clusters": 8190, 356 "cluster_size": 8192, 357 "total_data_clusters": 8190, 358 "block_size": 4096, 359 "name": "lvs" 360} 361~~~ 362 363To delete lvol store user should use `destroy_lvol_store` RPC command. 364 365Example commands 366 367`rpc.py destroy_lvol_store -u 330a6ab2-f468-11e7-983e-001e67edf35d` 368 369`rpc.py destroy_lvol_store -l lvs` 370 371## Lvols {#bdev_ug_lvols} 372 373To create lvols on existing lvol store user should use `construct_lvol_bdev` RPC command. 374Each created lvol will be represented by new bdev. 375 376Example commands 377 378`rpc.py construct_lvol_bdev lvol1 25 -l lvs` 379 380`rpc.py construct_lvol_bdev lvol2 25 -u 330a6ab2-f468-11e7-983e-001e67edf35d` 381 382# Passthru {#bdev_config_passthru} 383 384The SPDK Passthru virtual block device module serves as an example of how to write a 385virtual block device module. It implements the required functionality of a vbdev module 386and demonstrates some other basic features such as the use of per I/O context. 387 388Example commands 389 390`rpc.py construct_passthru_bdev -b aio -p pt` 391 392`rpc.py delete_passthru_bdev pt` 393 394# Pmem {#bdev_config_pmem} 395 396The SPDK pmem bdev driver uses pmemblk pool as the target for block I/O operations. For 397details on Pmem memory please refer to PMDK documentation on http://pmem.io website. 398First, user needs to configure SPDK to include PMDK support: 399 400`configure --with-pmdk` 401 402To create pmemblk pool for use with SPDK user should use `create_pmem_pool` RPC command. 403 404Example command 405 406`rpc.py create_pmem_pool /path/to/pmem_pool 25 4096` 407 408To get information on created pmem pool file user can use `pmem_pool_info` RPC command. 409 410Example command 411 412`rpc.py pmem_pool_info /path/to/pmem_pool` 413 414To remove pmem pool file user can use `delete_pmem_pool` RPC command. 415 416Example command 417 418`rpc.py delete_pmem_pool /path/to/pmem_pool` 419 420To create bdev based on pmemblk pool file user should use `construct_pmem_bdev ` RPC 421command. 422 423Example command 424 425`rpc.py construct_pmem_bdev /path/to/pmem_pool -n pmem` 426 427To remove a block device representation use the delete_pmem_bdev command. 428 429`rpc.py delete_pmem_bdev pmem` 430 431# Virtio Block {#bdev_config_virtio_blk} 432 433The Virtio-Block driver allows creating SPDK bdevs from Virtio-Block devices. 434 435The following command creates a Virtio-Block device named `VirtioBlk0` from a vhost-user 436socket `/tmp/vhost.0` exposed directly by SPDK @ref vhost. Optional `vq-count` and 437`vq-size` params specify number of request queues and queue depth to be used. 438 439`rpc.py construct_virtio_dev --dev-type blk --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioBlk0` 440 441The driver can be also used inside QEMU-based VMs. The following command creates a Virtio 442Block device named `VirtioBlk0` from a Virtio PCI device at address `0000:00:01.0`. 443The entire configuration will be read automatically from PCI Configuration Space. It will 444reflect all parameters passed to QEMU's vhost-user-scsi-pci device. 445 446`rpc.py construct_virtio_dev --dev-type blk --trtype pci --traddr 0000:01:00.0 VirtioBlk1` 447 448Virtio-Block devices can be removed with the following command 449 450`rpc.py remove_virtio_bdev VirtioBlk0` 451 452# Virtio SCSI {#bdev_config_virtio_scsi} 453 454The Virtio-SCSI driver allows creating SPDK block devices from Virtio-SCSI LUNs. 455 456Virtio-SCSI bdevs are constructed the same way as Virtio-Block ones. 457 458`rpc.py construct_virtio_dev --dev-type scsi --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioScsi0` 459 460`rpc.py construct_virtio_dev --dev-type scsi --trtype pci --traddr 0000:01:00.0 VirtioScsi0` 461 462Each Virtio-SCSI device may export up to 64 block devices named VirtioScsi0t0 ~ VirtioScsi0t63, 463one LUN (LUN0) per SCSI device. The above 2 commands will output names of all exposed bdevs. 464 465Virtio-SCSI devices can be removed with the following command 466 467`rpc.py remove_virtio_bdev VirtioScsi0` 468 469Removing a Virtio-SCSI device will destroy all its bdevs. 470