1# Writing a Custom Block Device Module {#bdev_module} 2 3## Target Audience 4 5This programming guide is intended for developers authoring their own block 6device modules to integrate with SPDK's bdev layer. For a guide on how to use 7the bdev layer, see @ref bdev_pg. 8 9## Introduction 10 11A block device module is SPDK's equivalent of a device driver in a traditional 12operating system. The module provides a set of function pointers that are 13called to service block device I/O requests. SPDK provides a number of block 14device modules including NVMe, RAM-disk, and Ceph RBD. However, some users 15will want to write their own to interact with either custom hardware or to an 16existing storage software stack. This guide is intended to demonstrate exactly 17how to write a module. 18 19## Creating A New Module 20 21Block device modules are located in subdirectories under module/bdev today. It is not 22currently possible to place the code for a bdev module elsewhere, but updates 23to the build system could be made to enable this in the future. To create a 24module, add a new directory with a single C file and a Makefile. A great 25starting point is to copy the existing 'null' bdev module. 26 27The primary interface that bdev modules will interact with is in 28include/spdk/bdev_module.h. In that header a macro is defined that registers 29a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a 30pointer spdk_bdev_module structure that is used to register new bdev module. 31 32The spdk_bdev_module structure describes the module properties like 33initialization (`module_init`) and teardown (`module_fini`) functions, 34the function that returns context size (`get_ctx_size`) - scratch space that 35will be allocated in each I/O request for use by this module, and a callback 36that will be called each time a new bdev is registered by another module 37(`examine_config` and `examine_disk`). Please check the documentation of 38struct spdk_bdev_module for more details. 39 40## Creating Bdevs 41 42New bdevs are created within the module by calling spdk_bdev_register(). The 43module must allocate a struct spdk_bdev, fill it out appropriately, and pass 44it to the register call. The most important field to fill out is `fn_table`, 45which points at this data structure: 46 47~~~{.c} 48/* 49 * Function table for a block device backend. 50 * 51 * The backend block device function table provides a set of APIs to allow 52 * communication with a backend. The main commands are read/write API 53 * calls for I/O via submit_request. 54 */ 55struct spdk_bdev_fn_table { 56 /* Destroy the backend block device object */ 57 int (*destruct)(void *ctx); 58 59 /* Process the IO. */ 60 void (*submit_request)(struct spdk_io_channel *ch, struct spdk_bdev_io *); 61 62 /* Check if the block device supports a specific I/O type. */ 63 bool (*io_type_supported)(void *ctx, enum spdk_bdev_io_type); 64 65 /* Get an I/O channel for the specific bdev for the calling thread. */ 66 struct spdk_io_channel *(*get_io_channel)(void *ctx); 67 68 /* 69 * Output driver-specific configuration to a JSON stream. Optional - may be NULL. 70 * 71 * The JSON write context will be initialized with an open object, so the bdev 72 * driver should write a name (based on the driver name) followed by a JSON value 73 * (most likely another nested object). 74 */ 75 int (*dump_config_json)(void *ctx, struct spdk_json_write_ctx *w); 76 77 /* Get spin-time per I/O channel in microseconds. 78 * Optional - may be NULL. 79 */ 80 uint64_t (*get_spin_time)(struct spdk_io_channel *ch); 81}; 82~~~ 83 84The bdev module must implement these function callbacks. 85 86The `destruct` function is called to tear down the device when the system no 87longer needs it. What `destruct` does is up to the module - it may just be 88freeing memory or it may be shutting down a piece of hardware. 89 90The `io_type_supported` function returns whether a particular I/O type is 91supported. The available I/O types are: 92 93~~~{.c} 94/** bdev I/O type */ 95enum spdk_bdev_io_type { 96 SPDK_BDEV_IO_TYPE_INVALID = 0, 97 SPDK_BDEV_IO_TYPE_READ, 98 SPDK_BDEV_IO_TYPE_WRITE, 99 SPDK_BDEV_IO_TYPE_UNMAP, 100 SPDK_BDEV_IO_TYPE_FLUSH, 101 SPDK_BDEV_IO_TYPE_RESET, 102 SPDK_BDEV_IO_TYPE_NVME_ADMIN, 103 SPDK_BDEV_IO_TYPE_NVME_IO, 104 SPDK_BDEV_IO_TYPE_NVME_IO_MD, 105 SPDK_BDEV_IO_TYPE_WRITE_ZEROES, 106}; 107~~~ 108 109For the simplest bdev modules, only `SPDK_BDEV_IO_TYPE_READ` and 110`SPDK_BDEV_IO_TYPE_WRITE` are necessary. `SPDK_BDEV_IO_TYPE_UNMAP` is often 111referred to as "trim" or "deallocate", and is a request to mark a set of 112blocks as no longer containing valid data. `SPDK_BDEV_IO_TYPE_FLUSH` is a 113request to make all previously completed writes durable. Many devices do not 114require flushes. `SPDK_BDEV_IO_TYPE_WRITE_ZEROES` is just like a regular 115write, but does not provide a data buffer (it would have just contained all 1160's). If it isn't supported, the generic bdev code is capable of emulating it 117by sending regular write requests. 118 119`SPDK_BDEV_IO_TYPE_RESET` is a request to abort all I/O and return the 120underlying device to its initial state. Do not complete the reset request 121until all I/O has been completed in some way. 122 123`SPDK_BDEV_IO_TYPE_NVME_ADMIN`, `SPDK_BDEV_IO_TYPE_NVME_IO`, and 124`SPDK_BDEV_IO_TYPE_NVME_IO_MD` are all mechanisms for passing raw NVMe 125commands through the SPDK bdev layer. They're strictly optional, and it 126probably only makes sense to implement those if the backing storage device is 127capable of handling NVMe commands. 128 129The `get_io_channel` function should return an I/O channel. For a detailed 130explanation of I/O channels, see @ref concurrency. The generic bdev layer will 131call `get_io_channel` one time per thread, cache the result, and pass that 132result to `submit_request`. It will use the corresponding channel for the 133thread it calls `submit_request` on. 134 135The `submit_request` function is called to actually submit I/O requests to the 136block device. Once the I/O request is completed, the module must call 137spdk_bdev_io_complete(). The I/O does not have to finish within the calling 138context of `submit_request`. 139 140Integrating a new bdev module into the build system requires updates to various 141files in the /mk directory. 142 143## Creating Bdevs in an External Repository 144 145A User can build their own bdev module and application on top of existing SPDK libraries. The example in 146test/external_code serves as a template for creating, building and linking an external 147bdev module. Refer to test/external_code/README.md and @ref so_linking for further information. 148 149## Creating Virtual Bdevs 150 151Block devices are considered virtual if they handle I/O requests by routing 152the I/O to other block devices. The canonical example would be a bdev module 153that implements RAID. Virtual bdevs are created in the same way as regular 154bdevs, but take the one additional step of claiming the bdev. 155 156The module can open the underlying bdevs it wishes to route I/O to using 157spdk_bdev_open_ext(), where the string name is provided by the user via an RPC. 158To ensure that other consumers do not modify the underlying bdev in an unexpected 159way, the virtual bdev should take a claim on the underlying bdev before 160reading from or writing to the underlying bdev. 161 162There are two slightly different APIs for taking and releasing claims. The 163preferred interface uses `spdk_bdev_module_claim_bdev_desc()`. This method allows 164claims that ensure there is a single writer with 165`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE`, cooperating shared writers with 166`SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`, and shared readers that prevent any 167writers with `SPDK_BDEV_CLAIM_READ_MANY_WRITE_NONE`. In all cases, 168`spdk_bdev_open_ext()` may be used to open the underlying bdev read-only. If a 169read-only bdev descriptor successfully claims a bdev with 170`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE` or `SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED` 171the bdev descriptor is promoted to read-write. 172Any claim that is obtained with `spdk_bdev_module_claim_bdev_desc()` is 173automatically released upon closing the bdev descriptor used to obtain the 174claim. Shared claims continue to block new incompatible claims and new writers 175until the last claim is released. 176 177The non-preferred interface for obtaining a claim allows the caller to obtain 178an exclusive writer claim with `spdk_bdev_module_claim_bdev()`. It may be 179be released with `spdk_bdev_module_release_bdev()`. If a read-only bdev 180descriptor is passed, it is promoted to read-write. NULL may be passed instead 181of a bdev descriptor to avoid promotion and to block new writers. New code 182should use `spdk_bdev_module_claim_bdev_desc()` with the claim type that is 183tailored to the virtual bdev's needs. 184 185The descriptor obtained from the successful spdk_bdev_open_ext() may be used 186with spdk_bdev_get_io_channel() to obtain I/O channels for the bdev. This is 187likely done in response to the virtual bdev's `get_io_channel` callback. 188Channels may be obtained before and/or after claiming the underlying bdev, but 189beware there may be other unknown writers until the underlying bdev has been 190claimed. 191 192When a virtual bdev module claims an underlying bdev from its `examine_config` 193callback, it causes the `examine_disk` callback to only be called for this 194module and any others that establish a shared claim. If no claims are taken by 195`examine_config` callbacks, all virtual bdevs' `examine_disk` callbacks are 196called. 197