1 # Writing a Custom Block Device Module {#bdev_module} 2 3 ## Target Audience 4 5 This programming guide is intended for developers authoring their own block 6 device modules to integrate with SPDK's bdev layer. For a guide on how to use 7 the bdev layer, see @ref bdev_pg. 8 9 ## Introduction 10 11 A block device module is SPDK's equivalent of a device driver in a traditional 12 operating system. The module provides a set of function pointers that are 13 called to service block device I/O requests. SPDK provides a number of block 14 device modules including NVMe, RAM-disk, and Ceph RBD. However, some users 15 will want to write their own to interact with either custom hardware or to an 16 existing storage software stack. This guide is intended to demonstrate exactly 17 how to write a module. 18 19 ## Creating A New Module 20 21 Block device modules are located in subdirectories under lib/bdev today. It is not 22 currently possible to place the code for a bdev module elsewhere, but updates 23 to the build system could be made to enable this in the future. To create a 24 module, add a new directory with a single C file and a Makefile. A great 25 starting point is to copy the existing 'null' bdev module. 26 27 The primary interface that bdev modules will interact with is in 28 include/spdk/bdev_module.h. In that header a macro is defined that registers 29 a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a 30 pointer spdk_bdev_module structure that is used to register new bdev module. 31 32 The spdk_bdev_module structure describes the module properties like 33 initialization (`module_init`) and teardown (`module_fini`) functions, 34 the function that returns context size (`get_ctx_size`) - scratch space that 35 will be allocated in each I/O request for use by this module, and a callback 36 that will be called each time a new bdev is registered by another module 37 (`examine_config` and `examine_disk`). Please check the documentation of 38 struct spdk_bdev_module for more details. 39 40 ## Creating Bdevs 41 42 New bdevs are created within the module by calling spdk_bdev_register(). The 43 module must allocate a struct spdk_bdev, fill it out appropriately, and pass 44 it to the register call. The most important field to fill out is `fn_table`, 45 which points at this data structure: 46 47 ~~~{.c} 48 /* 49 * Function table for a block device backend. 50 * 51 * The backend block device function table provides a set of APIs to allow 52 * communication with a backend. The main commands are read/write API 53 * calls for I/O via submit_request. 54 */ 55 struct spdk_bdev_fn_table { 56 /* Destroy the backend block device object */ 57 int (*destruct)(void *ctx); 58 59 /* Process the IO. */ 60 void (*submit_request)(struct spdk_io_channel *ch, struct spdk_bdev_io *); 61 62 /* Check if the block device supports a specific I/O type. */ 63 bool (*io_type_supported)(void *ctx, enum spdk_bdev_io_type); 64 65 /* Get an I/O channel for the specific bdev for the calling thread. */ 66 struct spdk_io_channel *(*get_io_channel)(void *ctx); 67 68 /* 69 * Output driver-specific configuration to a JSON stream. Optional - may be NULL. 70 * 71 * The JSON write context will be initialized with an open object, so the bdev 72 * driver should write a name (based on the driver name) followed by a JSON value 73 * (most likely another nested object). 74 */ 75 int (*dump_config_json)(void *ctx, struct spdk_json_write_ctx *w); 76 77 /* Get spin-time per I/O channel in microseconds. 78 * Optional - may be NULL. 79 */ 80 uint64_t (*get_spin_time)(struct spdk_io_channel *ch); 81 }; 82 ~~~ 83 84 The bdev module must implement these function callbacks. 85 86 The `destruct` function is called to tear down the device when the system no 87 longer needs it. What `destruct` does is up to the module - it may just be 88 freeing memory or it may be shutting down a piece of hardware. 89 90 The `io_type_supported` function returns whether a particular I/O type is 91 supported. The available I/O types are: 92 93 ~~~{.c} 94 /** bdev I/O type */ 95 enum spdk_bdev_io_type { 96 SPDK_BDEV_IO_TYPE_INVALID = 0, 97 SPDK_BDEV_IO_TYPE_READ, 98 SPDK_BDEV_IO_TYPE_WRITE, 99 SPDK_BDEV_IO_TYPE_UNMAP, 100 SPDK_BDEV_IO_TYPE_FLUSH, 101 SPDK_BDEV_IO_TYPE_RESET, 102 SPDK_BDEV_IO_TYPE_NVME_ADMIN, 103 SPDK_BDEV_IO_TYPE_NVME_IO, 104 SPDK_BDEV_IO_TYPE_NVME_IO_MD, 105 SPDK_BDEV_IO_TYPE_WRITE_ZEROES, 106 }; 107 ~~~ 108 109 For the simplest bdev modules, only `SPDK_BDEV_IO_TYPE_READ` and 110 `SPDK_BDEV_IO_TYPE_WRITE` are necessary. `SPDK_BDEV_IO_TYPE_UNMAP` is often 111 referred to as "trim" or "deallocate", and is a request to mark a set of 112 blocks as no longer containing valid data. `SPDK_BDEV_IO_TYPE_FLUSH` is a 113 request to make all previously completed writes durable. Many devices do not 114 require flushes. `SPDK_BDEV_IO_TYPE_WRITE_ZEROES` is just like a regular 115 write, but does not provide a data buffer (it would have just contained all 116 0's). If it isn't supported, the generic bdev code is capable of emulating it 117 by sending regular write requests. 118 119 `SPDK_BDEV_IO_TYPE_RESET` is a request to abort all I/O and return the 120 underlying device to its initial state. Do not complete the reset request 121 until all I/O has been completed in some way. 122 123 `SPDK_BDEV_IO_TYPE_NVME_ADMIN`, `SPDK_BDEV_IO_TYPE_NVME_IO`, and 124 `SPDK_BDEV_IO_TYPE_NVME_IO_MD` are all mechanisms for passing raw NVMe 125 commands through the SPDK bdev layer. They're strictly optional, and it 126 probably only makes sense to implement those if the backing storage device is 127 capable of handling NVMe commands. 128 129 The `get_io_channel` function should return an I/O channel. For a detailed 130 explanation of I/O channels, see @ref concurrency. The generic bdev layer will 131 call `get_io_channel` one time per thread, cache the result, and pass that 132 result to `submit_request`. It will use the corresponding channel for the 133 thread it calls `submit_request` on. 134 135 The `submit_request` function is called to actually submit I/O requests to the 136 block device. Once the I/O request is completed, the module must call 137 spdk_bdev_io_complete(). The I/O does not have to finish within the calling 138 context of `submit_request`. 139 140 ## Creating Virtual Bdevs 141 142 Block devices are considered virtual if they handle I/O requests by routing 143 the I/O to other block devices. The canonical example would be a bdev module 144 that implements RAID. Virtual bdevs are created in the same way as regular 145 bdevs, but take one additional step. The module can look up the underlying 146 bdevs it wishes to route I/O to using spdk_bdev_get_by_name(), where the string 147 name is provided by the user in a configuration file or via an RPC. The module 148 then may proceed is normal by opening the bdev to obtain a descriptor, and 149 creating I/O channels for the bdev (probably in response to the 150 `get_io_channel` callback). The final step is to have the module use its open 151 descriptor to call spdk_bdev_module_claim_bdev(), indicating that it is 152 consuming the underlying bdev. This prevents other users from opening 153 descriptors with write permissions. This effectively 'promotes' the descriptor 154 to write-exclusive and is an operation only available to bdev modules. 155