xref: /spdk/doc/bdev_module.md (revision 58c75caaaa9745b6f3592e379966ee622fc00b1e)
1daf33a09SBen Walker# Writing a Custom Block Device Module {#bdev_module}
2daf33a09SBen Walker
3daf33a09SBen Walker## Target Audience
4daf33a09SBen Walker
5daf33a09SBen WalkerThis programming guide is intended for developers authoring their own block
6daf33a09SBen Walkerdevice modules to integrate with SPDK's bdev layer. For a guide on how to use
7daf33a09SBen Walkerthe bdev layer, see @ref bdev_pg.
8daf33a09SBen Walker
9daf33a09SBen Walker## Introduction
10daf33a09SBen Walker
11daf33a09SBen WalkerA block device module is SPDK's equivalent of a device driver in a traditional
12daf33a09SBen Walkeroperating system. The module provides a set of function pointers that are
13daf33a09SBen Walkercalled to service block device I/O requests. SPDK provides a number of block
14daf33a09SBen Walkerdevice modules including NVMe, RAM-disk, and Ceph RBD. However, some users
15daf33a09SBen Walkerwill want to write their own to interact with either custom hardware or to an
16daf33a09SBen Walkerexisting storage software stack. This guide is intended to demonstrate exactly
17daf33a09SBen Walkerhow to write a module.
18daf33a09SBen Walker
19daf33a09SBen Walker## Creating A New Module
20daf33a09SBen Walker
217a660b30SMonica KenguvaBlock device modules are located in subdirectories under module/bdev today. It is not
22daf33a09SBen Walkercurrently possible to place the code for a bdev module elsewhere, but updates
23daf33a09SBen Walkerto the build system could be made to enable this in the future. To create a
24daf33a09SBen Walkermodule, add a new directory with a single C file and a Makefile. A great
25daf33a09SBen Walkerstarting point is to copy the existing 'null' bdev module.
26daf33a09SBen Walker
27daf33a09SBen WalkerThe primary interface that bdev modules will interact with is in
288632afe7SJohn Kariukiinclude/spdk/bdev_module.h. In that header a macro is defined that registers
294d367354SPawel Wodkowskia new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a
3019100ed5SDaniel Verkamppointer spdk_bdev_module structure that is used to register new bdev module.
314d367354SPawel Wodkowski
3219100ed5SDaniel VerkampThe spdk_bdev_module structure describes the module properties like
334d367354SPawel Wodkowskiinitialization (`module_init`) and teardown (`module_fini`) functions,
344d367354SPawel Wodkowskithe function that returns context size (`get_ctx_size`) - scratch space that
354d367354SPawel Wodkowskiwill be allocated in each I/O request for use by this module, and a callback
364d367354SPawel Wodkowskithat will be called each time a new bdev is registered by another module
370497ae8eSPiotr Pelplinski(`examine_config` and `examine_disk`). Please check the documentation of
380497ae8eSPiotr Pelplinskistruct spdk_bdev_module for more details.
39daf33a09SBen Walker
40daf33a09SBen Walker## Creating Bdevs
41daf33a09SBen Walker
42daf33a09SBen WalkerNew bdevs are created within the module by calling spdk_bdev_register(). The
43daf33a09SBen Walkermodule must allocate a struct spdk_bdev, fill it out appropriately, and pass
44daf33a09SBen Walkerit to the register call. The most important field to fill out is `fn_table`,
45daf33a09SBen Walkerwhich points at this data structure:
46daf33a09SBen Walker
47daf33a09SBen Walker~~~{.c}
48daf33a09SBen Walker/*
49daf33a09SBen Walker * Function table for a block device backend.
50daf33a09SBen Walker *
51daf33a09SBen Walker * The backend block device function table provides a set of APIs to allow
52daf33a09SBen Walker * communication with a backend. The main commands are read/write API
53daf33a09SBen Walker * calls for I/O via submit_request.
54daf33a09SBen Walker */
55daf33a09SBen Walkerstruct spdk_bdev_fn_table {
56daf33a09SBen Walker	/* Destroy the backend block device object */
57daf33a09SBen Walker	int (*destruct)(void *ctx);
58daf33a09SBen Walker
59daf33a09SBen Walker	/* Process the IO. */
60daf33a09SBen Walker	void (*submit_request)(struct spdk_io_channel *ch, struct spdk_bdev_io *);
61daf33a09SBen Walker
62daf33a09SBen Walker	/* Check if the block device supports a specific I/O type. */
63daf33a09SBen Walker	bool (*io_type_supported)(void *ctx, enum spdk_bdev_io_type);
64daf33a09SBen Walker
65daf33a09SBen Walker	/* Get an I/O channel for the specific bdev for the calling thread. */
66daf33a09SBen Walker	struct spdk_io_channel *(*get_io_channel)(void *ctx);
67daf33a09SBen Walker
68daf33a09SBen Walker	/*
69daf33a09SBen Walker	 * Output driver-specific configuration to a JSON stream. Optional - may be NULL.
70daf33a09SBen Walker	 *
71daf33a09SBen Walker	 * The JSON write context will be initialized with an open object, so the bdev
72daf33a09SBen Walker	 * driver should write a name (based on the driver name) followed by a JSON value
73daf33a09SBen Walker	 * (most likely another nested object).
74daf33a09SBen Walker	 */
75daf33a09SBen Walker	int (*dump_config_json)(void *ctx, struct spdk_json_write_ctx *w);
76daf33a09SBen Walker
77daf33a09SBen Walker	/* Get spin-time per I/O channel in microseconds.
78daf33a09SBen Walker	 *  Optional - may be NULL.
79daf33a09SBen Walker	 */
80daf33a09SBen Walker	uint64_t (*get_spin_time)(struct spdk_io_channel *ch);
81daf33a09SBen Walker};
82daf33a09SBen Walker~~~
83daf33a09SBen Walker
84daf33a09SBen WalkerThe bdev module must implement these function callbacks.
85daf33a09SBen Walker
86daf33a09SBen WalkerThe `destruct` function is called to tear down the device when the system no
87daf33a09SBen Walkerlonger needs it. What `destruct` does is up to the module - it may just be
88daf33a09SBen Walkerfreeing memory or it may be shutting down a piece of hardware.
89daf33a09SBen Walker
90daf33a09SBen WalkerThe `io_type_supported` function returns whether a particular I/O type is
91daf33a09SBen Walkersupported. The available I/O types are:
92daf33a09SBen Walker
93daf33a09SBen Walker~~~{.c}
94daf33a09SBen Walker/** bdev I/O type */
95daf33a09SBen Walkerenum spdk_bdev_io_type {
96daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_INVALID = 0,
97daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_READ,
98daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_WRITE,
99daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_UNMAP,
100daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_FLUSH,
101daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_RESET,
102daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_NVME_ADMIN,
103daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_NVME_IO,
104daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_NVME_IO_MD,
105daf33a09SBen Walker	SPDK_BDEV_IO_TYPE_WRITE_ZEROES,
106daf33a09SBen Walker};
107daf33a09SBen Walker~~~
108daf33a09SBen Walker
109daf33a09SBen WalkerFor the simplest bdev modules, only `SPDK_BDEV_IO_TYPE_READ` and
110daf33a09SBen Walker`SPDK_BDEV_IO_TYPE_WRITE` are necessary. `SPDK_BDEV_IO_TYPE_UNMAP` is often
111daf33a09SBen Walkerreferred to as "trim" or "deallocate", and is a request to mark a set of
112daf33a09SBen Walkerblocks as no longer containing valid data. `SPDK_BDEV_IO_TYPE_FLUSH` is a
113daf33a09SBen Walkerrequest to make all previously completed writes durable. Many devices do not
114daf33a09SBen Walkerrequire flushes. `SPDK_BDEV_IO_TYPE_WRITE_ZEROES` is just like a regular
115daf33a09SBen Walkerwrite, but does not provide a data buffer (it would have just contained all
116daf33a09SBen Walker0's). If it isn't supported, the generic bdev code is capable of emulating it
117daf33a09SBen Walkerby sending regular write requests.
118daf33a09SBen Walker
119daf33a09SBen Walker`SPDK_BDEV_IO_TYPE_RESET` is a request to abort all I/O and return the
120daf33a09SBen Walkerunderlying device to its initial state. Do not complete the reset request
121daf33a09SBen Walkeruntil all I/O has been completed in some way.
122daf33a09SBen Walker
123daf33a09SBen Walker`SPDK_BDEV_IO_TYPE_NVME_ADMIN`, `SPDK_BDEV_IO_TYPE_NVME_IO`, and
124daf33a09SBen Walker`SPDK_BDEV_IO_TYPE_NVME_IO_MD` are all mechanisms for passing raw NVMe
125daf33a09SBen Walkercommands through the SPDK bdev layer. They're strictly optional, and it
126daf33a09SBen Walkerprobably only makes sense to implement those if the backing storage device is
127daf33a09SBen Walkercapable of handling NVMe commands.
128daf33a09SBen Walker
129daf33a09SBen WalkerThe `get_io_channel` function should return an I/O channel. For a detailed
130daf33a09SBen Walkerexplanation of I/O channels, see @ref concurrency. The generic bdev layer will
131daf33a09SBen Walkercall `get_io_channel` one time per thread, cache the result, and pass that
132daf33a09SBen Walkerresult to `submit_request`. It will use the corresponding channel for the
133daf33a09SBen Walkerthread it calls `submit_request` on.
134daf33a09SBen Walker
135daf33a09SBen WalkerThe `submit_request` function is called to actually submit I/O requests to the
136daf33a09SBen Walkerblock device. Once the I/O request is completed, the module must call
137daf33a09SBen Walkerspdk_bdev_io_complete(). The I/O does not have to finish within the calling
138daf33a09SBen Walkercontext of `submit_request`.
139daf33a09SBen Walker
1407a660b30SMonica KenguvaIntegrating a new bdev module into the build system requires updates to various
1417a660b30SMonica Kenguvafiles in the /mk directory.
1427a660b30SMonica Kenguva
1437a660b30SMonica Kenguva## Creating Bdevs in an External Repository
1447a660b30SMonica Kenguva
1457a660b30SMonica KenguvaA User can build their own bdev module and application on top of existing SPDK libraries. The example in
1467a660b30SMonica Kenguvatest/external_code serves as a template for creating, building and linking an external
1477a660b30SMonica Kenguvabdev module. Refer to test/external_code/README.md and @ref so_linking for further information.
1487a660b30SMonica Kenguva
149daf33a09SBen Walker## Creating Virtual Bdevs
150daf33a09SBen Walker
151daf33a09SBen WalkerBlock devices are considered virtual if they handle I/O requests by routing
152daf33a09SBen Walkerthe I/O to other block devices. The canonical example would be a bdev module
153daf33a09SBen Walkerthat implements RAID. Virtual bdevs are created in the same way as regular
15424ea815bSMike Gerdtsbdevs, but take the one additional step of claiming the bdev.
15524ea815bSMike Gerdts
15624ea815bSMike GerdtsThe module can open the underlying bdevs it wishes to route I/O to using
15724ea815bSMike Gerdtsspdk_bdev_open_ext(), where the string name is provided by the user via an RPC.
158a7eb6187SMike GerdtsTo ensure that other consumers do not modify the underlying bdev in an unexpected
159a7eb6187SMike Gerdtsway, the virtual bdev should take a claim on the underlying bdev before
160a7eb6187SMike Gerdtsreading from or writing to the underlying bdev.
161a7eb6187SMike Gerdts
162a7eb6187SMike GerdtsThere are two slightly different APIs for taking and releasing claims. The
163a7eb6187SMike Gerdtspreferred interface uses `spdk_bdev_module_claim_bdev_desc()`. This method allows
164a7eb6187SMike Gerdtsclaims that ensure there is a single writer with
165*58c75caaSMike Gerdts`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE`, cooperating shared writers with
166*58c75caaSMike Gerdts`SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`, and shared readers that prevent any
167*58c75caaSMike Gerdtswriters with `SPDK_BDEV_CLAIM_READ_MANY_WRITE_NONE`. In all cases,
168a7eb6187SMike Gerdts`spdk_bdev_open_ext()` may be used to open the underlying bdev read-only. If a
169a7eb6187SMike Gerdtsread-only bdev descriptor successfully claims a bdev with
170*58c75caaSMike Gerdts`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE` or `SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`
171a7eb6187SMike Gerdtsthe bdev descriptor is promoted to read-write.
172a7eb6187SMike GerdtsAny claim that is obtained with `spdk_bdev_module_claim_bdev_desc()` is
173a7eb6187SMike Gerdtsautomatically released upon closing the bdev descriptor used to obtain the
174a7eb6187SMike Gerdtsclaim. Shared claims continue to block new incompatible claims and new writers
175a7eb6187SMike Gerdtsuntil the last claim is released.
176a7eb6187SMike Gerdts
177a7eb6187SMike GerdtsThe non-preferred interface for obtaining a claim allows the caller to obtain
178a7eb6187SMike Gerdtsan exclusive writer claim with `spdk_bdev_module_claim_bdev()`. It may be
179a7eb6187SMike Gerdtsbe released with `spdk_bdev_module_release_bdev()`. If a read-only bdev
180a7eb6187SMike Gerdtsdescriptor is passed, it is promoted to read-write.  NULL may be passed instead
181a7eb6187SMike Gerdtsof a bdev descriptor to avoid promotion and to block new writers. New code
182a7eb6187SMike Gerdtsshould use `spdk_bdev_module_claim_bdev_desc()` with the claim type that is
183a7eb6187SMike Gerdtstailored to the virtual bdev's needs.
18424ea815bSMike Gerdts
18524ea815bSMike GerdtsThe descriptor obtained from the successful spdk_bdev_open_ext() may be used
18624ea815bSMike Gerdtswith spdk_bdev_get_io_channel() to obtain I/O channels for the bdev. This is
18724ea815bSMike Gerdtslikely done in response to the virtual bdev's `get_io_channel` callback.
188a7eb6187SMike GerdtsChannels may be obtained before and/or after claiming the underlying bdev, but
189a7eb6187SMike Gerdtsbeware there may be other unknown writers until the underlying bdev has been
190a7eb6187SMike Gerdtsclaimed.
19124ea815bSMike Gerdts
192a7eb6187SMike GerdtsWhen a virtual bdev module claims an underlying bdev from its `examine_config`
193a7eb6187SMike Gerdtscallback, it causes the `examine_disk` callback to only be called for this
194a7eb6187SMike Gerdtsmodule and any others that establish a shared claim. If no claims are taken by
195a7eb6187SMike Gerdts`examine_config` callbacks, all virtual bdevs' `examine_disk` callbacks are
196a7eb6187SMike Gerdtscalled.
197