xref: /spdk/doc/nvmf.md (revision f93b6fb0a4ebcee203e7c44c9e170c20bbce96cc)
1# NVMe over Fabrics Target {#nvmf}
2
3@sa @ref nvme_fabrics_host
4@sa @ref nvmf_tgt_tracepoints
5
6# NVMe-oF Target Getting Started Guide {#nvmf_getting_started}
7
8The SPDK NVMe over Fabrics target is a user space application that presents block devices over a fabrics
9such as Ethernet, Infiniband or Fibre Channel. SPDK currently supports RDMA and TCP transports.
10
11The NVMe over Fabrics specification defines subsystems that can be exported over different transports.
12SPDK has chosen to call the software that exports these subsystems a "target", which is the term used
13for iSCSI. The specification refers to the "client" that connects to the target as a "host". Many
14people will also refer to the host as an "initiator", which is the equivalent thing in iSCSI
15parlance. SPDK will try to stick to the terms "target" and "host" to match the specification.
16
17The Linux kernel also implements an NVMe-oF target and host, and SPDK is tested for
18interoperability with the Linux kernel implementations.
19
20If you want to kill the application using signal, make sure use the SIGTERM, then the application
21will release all the share memory resource before exit, the SIGKILL will make the share memory
22resource have no chance to be released by application, you may need to release the resource manually.
23
24## RDMA transport support {#nvmf_rdma_transport}
25
26It requires an RDMA-capable NIC with its corresponding OFED (OpenFabrics Enterprise Distribution)
27software package installed to run. Maybe OS distributions provide packages, but OFED is also
28available [here](https://downloads.openfabrics.org/OFED/).
29
30### Prerequisites {#nvmf_prereqs}
31
32To build nvmf_tgt with the RDMA transport, there are some additional dependencies.
33
34Fedora:
35~~~{.sh}
36dnf install libibverbs-devel librdmacm-devel
37~~~
38
39Ubuntu:
40~~~{.sh}
41apt-get install libibverbs-dev librdmacm-dev
42~~~
43
44Then build SPDK with RDMA enabled:
45
46~~~{.sh}
47./configure --with-rdma <other config parameters>
48make
49~~~
50
51Once built, the binary will be in `app/nvmf_tgt`.
52
53### Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
54
55Before starting our NVMe-oF target with the RDMA transport we must load the InfiniBand and RDMA modules
56that allow userspace processes to use InfiniBand/RDMA verbs directly.
57
58~~~{.sh}
59modprobe ib_cm
60modprobe ib_core
61# Please note that ib_ucm does not exist in newer versions of the kernel and is not required.
62modprobe ib_ucm || true
63modprobe ib_umad
64modprobe ib_uverbs
65modprobe iw_cm
66modprobe rdma_cm
67modprobe rdma_ucm
68~~~
69
70### Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
71
72Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
73
74### Finding RDMA NICs and associated network interfaces
75
76~~~{.sh}
77ls /sys/class/infiniband/*/device/net
78~~~
79
80#### Mellanox ConnectX-3 RDMA NICs
81
82~~~{.sh}
83modprobe mlx4_core
84modprobe mlx4_ib
85modprobe mlx4_en
86~~~
87
88#### Mellanox ConnectX-4 RDMA NICs
89
90~~~{.sh}
91modprobe mlx5_core
92modprobe mlx5_ib
93~~~
94
95#### Assigning IP addresses to RDMA NICs
96
97~~~{.sh}
98ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up
99ifconfig eth2 192.168.100.9 netmask 255.255.255.0 up
100~~~
101
102### RDMA Limitations {#nvmf_rdma_limitations}
103
104As RDMA NICs put a limitation on the number of memory regions registered, the SPDK NVMe-oF
105target application may eventually start failing to allocate more DMA-able memory. This is
106an imperfection of the DPDK dynamic memory management and is most likely to occur with too
107many 2MB hugepages reserved at runtime. One type of memory bottleneck is the number of NIC memory
108regions, e.g., some NICs report as many as 2048 for the maximum number of memory regions. This
109gives us a 4GB memory limit with 2MB hugepages for the total memory regions. It can be overcome by
110using 1GB hugepages or by pre-reserving memory at application startup with `--mem-size` or `-s`
111option. All pre-reserved memory will be registered as a single region, but won't be returned to the
112system until the SPDK application is terminated.
113
114## TCP transport support {#nvmf_tcp_transport}
115
116The transport is built into the nvmf_tgt by default, and it does not need any special libraries.
117
118## Configuring the SPDK NVMe over Fabrics Target {#nvmf_config}
119
120An NVMe over Fabrics target can be configured using JSON RPCs.
121The basic RPCs needed to configure the NVMe-oF subsystem are detailed below. More information about
122working with NVMe over Fabrics specific RPCs can be found on the @ref jsonrpc_components_nvmf_tgt RPC page.
123
124Using .ini style configuration files for configuration of the NVMe-oF target is deprecated and should
125be replaced with JSON based RPCs. .ini style configuration files can be converted to json format by way
126of the new script `scripts/config_converter.py`.
127
128### Using RPCs {#nvmf_config_rpc}
129
130Start the nvmf_tgt application with elevated privileges. Once the target is started,
131the nvmf_create_transport rpc can be used to initialize a given transport. Below is an
132example where the target is started and configured with two different transports.
133The RDMA transport is configured with an I/O unit size of 8192 bytes, 4 max qpairs per controller,
134and an in capsule data size of 0 bytes. The TCP transport is configured with an I/O unit size of
13516384 bytes, 8 max qpairs per controller, and an in capsule data size of 8192 bytes.
136
137~~~{.sh}
138app/nvmf_tgt/nvmf_tgt
139scripts/rpc.py nvmf_create_transport -t RDMA -u 8192 -p 4 -c 0
140scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -p 8 -c 8192
141~~~
142
143Below is an example of creating a malloc bdev and assigning it to a subsystem. Adjust the bdevs,
144NQN, serial number, and IP address with RDMA transport to your own circumstances. If you replace
145"rdma" with "TCP", then the subsystem will add a listener with TCP transport.
146
147~~~{.sh}
148scripts/rpc.py construct_malloc_bdev -b Malloc0 512 512
149scripts/rpc.py nvmf_subsystem_create nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1
150scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0
151scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t rdma -a 192.168.100.8 -s 4420
152~~~
153
154### NQN Formal Definition
155
156NVMe qualified names or NQNs are defined in section 7.9 of the
157[NVMe specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf). SPDK has attempted to
158formalize that definition using [Extended Backus-Naur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form).
159SPDK modules use this formal definition (provided below) when validating NQNs.
160
161~~~{.sh}
162
163Basic Types
164year = 4 * digit ;
165month = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12' ;
166digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
167hex digit = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
168
169NQN Definition
170NVMe Qualified Name = ( NVMe-oF Discovery NQN | NVMe UUID NQN | NVMe Domain NQN ), '\0' ;
171NVMe-oF Discovery NQN = "nqn.2014-08.org.nvmexpress.discovery" ;
172NVMe UUID NQN = "nqn.2014-08.org.nvmexpress:uuid:", string UUID ;
173string UUID = 8 * hex digit, '-', 3 * (4 * hex digit, '-'), 12 * hex digit ;
174NVMe Domain NQN = "nqn.", year, '-', month, '.', reverse domain, ':', utf-8 string ;
175
176~~~
177
178Please note that the following types from the definition above are defined elsewhere:
1791. utf-8 string: Defined in [rfc 3629](https://tools.ietf.org/html/rfc3629).
1802. reverse domain: Equivalent to domain name as defined in [rfc 1034](https://tools.ietf.org/html/rfc1034).
181
182While not stated in the formal definition, SPDK enforces the requirement from the spec that the
183"maximum name is 223 bytes in length". SPDK does not include the null terminating character when
184defining the length of an nqn, and will accept an nqn containing up to 223 valid bytes with an
185additional null terminator. To be precise, SPDK follows the same conventions as the c standard
186library function [strlen()](http://man7.org/linux/man-pages/man3/strlen.3.html).
187
188#### NQN Comparisons
189
190SPDK compares NQNs byte for byte without case matching or unicode normalization. This has specific implications for
191uuid based NQNs. The following pair of NQNs, for example, would not match when compared in the SPDK NVMe-oF Target:
192
193nqn.2014-08.org.nvmexpress:uuid:11111111-aaaa-bbdd-ffee-123456789abc
194nqn.2014-08.org.nvmexpress:uuid:11111111-AAAA-BBDD-FFEE-123456789ABC
195
196In order to ensure the consistency of uuid based NQNs while using SPDK, users should use lowercase when representing
197alphabetic hex digits in their NQNs.
198
199### Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore}
200
201SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
202to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
203functions to assign threads to specific cores.
204To ensure the SPDK NVMe-oF target has the best performance, configure the NICs and NVMe devices to
205be located on the same NUMA node.
206
207The `-m` core mask option specifies a bit mask of the CPU cores that
208SPDK is allowed to execute work items on.
209For example, to allow SPDK to use cores 24, 25, 26 and 27:
210~~~{.sh}
211app/nvmf_tgt/nvmf_tgt -m 0xF000000
212~~~
213
214## Configuring the Linux NVMe over Fabrics Host {#nvmf_host}
215
216Both the Linux kernel and SPDK implement an NVMe over Fabrics host.
217The Linux kernel NVMe-oF RDMA host support is provided by the `nvme-rdma` driver
218(to support RDMA transport) and `nvme-tcp` (to support TCP transport). And the
219following shows two different commands for loading the driver.
220
221~~~{.sh}
222modprobe nvme-rdma
223modprobe nvme-tcp
224~~~
225
226The nvme-cli tool may be used to interface with the Linux kernel NVMe over Fabrics host.
227See below for examples of the discover, connect and disconnect commands. In all three instances, the
228transport can be changed to TCP by interchanging 'rdma' for 'tcp'.
229
230Discovery:
231~~~{.sh}
232nvme discover -t rdma -a 192.168.100.8 -s 4420
233~~~
234
235Connect:
236~~~{.sh}
237nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 192.168.100.8 -s 4420
238~~~
239
240Disconnect:
241~~~{.sh}
242nvme disconnect -n "nqn.2016-06.io.spdk:cnode1"
243~~~
244
245## Enabling NVMe-oF target tracepoints for offline analysis and debug {#nvmf_trace}
246
247SPDK has a tracing framework for capturing low-level event information at runtime.
248@ref nvmf_tgt_tracepoints enable analysis of both performance and application crashes.
249