1# NVMe over Fabrics Target {#nvmf} 2 3@sa @ref nvme_fabrics_host 4@sa @ref nvmf_tgt_tracepoints 5 6## NVMe-oF Target Getting Started Guide {#nvmf_getting_started} 7 8The SPDK NVMe over Fabrics target is a user space application that presents block devices over a fabrics 9such as Ethernet, Infiniband or Fibre Channel. SPDK currently supports RDMA and TCP transports. 10 11The NVMe over Fabrics specification defines subsystems that can be exported over different transports. 12SPDK has chosen to call the software that exports these subsystems a "target", which is the term used 13for iSCSI. The specification refers to the "client" that connects to the target as a "host". Many 14people will also refer to the host as an "initiator", which is the equivalent thing in iSCSI 15parlance. SPDK will try to stick to the terms "target" and "host" to match the specification. 16 17The Linux kernel also implements an NVMe-oF target and host, and SPDK is tested for 18interoperability with the Linux kernel implementations. 19 20If you want to kill the application using signal, make sure use the SIGTERM, then the application 21will release all the share memory resource before exit, the SIGKILL will make the share memory 22resource have no chance to be released by application, you may need to release the resource manually. 23 24## RDMA transport support {#nvmf_rdma_transport} 25 26It requires an RDMA-capable NIC with its corresponding OFED (OpenFabrics Enterprise Distribution) 27software package installed to run. Maybe OS distributions provide packages, but OFED is also 28available [here](https://downloads.openfabrics.org/OFED/). 29 30### Prerequisites {#nvmf_prereqs} 31 32To build nvmf_tgt with the RDMA transport, there are some additional dependencies, 33which can be install using pkgdep.sh script. 34 35~~~{.sh} 36sudo scripts/pkgdep.sh --rdma 37~~~ 38 39Then build SPDK with RDMA enabled: 40 41~~~{.sh} 42./configure --with-rdma <other config parameters> 43make 44~~~ 45 46Once built, the binary will be in `build/bin`. 47 48### Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs} 49 50Before starting our NVMe-oF target with the RDMA transport we must load the InfiniBand and RDMA modules 51that allow userspace processes to use InfiniBand/RDMA verbs directly. 52 53~~~{.sh} 54modprobe ib_cm 55modprobe ib_core 56# Please note that ib_ucm does not exist in newer versions of the kernel and is not required. 57modprobe ib_ucm || true 58modprobe ib_umad 59modprobe ib_uverbs 60modprobe iw_cm 61modprobe rdma_cm 62modprobe rdma_ucm 63~~~ 64 65### Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics} 66 67Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses. 68 69### Finding RDMA NICs and associated network interfaces 70 71~~~{.sh} 72ls /sys/class/infiniband/*/device/net 73~~~ 74 75#### Mellanox ConnectX-3 RDMA NICs 76 77~~~{.sh} 78modprobe mlx4_core 79modprobe mlx4_ib 80modprobe mlx4_en 81~~~ 82 83#### Mellanox ConnectX-4 RDMA NICs 84 85~~~{.sh} 86modprobe mlx5_core 87modprobe mlx5_ib 88~~~ 89 90#### Assigning IP addresses to RDMA NICs 91 92~~~{.sh} 93ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up 94ifconfig eth2 192.168.100.9 netmask 255.255.255.0 up 95~~~ 96 97### RDMA Limitations {#nvmf_rdma_limitations} 98 99As RDMA NICs put a limitation on the number of memory regions registered, the SPDK NVMe-oF 100target application may eventually start failing to allocate more DMA-able memory. This is 101an imperfection of the DPDK dynamic memory management and is most likely to occur with too 102many 2MB hugepages reserved at runtime. One type of memory bottleneck is the number of NIC memory 103regions, e.g., some NICs report as many as 2048 for the maximum number of memory regions. This 104gives us a 4GB memory limit with 2MB hugepages for the total memory regions. It can be overcome by 105using 1GB hugepages or by pre-reserving memory at application startup with `--mem-size` or `-s` 106option. All pre-reserved memory will be registered as a single region, but won't be returned to the 107system until the SPDK application is terminated. 108 109Another known issue occurs when using the E810 NICs in RoCE mode. Specifically, the NVMe-oF target 110sometimes cannot destroy a qpair, because its posted work requests don't get flushed. It can cause 111the NVMe-oF target application unable to terminate cleanly. 112 113## TCP transport support {#nvmf_tcp_transport} 114 115The transport is built into the nvmf_tgt by default, and it does not need any special libraries. 116 117## FC transport support {#nvmf_fc_transport} 118 119To build nvmf_tgt with the FC transport, there is an additional FC LLD (Low Level Driver) code dependency. 120Please contact your FC vendor for instructions to obtain FC driver module. 121 122### Broadcom FC LLD code 123 124FC LLD driver for Broadcom FC NVMe capable adapters can be obtained from, 125https://github.com/ecdufcdrvr/bcmufctdrvr. 126 127### Fetch FC LLD module and then build SPDK with FC enabled 128 129After cloning SPDK repo and initialize submodules, FC LLD library is built which then can be linked with 130the fc transport. 131 132~~~{.sh} 133git clone https://github.com/spdk/spdk spdk 134git clone https://github.com/ecdufcdrvr/bcmufctdrvr fc 135cd spdk 136git submodule update --init 137cd ../fc 138make DPDK_DIR=../spdk/dpdk/build SPDK_DIR=../spdk 139cd ../spdk 140./configure --with-fc=../fc/build 141make 142~~~ 143 144## Configuring the SPDK NVMe over Fabrics Target {#nvmf_config} 145 146An NVMe over Fabrics target can be configured using JSON RPCs. 147The basic RPCs needed to configure the NVMe-oF subsystem are detailed below. More information about 148working with NVMe over Fabrics specific RPCs can be found on the @ref jsonrpc_components_nvmf_tgt RPC page. 149 150### Using RPCs {#nvmf_config_rpc} 151 152Start the nvmf_tgt application with elevated privileges. Once the target is started, 153the nvmf_create_transport rpc can be used to initialize a given transport. Below is an 154example where the target is started and configured with two different transports. 155The RDMA transport is configured with an I/O unit size of 8192 bytes, max I/O size 131072 and an 156in capsule data size of 8192 bytes. The TCP transport is configured with an I/O unit size of 15716384 bytes, 8 max qpairs per controller, and an in capsule data size of 8192 bytes. 158 159~~~{.sh} 160build/bin/nvmf_tgt 161scripts/rpc.py nvmf_create_transport -t RDMA -u 8192 -i 131072 -c 8192 162scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192 163~~~ 164 165Below is an example of creating a malloc bdev and assigning it to a subsystem. Adjust the bdevs, 166NQN, serial number, and IP address with RDMA transport to your own circumstances. If you replace 167"rdma" with "TCP", then the subsystem will add a listener with TCP transport. 168 169~~~{.sh} 170scripts/rpc.py bdev_malloc_create -b Malloc0 512 512 171scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1 172scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Malloc0 173scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t rdma -a 192.168.100.8 -s 4420 174~~~ 175 176### NQN Formal Definition 177 178NVMe qualified names or NQNs are defined in section 7.9 of the 179[NVMe specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf). SPDK has attempted to 180formalize that definition using [Extended Backus-Naur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form). 181SPDK modules use this formal definition (provided below) when validating NQNs. 182 183~~~{.sh} 184 185Basic Types 186year = 4 * digit ; 187month = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12' ; 188digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; 189hex digit = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | '0' | 190'1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; 191 192NQN Definition 193NVMe Qualified Name = ( NVMe-oF Discovery NQN | NVMe UUID NQN | NVMe Domain NQN ), '\0' ; 194NVMe-oF Discovery NQN = "nqn.2014-08.org.nvmexpress.discovery" ; 195NVMe UUID NQN = "nqn.2014-08.org.nvmexpress:uuid:", string UUID ; 196string UUID = 8 * hex digit, '-', 3 * (4 * hex digit, '-'), 12 * hex digit ; 197NVMe Domain NQN = "nqn.", year, '-', month, '.', reverse domain, ':', utf-8 string ; 198 199~~~ 200 201Please note that the following types from the definition above are defined elsewhere: 202 2031. utf-8 string: Defined in [rfc 3629](https://tools.ietf.org/html/rfc3629). 2042. reverse domain: Equivalent to domain name as defined in [rfc 1034](https://tools.ietf.org/html/rfc1034). 205 206While not stated in the formal definition, SPDK enforces the requirement from the spec that the 207"maximum name is 223 bytes in length". SPDK does not include the null terminating character when 208defining the length of an nqn, and will accept an nqn containing up to 223 valid bytes with an 209additional null terminator. To be precise, SPDK follows the same conventions as the c standard 210library function [strlen()](http://man7.org/linux/man-pages/man3/strlen.3.html). 211 212#### NQN Comparisons 213 214SPDK compares NQNs byte for byte without case matching or unicode normalization. This has specific implications for 215uuid based NQNs. The following pair of NQNs, for example, would not match when compared in the SPDK NVMe-oF Target: 216 217nqn.2014-08.org.nvmexpress:uuid:11111111-aaaa-bbdd-ffee-123456789abc 218nqn.2014-08.org.nvmexpress:uuid:11111111-AAAA-BBDD-FFEE-123456789ABC 219 220In order to ensure the consistency of uuid based NQNs while using SPDK, users should use lowercase when representing 221alphabetic hex digits in their NQNs. 222 223### Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore} 224 225SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html) 226to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides 227functions to assign threads to specific cores. 228To ensure the SPDK NVMe-oF target has the best performance, configure the NICs and NVMe devices to 229be located on the same NUMA node. 230 231The `-m` core mask option specifies a bit mask of the CPU cores that 232SPDK is allowed to execute work items on. 233For example, to allow SPDK to use cores 24, 25, 26 and 27: 234~~~{.sh} 235build/bin/nvmf_tgt -m 0xF000000 236~~~ 237 238## Configuring the Linux NVMe over Fabrics Host {#nvmf_host} 239 240Both the Linux kernel and SPDK implement an NVMe over Fabrics host. 241The Linux kernel NVMe-oF RDMA host support is provided by the `nvme-rdma` driver 242(to support RDMA transport) and `nvme-tcp` (to support TCP transport). And the 243following shows two different commands for loading the driver. 244 245~~~{.sh} 246modprobe nvme-rdma 247modprobe nvme-tcp 248~~~ 249 250The nvme-cli tool may be used to interface with the Linux kernel NVMe over Fabrics host. 251See below for examples of the discover, connect and disconnect commands. In all three instances, the 252transport can be changed to TCP by interchanging 'rdma' for 'tcp'. 253 254Discovery: 255~~~{.sh} 256nvme discover -t rdma -a 192.168.100.8 -s 4420 257~~~ 258 259Connect: 260~~~{.sh} 261nvme connect -t rdma -n "nqn.2016-06.io.spdk:cnode1" -a 192.168.100.8 -s 4420 262~~~ 263 264Disconnect: 265~~~{.sh} 266nvme disconnect -n "nqn.2016-06.io.spdk:cnode1" 267~~~ 268 269## Enabling NVMe-oF target tracepoints for offline analysis and debug {#nvmf_trace} 270 271SPDK has a tracing framework for capturing low-level event information at runtime. 272@ref nvmf_tgt_tracepoints enable analysis of both performance and application crashes. 273 274## Enabling NVMe-oF Multipath 275 276The SPDK NVMe-oF target and initiator support multiple independent paths to the same NVMe-oF subsystem. 277For step-by-step instructions for configuring and switching between paths, see @ref nvmf_multipath_howto . 278