1e8ea27f8SMichal Berger# SPDK Docker suite 2e8ea27f8SMichal Berger 3e8ea27f8SMichal BergerThis suite is meant to serve as an example of how SPDK can be encapsulated 4e8ea27f8SMichal Bergerinto docker container images. The example containers consist of SPDK NVMe-oF 5e8ea27f8SMichal Bergertarget sharing devices to another SPDK NVMe-oF application. Which serves 6e8ea27f8SMichal Bergeras both initiator and target. Finally a traffic generator based on FIO 7e8ea27f8SMichal Bergerissues I/O to the connected devices. 874dcb373SMarcin SpiewakPlease note that some simplifications have been made to the configuration files 974dcb373SMarcin Spiewakfor the purpose of the example, please do not use the files directly in 1074dcb373SMarcin Spiewakthe production environment. 11e8ea27f8SMichal Berger 12e8ea27f8SMichal Berger## Prerequisites 13e8ea27f8SMichal Berger 14e8ea27f8SMichal Bergerdocker: We recommend version 20.10 and above because it supports cgroups v2 for 15e8ea27f8SMichal Bergercustomization of host resources like CPUs, memory, and block I/O. 16e8ea27f8SMichal Berger 17e8ea27f8SMichal Bergerdocker-compose: We recommend using 1.29.2 version or newer. 18e8ea27f8SMichal Berger 19e8ea27f8SMichal Bergerkernel: Hugepages must be allocated prior running the containers and hugetlbfs 20e8ea27f8SMichal Bergermount must be available under /dev/hugepages. Also, tmpfs should be mounted 21e8ea27f8SMichal Bergerunder /dev/shm. Depending on the use-case, some kernel modules should be also 22e8ea27f8SMichal Bergerloaded into the kernel prior running the containers. 23e8ea27f8SMichal Berger 24e8ea27f8SMichal Bergerproxy: If you are working behind firewall make sure dockerd is aware of the 25e8ea27f8SMichal Bergerproxy. Please refer to: 26e8ea27f8SMichal Berger[docker-proxy](https://docs.docker.com/config/daemon/systemd/#httphttps-proxy) 27e8ea27f8SMichal Berger 28e8ea27f8SMichal BergerTo pass `$http_proxy` to docker-compose build use: 29e8ea27f8SMichal Berger~~~{.sh} 30e8ea27f8SMichal Bergerdocker-compose build --build-arg PROXY=$http_proxy 31e8ea27f8SMichal Berger~~~ 32e8ea27f8SMichal Berger 33e8ea27f8SMichal Berger## How-To 34e8ea27f8SMichal Berger 35e8ea27f8SMichal Berger`docker-compose.yaml` shows an example deployment of the storage containers based on SPDK. 3615b0fb3aSTomasz ZawadzkiRunning `docker-compose build` creates 5 docker images: 37e8ea27f8SMichal Berger 38e8ea27f8SMichal Berger- build_base 39e8ea27f8SMichal Berger- storage-target 40e8ea27f8SMichal Berger- proxy-container 4115b0fb3aSTomasz Zawadzki- traffic-generator-nvme 4215b0fb3aSTomasz Zawadzki- traffic-generator-virtio 43e8ea27f8SMichal Berger 44e8ea27f8SMichal BergerThe `build_base` image provides the core components required to containerize SPDK 45ba453fbeSKarol Lateckiapplications. The fedora:35 image from the Fedora Container Registry is used and then SPDK is installed. SPDK is installed out of `build_base/spdk.tar.gz` provided. 46e8ea27f8SMichal BergerSee `build_base` folder for details on what's included in the final image. 47e8ea27f8SMichal Berger 48e8ea27f8SMichal BergerRunning `docker-compose up` creates 3 docker containers: 49e8ea27f8SMichal Berger 50*1204ddffSBoris Glimcher- storage-target: Contains SPDK NVMe-oF target exposing single subsystem to `proxy-container` based on malloc bdev. 51*1204ddffSBoris Glimcher- proxy-container: Connecting to `storage-target` and then exposing the same devices to `traffic-generator-nvme` using NVMe-oF and to `traffic-generator-virtio` using Virtio. 52*1204ddffSBoris Glimcher- traffic-generator-nvme: Contains FIO using SPDK plugin to connect to `proxy-container` and runs a sample workload. 53*1204ddffSBoris Glimcher- traffic-generator-virtio: Contains FIO using SPDK plugin to connect to `proxy-container` and runs a sample workload. 54e8ea27f8SMichal Berger 55e8ea27f8SMichal BergerEach container is connected to a separate "spdk" network which is created before 56e8ea27f8SMichal Bergerdeploying the containers. See `docker-compose.yaml` for the network's detailed setup and ip assignment. 57e8ea27f8SMichal Berger 58e8ea27f8SMichal BergerAll the above boils down to: 59e8ea27f8SMichal Berger 60e8ea27f8SMichal Berger~~~{.sh} 61e8ea27f8SMichal Bergercd docker 62e8ea27f8SMichal Bergertar -czf build_base/spdk.tar.gz --exclude='docker/*' -C .. . 63e8ea27f8SMichal Bergerdocker-compose build 64e8ea27f8SMichal Bergerdocker-compose up 65e8ea27f8SMichal Berger~~~ 66e8ea27f8SMichal Berger 67e8ea27f8SMichal BergerThe `storage-target` and `proxy-container` can be started as services. 6815b0fb3aSTomasz ZawadzkiAllowing for multiple traffic generator containers to connect. 69e8ea27f8SMichal Berger 70e8ea27f8SMichal Berger~~~{.sh} 71e8ea27f8SMichal Bergerdocker-compose up -d proxy-container 7215b0fb3aSTomasz Zawadzkidocker-compose run traffic-generator-nvme 7315b0fb3aSTomasz Zawadzkidocker-compose run traffic-generator-virtio 74e8ea27f8SMichal Berger~~~ 75e8ea27f8SMichal Berger 763f912cf0SMichal BergerEnvironment variables to containers can be passed as shown in 77e8ea27f8SMichal Berger[docs](https://docs.docker.com/compose/environment-variables/). 78e8ea27f8SMichal BergerFor example extra arguments to fio can be passed as so: 79e8ea27f8SMichal Berger 80e8ea27f8SMichal Berger~~~{.sh} 8115b0fb3aSTomasz Zawadzkidocker-compose run -e FIO_ARGS="--minimal" traffic-generator-nvme 82e8ea27f8SMichal Berger~~~ 83e8ea27f8SMichal Berger 84e8ea27f8SMichal BergerAs each container includes SPDK installation it is possible to use rpc.py to 85e8ea27f8SMichal Bergerexamine the final setup. E.g.: 86e8ea27f8SMichal Berger 87e8ea27f8SMichal Berger~~~{.sh} 88e8ea27f8SMichal Bergerdocker-compose exec storage-target rpc.py bdev_get_bdevs 89e8ea27f8SMichal Bergerdocker-compose exec proxy-container rpc.py nvmf_get_subsystems 90e8ea27f8SMichal Berger~~~ 91e8ea27f8SMichal Berger 920f57273aSBoris Glimcher## Monitoring 930f57273aSBoris Glimcher 940f57273aSBoris Glimcher`docker-compose.monitoring.yaml` shows an example deployment of the storage containers based on SPDK. 950f57273aSBoris Glimcher 960f57273aSBoris GlimcherRunning `docker-compose -f docker-compose.monitoring.yaml up` creates 3 docker containers: 970f57273aSBoris Glimcher 98*1204ddffSBoris Glimcher- storage-target: Contains SPDK NVMe-oF target exposing single subsystem based on malloc bdev. 99*1204ddffSBoris Glimcher- [telegraf](https://www.influxdata.com/time-series-platform/telegraf/) is a very minimal memory footprint agent for collecting and sending metrics and events. 100*1204ddffSBoris Glimcher- [prometheus](https://prometheus.io/) is leading open-source monitoring solution. 1010f57273aSBoris Glimcher 102*1204ddffSBoris Glimcher`telegraf` connects to `spdk` via `rpc_http_proxy.py` and uses `bdev_get_iostat` commands to fetch bdev statistics. 1030f57273aSBoris Glimcher 104*1204ddffSBoris GlimcherIn order to see data change, once all of the 3 containers are brought up, use `docker-compose run traffic-generator-nvme` to generate some traffic. 1050f57273aSBoris Glimcher 1060f57273aSBoris GlimcherOpen Prometheus UI or query via cmdline. E.g.: 1070f57273aSBoris Glimcher 1080f57273aSBoris Glimcher~~~{.sh} 1090f57273aSBoris Glimchercurl --fail http://127.0.0.1:9090/api/v1/query?query=spdk_bytes_read 1100f57273aSBoris Glimchercurl --fail http://127.0.0.1:9090/api/v1/query?query=spdk_bytes_written 1110f57273aSBoris Glimcher~~~ 1120f57273aSBoris Glimcher 113e8ea27f8SMichal Berger## Caveats 114e8ea27f8SMichal Berger 115e8ea27f8SMichal Berger- If you run docker < 20.10 under distro which switched fully to cgroups2 116e8ea27f8SMichal Berger (e.g. f33) make sure that /sys/fs/cgroup/systemd exists otherwise docker/build 117e8ea27f8SMichal Berger will simply fail. 118e8ea27f8SMichal Berger- Each SPDK app inside the containers is limited to single, separate CPU. 119