xref: /spdk/scripts/perf/nvmf/README.md (revision da2fd6651a9cd4732b0910d30291821e77f4d643)
1# Running NVMe-OF Performance Test Cases
2
3Scripts contained in this directory are used to run TCP and RDMA benchmark tests,
4that are later published at [spdk.io performance reports section](https://spdk.io/doc/performance_reports.html).
5To run the scripts in your environment please follow steps below.
6
7## Test Systems Requirements
8
9- The OS installed on test systems must be a Linux OS.
10  Scripts were primarily used on systems with Fedora and
11  Ubuntu 18.04/20.04 distributions.
12- Each test system must have at least one RDMA-capable NIC installed for RDMA tests.
13  For TCP tests any TCP-capable NIC will do. However, high-bandwidth,
14  high-performance NICs like Intel E810 CQDA2 or Mellanox ConnectX-5 are
15  suggested because the NVMe-oF workload is network bound.
16  So, if you use a NIC capable of less than 100Gbps on NVMe-oF target
17  system, you will quickly saturate your NICs.
18- Python3 interpreter must be available on all test systems.
19  Paramiko and Pandas modules must be installed.
20- nvmecli package must be installed on all test systems.
21- fio must be downloaded from [Github](https://github.com/axboe/fio) and built.
22  This must be done on Initiator test systems to later build SPDK with
23  "--with-fio" option.
24- All test systems must have a user account with a common name,
25  password and passwordless sudo enabled.
26- [mlnx-tools](https://github.com/Mellanox/mlnx-tools) package must be downloaded
27  to /usr/src/local directory in order to configure NIC ports IRQ affinity.
28  If custom directory is to be used, then it must be set using irq_scripts_dir
29  option in Target and Initiator configuration sections.
30
31### Optional
32
33- For test using the Kernel Target, nvmet-cli must be downloaded and build on Target system.
34  nvmet-cli is available [here](http://git.infradead.org/users/hch/nvmetcli.git).
35
36## Manual configuration
37
38Before running the scripts some manual test systems configuration is required:
39
40- Configure IP address assignment on the NIC ports that will be used for test.
41  Make sure to make these assignments persistent, as in some cases NIC drivers may be reloaded.
42- Adjust firewall service to allow traffic on IP - port pairs used in test
43  (or disable firewall service completely if possible).
44- Adjust or completely disable local security engines like AppArmor or SELinux.
45
46## JSON configuration for test run automation
47
48An example json configuration file with the minimum configuration required
49to automate NVMe-oF testing is provided in this repository.
50The following sub-chapters describe each configuration section in more detail.
51
52### General settings section
53
54``` ~sh
55"general": {
56    "username": "user",
57    "password": "password",
58    "transport": "transport_type",
59    "skip_spdk_install": bool
60}
61```
62
63Required:
64
65- username - username for the SSH session
66- password - password for the SSH session
67- transport - transport layer to be used throughout the test ("tcp" or "rdma")
68
69Optional:
70
71- skip_spdk_install - by default SPDK sources will be copied from Target
72  to the Initiator systems each time run_nvmf.py script is run. If the SPDK
73  is already in place on Initiator systems and there's no need to re-build it,
74  then set this option to true.
75  Default: false.
76
77### Target System Configuration
78
79``` ~sh
80"target": {
81  "mode": "spdk",
82  "nic_ips": ["192.0.1.1", "192.0.2.1"],
83  "core_mask": "[1-10]",
84  "null_block_devices": 8,
85  "nvmet_bin": "/path/to/nvmetcli",
86  "sar_settings": [true, 30, 1, 60],
87  "pcm_settings": [/tmp/pcm, 30, 1, 60],
88  "enable_bandwidth": [true, 60],
89  "enable_dpdk_memory": [true, 30]
90  "num_shared_buffers": 4096,
91  "scheduler_settings": "static",
92  "zcopy_settings": false,
93  "dif_insert_strip": true,
94  "null_block_dif_type": 3
95}
96```
97
98Required:
99
100- mode - Target application mode, "spdk" or "kernel".
101- nic_ips - IP addresses of NIC ports to be used by the target to export
102  NVMe-oF subsystems.
103- core_mask - Used by SPDK target only.
104  CPU core mask either in form of actual mask (i.e. 0xAAAA) or core list
105  (i.e. [0,1,2-5,6).
106  At this moment the scripts cannot restrict the Kernel target to only
107  use certain CPU cores. Important: upper bound of the range is inclusive!
108
109Optional, common:
110
111- null_block_devices - int, number of null block devices to create.
112  Detected NVMe devices are not used if option is present. Default: 0.
113- sar_settings - [bool, int(x), int(y), int(z)];
114  Enable SAR CPU utilization measurement on Target side.
115  Wait for "x" seconds before starting measurements, then do "z" samples
116  with "y" seconds intervals between them. Default: disabled.
117- pcm_settings - [path, int(x), int(y), int(z)];
118  Enable [PCM](https://github.com/opcm/pcm.git) measurements on Tartet side.
119  Measurements include CPU, memory and power consumption. "path" points to a
120  directory where pcm executables are present. Default: disabled.
121- enable_bandwidth - [bool, int]. Wait a given number of seconds and run
122  bwm-ng until the end of test to measure bandwidth utilization on network
123  interfaces. Default: disabled.
124- tuned_profile - tunedadm profile to apply on the system before starting
125  the test.
126- irq_scripts_dir - path to scripts directory of Mellanox mlnx-tools package;
127  Used to run set_irq_affinity.sh script.
128  Default: /usr/src/local/mlnx-tools/ofed_scripts
129
130Optional, Kernel Target only:
131
132- nvmet_bin - path to nvmetcli binary, if not available in $PATH.
133  Only for Kernel Target. Default: "nvmetcli".
134
135Optional, SPDK Target only:
136
137- zcopy_settings - bool. Disable or enable target-size zero-copy option.
138  Default: false.
139- scheduler_settings - str. Select SPDK Target thread scheduler (static/dynamic).
140  Default: static.
141- num_shared_buffers - int, number of shared buffers to allocate when
142  creating transport layer. Default: 4096.
143- dif_insert_strip - bool. Only for TCP transport. Enable DIF option when
144  creating transport layer. Default: false.
145- null_block_dif_type - int, 0-3. Level of DIF type to use when creating
146  null block bdev. Default: 0.
147- enable_dpdk_memory - [bool, int]. Wait for a given number of seconds and
148  call env_dpdk_get_mem_stats RPC call to dump DPDK memory stats. Typically
149  wait time should be at least ramp_time of fio described in another section.
150- adq_enable - bool; only for TCP transport.
151  Configure system modules, NIC settings and create priority traffic classes
152  for ADQ testing. You need and ADQ-capable NIC like the Intel E810.
153
154### Initiator system settings section
155
156There can be one or more `initiatorX` setting sections, depending on the test setup.
157
158``` ~sh
159"initiator1": {
160  "ip": "10.0.0.1",
161  "nic_ips": ["192.0.1.2"],
162  "target_nic_ips": ["192.0.1.1"],
163  "mode": "spdk",
164  "fio_bin": "/path/to/fio/bin",
165  "nvmecli_bin": "/path/to/nvmecli/bin",
166  "cpus_allowed": "0,1,10-15",
167  "cpus_allowed_policy": "shared",
168  "num_cores": 4,
169  "cpu_frequency": 2100000,
170  "adq_enable": false
171}
172```
173
174Required:
175
176- ip - management IP address of initiator system to set up SSH connection.
177- nic_ips - list of IP addresses of NIC ports to be used in test,
178  local to given initiator system.
179- target_nic_ips - list of IP addresses of Target NIC ports to which initiator
180  will attempt to connect to.
181- mode - initiator mode, "spdk" or "kernel". For SPDK, the bdev fio plugin
182  will be used to connect to NVMe-oF subsystems and submit I/O. For "kernel",
183  nvmecli will be used to connect to NVMe-oF subsystems and fio will use the
184  libaio ioengine to submit I/Os.
185
186Optional, common:
187
188- nvmecli_bin - path to nvmecli binary; Will be used for "discovery" command
189  (for both SPDK and Kernel modes) and for "connect" (in case of Kernel mode).
190  Default: system-wide "nvme".
191- fio_bin - path to custom fio binary, which will be used to run IO.
192  Additionally, the directory where the binary is located should also contain
193  fio sources needed to build SPDK fio_plugin for spdk initiator mode.
194  Default: /usr/src/fio/fio.
195- cpus_allowed - str, list of CPU cores to run fio threads on. Takes precedence
196  before `num_cores` setting. Default: None (CPU cores randomly allocated).
197  For more information see `man fio`.
198- cpus_allowed_policy - str, "shared" or "split". CPU sharing policy for fio
199  threads. Default: shared. For more information see `man fio`.
200- num_cores - By default fio threads on initiator side will use as many CPUs
201  as there are connected subsystems. This option limits the number of CPU cores
202  used for fio threads to this number; cores are allocated randomly and fio
203  `filename` parameters are grouped if needed. `cpus_allowed` option takes
204  precedence and `num_cores` is ignored if both are present in config.
205- cpu_frequency - int, custom CPU frequency to set. By default test setups are
206  configured to run in performance mode at max frequencies. This option allows
207  user to select CPU frequency instead of running at max frequency. Before
208  using this option `intel_pstate=disable` must be set in boot options and
209  cpupower governor be set to `userspace`.
210- tuned_profile - tunedadm profile to apply on the system before starting
211  the test.
212- irq_scripts_dir - path to scripts directory of Mellanox mlnx-tools package;
213  Used to run set_irq_affinity.sh script.
214  Default: /usr/src/local/mlnx-tools/ofed_scripts
215
216Optional, SPDK Initiator only:
217
218- adq_enable - bool; only for TCP transport. Configure system modules, NIC
219  settings and create priority traffic classes for ADQ testing.
220  You need an ADQ-capable NIC like Intel E810.
221
222### Fio settings section
223
224``` ~sh
225"fio": {
226  "bs": ["4k", "128k"],
227  "qd": [32, 128],
228  "rw": ["randwrite", "write"],
229  "rwmixread": 100,
230  "rate_iops": 10000,
231  "num_jobs": 2,
232  "run_time": 30,
233  "ramp_time": 30,
234  "run_num": 3
235}
236```
237
238Required:
239
240- bs - fio IO block size
241- qd -  fio iodepth
242- rw - fio rw mode
243- rwmixread - read operations percentage in case of mixed workloads
244- num_jobs - fio numjobs parameter
245  Note: may affect total number of CPU cores used by initiator systems
246- run_time - fio run time
247- ramp_time - fio ramp time, does not do measurements
248- run_num - number of times each workload combination is run.
249  If more than 1 then final result is the average of all runs.
250
251Optional:
252
253- rate_iops - limit IOPS to this number
254
255#### Test Combinations
256
257It is possible to specify more than one value for bs, qd and rw parameters.
258In such case script creates a list of their combinations and runs IO tests
259for all of these combinations. For example, the following configuration:
260
261``` ~sh
262  "bs": ["4k"],
263  "qd": [32, 128],
264  "rw": ["write", "read"]
265```
266
267results in following workloads being tested:
268
269- 4k-write-32
270- 4k-write-128
271- 4k-read-32
272- 4k-read-128
273
274#### Important note about queue depth parameter
275
276qd in fio settings section refers to iodepth generated per single fio target
277device ("filename" in resulting fio configuration file). It is re-calculated
278while the script is running, so generated fio configuration file might contain
279a different value than what user has specified at input, especially when also
280using "numjobs" or initiator "num_cores" parameters. For example:
281
282Target system exposes 4 NVMe-oF subsystems. One initiator system connects to
283all of these systems.
284
285Initiator configuration (relevant settings only):
286
287``` ~sh
288"initiator1": {
289  "num_cores": 1
290}
291```
292
293Fio configuration:
294
295``` ~sh
296"fio": {
297  "bs": ["4k"],
298  "qd": [128],
299  "rw": ["randread"],
300  "rwmixread": 100,
301  "num_jobs": 1,
302  "run_time": 30,
303  "ramp_time": 30,
304  "run_num": 1
305}
306```
307
308In this case generated fio configuration will look like this
309(relevant settings only):
310
311``` ~sh
312[global]
313numjobs=1
314
315[job_section0]
316filename=Nvme0n1
317filename=Nvme1n1
318filename=Nvme2n1
319filename=Nvme3n1
320iodepth=512
321```
322
323`num_cores` option results in 4 connected subsystems to be grouped under a
324single fio thread (job_section0). Because `iodepth` is local to `job_section0`,
325it is distributed between each `filename` local to job section in round-robin
326(by default) fashion. In case of fio targets with the same characteristics
327(IOPS & Bandwidth capabilities) it means that iodepth is distributed **roughly**
328equally. Ultimately above fio configuration results in iodepth=128 per filename.
329
330`numjobs` higher than 1 is also taken into account, so that desired qd per
331filename is retained:
332
333``` ~sh
334[global]
335numjobs=2
336
337[job_section0]
338filename=Nvme0n1
339filename=Nvme1n1
340filename=Nvme2n1
341filename=Nvme3n1
342iodepth=256
343```
344
345Besides `run_num`, more information on these options can be found in `man fio`.
346
347## Running the test
348
349Before running the test script run the spdk/scripts/setup.sh script on Target
350system. This binds the devices to VFIO/UIO userspace driver and allocates
351hugepages for SPDK process.
352
353Run the script on the NVMe-oF target system:
354
355``` ~sh
356cd spdk
357sudo PYTHONPATH=$PYTHONPATH:$PWD/scripts scripts/perf/nvmf/run_nvmf.py
358```
359
360By default script uses config.json configuration file in the scripts/perf/nvmf
361directory. You can specify a different configuration file at runtime as below:
362
363``` ~sh
364sudo PYTHONPATH=$PYTHONPATH:$PWD/scripts scripts/perf/nvmf/run_nvmf.py -c /path/to/config.json
365```
366
367PYTHONPATH environment variable is needed because script uses SPDK-local Python
368modules. If you'd like to get rid of `PYTHONPATH=$PYTHONPATH:$PWD/scripts`
369you need to modify your environment so that Python interpreter is aware of
370`spdk/scripts` directory.
371
372## Test Results
373
374Test results for all workload combinations are printed to screen once the tests
375are finished. Additionally all aggregate results are saved to /tmp/results/nvmf_results.conf
376Results directory path can be changed by -r script parameter.
377