xref: /spdk/scripts/perf/nvmf/README.md (revision 1f3a6b0398dfba2d9aedea1d272e64e55d6f1af6)
1# Running NVMe-OF Performance Test Cases
2
3Scripts contained in this directory are used to run TCP and RDMA benchmark tests,
4that are later published at [spdk.io performance reports section](https://spdk.io/doc/performance_reports.html).
5To run the scripts in your environment please follow steps below.
6
7## Test Systems Requirements
8
9- The OS installed on test systems must be a Linux OS.
10  Scripts were primarily used on systems with Fedora and
11  Ubuntu 18.04/20.04 distributions.
12- Each test system must have at least one RDMA-capable NIC installed for RDMA tests.
13  For TCP tests any TCP-capable NIC will do. However, high-bandwidth,
14  high-performance NICs like Intel E810 CQDA2 or Mellanox ConnectX-5 are
15  suggested because the NVMe-oF workload is network bound.
16  So, if you use a NIC capable of less than 100Gbps on NVMe-oF target
17  system, you will quickly saturate your NICs.
18- Python3 interpreter must be available on all test systems.
19  Paramiko and Pandas modules must be installed.
20- nvmecli package must be installed on all test systems.
21- fio must be downloaded from [Github](https://github.com/axboe/fio) and built.
22  This must be done on Initiator test systems to later build SPDK with
23  "--with-fio" option.
24- All test systems must have a user account with a common name,
25  password and passwordless sudo enabled.
26- [mlnx-tools](https://github.com/Mellanox/mlnx-tools) package must be downloaded
27  to /usr/src/local directory in order to configure NIC ports IRQ affinity.
28  If custom directory is to be used, then it must be set using irq_scripts_dir
29  option in Target and Initiator configuration sections.
30
31### Optional
32
33- For test using the Kernel Target, nvmet-cli must be downloaded and build on Target system.
34  nvmet-cli is available [here](http://git.infradead.org/users/hch/nvmetcli.git).
35
36## Manual configuration
37
38Before running the scripts some manual test systems configuration is required:
39
40- Configure IP address assignment on the NIC ports that will be used for test.
41  Make sure to make these assignments persistent, as in some cases NIC drivers may be reloaded.
42- Adjust firewall service to allow traffic on IP - port pairs used in test
43  (or disable firewall service completely if possible).
44- Adjust or completely disable local security engines like AppArmor or SELinux.
45
46## JSON configuration for test run automation
47
48An example json configuration file with the minimum configuration required
49to automate NVMe-oF testing is provided in this repository.
50The following sub-chapters describe each configuration section in more detail.
51
52### General settings section
53
54``` ~sh
55"general": {
56    "username": "user",
57    "password": "password",
58    "transport": "transport_type",
59    "skip_spdk_install": bool
60}
61```
62
63Required:
64
65- username - username for the SSH session
66- password - password for the SSH session
67- transport - transport layer to be used throughout the test ("tcp" or "rdma")
68
69Optional:
70
71- skip_spdk_install - by default SPDK sources will be copied from Target
72  to the Initiator systems each time run_nvmf.py script is run. If the SPDK
73  is already in place on Initiator systems and there's no need to re-build it,
74  then set this option to true.
75  Default: false.
76
77### Target System Configuration
78
79``` ~sh
80"target": {
81  "mode": "spdk",
82  "nic_ips": ["192.0.1.1", "192.0.2.1"],
83  "core_mask": "[1-10]",
84  "null_block_devices": 8,
85  "nvmet_bin": "/path/to/nvmetcli",
86  "sar_settings": [true, 30, 1, 60],
87  "pcm_settings": [/tmp/pcm, 30, 1, 60],
88  "enable_bandwidth": [true, 60],
89  "enable_dpdk_memory": [true, 30]
90  "num_shared_buffers": 4096,
91  "scheduler_settings": "static",
92  "zcopy_settings": false,
93  "dif_insert_strip": true,
94  "null_block_dif_type": 3,
95  "enable_pm": true
96}
97```
98
99Required:
100
101- mode - Target application mode, "spdk" or "kernel".
102- nic_ips - IP addresses of NIC ports to be used by the target to export
103  NVMe-oF subsystems.
104- core_mask - Used by SPDK target only.
105  CPU core mask either in form of actual mask (i.e. 0xAAAA) or core list
106  (i.e. [0,1,2-5,6).
107  At this moment the scripts cannot restrict the Kernel target to only
108  use certain CPU cores. Important: upper bound of the range is inclusive!
109
110Optional, common:
111
112- null_block_devices - int, number of null block devices to create.
113  Detected NVMe devices are not used if option is present. Default: 0.
114- sar_settings - [bool, int(x), int(y), int(z)];
115  Enable SAR CPU utilization measurement on Target side.
116  Wait for "x" seconds before starting measurements, then do "z" samples
117  with "y" seconds intervals between them. Default: disabled.
118- pcm_settings - [path, int(x), int(y), int(z)];
119  Enable [PCM](https://github.com/opcm/pcm.git) measurements on Target side.
120  Measurements include CPU, memory and power consumption. "path" points to a
121  directory where pcm executables are present.
122  "x" - time to wait before starting measurements (suggested it equals to fio
123  ramp_time).
124  "y" - time interval between measurements.
125  "z" - number of measurement samples.
126  Default: disabled.
127- enable_bandwidth - [bool, int]. Wait a given number of seconds and run
128  bwm-ng until the end of test to measure bandwidth utilization on network
129  interfaces. Default: disabled.
130- tuned_profile - tunedadm profile to apply on the system before starting
131  the test.
132- irq_scripts_dir - path to scripts directory of Mellanox mlnx-tools package;
133  Used to run set_irq_affinity.sh script.
134  Default: /usr/src/local/mlnx-tools/ofed_scripts
135- enable_pm - if set to true, power measurement is enabled via collect-bmc-pm
136  on the target side.
137
138Optional, Kernel Target only:
139
140- nvmet_bin - path to nvmetcli binary, if not available in $PATH.
141  Only for Kernel Target. Default: "nvmetcli".
142
143Optional, SPDK Target only:
144
145- zcopy_settings - bool. Disable or enable target-size zero-copy option.
146  Default: false.
147- scheduler_settings - str. Select SPDK Target thread scheduler (static/dynamic).
148  Default: static.
149- num_shared_buffers - int, number of shared buffers to allocate when
150  creating transport layer. Default: 4096.
151- max_queue_depth - int, max number of outstanding I/O per queue. Default: 128.
152- dif_insert_strip - bool. Only for TCP transport. Enable DIF option when
153  creating transport layer. Default: false.
154- null_block_dif_type - int, 0-3. Level of DIF type to use when creating
155  null block bdev. Default: 0.
156- enable_dpdk_memory - [bool, int]. Wait for a given number of seconds and
157  call env_dpdk_get_mem_stats RPC call to dump DPDK memory stats. Typically
158  wait time should be at least ramp_time of fio described in another section.
159- adq_enable - bool; only for TCP transport.
160  Configure system modules, NIC settings and create priority traffic classes
161  for ADQ testing. You need and ADQ-capable NIC like the Intel E810.
162- bpf_scripts - list of bpftrace scripts that will be attached during the
163  test run. Available scripts can be found in the spdk/scripts/bpf directory.
164- dsa_settings - bool. Only for TCP transport. Enable offloading CRC32C
165  calculation to DSA. You need a CPU with the Intel(R) Data Streaming
166  Accelerator (DSA) engine.
167- scheduler_core_limit - int, 0-100. Dynamic scheduler option to load limit on
168  the core to be considered full.
169
170### Initiator system settings section
171
172There can be one or more `initiatorX` setting sections, depending on the test setup.
173
174``` ~sh
175"initiator1": {
176  "ip": "10.0.0.1",
177  "nic_ips": ["192.0.1.2"],
178  "target_nic_ips": ["192.0.1.1"],
179  "mode": "spdk",
180  "fio_bin": "/path/to/fio/bin",
181  "nvmecli_bin": "/path/to/nvmecli/bin",
182  "cpus_allowed": "0,1,10-15",
183  "cpus_allowed_policy": "shared",
184  "num_cores": 4,
185  "cpu_frequency": 2100000,
186  "adq_enable": false,
187  "kernel_engine": "io_uring"
188}
189```
190
191Required:
192
193- ip - management IP address of initiator system to set up SSH connection.
194- nic_ips - list of IP addresses of NIC ports to be used in test,
195  local to given initiator system.
196- target_nic_ips - list of IP addresses of Target NIC ports to which initiator
197  will attempt to connect to.
198- mode - initiator mode, "spdk" or "kernel". For SPDK, the bdev fio plugin
199  will be used to connect to NVMe-oF subsystems and submit I/O. For "kernel",
200  nvmecli will be used to connect to NVMe-oF subsystems and fio will use the
201  libaio ioengine to submit I/Os.
202
203Optional, common:
204
205- nvmecli_bin - path to nvmecli binary; Will be used for "discovery" command
206  (for both SPDK and Kernel modes) and for "connect" (in case of Kernel mode).
207  Default: system-wide "nvme".
208- fio_bin - path to custom fio binary, which will be used to run IO.
209  Additionally, the directory where the binary is located should also contain
210  fio sources needed to build SPDK fio_plugin for spdk initiator mode.
211  Default: /usr/src/fio/fio.
212- cpus_allowed - str, list of CPU cores to run fio threads on. Takes precedence
213  before `num_cores` setting. Default: None (CPU cores randomly allocated).
214  For more information see `man fio`.
215- cpus_allowed_policy - str, "shared" or "split". CPU sharing policy for fio
216  threads. Default: shared. For more information see `man fio`.
217- num_cores - By default fio threads on initiator side will use as many CPUs
218  as there are connected subsystems. This option limits the number of CPU cores
219  used for fio threads to this number; cores are allocated randomly and fio
220  `filename` parameters are grouped if needed. `cpus_allowed` option takes
221  precedence and `num_cores` is ignored if both are present in config.
222- cpu_frequency - int, custom CPU frequency to set. By default test setups are
223  configured to run in performance mode at max frequencies. This option allows
224  user to select CPU frequency instead of running at max frequency. Before
225  using this option `intel_pstate=disable` must be set in boot options and
226  cpupower governor be set to `userspace`.
227- tuned_profile - tunedadm profile to apply on the system before starting
228  the test.
229- irq_scripts_dir - path to scripts directory of Mellanox mlnx-tools package;
230  Used to run set_irq_affinity.sh script.
231  Default: /usr/src/local/mlnx-tools/ofed_scripts
232- kernel_engine - Select fio ioengine mode to run tests. io_uring libraries and
233  io_uring capable fio binaries must be present on Initiator systems!
234  Available options:
235  - libaio (default)
236  - io_uring
237
238Optional, SPDK Initiator only:
239
240- adq_enable - bool; only for TCP transport. Configure system modules, NIC
241  settings and create priority traffic classes for ADQ testing.
242  You need an ADQ-capable NIC like Intel E810.
243- enable_data_digest - bool; only for TCP transport. Enable the data
244  digest for the bdev controller. The target can use IDXD to calculate the
245  data digest or fallback to a software optimized implementation on system
246  that don't have the Intel(R) Data Streaming Accelerator (DSA) engine.
247
248### Fio settings section
249
250``` ~sh
251"fio": {
252  "bs": ["4k", "128k"],
253  "qd": [32, 128],
254  "rw": ["randwrite", "write"],
255  "rwmixread": 100,
256  "rate_iops": 10000,
257  "num_jobs": 2,
258  "offset": true,
259  "offset_inc": 10,
260  "run_time": 30,
261  "ramp_time": 30,
262  "run_num": 3
263}
264```
265
266Required:
267
268- bs - fio IO block size
269- qd -  fio iodepth
270- rw - fio rw mode
271- rwmixread - read operations percentage in case of mixed workloads
272- num_jobs - fio numjobs parameter
273  Note: may affect total number of CPU cores used by initiator systems
274- run_time - fio run time
275- ramp_time - fio ramp time, does not do measurements
276- run_num - number of times each workload combination is run.
277  If more than 1 then final result is the average of all runs.
278
279Optional:
280
281- rate_iops - limit IOPS to this number
282- offset - bool; enable offseting of the IO to the file. When this option is
283  enabled the file is "split" into a number of chunks equal to "num_jobs"
284  parameter value, and each "num_jobs" fio thread gets it's own chunk to
285  work with.
286  For more detail see "offset", "offset_increment" and "size" in fio man
287  pages. Default: false.
288- offset_inc - int; Percentage value determining the offset, size and
289  offset_increment when "offset" option is enabled. By default if "offset"
290  is enabled fio file will get split evenly between fio threads doing the
291  IO. Offset_inc can be used to specify a custom value.
292
293#### Test Combinations
294
295It is possible to specify more than one value for bs, qd and rw parameters.
296In such case script creates a list of their combinations and runs IO tests
297for all of these combinations. For example, the following configuration:
298
299``` ~sh
300  "bs": ["4k"],
301  "qd": [32, 128],
302  "rw": ["write", "read"]
303```
304
305results in following workloads being tested:
306
307- 4k-write-32
308- 4k-write-128
309- 4k-read-32
310- 4k-read-128
311
312#### Important note about queue depth parameter
313
314qd in fio settings section refers to iodepth generated per single fio target
315device ("filename" in resulting fio configuration file). It is re-calculated
316while the script is running, so generated fio configuration file might contain
317a different value than what user has specified at input, especially when also
318using "numjobs" or initiator "num_cores" parameters. For example:
319
320Target system exposes 4 NVMe-oF subsystems. One initiator system connects to
321all of these systems.
322
323Initiator configuration (relevant settings only):
324
325``` ~sh
326"initiator1": {
327  "num_cores": 1
328}
329```
330
331Fio configuration:
332
333``` ~sh
334"fio": {
335  "bs": ["4k"],
336  "qd": [128],
337  "rw": ["randread"],
338  "rwmixread": 100,
339  "num_jobs": 1,
340  "run_time": 30,
341  "ramp_time": 30,
342  "run_num": 1
343}
344```
345
346In this case generated fio configuration will look like this
347(relevant settings only):
348
349``` ~sh
350[global]
351numjobs=1
352
353[job_section0]
354filename=Nvme0n1
355filename=Nvme1n1
356filename=Nvme2n1
357filename=Nvme3n1
358iodepth=512
359```
360
361`num_cores` option results in 4 connected subsystems to be grouped under a
362single fio thread (job_section0). Because `iodepth` is local to `job_section0`,
363it is distributed between each `filename` local to job section in round-robin
364(by default) fashion. In case of fio targets with the same characteristics
365(IOPS & Bandwidth capabilities) it means that iodepth is distributed **roughly**
366equally. Ultimately above fio configuration results in iodepth=128 per filename.
367
368`numjobs` higher than 1 is also taken into account, so that desired qd per
369filename is retained:
370
371``` ~sh
372[global]
373numjobs=2
374
375[job_section0]
376filename=Nvme0n1
377filename=Nvme1n1
378filename=Nvme2n1
379filename=Nvme3n1
380iodepth=256
381```
382
383Besides `run_num`, more information on these options can be found in `man fio`.
384
385## Running the test
386
387Before running the test script run the spdk/scripts/setup.sh script on Target
388system. This binds the devices to VFIO/UIO userspace driver and allocates
389hugepages for SPDK process.
390
391Run the script on the NVMe-oF target system:
392
393``` ~sh
394cd spdk
395sudo PYTHONPATH=$PYTHONPATH:$PWD/python scripts/perf/nvmf/run_nvmf.py
396```
397
398By default script uses config.json configuration file in the scripts/perf/nvmf
399directory. You can specify a different configuration file at runtime as below:
400
401``` ~sh
402sudo PYTHONPATH=$PYTHONPATH:$PWD/python scripts/perf/nvmf/run_nvmf.py -c /path/to/config.json
403```
404
405PYTHONPATH environment variable is needed because script uses SPDK-local Python
406modules. If you'd like to get rid of `PYTHONPATH=$PYTHONPATH:$PWD/python`
407you need to modify your environment so that Python interpreter is aware of
408`spdk/scripts` directory.
409
410## Test Results
411
412Test results for all workload combinations are printed to screen once the tests
413are finished. Additionally all aggregate results are saved to /tmp/results/nvmf_results.conf
414Results directory path can be changed by -r script parameter.
415