xref: /dpdk/doc/guides/tools/testmldev.rst (revision 62774b78a84e9fa5df56d04cffed69bef8c901f1)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright (c) 2022 Marvell.
3
4dpdk-test-mldev Application
5===========================
6
7The ``dpdk-test-mldev`` tool is a Data Plane Development Kit (DPDK) application
8that allows testing various mldev use cases.
9This application has a generic framework to add new mldev based test cases
10to verify functionality
11and measure the performance of inference execution on DPDK ML devices.
12
13
14Application and Options
15-----------------------
16
17The application has a number of command line options:
18
19.. code-block:: console
20
21   dpdk-test-mldev [EAL Options] -- [application options]
22
23
24EAL Options
25~~~~~~~~~~~
26
27The following are the EAL command-line options that can be used
28with the ``dpdk-test-mldev`` application.
29See the DPDK Getting Started Guides for more information on these options.
30
31``-c <COREMASK>`` or ``-l <CORELIST>``
32  Set the hexadecimal bitmask of the cores to run on.
33  The corelist is a list of cores to use.
34
35``-a <PCI_ID>``
36  Attach a PCI based ML device.
37  Specific to drivers using a PCI based ML device.
38
39``--vdev <driver>``
40  Add a virtual mldev device.
41  Specific to drivers using a ML virtual device.
42
43
44Application Options
45~~~~~~~~~~~~~~~~~~~
46
47The following are the command-line options supported by the test application.
48
49``--test <name>``
50  Name of the test to execute.
51  ML tests are divided into three groups: Device, Model and Inference tests.
52  Test name should be one of the following supported tests.
53
54  **ML Device Tests** ::
55
56    device_ops
57
58  **ML Model Tests** ::
59
60    model_ops
61
62  **ML Inference Tests** ::
63
64    inference_ordered
65    inference_interleave
66
67``--dev_id <n>``
68  Set the device ID of the ML device to be used for the test.
69  Default value is ``0``.
70
71``--socket_id <n>``
72  Set the socket ID of the application resources.
73  Default value is ``SOCKET_ID_ANY``.
74
75``--models <model_list>``
76  Set the list of model files to be used for the tests.
77  Application expects the ``model_list`` in comma separated form
78  (i.e. ``--models model_A.bin,model_B.bin``).
79  Maximum number of models supported by the test is ``8``.
80
81``--filelist <file_list>``
82  Set the list of model, input, output and reference files to be used for the tests.
83  Application expects the ``file_list`` to be in comma separated form
84  (i.e. ``--filelist <model,input,output>[,reference]``).
85
86  Multiple filelist entries can be specified when running the tests with multiple models.
87  Both quantized and dequantized outputs are written to the disk.
88  Dequantized output file would have the name specified by the user through ``--filelist`` option.
89  A suffix ``.q`` is appended to quantized output filename.
90  Maximum number of filelist entries supported by the test is ``8``.
91
92``--repetitions <n>``
93  Set the number of inference repetitions to be executed in the test per each model.
94  Default value is ``1``.
95
96``--burst_size <n>``
97  Set the burst size to be used when enqueuing / dequeuing inferences.
98  Default value is ``1``.
99
100``--queue_pairs <n>``
101  Set the number of queue-pairs to be used for inference enqueue and dequeue operations.
102  Default value is ``1``.
103
104``--queue_size <n>``
105  Set the size of queue-pair to be created for inference enqueue / dequeue operations.
106  Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation.
107  Default value is ``1``.
108
109``--batches <n>``
110  Set the number batches in the input file provided for inference run.
111  When not specified, the test would assume the number of batches
112  is the batch size of the model.
113
114``--tolerance <n>``
115  Set the tolerance value in percentage to be used for output validation.
116  Default value is ``0``.
117
118``--stats``
119  Enable reporting device extended stats.
120
121``--debug``
122  Enable the tests to run in debug mode.
123
124``--help``
125  Print help message.
126
127
128ML Device Tests
129---------------
130
131ML device tests are functional tests to validate ML device API.
132Device tests validate the ML device handling configure, close, start and stop APIs.
133
134
135Application Options
136~~~~~~~~~~~~~~~~~~~
137
138Supported command line options for the ``device_ops`` test are following::
139
140   --debug
141   --test
142   --dev_id
143   --socket_id
144   --queue_pairs
145   --queue_size
146
147
148DEVICE_OPS Test
149~~~~~~~~~~~~~~~
150
151Device ops test validates the device configuration and reconfiguration support.
152The test configures ML device based on the options
153``--queue_pairs`` and ``--queue_size`` specified by the user,
154and later reconfigures the ML device with the number of queue pairs and queue size
155based on the maximum specified through the device info.
156
157
158Example
159^^^^^^^
160
161Command to run ``device_ops`` test:
162
163.. code-block:: console
164
165   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
166        --test=device_ops
167
168Command to run ``device_ops`` test with user options:
169
170.. code-block:: console
171
172   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
173        --test=device_ops --queue_pairs <M> --queue_size <N>
174
175
176ML Model Tests
177--------------
178
179Model tests are functional tests to validate ML model API.
180Model tests validate the functioning of load, start, stop and unload ML models.
181
182
183Application Options
184~~~~~~~~~~~~~~~~~~~
185
186Supported command line options for the ``model_ops`` test are following::
187
188   --debug
189   --test
190   --dev_id
191   --socket_id
192   --models
193
194List of model files to be used for the ``model_ops`` test can be specified
195through the option ``--models <model_list>`` as a comma separated list.
196Maximum number of models supported in the test is ``8``.
197
198.. note::
199
200   * The ``--models <model_list>`` is a mandatory option for running this test.
201   * Options not supported by the test are ignored if specified.
202
203
204MODEL_OPS Test
205~~~~~~~~~~~~~~
206
207The test is a collection of multiple sub-tests,
208each with a different order of slow-path operations
209when handling with `N` number of models.
210
211**Sub-test A:**
212executes the sequence of load / start / stop / unload for a model in order,
213followed by next model.
214
215.. _figure_mldev_model_ops_subtest_a:
216
217.. figure:: img/mldev_model_ops_subtest_a.*
218
219   Execution sequence of model_ops subtest A.
220
221**Sub-test B:**
222executes load for all models, followed by a start for all models.
223Upon successful start of all models, stop is invoked for all models followed by unload.
224
225.. _figure_mldev_model_ops_subtest_b:
226
227.. figure:: img/mldev_model_ops_subtest_b.*
228
229   Execution sequence of model_ops subtest B.
230
231**Sub-test C:**
232loads all models, followed by a start and stop of all models in order.
233Upon completion of stop, unload is invoked for all models.
234
235.. _figure_mldev_model_ops_subtest_c:
236
237.. figure:: img/mldev_model_ops_subtest_c.*
238
239   Execution sequence of model_ops subtest C.
240
241**Sub-test D:**
242executes load and start for all models available.
243Upon successful start of all models, stop is executed for the models.
244
245.. _figure_mldev_model_ops_subtest_d:
246
247.. figure:: img/mldev_model_ops_subtest_d.*
248
249   Execution sequence of model_ops subtest D.
250
251
252Example
253^^^^^^^
254
255Command to run ``model_ops`` test:
256
257.. code-block:: console
258
259   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
260        --test=model_ops --models model_1.bin,model_2.bin,model_3.bin, model_4.bin
261
262
263ML Inference Tests
264------------------
265
266Inference tests are a set of tests to validate end-to-end inference execution on ML device.
267These tests executes the full sequence of operations required to run inferences
268with one or multiple models.
269
270
271Application Options
272~~~~~~~~~~~~~~~~~~~
273
274Supported command line options for inference tests are following::
275
276   --debug
277   --test
278   --dev_id
279   --socket_id
280   --filelist
281   --repetitions
282   --burst_size
283   --queue_pairs
284   --queue_size
285   --batches
286   --tolerance
287   --stats
288
289List of files to be used for the inference tests can be specified
290through the option ``--filelist <file_list>`` as a comma separated list.
291A filelist entry would be of the format
292``--filelist <model_file,input_file,output_file>[,reference_file]``
293and is used to specify the list of files required to test with a single model.
294Multiple filelist entries are supported by the test, one entry per model.
295Maximum number of file entries supported by the test is ``8``.
296
297When ``--burst_size <num>`` option is specified for the test,
298enqueue and dequeue burst would try to enqueue or dequeue
299``num`` number of inferences per each call respectively.
300
301In the inference test, a pair of lcores are mapped to each queue pair.
302Minimum number of lcores required for the tests is equal to ``(queue_pairs * 2 + 1)``.
303
304Output validation of inference would be enabled only
305when a reference file is specified through the ``--filelist`` option.
306Application would additionally consider the tolerance value
307provided through ``--tolerance`` option during validation.
308When the tolerance values is 0, CRC32 hash of inference output
309and reference output are compared.
310When the tolerance is non-zero, element wise comparison of output is performed.
311Validation is considered as successful only
312when all the elements of the output tensor are with in the tolerance range specified.
313
314Enabling ``--stats`` would print the extended stats supported by the driver.
315
316.. note::
317
318   * The ``--filelist <file_list>`` is a mandatory option for running inference tests.
319   * Options not supported by the tests are ignored if specified.
320   * Element wise comparison is not supported when
321     the output dtype is either fp8, fp16 or bfloat16.
322     This is applicable only when the tolerance is greater than zero
323     and for pre-quantized models only.
324
325
326INFERENCE_ORDERED Test
327~~~~~~~~~~~~~~~~~~~~~~
328
329This is a functional test for validating the end-to-end inference execution on ML device.
330This test configures ML device and queue pairs
331as per the queue-pair related options (queue_pairs and queue_size) specified by the user.
332Upon successful configuration of the device and queue pairs,
333the first model specified through the filelist is loaded to the device
334and inferences are enqueued by a pool of worker threads to the ML device.
335Total number of inferences enqueued for the model are equal to the repetitions specified.
336A dedicated pool of worker threads would dequeue the inferences from the device.
337The model is unloaded upon completion of all inferences for the model.
338The test would continue loading and executing inference requests for all models
339specified through ``filelist`` option in an ordered manner.
340
341.. _figure_mldev_inference_ordered:
342
343.. figure:: img/mldev_inference_ordered.*
344
345   Execution of inference_ordered on single model.
346
347
348Example
349^^^^^^^
350
351Example command to run ``inference_ordered`` test:
352
353.. code-block:: console
354
355   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
356        --test=inference_ordered --filelist model.bin,input.bin,output.bin
357
358Example command to run ``inference_ordered`` test with a specific burst size:
359
360.. code-block:: console
361
362   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
363        --test=inference_ordered --filelist model.bin,input.bin,output.bin \
364        --burst_size 12
365
366Example command to run ``inference_ordered`` test with multiple queue-pairs and queue size:
367
368.. code-block:: console
369
370   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
371        --test=inference_ordered --filelist model.bin,input.bin,output.bin \
372        --queue_pairs 4 --queue_size 16
373
374Example command to run ``inference_ordered`` with output validation using tolerance of ``1%``:
375
376.. code-block:: console
377
378   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
379        --test=inference_ordered --filelist model.bin,input.bin,output.bin,reference.bin \
380        --tolerance 1.0
381
382
383INFERENCE_INTERLEAVE Test
384~~~~~~~~~~~~~~~~~~~~~~~~~
385
386This is a stress test for validating the end-to-end inference execution on ML device.
387The test configures the ML device and queue pairs
388as per the queue-pair related options (queue_pairs and queue_size) specified by the user.
389Upon successful configuration of the device and queue pairs,
390all models specified through the filelist are loaded to the device.
391Inferences for multiple models are enqueued by a pool of worker threads in parallel.
392Inference execution by the device is interleaved between multiple models.
393Total number of inferences enqueued for a model are equal to the repetitions specified.
394An additional pool of threads would dequeue the inferences from the device.
395Models would be unloaded upon completion of inferences for all models loaded.
396
397.. _figure_mldev_inference_interleave:
398
399.. figure:: img/mldev_inference_interleave.*
400
401   Execution of inference_interleave on single model.
402
403
404Example
405^^^^^^^
406
407Example command to run ``inference_interleave`` test:
408
409.. code-block:: console
410
411   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
412        --test=inference_interleave --filelist model.bin,input.bin,output.bin
413
414Example command to run ``inference_interleave`` test with multiple models:
415
416.. code-block:: console
417
418   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
419        --test=inference_interleave --filelist model_A.bin,input_A.bin,output_A.bin \
420        --filelist model_B.bin,input_B.bin,output_B.bin
421
422Example command to run ``inference_interleave`` test
423with a specific burst size, multiple queue-pairs and queue size:
424
425.. code-block:: console
426
427   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
428        --test=inference_interleave --filelist model.bin,input.bin,output.bin \
429        --queue_pairs 8 --queue_size 12 --burst_size 16
430
431Example command to run ``inference_interleave`` test
432with multiple models and output validation using tolerance of ``2.0%``:
433
434.. code-block:: console
435
436   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
437        --test=inference_interleave \
438        --filelist model_A.bin,input_A.bin,output_A.bin,reference_A.bin \
439        --filelist model_B.bin,input_B.bin,output_B.bin,reference_B.bin \
440        --tolerance 2.0
441
442
443Debug mode
444----------
445
446ML tests can be executed in debug mode by enabling the option ``--debug``.
447Execution of tests in debug mode would enable additional prints.
448
449When a validation failure is observed, output from that buffer is written to the disk,
450with the filenames having similar convention when the test has passed.
451Additionally index of the buffer would be appended to the filenames.
452