xref: /dpdk/doc/guides/tools/testmldev.rst (revision 35cb364b62c3c11c09ca097fec789a03a0c9a53f)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright (c) 2022 Marvell.
3
4dpdk-test-mldev Application
5===========================
6
7The ``dpdk-test-mldev`` tool is a Data Plane Development Kit (DPDK) application
8that allows testing various mldev use cases.
9This application has a generic framework to add new mldev based test cases
10to verify functionality
11and measure the performance of inference execution on DPDK ML devices.
12
13
14Application and Options
15-----------------------
16
17The application has a number of command line options:
18
19.. code-block:: console
20
21   dpdk-test-mldev [EAL Options] -- [application options]
22
23
24EAL Options
25~~~~~~~~~~~
26
27The following are the EAL command-line options that can be used
28with the ``dpdk-test-mldev`` application.
29See the DPDK Getting Started Guides for more information on these options.
30
31``-c <COREMASK>`` or ``-l <CORELIST>``
32  Set the hexadecimal bitmask of the cores to run on.
33  The corelist is a list of cores to use.
34
35``-a <PCI_ID>``
36  Attach a PCI based ML device.
37  Specific to drivers using a PCI based ML device.
38
39``--vdev <driver>``
40  Add a virtual mldev device.
41  Specific to drivers using a ML virtual device.
42
43
44Application Options
45~~~~~~~~~~~~~~~~~~~
46
47The following are the command-line options supported by the test application.
48
49``--test <name>``
50  Name of the test to execute.
51  ML tests are divided into three groups: Device, Model and Inference tests.
52  Test name should be one of the following supported tests.
53
54  **ML Device Tests** ::
55
56    device_ops
57
58  **ML Model Tests** ::
59
60    model_ops
61
62  **ML Inference Tests** ::
63
64    inference_ordered
65    inference_interleave
66
67``--dev_id <n>``
68  Set the device ID of the ML device to be used for the test.
69  Default value is ``0``.
70
71``--socket_id <n>``
72  Set the socket ID of the application resources.
73  Default value is ``SOCKET_ID_ANY``.
74
75``--models <model_list>``
76  Set the list of model files to be used for the tests.
77  Application expects the ``model_list`` in comma separated form
78  (i.e. ``--models model_A.bin,model_B.bin``).
79  Maximum number of models supported by the test is ``8``.
80
81``--filelist <file_list>``
82  Set the list of model, input, output and reference files to be used for the tests.
83  Application expects the ``file_list`` to be in comma separated form
84  (i.e. ``--filelist <model,input,output>[,reference]``).
85
86  Multiple filelist entries can be specified when running the tests with multiple models.
87  Both quantized and dequantized outputs are written to the disk.
88  Dequantized output file would have the name specified by the user through ``--filelist`` option.
89  A suffix ``.q`` is appended to quantized output filename.
90  Maximum number of filelist entries supported by the test is ``8``.
91
92``--repetitions <n>``
93  Set the number of inference repetitions to be executed in the test per each model.
94  Default value is ``1``.
95
96``--burst_size <n>``
97  Set the burst size to be used when enqueuing / dequeuing inferences.
98  Default value is ``1``.
99
100``--queue_pairs <n>``
101  Set the number of queue-pairs to be used for inference enqueue and dequeue operations.
102  Default value is ``1``.
103
104``--queue_size <n>``
105  Set the size of queue-pair to be created for inference enqueue / dequeue operations.
106  Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation.
107  Default value is ``1``.
108
109``--tolerance <n>``
110  Set the tolerance value in percentage to be used for output validation.
111  Default value is ``0``.
112
113``--stats``
114  Enable reporting device extended stats.
115
116``--debug``
117  Enable the tests to run in debug mode.
118
119``--help``
120  Print help message.
121
122
123ML Device Tests
124---------------
125
126ML device tests are functional tests to validate ML device API.
127Device tests validate the ML device handling configure, close, start and stop APIs.
128
129
130Application Options
131~~~~~~~~~~~~~~~~~~~
132
133Supported command line options for the ``device_ops`` test are following::
134
135   --debug
136   --test
137   --dev_id
138   --socket_id
139   --queue_pairs
140   --queue_size
141
142
143DEVICE_OPS Test
144~~~~~~~~~~~~~~~
145
146Device ops test validates the device configuration and reconfiguration support.
147The test configures ML device based on the options
148``--queue_pairs`` and ``--queue_size`` specified by the user,
149and later reconfigures the ML device with the number of queue pairs and queue size
150based on the maximum specified through the device info.
151
152
153Example
154^^^^^^^
155
156Command to run ``device_ops`` test:
157
158.. code-block:: console
159
160   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
161        --test=device_ops
162
163Command to run ``device_ops`` test with user options:
164
165.. code-block:: console
166
167   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
168        --test=device_ops --queue_pairs <M> --queue_size <N>
169
170
171ML Model Tests
172--------------
173
174Model tests are functional tests to validate ML model API.
175Model tests validate the functioning of load, start, stop and unload ML models.
176
177
178Application Options
179~~~~~~~~~~~~~~~~~~~
180
181Supported command line options for the ``model_ops`` test are following::
182
183   --debug
184   --test
185   --dev_id
186   --socket_id
187   --models
188
189List of model files to be used for the ``model_ops`` test can be specified
190through the option ``--models <model_list>`` as a comma separated list.
191Maximum number of models supported in the test is ``8``.
192
193.. note::
194
195   * The ``--models <model_list>`` is a mandatory option for running this test.
196   * Options not supported by the test are ignored if specified.
197
198
199MODEL_OPS Test
200~~~~~~~~~~~~~~
201
202The test is a collection of multiple sub-tests,
203each with a different order of slow-path operations
204when handling with `N` number of models.
205
206**Sub-test A:**
207executes the sequence of load / start / stop / unload for a model in order,
208followed by next model.
209
210.. _figure_mldev_model_ops_subtest_a:
211
212.. figure:: img/mldev_model_ops_subtest_a.*
213
214   Execution sequence of model_ops subtest A.
215
216**Sub-test B:**
217executes load for all models, followed by a start for all models.
218Upon successful start of all models, stop is invoked for all models followed by unload.
219
220.. _figure_mldev_model_ops_subtest_b:
221
222.. figure:: img/mldev_model_ops_subtest_b.*
223
224   Execution sequence of model_ops subtest B.
225
226**Sub-test C:**
227loads all models, followed by a start and stop of all models in order.
228Upon completion of stop, unload is invoked for all models.
229
230.. _figure_mldev_model_ops_subtest_c:
231
232.. figure:: img/mldev_model_ops_subtest_c.*
233
234   Execution sequence of model_ops subtest C.
235
236**Sub-test D:**
237executes load and start for all models available.
238Upon successful start of all models, stop is executed for the models.
239
240.. _figure_mldev_model_ops_subtest_d:
241
242.. figure:: img/mldev_model_ops_subtest_d.*
243
244   Execution sequence of model_ops subtest D.
245
246
247Example
248^^^^^^^
249
250Command to run ``model_ops`` test:
251
252.. code-block:: console
253
254   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
255        --test=model_ops --models model_1.bin,model_2.bin,model_3.bin, model_4.bin
256
257
258ML Inference Tests
259------------------
260
261Inference tests are a set of tests to validate end-to-end inference execution on ML device.
262These tests executes the full sequence of operations required to run inferences
263with one or multiple models.
264
265
266Application Options
267~~~~~~~~~~~~~~~~~~~
268
269Supported command line options for inference tests are following::
270
271   --debug
272   --test
273   --dev_id
274   --socket_id
275   --filelist
276   --repetitions
277   --burst_size
278   --queue_pairs
279   --queue_size
280   --tolerance
281   --stats
282
283List of files to be used for the inference tests can be specified
284through the option ``--filelist <file_list>`` as a comma separated list.
285A filelist entry would be of the format
286``--filelist <model_file,input_file,output_file>[,reference_file]``
287and is used to specify the list of files required to test with a single model.
288Multiple filelist entries are supported by the test, one entry per model.
289Maximum number of file entries supported by the test is ``8``.
290
291When ``--burst_size <num>`` option is specified for the test,
292enqueue and dequeue burst would try to enqueue or dequeue
293``num`` number of inferences per each call respectively.
294
295In the inference test, a pair of lcores are mapped to each queue pair.
296Minimum number of lcores required for the tests is equal to ``(queue_pairs * 2 + 1)``.
297
298Output validation of inference would be enabled only
299when a reference file is specified through the ``--filelist`` option.
300Application would additionally consider the tolerance value
301provided through ``--tolerance`` option during validation.
302When the tolerance values is 0, CRC32 hash of inference output
303and reference output are compared.
304When the tolerance is non-zero, element wise comparison of output is performed.
305Validation is considered as successful only
306when all the elements of the output tensor are with in the tolerance range specified.
307
308Enabling ``--stats`` would print the extended stats supported by the driver.
309
310.. note::
311
312   * The ``--filelist <file_list>`` is a mandatory option for running inference tests.
313   * Options not supported by the tests are ignored if specified.
314   * Element wise comparison is not supported when
315     the output dtype is either fp8, fp16 or bfloat16.
316     This is applicable only when the tolerance is greater than zero
317     and for pre-quantized models only.
318
319
320INFERENCE_ORDERED Test
321~~~~~~~~~~~~~~~~~~~~~~
322
323This is a functional test for validating the end-to-end inference execution on ML device.
324This test configures ML device and queue pairs
325as per the queue-pair related options (queue_pairs and queue_size) specified by the user.
326Upon successful configuration of the device and queue pairs,
327the first model specified through the filelist is loaded to the device
328and inferences are enqueued by a pool of worker threads to the ML device.
329Total number of inferences enqueued for the model are equal to the repetitions specified.
330A dedicated pool of worker threads would dequeue the inferences from the device.
331The model is unloaded upon completion of all inferences for the model.
332The test would continue loading and executing inference requests for all models
333specified through ``filelist`` option in an ordered manner.
334
335.. _figure_mldev_inference_ordered:
336
337.. figure:: img/mldev_inference_ordered.*
338
339   Execution of inference_ordered on single model.
340
341
342Example
343^^^^^^^
344
345Example command to run ``inference_ordered`` test:
346
347.. code-block:: console
348
349   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
350        --test=inference_ordered --filelist model.bin,input.bin,output.bin
351
352Example command to run ``inference_ordered`` test with a specific burst size:
353
354.. code-block:: console
355
356   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
357        --test=inference_ordered --filelist model.bin,input.bin,output.bin \
358        --burst_size 12
359
360Example command to run ``inference_ordered`` test with multiple queue-pairs and queue size:
361
362.. code-block:: console
363
364   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
365        --test=inference_ordered --filelist model.bin,input.bin,output.bin \
366        --queue_pairs 4 --queue_size 16
367
368Example command to run ``inference_ordered`` with output validation using tolerance of ``1%``:
369
370.. code-block:: console
371
372   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
373        --test=inference_ordered --filelist model.bin,input.bin,output.bin,reference.bin \
374        --tolerance 1.0
375
376
377INFERENCE_INTERLEAVE Test
378~~~~~~~~~~~~~~~~~~~~~~~~~
379
380This is a stress test for validating the end-to-end inference execution on ML device.
381The test configures the ML device and queue pairs
382as per the queue-pair related options (queue_pairs and queue_size) specified by the user.
383Upon successful configuration of the device and queue pairs,
384all models specified through the filelist are loaded to the device.
385Inferences for multiple models are enqueued by a pool of worker threads in parallel.
386Inference execution by the device is interleaved between multiple models.
387Total number of inferences enqueued for a model are equal to the repetitions specified.
388An additional pool of threads would dequeue the inferences from the device.
389Models would be unloaded upon completion of inferences for all models loaded.
390
391.. _figure_mldev_inference_interleave:
392
393.. figure:: img/mldev_inference_interleave.*
394
395   Execution of inference_interleave on single model.
396
397
398Example
399^^^^^^^
400
401Example command to run ``inference_interleave`` test:
402
403.. code-block:: console
404
405   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
406        --test=inference_interleave --filelist model.bin,input.bin,output.bin
407
408Example command to run ``inference_interleave`` test with multiple models:
409
410.. code-block:: console
411
412   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
413        --test=inference_interleave --filelist model_A.bin,input_A.bin,output_A.bin \
414        --filelist model_B.bin,input_B.bin,output_B.bin
415
416Example command to run ``inference_interleave`` test
417with a specific burst size, multiple queue-pairs and queue size:
418
419.. code-block:: console
420
421   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
422        --test=inference_interleave --filelist model.bin,input.bin,output.bin \
423        --queue_pairs 8 --queue_size 12 --burst_size 16
424
425Example command to run ``inference_interleave`` test
426with multiple models and output validation using tolerance of ``2.0%``:
427
428.. code-block:: console
429
430   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
431        --test=inference_interleave \
432        --filelist model_A.bin,input_A.bin,output_A.bin,reference_A.bin \
433        --filelist model_B.bin,input_B.bin,output_B.bin,reference_B.bin \
434        --tolerance 2.0
435
436
437Debug mode
438----------
439
440ML tests can be executed in debug mode by enabling the option ``--debug``.
441Execution of tests in debug mode would enable additional prints.
442
443When a validation failure is observed, output from that buffer is written to the disk,
444with the filenames having similar convention when the test has passed.
445Additionally index of the buffer would be appended to the filenames.
446