xref: /dpdk/doc/guides/tools/testmldev.rst (revision 671e9c2013b9c921cb13e918a7a9a717101d0556)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright (c) 2022 Marvell.
3
4dpdk-test-mldev Application
5===========================
6
7The ``dpdk-test-mldev`` tool is a Data Plane Development Kit (DPDK) application
8that allows testing various mldev use cases.
9This application has a generic framework to add new mldev based test cases
10to verify functionality
11and measure the performance of inference execution on DPDK ML devices.
12
13
14Application and Options
15-----------------------
16
17The application has a number of command line options:
18
19.. code-block:: console
20
21   dpdk-test-mldev [EAL Options] -- [application options]
22
23
24EAL Options
25~~~~~~~~~~~
26
27The following are the EAL command-line options that can be used
28with the ``dpdk-test-mldev`` application.
29See the DPDK Getting Started Guides for more information on these options.
30
31``-c <COREMASK>`` or ``-l <CORELIST>``
32  Set the hexadecimal bitmask of the cores to run on.
33  The corelist is a list of cores to use.
34
35``-a <PCI_ID>``
36  Attach a PCI based ML device.
37  Specific to drivers using a PCI based ML device.
38
39``--vdev <driver>``
40  Add a virtual mldev device.
41  Specific to drivers using a ML virtual device.
42
43
44Application Options
45~~~~~~~~~~~~~~~~~~~
46
47The following are the command-line options supported by the test application.
48
49``--test <name>``
50  Name of the test to execute.
51  ML tests are divided into three groups: Device, Model and Inference tests.
52  Test name should be one of the following supported tests.
53
54  **ML Device Tests** ::
55
56    device_ops
57
58  **ML Model Tests** ::
59
60    model_ops
61
62  **ML Inference Tests** ::
63
64    inference_ordered
65    inference_interleave
66
67``--dev_id <n>``
68  Set the device ID of the ML device to be used for the test.
69  Default value is ``0``.
70
71``--socket_id <n>``
72  Set the socket ID of the application resources.
73  Default value is ``SOCKET_ID_ANY``.
74
75``--models <model_list>``
76  Set the list of model files to be used for the tests.
77  Application expects the ``model_list`` in comma separated form
78  (i.e. ``--models model_A.bin,model_B.bin``).
79  Maximum number of models supported by the test is ``8``.
80
81``--filelist <file_list>``
82  Set the list of model, input, output and reference files to be used for the tests.
83  Application expects the ``file_list`` to be in comma separated form
84  (i.e. ``--filelist <model,input,output>[,reference]``).
85
86  Multiple filelist entries can be specified when running the tests with multiple models.
87  Both quantized and dequantized outputs are written to the disk.
88  Dequantized output file would have the name specified by the user through ``--filelist`` option.
89  A suffix ``.q`` is appended to quantized output filename.
90  Maximum number of filelist entries supported by the test is ``8``.
91
92``--quantized_io``
93  Disable IO quantization and dequantization.
94
95``--repetitions <n>``
96  Set the number of inference repetitions to be executed in the test per each model.
97  Default value is ``1``.
98
99``--burst_size <n>``
100  Set the burst size to be used when enqueuing / dequeuing inferences.
101  Default value is ``1``.
102
103``--queue_pairs <n>``
104  Set the number of queue-pairs to be used for inference enqueue and dequeue operations.
105  Default value is ``1``.
106
107``--queue_size <n>``
108  Set the size of queue-pair to be created for inference enqueue / dequeue operations.
109  Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation.
110  Default value is ``1``.
111
112``--tolerance <n>``
113  Set the tolerance value in percentage to be used for output validation.
114  Default value is ``0``.
115
116``--stats``
117  Enable reporting device extended stats.
118
119``--debug``
120  Enable the tests to run in debug mode.
121
122``--help``
123  Print help message.
124
125
126ML Device Tests
127---------------
128
129ML device tests are functional tests to validate ML device API.
130Device tests validate the ML device handling configure, close, start and stop APIs.
131
132
133Application Options
134~~~~~~~~~~~~~~~~~~~
135
136Supported command line options for the ``device_ops`` test are following::
137
138   --debug
139   --test
140   --dev_id
141   --socket_id
142   --queue_pairs
143   --queue_size
144
145
146DEVICE_OPS Test
147~~~~~~~~~~~~~~~
148
149Device ops test validates the device configuration and reconfiguration support.
150The test configures ML device based on the options
151``--queue_pairs`` and ``--queue_size`` specified by the user,
152and later reconfigures the ML device with the number of queue pairs and queue size
153based on the maximum specified through the device info.
154
155
156Example
157^^^^^^^
158
159Command to run ``device_ops`` test:
160
161.. code-block:: console
162
163   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
164        --test=device_ops
165
166Command to run ``device_ops`` test with user options:
167
168.. code-block:: console
169
170   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
171        --test=device_ops --queue_pairs <M> --queue_size <N>
172
173
174ML Model Tests
175--------------
176
177Model tests are functional tests to validate ML model API.
178Model tests validate the functioning of load, start, stop and unload ML models.
179
180
181Application Options
182~~~~~~~~~~~~~~~~~~~
183
184Supported command line options for the ``model_ops`` test are following::
185
186   --debug
187   --test
188   --dev_id
189   --socket_id
190   --models
191
192List of model files to be used for the ``model_ops`` test can be specified
193through the option ``--models <model_list>`` as a comma separated list.
194Maximum number of models supported in the test is ``8``.
195
196.. note::
197
198   * The ``--models <model_list>`` is a mandatory option for running this test.
199   * Options not supported by the test are ignored if specified.
200
201
202MODEL_OPS Test
203~~~~~~~~~~~~~~
204
205The test is a collection of multiple sub-tests,
206each with a different order of slow-path operations
207when handling with `N` number of models.
208
209**Sub-test A:**
210executes the sequence of load / start / stop / unload for a model in order,
211followed by next model.
212
213.. _figure_mldev_model_ops_subtest_a:
214
215.. figure:: img/mldev_model_ops_subtest_a.*
216
217   Execution sequence of model_ops subtest A.
218
219**Sub-test B:**
220executes load for all models, followed by a start for all models.
221Upon successful start of all models, stop is invoked for all models followed by unload.
222
223.. _figure_mldev_model_ops_subtest_b:
224
225.. figure:: img/mldev_model_ops_subtest_b.*
226
227   Execution sequence of model_ops subtest B.
228
229**Sub-test C:**
230loads all models, followed by a start and stop of all models in order.
231Upon completion of stop, unload is invoked for all models.
232
233.. _figure_mldev_model_ops_subtest_c:
234
235.. figure:: img/mldev_model_ops_subtest_c.*
236
237   Execution sequence of model_ops subtest C.
238
239**Sub-test D:**
240executes load and start for all models available.
241Upon successful start of all models, stop is executed for the models.
242
243.. _figure_mldev_model_ops_subtest_d:
244
245.. figure:: img/mldev_model_ops_subtest_d.*
246
247   Execution sequence of model_ops subtest D.
248
249
250Example
251^^^^^^^
252
253Command to run ``model_ops`` test:
254
255.. code-block:: console
256
257   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
258        --test=model_ops --models model_1.bin,model_2.bin,model_3.bin, model_4.bin
259
260
261ML Inference Tests
262------------------
263
264Inference tests are a set of tests to validate end-to-end inference execution on ML device.
265These tests executes the full sequence of operations required to run inferences
266with one or multiple models.
267
268
269Application Options
270~~~~~~~~~~~~~~~~~~~
271
272Supported command line options for inference tests are following::
273
274   --debug
275   --test
276   --dev_id
277   --socket_id
278   --filelist
279   --repetitions
280   --burst_size
281   --queue_pairs
282   --queue_size
283   --tolerance
284   --stats
285
286List of files to be used for the inference tests can be specified
287through the option ``--filelist <file_list>`` as a comma separated list.
288A filelist entry would be of the format
289``--filelist <model_file,input_file,output_file>[,reference_file]``
290and is used to specify the list of files required to test with a single model.
291Multiple filelist entries are supported by the test, one entry per model.
292Maximum number of file entries supported by the test is ``8``.
293
294When ``--burst_size <num>`` option is specified for the test,
295enqueue and dequeue burst would try to enqueue or dequeue
296``num`` number of inferences per each call respectively.
297
298In the inference test, a pair of lcores are mapped to each queue pair.
299Minimum number of lcores required for the tests is equal to ``(queue_pairs * 2 + 1)``.
300
301Output validation of inference would be enabled only
302when a reference file is specified through the ``--filelist`` option.
303Application would additionally consider the tolerance value
304provided through ``--tolerance`` option during validation.
305When the tolerance values is 0, CRC32 hash of inference output
306and reference output are compared.
307When the tolerance is non-zero, element wise comparison of output is performed.
308Validation is considered as successful only
309when all the elements of the output tensor are with in the tolerance range specified.
310
311Enabling ``--stats`` would print the extended stats supported by the driver.
312
313.. note::
314
315   * The ``--filelist <file_list>`` is a mandatory option for running inference tests.
316   * Options not supported by the tests are ignored if specified.
317   * Element wise comparison is not supported when
318     the output dtype is either fp8, fp16 or bfloat16.
319     This is applicable only when the tolerance is greater than zero
320     and for pre-quantized models only.
321
322
323INFERENCE_ORDERED Test
324~~~~~~~~~~~~~~~~~~~~~~
325
326This is a functional test for validating the end-to-end inference execution on ML device.
327This test configures ML device and queue pairs
328as per the queue-pair related options (queue_pairs and queue_size) specified by the user.
329Upon successful configuration of the device and queue pairs,
330the first model specified through the filelist is loaded to the device
331and inferences are enqueued by a pool of worker threads to the ML device.
332Total number of inferences enqueued for the model are equal to the repetitions specified.
333A dedicated pool of worker threads would dequeue the inferences from the device.
334The model is unloaded upon completion of all inferences for the model.
335The test would continue loading and executing inference requests for all models
336specified through ``filelist`` option in an ordered manner.
337
338.. _figure_mldev_inference_ordered:
339
340.. figure:: img/mldev_inference_ordered.*
341
342   Execution of inference_ordered on single model.
343
344
345Example
346^^^^^^^
347
348Example command to run ``inference_ordered`` test:
349
350.. code-block:: console
351
352   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
353        --test=inference_ordered --filelist model.bin,input.bin,output.bin
354
355Example command to run ``inference_ordered`` test with a specific burst size:
356
357.. code-block:: console
358
359   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
360        --test=inference_ordered --filelist model.bin,input.bin,output.bin \
361        --burst_size 12
362
363Example command to run ``inference_ordered`` test with multiple queue-pairs and queue size:
364
365.. code-block:: console
366
367   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
368        --test=inference_ordered --filelist model.bin,input.bin,output.bin \
369        --queue_pairs 4 --queue_size 16
370
371Example command to run ``inference_ordered`` with output validation using tolerance of ``1%``:
372
373.. code-block:: console
374
375   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
376        --test=inference_ordered --filelist model.bin,input.bin,output.bin,reference.bin \
377        --tolerance 1.0
378
379
380INFERENCE_INTERLEAVE Test
381~~~~~~~~~~~~~~~~~~~~~~~~~
382
383This is a stress test for validating the end-to-end inference execution on ML device.
384The test configures the ML device and queue pairs
385as per the queue-pair related options (queue_pairs and queue_size) specified by the user.
386Upon successful configuration of the device and queue pairs,
387all models specified through the filelist are loaded to the device.
388Inferences for multiple models are enqueued by a pool of worker threads in parallel.
389Inference execution by the device is interleaved between multiple models.
390Total number of inferences enqueued for a model are equal to the repetitions specified.
391An additional pool of threads would dequeue the inferences from the device.
392Models would be unloaded upon completion of inferences for all models loaded.
393
394.. _figure_mldev_inference_interleave:
395
396.. figure:: img/mldev_inference_interleave.*
397
398   Execution of inference_interleave on single model.
399
400
401Example
402^^^^^^^
403
404Example command to run ``inference_interleave`` test:
405
406.. code-block:: console
407
408   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
409        --test=inference_interleave --filelist model.bin,input.bin,output.bin
410
411Example command to run ``inference_interleave`` test with multiple models:
412
413.. code-block:: console
414
415   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
416        --test=inference_interleave --filelist model_A.bin,input_A.bin,output_A.bin \
417        --filelist model_B.bin,input_B.bin,output_B.bin
418
419Example command to run ``inference_interleave`` test
420with a specific burst size, multiple queue-pairs and queue size:
421
422.. code-block:: console
423
424   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
425        --test=inference_interleave --filelist model.bin,input.bin,output.bin \
426        --queue_pairs 8 --queue_size 12 --burst_size 16
427
428Example command to run ``inference_interleave`` test
429with multiple models and output validation using tolerance of ``2.0%``:
430
431.. code-block:: console
432
433   sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \
434        --test=inference_interleave \
435        --filelist model_A.bin,input_A.bin,output_A.bin,reference_A.bin \
436        --filelist model_B.bin,input_B.bin,output_B.bin,reference_B.bin \
437        --tolerance 2.0
438
439
440Debug mode
441----------
442
443ML tests can be executed in debug mode by enabling the option ``--debug``.
444Execution of tests in debug mode would enable additional prints.
445
446When a validation failure is observed, output from that buffer is written to the disk,
447with the filenames having similar convention when the test has passed.
448Additionally index of the buffer would be appended to the filenames.
449