1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright (c) 2022 Marvell. 3 4dpdk-test-mldev Application 5=========================== 6 7The ``dpdk-test-mldev`` tool is a Data Plane Development Kit (DPDK) application 8that allows testing various mldev use cases. 9This application has a generic framework to add new mldev based test cases 10to verify functionality 11and measure the performance of inference execution on DPDK ML devices. 12 13 14Application and Options 15----------------------- 16 17The application has a number of command line options: 18 19.. code-block:: console 20 21 dpdk-test-mldev [EAL Options] -- [application options] 22 23 24EAL Options 25~~~~~~~~~~~ 26 27The following are the EAL command-line options that can be used 28with the ``dpdk-test-mldev`` application. 29See the DPDK Getting Started Guides for more information on these options. 30 31``-c <COREMASK>`` or ``-l <CORELIST>`` 32 Set the hexadecimal bitmask of the cores to run on. 33 The corelist is a list of cores to use. 34 35``-a <PCI_ID>`` 36 Attach a PCI based ML device. 37 Specific to drivers using a PCI based ML device. 38 39``--vdev <driver>`` 40 Add a virtual mldev device. 41 Specific to drivers using a ML virtual device. 42 43 44Application Options 45~~~~~~~~~~~~~~~~~~~ 46 47The following are the command-line options supported by the test application. 48 49``--test <name>`` 50 Name of the test to execute. 51 ML tests are divided into three groups: Device, Model and Inference tests. 52 Test name should be one of the following supported tests. 53 54 **ML Device Tests** :: 55 56 device_ops 57 58 **ML Model Tests** :: 59 60 model_ops 61 62 **ML Inference Tests** :: 63 64 inference_ordered 65 inference_interleave 66 67``--dev_id <n>`` 68 Set the device ID of the ML device to be used for the test. 69 Default value is ``0``. 70 71``--socket_id <n>`` 72 Set the socket ID of the application resources. 73 Default value is ``SOCKET_ID_ANY``. 74 75``--models <model_list>`` 76 Set the list of model files to be used for the tests. 77 Application expects the ``model_list`` in comma separated form 78 (i.e. ``--models model_A.bin,model_B.bin``). 79 Maximum number of models supported by the test is ``8``. 80 81``--filelist <file_list>`` 82 Set the list of model, input, output and reference files to be used for the tests. 83 Application expects the ``file_list`` to be in comma separated form 84 (i.e. ``--filelist <model,input,output>[,reference]``). 85 86 Multiple filelist entries can be specified when running the tests with multiple models. 87 Both quantized and dequantized outputs are written to the disk. 88 Dequantized output file would have the name specified by the user through ``--filelist`` option. 89 A suffix ``.q`` is appended to quantized output filename. 90 Maximum number of filelist entries supported by the test is ``8``. 91 92``--repetitions <n>`` 93 Set the number of inference repetitions to be executed in the test per each model. 94 Default value is ``1``. 95 96``--burst_size <n>`` 97 Set the burst size to be used when enqueuing / dequeuing inferences. 98 Default value is ``1``. 99 100``--queue_pairs <n>`` 101 Set the number of queue-pairs to be used for inference enqueue and dequeue operations. 102 Default value is ``1``. 103 104``--queue_size <n>`` 105 Set the size of queue-pair to be created for inference enqueue / dequeue operations. 106 Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation. 107 Default value is ``1``. 108 109``--batches <n>`` 110 Set the number batches in the input file provided for inference run. 111 When not specified, the test would assume the number of batches 112 is the batch size of the model. 113 114``--tolerance <n>`` 115 Set the tolerance value in percentage to be used for output validation. 116 Default value is ``0``. 117 118``--stats`` 119 Enable reporting device extended stats. 120 121``--debug`` 122 Enable the tests to run in debug mode. 123 124``--help`` 125 Print help message. 126 127 128ML Device Tests 129--------------- 130 131ML device tests are functional tests to validate ML device API. 132Device tests validate the ML device handling configure, close, start and stop APIs. 133 134 135Application Options 136~~~~~~~~~~~~~~~~~~~ 137 138Supported command line options for the ``device_ops`` test are following:: 139 140 --debug 141 --test 142 --dev_id 143 --socket_id 144 --queue_pairs 145 --queue_size 146 147 148DEVICE_OPS Test 149~~~~~~~~~~~~~~~ 150 151Device ops test validates the device configuration and reconfiguration support. 152The test configures ML device based on the options 153``--queue_pairs`` and ``--queue_size`` specified by the user, 154and later reconfigures the ML device with the number of queue pairs and queue size 155based on the maximum specified through the device info. 156 157 158Example 159^^^^^^^ 160 161Command to run ``device_ops`` test: 162 163.. code-block:: console 164 165 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 166 --test=device_ops 167 168Command to run ``device_ops`` test with user options: 169 170.. code-block:: console 171 172 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 173 --test=device_ops --queue_pairs <M> --queue_size <N> 174 175 176ML Model Tests 177-------------- 178 179Model tests are functional tests to validate ML model API. 180Model tests validate the functioning of load, start, stop and unload ML models. 181 182 183Application Options 184~~~~~~~~~~~~~~~~~~~ 185 186Supported command line options for the ``model_ops`` test are following:: 187 188 --debug 189 --test 190 --dev_id 191 --socket_id 192 --models 193 194List of model files to be used for the ``model_ops`` test can be specified 195through the option ``--models <model_list>`` as a comma separated list. 196Maximum number of models supported in the test is ``8``. 197 198.. note:: 199 200 * The ``--models <model_list>`` is a mandatory option for running this test. 201 * Options not supported by the test are ignored if specified. 202 203 204MODEL_OPS Test 205~~~~~~~~~~~~~~ 206 207The test is a collection of multiple sub-tests, 208each with a different order of slow-path operations 209when handling with `N` number of models. 210 211**Sub-test A:** 212executes the sequence of load / start / stop / unload for a model in order, 213followed by next model. 214 215.. _figure_mldev_model_ops_subtest_a: 216 217.. figure:: img/mldev_model_ops_subtest_a.* 218 219 Execution sequence of model_ops subtest A. 220 221**Sub-test B:** 222executes load for all models, followed by a start for all models. 223Upon successful start of all models, stop is invoked for all models followed by unload. 224 225.. _figure_mldev_model_ops_subtest_b: 226 227.. figure:: img/mldev_model_ops_subtest_b.* 228 229 Execution sequence of model_ops subtest B. 230 231**Sub-test C:** 232loads all models, followed by a start and stop of all models in order. 233Upon completion of stop, unload is invoked for all models. 234 235.. _figure_mldev_model_ops_subtest_c: 236 237.. figure:: img/mldev_model_ops_subtest_c.* 238 239 Execution sequence of model_ops subtest C. 240 241**Sub-test D:** 242executes load and start for all models available. 243Upon successful start of all models, stop is executed for the models. 244 245.. _figure_mldev_model_ops_subtest_d: 246 247.. figure:: img/mldev_model_ops_subtest_d.* 248 249 Execution sequence of model_ops subtest D. 250 251 252Example 253^^^^^^^ 254 255Command to run ``model_ops`` test: 256 257.. code-block:: console 258 259 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 260 --test=model_ops --models model_1.bin,model_2.bin,model_3.bin, model_4.bin 261 262 263ML Inference Tests 264------------------ 265 266Inference tests are a set of tests to validate end-to-end inference execution on ML device. 267These tests executes the full sequence of operations required to run inferences 268with one or multiple models. 269 270 271Application Options 272~~~~~~~~~~~~~~~~~~~ 273 274Supported command line options for inference tests are following:: 275 276 --debug 277 --test 278 --dev_id 279 --socket_id 280 --filelist 281 --repetitions 282 --burst_size 283 --queue_pairs 284 --queue_size 285 --batches 286 --tolerance 287 --stats 288 289List of files to be used for the inference tests can be specified 290through the option ``--filelist <file_list>`` as a comma separated list. 291A filelist entry would be of the format 292``--filelist <model_file,input_file,output_file>[,reference_file]`` 293and is used to specify the list of files required to test with a single model. 294Multiple filelist entries are supported by the test, one entry per model. 295Maximum number of file entries supported by the test is ``8``. 296 297When ``--burst_size <num>`` option is specified for the test, 298enqueue and dequeue burst would try to enqueue or dequeue 299``num`` number of inferences per each call respectively. 300 301In the inference test, a pair of lcores are mapped to each queue pair. 302Minimum number of lcores required for the tests is equal to ``(queue_pairs * 2 + 1)``. 303 304Output validation of inference would be enabled only 305when a reference file is specified through the ``--filelist`` option. 306Application would additionally consider the tolerance value 307provided through ``--tolerance`` option during validation. 308When the tolerance values is 0, CRC32 hash of inference output 309and reference output are compared. 310When the tolerance is non-zero, element wise comparison of output is performed. 311Validation is considered as successful only 312when all the elements of the output tensor are with in the tolerance range specified. 313 314Enabling ``--stats`` would print the extended stats supported by the driver. 315 316.. note:: 317 318 * The ``--filelist <file_list>`` is a mandatory option for running inference tests. 319 * Options not supported by the tests are ignored if specified. 320 * Element wise comparison is not supported when 321 the output dtype is either fp8, fp16 or bfloat16. 322 This is applicable only when the tolerance is greater than zero 323 and for pre-quantized models only. 324 325 326INFERENCE_ORDERED Test 327~~~~~~~~~~~~~~~~~~~~~~ 328 329This is a functional test for validating the end-to-end inference execution on ML device. 330This test configures ML device and queue pairs 331as per the queue-pair related options (queue_pairs and queue_size) specified by the user. 332Upon successful configuration of the device and queue pairs, 333the first model specified through the filelist is loaded to the device 334and inferences are enqueued by a pool of worker threads to the ML device. 335Total number of inferences enqueued for the model are equal to the repetitions specified. 336A dedicated pool of worker threads would dequeue the inferences from the device. 337The model is unloaded upon completion of all inferences for the model. 338The test would continue loading and executing inference requests for all models 339specified through ``filelist`` option in an ordered manner. 340 341.. _figure_mldev_inference_ordered: 342 343.. figure:: img/mldev_inference_ordered.* 344 345 Execution of inference_ordered on single model. 346 347 348Example 349^^^^^^^ 350 351Example command to run ``inference_ordered`` test: 352 353.. code-block:: console 354 355 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 356 --test=inference_ordered --filelist model.bin,input.bin,output.bin 357 358Example command to run ``inference_ordered`` test with a specific burst size: 359 360.. code-block:: console 361 362 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 363 --test=inference_ordered --filelist model.bin,input.bin,output.bin \ 364 --burst_size 12 365 366Example command to run ``inference_ordered`` test with multiple queue-pairs and queue size: 367 368.. code-block:: console 369 370 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 371 --test=inference_ordered --filelist model.bin,input.bin,output.bin \ 372 --queue_pairs 4 --queue_size 16 373 374Example command to run ``inference_ordered`` with output validation using tolerance of ``1%``: 375 376.. code-block:: console 377 378 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 379 --test=inference_ordered --filelist model.bin,input.bin,output.bin,reference.bin \ 380 --tolerance 1.0 381 382 383INFERENCE_INTERLEAVE Test 384~~~~~~~~~~~~~~~~~~~~~~~~~ 385 386This is a stress test for validating the end-to-end inference execution on ML device. 387The test configures the ML device and queue pairs 388as per the queue-pair related options (queue_pairs and queue_size) specified by the user. 389Upon successful configuration of the device and queue pairs, 390all models specified through the filelist are loaded to the device. 391Inferences for multiple models are enqueued by a pool of worker threads in parallel. 392Inference execution by the device is interleaved between multiple models. 393Total number of inferences enqueued for a model are equal to the repetitions specified. 394An additional pool of threads would dequeue the inferences from the device. 395Models would be unloaded upon completion of inferences for all models loaded. 396 397.. _figure_mldev_inference_interleave: 398 399.. figure:: img/mldev_inference_interleave.* 400 401 Execution of inference_interleave on single model. 402 403 404Example 405^^^^^^^ 406 407Example command to run ``inference_interleave`` test: 408 409.. code-block:: console 410 411 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 412 --test=inference_interleave --filelist model.bin,input.bin,output.bin 413 414Example command to run ``inference_interleave`` test with multiple models: 415 416.. code-block:: console 417 418 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 419 --test=inference_interleave --filelist model_A.bin,input_A.bin,output_A.bin \ 420 --filelist model_B.bin,input_B.bin,output_B.bin 421 422Example command to run ``inference_interleave`` test 423with a specific burst size, multiple queue-pairs and queue size: 424 425.. code-block:: console 426 427 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 428 --test=inference_interleave --filelist model.bin,input.bin,output.bin \ 429 --queue_pairs 8 --queue_size 12 --burst_size 16 430 431Example command to run ``inference_interleave`` test 432with multiple models and output validation using tolerance of ``2.0%``: 433 434.. code-block:: console 435 436 sudo <build_dir>/app/dpdk-test-mldev -c 0xf -a <PCI_ID> -- \ 437 --test=inference_interleave \ 438 --filelist model_A.bin,input_A.bin,output_A.bin,reference_A.bin \ 439 --filelist model_B.bin,input_B.bin,output_B.bin,reference_B.bin \ 440 --tolerance 2.0 441 442 443Debug mode 444---------- 445 446ML tests can be executed in debug mode by enabling the option ``--debug``. 447Execution of tests in debug mode would enable additional prints. 448 449When a validation failure is observed, output from that buffer is written to the disk, 450with the filenames having similar convention when the test has passed. 451Additionally index of the buffer would be appended to the filenames. 452