1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright (c) 2022 Marvell. 3 4Machine Learning (ML) Device Library 5==================================== 6 7The Machine Learning (ML) Device library provides a Machine Learning device framework for the management and 8provisioning of hardware and software ML poll mode drivers, 9defining an API which support a number of ML operations 10including device handling and inference processing. 11The ML model creation and training is outside of the scope of this library. 12 13The ML framework is built on the following model: 14 15.. _figure_mldev_work_flow: 16 17.. figure:: img/mldev_flow.* 18 19 Work flow of inference on MLDEV 20 21ML Device 22 A hardware or software-based implementation of ML device API 23 for running inferences using a pre-trained ML model. 24 25ML Model 26 An ML model is an algorithm trained over a dataset. 27 A model consists of procedure/algorithm and data/pattern 28 required to make predictions on live data. 29 Once the model is created and trained outside of the DPDK scope, 30 the model can be loaded via ``rte_ml_model_load()`` 31 and then start it using ``rte_ml_model_start()`` API function. 32 The ``rte_ml_model_params_update()`` can be used to update the model parameters 33 such as weights and bias without unloading the model using ``rte_ml_model_unload()``. 34 35ML Inference 36 ML inference is the process of feeding data to the model 37 via ``rte_ml_enqueue_burst()`` API function 38 and use ``rte_ml_dequeue_burst()`` API function 39 to get the calculated outputs / predictions from the started model. 40 41 42Design Principles 43----------------- 44 45The MLDEV library follows the same basic principles as those used in DPDK's 46Ethernet Device framework and the Crypto framework. 47The MLDEV framework provides a generic Machine Learning device framework 48which supports both physical (hardware) and virtual (software) ML devices 49as well as an ML API to manage and configure ML devices. 50The API also supports performing ML inference operations 51through ML poll mode driver. 52 53 54Device Operations 55----------------- 56 57Device Creation 58~~~~~~~~~~~~~~~ 59 60Physical ML devices are discovered during the PCI probe/enumeration, 61through the EAL functions which are executed at DPDK initialization, 62based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function). 63ML physical devices, like other physical devices in DPDK can be allowed or blocked 64using the EAL command line options. 65 66 67Device Identification 68~~~~~~~~~~~~~~~~~~~~~ 69 70Each device, whether virtual or physical is uniquely designated by two identifiers: 71 72- A unique device index used to designate the ML device 73 in all functions exported by the MLDEV API. 74 75- A device name used to designate the ML device in console messages, 76 for administration or debugging purposes. 77 78 79Device Features and Capabilities 80~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 81 82ML devices may support different feature set. 83In order to get the supported PMD feature ``rte_ml_dev_info_get()`` API 84which return the info of the device and its supported features. 85 86 87Device Configuration 88~~~~~~~~~~~~~~~~~~~~ 89 90The configuration of each ML device includes the following operations: 91 92- Allocation of resources, including hardware resources if a physical device. 93- Resetting the device into a well-known default state. 94- Initialization of statistics counters. 95 96The ``rte_ml_dev_configure()`` API is used to configure a ML device. 97 98.. code-block:: c 99 100 int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg); 101 102The ``rte_ml_dev_config`` structure is used to pass the configuration parameters 103for the ML device, for example number of queue pairs, maximum number of models, 104maximum size of model and so on. 105 106Configuration of Queue Pairs 107~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 108 109Each ML device can be configured with number of queue pairs. 110Each queue pair is configured using ``rte_ml_dev_queue_pair_setup()`` 111 112 113Logical Cores, Memory and Queues Pair Relationships 114~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 115 116Multiple logical cores should never share the same queue pair 117for enqueuing operations or dequeueing operations on the same ML device 118since this would require global locks and hinder performance. 119 120 121Configuration of Machine Learning models 122~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 123 124Pre-trained ML models that are built using external ML compiler / training frameworks 125are used to perform inference operations. 126These models are configured on an ML device in a two-stage process 127that includes loading the model on an ML device, 128and starting the model to accept inference operations. 129Inference operations can be queued for a model 130only when the model is in started state. 131Model load stage assigns a Model ID, 132which is unique for the model in a driver's context. 133Model ID is used during all subsequent slow-path and fast-path operations. 134 135Model loading and start is done 136through the ``rte_ml_model_load()`` and ``rte_ml_model_start()`` functions. 137 138Similarly stop and unloading are done 139through ``rte_ml_model_stop()`` and ``rte_ml_model_unload()`` functions. 140 141Stop and unload functions would release the resources allocated for the models. 142Inference tasks cannot be queued for a model that is stopped. 143 144Detailed information related to the model can be retrieved from the driver 145using the function ``rte_ml_model_info_get()``. 146Model information is accessible to the application 147through the ``rte_ml_model_info`` structure. 148Information available to the user would include the details related to 149the inputs and outputs, and the maximum batch size supported by the model. 150 151User can optionally update the model parameters such as weights and bias, 152without unloading the model, through the ``rte_ml_model_params_update()`` function. 153A model should be in stopped state to update the parameters. 154Model has to be started in order to enqueue inference requests after parameters update. 155 156 157Enqueue / Dequeue 158~~~~~~~~~~~~~~~~~ 159 160The burst enqueue API uses a ML device identifier and a queue pair identifier 161to specify the device queue pair to schedule the processing on. 162The ``nb_ops`` parameter is the number of operations to process 163which are supplied in the ``ops`` array of ``rte_ml_op`` structures. 164The enqueue function returns the number of operations it enqueued for processing, 165a return value equal to ``nb_ops`` means that all packets have been enqueued. 166 167The dequeue API uses the same format as the enqueue API of processed 168but the ``nb_ops`` and ``ops`` parameters are now used to specify 169the max processed operations the user wishes to retrieve 170and the location in which to store them. 171The API call returns the actual number of processed operations returned; 172this can never be larger than ``nb_ops``. 173 174``rte_ml_op`` provides the required information to the driver 175to queue an ML inference task. 176ML op specifies the model to be used and the number of batches 177to be executed in the inference task. 178Input and output buffer information is specified through 179the structure ``rte_ml_buff_seg``, which supports segmented data. 180Input is provided through the ``rte_ml_op::input`` 181and output through ``rte_ml_op::output``. 182Data pointed in each op, should not be released until the dequeue of that op. 183 184 185Quantize and Dequantize 186~~~~~~~~~~~~~~~~~~~~~~~ 187 188Inference operations performed with lower precision types would improve 189the throughput and efficiency of the inference execution 190with a minimal loss of accuracy, which is within the tolerance limits. 191Quantization and dequantization is the process of converting data 192from a higher precision type to a lower precision type and vice-versa. 193ML library provides the functions ``rte_ml_io_quantize()`` and ``rte_ml_io_dequantize()`` 194to enable data type conversions. 195User needs to provide the address of the quantized and dequantized data buffers 196to the functions, along the number of the batches in the buffers. 197 198For quantization, the dequantized data is assumed to be 199of the type ``dtype`` provided by the ``rte_ml_model_info::input`` 200and the data is converted to ``qtype`` provided by the ``rte_ml_model_info::input``. 201 202For dequantization, the quantized data is assumed to be 203of the type ``qtype`` provided by the ``rte_ml_model_info::output`` 204and the data is converted to ``dtype`` provided by the ``rte_ml_model_info::output``. 205 206Size of the buffers required for the input and output can be calculated 207using the functions ``rte_ml_io_input_size_get()`` and ``rte_ml_io_output_size_get()``. 208These functions would get the buffer sizes for both quantized and dequantized data 209for the given number of batches. 210