1d82cac58SJerin Jacob.. SPDX-License-Identifier: BSD-3-Clause 2d82cac58SJerin Jacob Copyright (c) 2022 Marvell. 3d82cac58SJerin Jacob 4*41dd9a6bSDavid YoungMachine Learning (ML) Device Library 5*41dd9a6bSDavid Young==================================== 6d82cac58SJerin Jacob 7*41dd9a6bSDavid YoungThe Machine Learning (ML) Device library provides a Machine Learning device framework for the management and 8d82cac58SJerin Jacobprovisioning of hardware and software ML poll mode drivers, 9d82cac58SJerin Jacobdefining an API which support a number of ML operations 10d82cac58SJerin Jacobincluding device handling and inference processing. 11d82cac58SJerin JacobThe ML model creation and training is outside of the scope of this library. 12d82cac58SJerin Jacob 13d82cac58SJerin JacobThe ML framework is built on the following model: 14d82cac58SJerin Jacob 15d82cac58SJerin Jacob.. _figure_mldev_work_flow: 16d82cac58SJerin Jacob 17d82cac58SJerin Jacob.. figure:: img/mldev_flow.* 18d82cac58SJerin Jacob 19d82cac58SJerin Jacob Work flow of inference on MLDEV 20d82cac58SJerin Jacob 21d82cac58SJerin JacobML Device 22d82cac58SJerin Jacob A hardware or software-based implementation of ML device API 23d82cac58SJerin Jacob for running inferences using a pre-trained ML model. 24d82cac58SJerin Jacob 25d82cac58SJerin JacobML Model 26d82cac58SJerin Jacob An ML model is an algorithm trained over a dataset. 27d82cac58SJerin Jacob A model consists of procedure/algorithm and data/pattern 28d82cac58SJerin Jacob required to make predictions on live data. 29d82cac58SJerin Jacob Once the model is created and trained outside of the DPDK scope, 30d82cac58SJerin Jacob the model can be loaded via ``rte_ml_model_load()`` 31d82cac58SJerin Jacob and then start it using ``rte_ml_model_start()`` API function. 32d82cac58SJerin Jacob The ``rte_ml_model_params_update()`` can be used to update the model parameters 33d82cac58SJerin Jacob such as weights and bias without unloading the model using ``rte_ml_model_unload()``. 34d82cac58SJerin Jacob 35d82cac58SJerin JacobML Inference 36d82cac58SJerin Jacob ML inference is the process of feeding data to the model 37d82cac58SJerin Jacob via ``rte_ml_enqueue_burst()`` API function 38d82cac58SJerin Jacob and use ``rte_ml_dequeue_burst()`` API function 39d82cac58SJerin Jacob to get the calculated outputs / predictions from the started model. 40d82cac58SJerin Jacob 41d82cac58SJerin Jacob 42d82cac58SJerin JacobDesign Principles 43d82cac58SJerin Jacob----------------- 44d82cac58SJerin Jacob 45d82cac58SJerin JacobThe MLDEV library follows the same basic principles as those used in DPDK's 46d82cac58SJerin JacobEthernet Device framework and the Crypto framework. 47d82cac58SJerin JacobThe MLDEV framework provides a generic Machine Learning device framework 48d82cac58SJerin Jacobwhich supports both physical (hardware) and virtual (software) ML devices 49d82cac58SJerin Jacobas well as an ML API to manage and configure ML devices. 50d82cac58SJerin JacobThe API also supports performing ML inference operations 51d82cac58SJerin Jacobthrough ML poll mode driver. 52d82cac58SJerin Jacob 53d82cac58SJerin Jacob 54d82cac58SJerin JacobDevice Operations 55d82cac58SJerin Jacob----------------- 56d82cac58SJerin Jacob 57d82cac58SJerin JacobDevice Creation 58d82cac58SJerin Jacob~~~~~~~~~~~~~~~ 59d82cac58SJerin Jacob 60d82cac58SJerin JacobPhysical ML devices are discovered during the PCI probe/enumeration, 61d82cac58SJerin Jacobthrough the EAL functions which are executed at DPDK initialization, 62d82cac58SJerin Jacobbased on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function). 63d82cac58SJerin JacobML physical devices, like other physical devices in DPDK can be allowed or blocked 64d82cac58SJerin Jacobusing the EAL command line options. 65d82cac58SJerin Jacob 66d82cac58SJerin Jacob 67d82cac58SJerin JacobDevice Identification 68d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~ 69d82cac58SJerin Jacob 70d82cac58SJerin JacobEach device, whether virtual or physical is uniquely designated by two identifiers: 71d82cac58SJerin Jacob 72d82cac58SJerin Jacob- A unique device index used to designate the ML device 73d82cac58SJerin Jacob in all functions exported by the MLDEV API. 74d82cac58SJerin Jacob 75d82cac58SJerin Jacob- A device name used to designate the ML device in console messages, 76d82cac58SJerin Jacob for administration or debugging purposes. 77d82cac58SJerin Jacob 78d82cac58SJerin Jacob 79d82cac58SJerin JacobDevice Features and Capabilities 80d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 81d82cac58SJerin Jacob 82d82cac58SJerin JacobML devices may support different feature set. 83d82cac58SJerin JacobIn order to get the supported PMD feature ``rte_ml_dev_info_get()`` API 84d82cac58SJerin Jacobwhich return the info of the device and its supported features. 85d82cac58SJerin Jacob 86d82cac58SJerin Jacob 87d82cac58SJerin JacobDevice Configuration 88d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~ 89d82cac58SJerin Jacob 90d82cac58SJerin JacobThe configuration of each ML device includes the following operations: 91d82cac58SJerin Jacob 92d82cac58SJerin Jacob- Allocation of resources, including hardware resources if a physical device. 93d82cac58SJerin Jacob- Resetting the device into a well-known default state. 94d82cac58SJerin Jacob- Initialization of statistics counters. 95d82cac58SJerin Jacob 96d82cac58SJerin JacobThe ``rte_ml_dev_configure()`` API is used to configure a ML device. 97d82cac58SJerin Jacob 98d82cac58SJerin Jacob.. code-block:: c 99d82cac58SJerin Jacob 100d82cac58SJerin Jacob int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg); 101d82cac58SJerin Jacob 102d82cac58SJerin JacobThe ``rte_ml_dev_config`` structure is used to pass the configuration parameters 103d82cac58SJerin Jacobfor the ML device, for example number of queue pairs, maximum number of models, 104d82cac58SJerin Jacobmaximum size of model and so on. 105d82cac58SJerin Jacob 106d82cac58SJerin JacobConfiguration of Queue Pairs 107d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 108d82cac58SJerin Jacob 109d82cac58SJerin JacobEach ML device can be configured with number of queue pairs. 110d82cac58SJerin JacobEach queue pair is configured using ``rte_ml_dev_queue_pair_setup()`` 111d82cac58SJerin Jacob 112d82cac58SJerin Jacob 113d82cac58SJerin JacobLogical Cores, Memory and Queues Pair Relationships 114d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 115d82cac58SJerin Jacob 116d82cac58SJerin JacobMultiple logical cores should never share the same queue pair 117d82cac58SJerin Jacobfor enqueuing operations or dequeueing operations on the same ML device 118d82cac58SJerin Jacobsince this would require global locks and hinder performance. 119d82cac58SJerin Jacob 120d82cac58SJerin Jacob 121d82cac58SJerin JacobConfiguration of Machine Learning models 122d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 123d82cac58SJerin Jacob 124d82cac58SJerin JacobPre-trained ML models that are built using external ML compiler / training frameworks 125d82cac58SJerin Jacobare used to perform inference operations. 126d82cac58SJerin JacobThese models are configured on an ML device in a two-stage process 127d82cac58SJerin Jacobthat includes loading the model on an ML device, 128d82cac58SJerin Jacoband starting the model to accept inference operations. 129d82cac58SJerin JacobInference operations can be queued for a model 130d82cac58SJerin Jacobonly when the model is in started state. 131d82cac58SJerin JacobModel load stage assigns a Model ID, 132d82cac58SJerin Jacobwhich is unique for the model in a driver's context. 133d82cac58SJerin JacobModel ID is used during all subsequent slow-path and fast-path operations. 134d82cac58SJerin Jacob 135d82cac58SJerin JacobModel loading and start is done 136d82cac58SJerin Jacobthrough the ``rte_ml_model_load()`` and ``rte_ml_model_start()`` functions. 137d82cac58SJerin Jacob 138d82cac58SJerin JacobSimilarly stop and unloading are done 139d82cac58SJerin Jacobthrough ``rte_ml_model_stop()`` and ``rte_ml_model_unload()`` functions. 140d82cac58SJerin Jacob 141d82cac58SJerin JacobStop and unload functions would release the resources allocated for the models. 142d82cac58SJerin JacobInference tasks cannot be queued for a model that is stopped. 143d82cac58SJerin Jacob 144d82cac58SJerin JacobDetailed information related to the model can be retrieved from the driver 145d82cac58SJerin Jacobusing the function ``rte_ml_model_info_get()``. 146d82cac58SJerin JacobModel information is accessible to the application 147d82cac58SJerin Jacobthrough the ``rte_ml_model_info`` structure. 148d82cac58SJerin JacobInformation available to the user would include the details related to 149d82cac58SJerin Jacobthe inputs and outputs, and the maximum batch size supported by the model. 150d82cac58SJerin Jacob 151d82cac58SJerin JacobUser can optionally update the model parameters such as weights and bias, 152d82cac58SJerin Jacobwithout unloading the model, through the ``rte_ml_model_params_update()`` function. 153d82cac58SJerin JacobA model should be in stopped state to update the parameters. 154d82cac58SJerin JacobModel has to be started in order to enqueue inference requests after parameters update. 155d82cac58SJerin Jacob 156d82cac58SJerin Jacob 157d82cac58SJerin JacobEnqueue / Dequeue 158d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~ 159d82cac58SJerin Jacob 160d82cac58SJerin JacobThe burst enqueue API uses a ML device identifier and a queue pair identifier 161d82cac58SJerin Jacobto specify the device queue pair to schedule the processing on. 162d82cac58SJerin JacobThe ``nb_ops`` parameter is the number of operations to process 163d82cac58SJerin Jacobwhich are supplied in the ``ops`` array of ``rte_ml_op`` structures. 164d82cac58SJerin JacobThe enqueue function returns the number of operations it enqueued for processing, 165d82cac58SJerin Jacoba return value equal to ``nb_ops`` means that all packets have been enqueued. 166d82cac58SJerin Jacob 167d82cac58SJerin JacobThe dequeue API uses the same format as the enqueue API of processed 168d82cac58SJerin Jacobbut the ``nb_ops`` and ``ops`` parameters are now used to specify 169d82cac58SJerin Jacobthe max processed operations the user wishes to retrieve 170d82cac58SJerin Jacoband the location in which to store them. 171d82cac58SJerin JacobThe API call returns the actual number of processed operations returned; 172d82cac58SJerin Jacobthis can never be larger than ``nb_ops``. 173d82cac58SJerin Jacob 174d82cac58SJerin Jacob``rte_ml_op`` provides the required information to the driver 175d82cac58SJerin Jacobto queue an ML inference task. 176d82cac58SJerin JacobML op specifies the model to be used and the number of batches 177d82cac58SJerin Jacobto be executed in the inference task. 178d82cac58SJerin JacobInput and output buffer information is specified through 179d82cac58SJerin Jacobthe structure ``rte_ml_buff_seg``, which supports segmented data. 180d82cac58SJerin JacobInput is provided through the ``rte_ml_op::input`` 181d82cac58SJerin Jacoband output through ``rte_ml_op::output``. 182d82cac58SJerin JacobData pointed in each op, should not be released until the dequeue of that op. 183d82cac58SJerin Jacob 184d82cac58SJerin Jacob 185d82cac58SJerin JacobQuantize and Dequantize 186d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~ 187d82cac58SJerin Jacob 188d82cac58SJerin JacobInference operations performed with lower precision types would improve 189d82cac58SJerin Jacobthe throughput and efficiency of the inference execution 190d82cac58SJerin Jacobwith a minimal loss of accuracy, which is within the tolerance limits. 191d82cac58SJerin JacobQuantization and dequantization is the process of converting data 192d82cac58SJerin Jacobfrom a higher precision type to a lower precision type and vice-versa. 193d82cac58SJerin JacobML library provides the functions ``rte_ml_io_quantize()`` and ``rte_ml_io_dequantize()`` 194d82cac58SJerin Jacobto enable data type conversions. 195d82cac58SJerin JacobUser needs to provide the address of the quantized and dequantized data buffers 196d82cac58SJerin Jacobto the functions, along the number of the batches in the buffers. 197d82cac58SJerin Jacob 198d82cac58SJerin JacobFor quantization, the dequantized data is assumed to be 199d82cac58SJerin Jacobof the type ``dtype`` provided by the ``rte_ml_model_info::input`` 200d82cac58SJerin Jacoband the data is converted to ``qtype`` provided by the ``rte_ml_model_info::input``. 201d82cac58SJerin Jacob 202d82cac58SJerin JacobFor dequantization, the quantized data is assumed to be 203d82cac58SJerin Jacobof the type ``qtype`` provided by the ``rte_ml_model_info::output`` 204d82cac58SJerin Jacoband the data is converted to ``dtype`` provided by the ``rte_ml_model_info::output``. 205d82cac58SJerin Jacob 206d82cac58SJerin JacobSize of the buffers required for the input and output can be calculated 207d82cac58SJerin Jacobusing the functions ``rte_ml_io_input_size_get()`` and ``rte_ml_io_output_size_get()``. 208d82cac58SJerin JacobThese functions would get the buffer sizes for both quantized and dequantized data 209d82cac58SJerin Jacobfor the given number of batches. 210