guides/prog_guide/mldev.rst

d82cac58SJerin Jacob..  SPDX-License-Identifier: BSD-3-Clause
d82cac58SJerin Jacob    Copyright (c) 2022 Marvell.
d82cac58SJerin Jacob
*41dd9a6bSDavid YoungMachine Learning (ML) Device Library
*41dd9a6bSDavid Young====================================
d82cac58SJerin Jacob
*41dd9a6bSDavid YoungThe Machine Learning (ML) Device library provides a Machine Learning device framework for the management and
d82cac58SJerin Jacobprovisioning of hardware and software ML poll mode drivers,
d82cac58SJerin Jacobdefining an API which support a number of ML operations
d82cac58SJerin Jacobincluding device handling and inference processing.
d82cac58SJerin JacobThe ML model creation and training is outside of the scope of this library.
d82cac58SJerin Jacob
d82cac58SJerin JacobThe ML framework is built on the following model:
d82cac58SJerin Jacob
d82cac58SJerin Jacob.. _figure_mldev_work_flow:
d82cac58SJerin Jacob
d82cac58SJerin Jacob.. figure:: img/mldev_flow.*
d82cac58SJerin Jacob
d82cac58SJerin Jacob   Work flow of inference on MLDEV
d82cac58SJerin Jacob
d82cac58SJerin JacobML Device
d82cac58SJerin Jacob   A hardware or software-based implementation of ML device API
d82cac58SJerin Jacob   for running inferences using a pre-trained ML model.
d82cac58SJerin Jacob
d82cac58SJerin JacobML Model
d82cac58SJerin Jacob   An ML model is an algorithm trained over a dataset.
d82cac58SJerin Jacob   A model consists of procedure/algorithm and data/pattern
d82cac58SJerin Jacob   required to make predictions on live data.
d82cac58SJerin Jacob   Once the model is created and trained outside of the DPDK scope,
d82cac58SJerin Jacob   the model can be loaded via ``rte_ml_model_load()``
d82cac58SJerin Jacob   and then start it using ``rte_ml_model_start()`` API function.
d82cac58SJerin Jacob   The ``rte_ml_model_params_update()`` can be used to update the model parameters
d82cac58SJerin Jacob   such as weights and bias without unloading the model using ``rte_ml_model_unload()``.
d82cac58SJerin Jacob
d82cac58SJerin JacobML Inference
d82cac58SJerin Jacob   ML inference is the process of feeding data to the model
d82cac58SJerin Jacob   via ``rte_ml_enqueue_burst()`` API function
d82cac58SJerin Jacob   and use ``rte_ml_dequeue_burst()`` API function
d82cac58SJerin Jacob   to get the calculated outputs / predictions from the started model.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobDesign Principles
d82cac58SJerin Jacob-----------------
d82cac58SJerin Jacob
d82cac58SJerin JacobThe MLDEV library follows the same basic principles as those used in DPDK's
d82cac58SJerin JacobEthernet Device framework and the Crypto framework.
d82cac58SJerin JacobThe MLDEV framework provides a generic Machine Learning device framework
d82cac58SJerin Jacobwhich supports both physical (hardware) and virtual (software) ML devices
d82cac58SJerin Jacobas well as an ML API to manage and configure ML devices.
d82cac58SJerin JacobThe API also supports performing ML inference operations
d82cac58SJerin Jacobthrough ML poll mode driver.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobDevice Operations
d82cac58SJerin Jacob-----------------
d82cac58SJerin Jacob
d82cac58SJerin JacobDevice Creation
d82cac58SJerin Jacob~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobPhysical ML devices are discovered during the PCI probe/enumeration,
d82cac58SJerin Jacobthrough the EAL functions which are executed at DPDK initialization,
d82cac58SJerin Jacobbased on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function).
d82cac58SJerin JacobML physical devices, like other physical devices in DPDK can be allowed or blocked
d82cac58SJerin Jacobusing the EAL command line options.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobDevice Identification
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobEach device, whether virtual or physical is uniquely designated by two identifiers:
d82cac58SJerin Jacob
d82cac58SJerin Jacob- A unique device index used to designate the ML device
d82cac58SJerin Jacob  in all functions exported by the MLDEV API.
d82cac58SJerin Jacob
d82cac58SJerin Jacob- A device name used to designate the ML device in console messages,
d82cac58SJerin Jacob  for administration or debugging purposes.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobDevice Features and Capabilities
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobML devices may support different feature set.
d82cac58SJerin JacobIn order to get the supported PMD feature ``rte_ml_dev_info_get()`` API
d82cac58SJerin Jacobwhich return the info of the device and its supported features.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobDevice Configuration
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobThe configuration of each ML device includes the following operations:
d82cac58SJerin Jacob
d82cac58SJerin Jacob- Allocation of resources, including hardware resources if a physical device.
d82cac58SJerin Jacob- Resetting the device into a well-known default state.
d82cac58SJerin Jacob- Initialization of statistics counters.
d82cac58SJerin Jacob
d82cac58SJerin JacobThe ``rte_ml_dev_configure()`` API is used to configure a ML device.
d82cac58SJerin Jacob
d82cac58SJerin Jacob.. code-block:: c
d82cac58SJerin Jacob
d82cac58SJerin Jacob   int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg);
d82cac58SJerin Jacob
d82cac58SJerin JacobThe ``rte_ml_dev_config`` structure is used to pass the configuration parameters
d82cac58SJerin Jacobfor the ML device, for example number of queue pairs, maximum number of models,
d82cac58SJerin Jacobmaximum size of model and so on.
d82cac58SJerin Jacob
d82cac58SJerin JacobConfiguration of Queue Pairs
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobEach ML device can be configured with number of queue pairs.
d82cac58SJerin JacobEach queue pair is configured using ``rte_ml_dev_queue_pair_setup()``
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobLogical Cores, Memory and Queues Pair Relationships
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobMultiple logical cores should never share the same queue pair
d82cac58SJerin Jacobfor enqueuing operations or dequeueing operations on the same ML device
d82cac58SJerin Jacobsince this would require global locks and hinder performance.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobConfiguration of Machine Learning models
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobPre-trained ML models that are built using external ML compiler / training frameworks
d82cac58SJerin Jacobare used to perform inference operations.
d82cac58SJerin JacobThese models are configured on an ML device in a two-stage process
d82cac58SJerin Jacobthat includes loading the model on an ML device,
d82cac58SJerin Jacoband starting the model to accept inference operations.
d82cac58SJerin JacobInference operations can be queued for a model
d82cac58SJerin Jacobonly when the model is in started state.
d82cac58SJerin JacobModel load stage assigns a Model ID,
d82cac58SJerin Jacobwhich is unique for the model in a driver's context.
d82cac58SJerin JacobModel ID is used during all subsequent slow-path and fast-path operations.
d82cac58SJerin Jacob
d82cac58SJerin JacobModel loading and start is done
d82cac58SJerin Jacobthrough the ``rte_ml_model_load()`` and ``rte_ml_model_start()`` functions.
d82cac58SJerin Jacob
d82cac58SJerin JacobSimilarly stop and unloading are done
d82cac58SJerin Jacobthrough ``rte_ml_model_stop()`` and ``rte_ml_model_unload()`` functions.
d82cac58SJerin Jacob
d82cac58SJerin JacobStop and unload functions would release the resources allocated for the models.
d82cac58SJerin JacobInference tasks cannot be queued for a model that is stopped.
d82cac58SJerin Jacob
d82cac58SJerin JacobDetailed information related to the model can be retrieved from the driver
d82cac58SJerin Jacobusing the function ``rte_ml_model_info_get()``.
d82cac58SJerin JacobModel information is accessible to the application
d82cac58SJerin Jacobthrough the ``rte_ml_model_info`` structure.
d82cac58SJerin JacobInformation available to the user would include the details related to
d82cac58SJerin Jacobthe inputs and outputs, and the maximum batch size supported by the model.
d82cac58SJerin Jacob
d82cac58SJerin JacobUser can optionally update the model parameters such as weights and bias,
d82cac58SJerin Jacobwithout unloading the model, through the ``rte_ml_model_params_update()`` function.
d82cac58SJerin JacobA model should be in stopped state to update the parameters.
d82cac58SJerin JacobModel has to be started in order to enqueue inference requests after parameters update.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobEnqueue / Dequeue
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobThe burst enqueue API uses a ML device identifier and a queue pair identifier
d82cac58SJerin Jacobto specify the device queue pair to schedule the processing on.
d82cac58SJerin JacobThe ``nb_ops`` parameter is the number of operations to process
d82cac58SJerin Jacobwhich are supplied in the ``ops`` array of ``rte_ml_op`` structures.
d82cac58SJerin JacobThe enqueue function returns the number of operations it enqueued for processing,
d82cac58SJerin Jacoba return value equal to ``nb_ops`` means that all packets have been enqueued.
d82cac58SJerin Jacob
d82cac58SJerin JacobThe dequeue API uses the same format as the enqueue API of processed
d82cac58SJerin Jacobbut the ``nb_ops`` and ``ops`` parameters are now used to specify
d82cac58SJerin Jacobthe max processed operations the user wishes to retrieve
d82cac58SJerin Jacoband the location in which to store them.
d82cac58SJerin JacobThe API call returns the actual number of processed operations returned;
d82cac58SJerin Jacobthis can never be larger than ``nb_ops``.
d82cac58SJerin Jacob
d82cac58SJerin Jacob``rte_ml_op`` provides the required information to the driver
d82cac58SJerin Jacobto queue an ML inference task.
d82cac58SJerin JacobML op specifies the model to be used and the number of batches
d82cac58SJerin Jacobto be executed in the inference task.
d82cac58SJerin JacobInput and output buffer information is specified through
d82cac58SJerin Jacobthe structure ``rte_ml_buff_seg``, which supports segmented data.
d82cac58SJerin JacobInput is provided through the ``rte_ml_op::input``
d82cac58SJerin Jacoband output through ``rte_ml_op::output``.
d82cac58SJerin JacobData pointed in each op, should not be released until the dequeue of that op.
d82cac58SJerin Jacob
d82cac58SJerin Jacob
d82cac58SJerin JacobQuantize and Dequantize
d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~
d82cac58SJerin Jacob
d82cac58SJerin JacobInference operations performed with lower precision types would improve
d82cac58SJerin Jacobthe throughput and efficiency of the inference execution
d82cac58SJerin Jacobwith a minimal loss of accuracy, which is within the tolerance limits.
d82cac58SJerin JacobQuantization and dequantization is the process of converting data
d82cac58SJerin Jacobfrom a higher precision type to a lower precision type and vice-versa.
d82cac58SJerin JacobML library provides the functions ``rte_ml_io_quantize()`` and ``rte_ml_io_dequantize()``
d82cac58SJerin Jacobto enable data type conversions.
d82cac58SJerin JacobUser needs to provide the address of the quantized and dequantized data buffers
d82cac58SJerin Jacobto the functions, along the number of the batches in the buffers.
d82cac58SJerin Jacob
d82cac58SJerin JacobFor quantization, the dequantized data is assumed to be
d82cac58SJerin Jacobof the type ``dtype`` provided by the ``rte_ml_model_info::input``
d82cac58SJerin Jacoband the data is converted to ``qtype`` provided by the ``rte_ml_model_info::input``.
d82cac58SJerin Jacob
d82cac58SJerin JacobFor dequantization, the quantized data is assumed to be
d82cac58SJerin Jacobof the type ``qtype`` provided by the ``rte_ml_model_info::output``
d82cac58SJerin Jacoband the data is converted to ``dtype`` provided by the ``rte_ml_model_info::output``.
d82cac58SJerin Jacob
d82cac58SJerin JacobSize of the buffers required for the input and output can be calculated
d82cac58SJerin Jacobusing the functions ``rte_ml_io_input_size_get()`` and ``rte_ml_io_output_size_get()``.
d82cac58SJerin JacobThese functions would get the buffer sizes for both quantized and dequantized data
d82cac58SJerin Jacobfor the given number of batches.