xref: /dpdk/doc/guides/prog_guide/mldev.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
1d82cac58SJerin Jacob..  SPDX-License-Identifier: BSD-3-Clause
2d82cac58SJerin Jacob    Copyright (c) 2022 Marvell.
3d82cac58SJerin Jacob
4*41dd9a6bSDavid YoungMachine Learning (ML) Device Library
5*41dd9a6bSDavid Young====================================
6d82cac58SJerin Jacob
7*41dd9a6bSDavid YoungThe Machine Learning (ML) Device library provides a Machine Learning device framework for the management and
8d82cac58SJerin Jacobprovisioning of hardware and software ML poll mode drivers,
9d82cac58SJerin Jacobdefining an API which support a number of ML operations
10d82cac58SJerin Jacobincluding device handling and inference processing.
11d82cac58SJerin JacobThe ML model creation and training is outside of the scope of this library.
12d82cac58SJerin Jacob
13d82cac58SJerin JacobThe ML framework is built on the following model:
14d82cac58SJerin Jacob
15d82cac58SJerin Jacob.. _figure_mldev_work_flow:
16d82cac58SJerin Jacob
17d82cac58SJerin Jacob.. figure:: img/mldev_flow.*
18d82cac58SJerin Jacob
19d82cac58SJerin Jacob   Work flow of inference on MLDEV
20d82cac58SJerin Jacob
21d82cac58SJerin JacobML Device
22d82cac58SJerin Jacob   A hardware or software-based implementation of ML device API
23d82cac58SJerin Jacob   for running inferences using a pre-trained ML model.
24d82cac58SJerin Jacob
25d82cac58SJerin JacobML Model
26d82cac58SJerin Jacob   An ML model is an algorithm trained over a dataset.
27d82cac58SJerin Jacob   A model consists of procedure/algorithm and data/pattern
28d82cac58SJerin Jacob   required to make predictions on live data.
29d82cac58SJerin Jacob   Once the model is created and trained outside of the DPDK scope,
30d82cac58SJerin Jacob   the model can be loaded via ``rte_ml_model_load()``
31d82cac58SJerin Jacob   and then start it using ``rte_ml_model_start()`` API function.
32d82cac58SJerin Jacob   The ``rte_ml_model_params_update()`` can be used to update the model parameters
33d82cac58SJerin Jacob   such as weights and bias without unloading the model using ``rte_ml_model_unload()``.
34d82cac58SJerin Jacob
35d82cac58SJerin JacobML Inference
36d82cac58SJerin Jacob   ML inference is the process of feeding data to the model
37d82cac58SJerin Jacob   via ``rte_ml_enqueue_burst()`` API function
38d82cac58SJerin Jacob   and use ``rte_ml_dequeue_burst()`` API function
39d82cac58SJerin Jacob   to get the calculated outputs / predictions from the started model.
40d82cac58SJerin Jacob
41d82cac58SJerin Jacob
42d82cac58SJerin JacobDesign Principles
43d82cac58SJerin Jacob-----------------
44d82cac58SJerin Jacob
45d82cac58SJerin JacobThe MLDEV library follows the same basic principles as those used in DPDK's
46d82cac58SJerin JacobEthernet Device framework and the Crypto framework.
47d82cac58SJerin JacobThe MLDEV framework provides a generic Machine Learning device framework
48d82cac58SJerin Jacobwhich supports both physical (hardware) and virtual (software) ML devices
49d82cac58SJerin Jacobas well as an ML API to manage and configure ML devices.
50d82cac58SJerin JacobThe API also supports performing ML inference operations
51d82cac58SJerin Jacobthrough ML poll mode driver.
52d82cac58SJerin Jacob
53d82cac58SJerin Jacob
54d82cac58SJerin JacobDevice Operations
55d82cac58SJerin Jacob-----------------
56d82cac58SJerin Jacob
57d82cac58SJerin JacobDevice Creation
58d82cac58SJerin Jacob~~~~~~~~~~~~~~~
59d82cac58SJerin Jacob
60d82cac58SJerin JacobPhysical ML devices are discovered during the PCI probe/enumeration,
61d82cac58SJerin Jacobthrough the EAL functions which are executed at DPDK initialization,
62d82cac58SJerin Jacobbased on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function).
63d82cac58SJerin JacobML physical devices, like other physical devices in DPDK can be allowed or blocked
64d82cac58SJerin Jacobusing the EAL command line options.
65d82cac58SJerin Jacob
66d82cac58SJerin Jacob
67d82cac58SJerin JacobDevice Identification
68d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~
69d82cac58SJerin Jacob
70d82cac58SJerin JacobEach device, whether virtual or physical is uniquely designated by two identifiers:
71d82cac58SJerin Jacob
72d82cac58SJerin Jacob- A unique device index used to designate the ML device
73d82cac58SJerin Jacob  in all functions exported by the MLDEV API.
74d82cac58SJerin Jacob
75d82cac58SJerin Jacob- A device name used to designate the ML device in console messages,
76d82cac58SJerin Jacob  for administration or debugging purposes.
77d82cac58SJerin Jacob
78d82cac58SJerin Jacob
79d82cac58SJerin JacobDevice Features and Capabilities
80d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
81d82cac58SJerin Jacob
82d82cac58SJerin JacobML devices may support different feature set.
83d82cac58SJerin JacobIn order to get the supported PMD feature ``rte_ml_dev_info_get()`` API
84d82cac58SJerin Jacobwhich return the info of the device and its supported features.
85d82cac58SJerin Jacob
86d82cac58SJerin Jacob
87d82cac58SJerin JacobDevice Configuration
88d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~
89d82cac58SJerin Jacob
90d82cac58SJerin JacobThe configuration of each ML device includes the following operations:
91d82cac58SJerin Jacob
92d82cac58SJerin Jacob- Allocation of resources, including hardware resources if a physical device.
93d82cac58SJerin Jacob- Resetting the device into a well-known default state.
94d82cac58SJerin Jacob- Initialization of statistics counters.
95d82cac58SJerin Jacob
96d82cac58SJerin JacobThe ``rte_ml_dev_configure()`` API is used to configure a ML device.
97d82cac58SJerin Jacob
98d82cac58SJerin Jacob.. code-block:: c
99d82cac58SJerin Jacob
100d82cac58SJerin Jacob   int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg);
101d82cac58SJerin Jacob
102d82cac58SJerin JacobThe ``rte_ml_dev_config`` structure is used to pass the configuration parameters
103d82cac58SJerin Jacobfor the ML device, for example number of queue pairs, maximum number of models,
104d82cac58SJerin Jacobmaximum size of model and so on.
105d82cac58SJerin Jacob
106d82cac58SJerin JacobConfiguration of Queue Pairs
107d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~
108d82cac58SJerin Jacob
109d82cac58SJerin JacobEach ML device can be configured with number of queue pairs.
110d82cac58SJerin JacobEach queue pair is configured using ``rte_ml_dev_queue_pair_setup()``
111d82cac58SJerin Jacob
112d82cac58SJerin Jacob
113d82cac58SJerin JacobLogical Cores, Memory and Queues Pair Relationships
114d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115d82cac58SJerin Jacob
116d82cac58SJerin JacobMultiple logical cores should never share the same queue pair
117d82cac58SJerin Jacobfor enqueuing operations or dequeueing operations on the same ML device
118d82cac58SJerin Jacobsince this would require global locks and hinder performance.
119d82cac58SJerin Jacob
120d82cac58SJerin Jacob
121d82cac58SJerin JacobConfiguration of Machine Learning models
122d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123d82cac58SJerin Jacob
124d82cac58SJerin JacobPre-trained ML models that are built using external ML compiler / training frameworks
125d82cac58SJerin Jacobare used to perform inference operations.
126d82cac58SJerin JacobThese models are configured on an ML device in a two-stage process
127d82cac58SJerin Jacobthat includes loading the model on an ML device,
128d82cac58SJerin Jacoband starting the model to accept inference operations.
129d82cac58SJerin JacobInference operations can be queued for a model
130d82cac58SJerin Jacobonly when the model is in started state.
131d82cac58SJerin JacobModel load stage assigns a Model ID,
132d82cac58SJerin Jacobwhich is unique for the model in a driver's context.
133d82cac58SJerin JacobModel ID is used during all subsequent slow-path and fast-path operations.
134d82cac58SJerin Jacob
135d82cac58SJerin JacobModel loading and start is done
136d82cac58SJerin Jacobthrough the ``rte_ml_model_load()`` and ``rte_ml_model_start()`` functions.
137d82cac58SJerin Jacob
138d82cac58SJerin JacobSimilarly stop and unloading are done
139d82cac58SJerin Jacobthrough ``rte_ml_model_stop()`` and ``rte_ml_model_unload()`` functions.
140d82cac58SJerin Jacob
141d82cac58SJerin JacobStop and unload functions would release the resources allocated for the models.
142d82cac58SJerin JacobInference tasks cannot be queued for a model that is stopped.
143d82cac58SJerin Jacob
144d82cac58SJerin JacobDetailed information related to the model can be retrieved from the driver
145d82cac58SJerin Jacobusing the function ``rte_ml_model_info_get()``.
146d82cac58SJerin JacobModel information is accessible to the application
147d82cac58SJerin Jacobthrough the ``rte_ml_model_info`` structure.
148d82cac58SJerin JacobInformation available to the user would include the details related to
149d82cac58SJerin Jacobthe inputs and outputs, and the maximum batch size supported by the model.
150d82cac58SJerin Jacob
151d82cac58SJerin JacobUser can optionally update the model parameters such as weights and bias,
152d82cac58SJerin Jacobwithout unloading the model, through the ``rte_ml_model_params_update()`` function.
153d82cac58SJerin JacobA model should be in stopped state to update the parameters.
154d82cac58SJerin JacobModel has to be started in order to enqueue inference requests after parameters update.
155d82cac58SJerin Jacob
156d82cac58SJerin Jacob
157d82cac58SJerin JacobEnqueue / Dequeue
158d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~
159d82cac58SJerin Jacob
160d82cac58SJerin JacobThe burst enqueue API uses a ML device identifier and a queue pair identifier
161d82cac58SJerin Jacobto specify the device queue pair to schedule the processing on.
162d82cac58SJerin JacobThe ``nb_ops`` parameter is the number of operations to process
163d82cac58SJerin Jacobwhich are supplied in the ``ops`` array of ``rte_ml_op`` structures.
164d82cac58SJerin JacobThe enqueue function returns the number of operations it enqueued for processing,
165d82cac58SJerin Jacoba return value equal to ``nb_ops`` means that all packets have been enqueued.
166d82cac58SJerin Jacob
167d82cac58SJerin JacobThe dequeue API uses the same format as the enqueue API of processed
168d82cac58SJerin Jacobbut the ``nb_ops`` and ``ops`` parameters are now used to specify
169d82cac58SJerin Jacobthe max processed operations the user wishes to retrieve
170d82cac58SJerin Jacoband the location in which to store them.
171d82cac58SJerin JacobThe API call returns the actual number of processed operations returned;
172d82cac58SJerin Jacobthis can never be larger than ``nb_ops``.
173d82cac58SJerin Jacob
174d82cac58SJerin Jacob``rte_ml_op`` provides the required information to the driver
175d82cac58SJerin Jacobto queue an ML inference task.
176d82cac58SJerin JacobML op specifies the model to be used and the number of batches
177d82cac58SJerin Jacobto be executed in the inference task.
178d82cac58SJerin JacobInput and output buffer information is specified through
179d82cac58SJerin Jacobthe structure ``rte_ml_buff_seg``, which supports segmented data.
180d82cac58SJerin JacobInput is provided through the ``rte_ml_op::input``
181d82cac58SJerin Jacoband output through ``rte_ml_op::output``.
182d82cac58SJerin JacobData pointed in each op, should not be released until the dequeue of that op.
183d82cac58SJerin Jacob
184d82cac58SJerin Jacob
185d82cac58SJerin JacobQuantize and Dequantize
186d82cac58SJerin Jacob~~~~~~~~~~~~~~~~~~~~~~~
187d82cac58SJerin Jacob
188d82cac58SJerin JacobInference operations performed with lower precision types would improve
189d82cac58SJerin Jacobthe throughput and efficiency of the inference execution
190d82cac58SJerin Jacobwith a minimal loss of accuracy, which is within the tolerance limits.
191d82cac58SJerin JacobQuantization and dequantization is the process of converting data
192d82cac58SJerin Jacobfrom a higher precision type to a lower precision type and vice-versa.
193d82cac58SJerin JacobML library provides the functions ``rte_ml_io_quantize()`` and ``rte_ml_io_dequantize()``
194d82cac58SJerin Jacobto enable data type conversions.
195d82cac58SJerin JacobUser needs to provide the address of the quantized and dequantized data buffers
196d82cac58SJerin Jacobto the functions, along the number of the batches in the buffers.
197d82cac58SJerin Jacob
198d82cac58SJerin JacobFor quantization, the dequantized data is assumed to be
199d82cac58SJerin Jacobof the type ``dtype`` provided by the ``rte_ml_model_info::input``
200d82cac58SJerin Jacoband the data is converted to ``qtype`` provided by the ``rte_ml_model_info::input``.
201d82cac58SJerin Jacob
202d82cac58SJerin JacobFor dequantization, the quantized data is assumed to be
203d82cac58SJerin Jacobof the type ``qtype`` provided by the ``rte_ml_model_info::output``
204d82cac58SJerin Jacoband the data is converted to ``dtype`` provided by the ``rte_ml_model_info::output``.
205d82cac58SJerin Jacob
206d82cac58SJerin JacobSize of the buffers required for the input and output can be calculated
207d82cac58SJerin Jacobusing the functions ``rte_ml_io_input_size_get()`` and ``rte_ml_io_output_size_get()``.
208d82cac58SJerin JacobThese functions would get the buffer sizes for both quantized and dequantized data
209d82cac58SJerin Jacobfor the given number of batches.
210