xref: /dpdk/doc/guides/prog_guide/mldev.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright (c) 2022 Marvell.
3
4Machine Learning (ML) Device Library
5====================================
6
7The Machine Learning (ML) Device library provides a Machine Learning device framework for the management and
8provisioning of hardware and software ML poll mode drivers,
9defining an API which support a number of ML operations
10including device handling and inference processing.
11The ML model creation and training is outside of the scope of this library.
12
13The ML framework is built on the following model:
14
15.. _figure_mldev_work_flow:
16
17.. figure:: img/mldev_flow.*
18
19   Work flow of inference on MLDEV
20
21ML Device
22   A hardware or software-based implementation of ML device API
23   for running inferences using a pre-trained ML model.
24
25ML Model
26   An ML model is an algorithm trained over a dataset.
27   A model consists of procedure/algorithm and data/pattern
28   required to make predictions on live data.
29   Once the model is created and trained outside of the DPDK scope,
30   the model can be loaded via ``rte_ml_model_load()``
31   and then start it using ``rte_ml_model_start()`` API function.
32   The ``rte_ml_model_params_update()`` can be used to update the model parameters
33   such as weights and bias without unloading the model using ``rte_ml_model_unload()``.
34
35ML Inference
36   ML inference is the process of feeding data to the model
37   via ``rte_ml_enqueue_burst()`` API function
38   and use ``rte_ml_dequeue_burst()`` API function
39   to get the calculated outputs / predictions from the started model.
40
41
42Design Principles
43-----------------
44
45The MLDEV library follows the same basic principles as those used in DPDK's
46Ethernet Device framework and the Crypto framework.
47The MLDEV framework provides a generic Machine Learning device framework
48which supports both physical (hardware) and virtual (software) ML devices
49as well as an ML API to manage and configure ML devices.
50The API also supports performing ML inference operations
51through ML poll mode driver.
52
53
54Device Operations
55-----------------
56
57Device Creation
58~~~~~~~~~~~~~~~
59
60Physical ML devices are discovered during the PCI probe/enumeration,
61through the EAL functions which are executed at DPDK initialization,
62based on their PCI device identifier, each unique PCI BDF (bus/bridge, device, function).
63ML physical devices, like other physical devices in DPDK can be allowed or blocked
64using the EAL command line options.
65
66
67Device Identification
68~~~~~~~~~~~~~~~~~~~~~
69
70Each device, whether virtual or physical is uniquely designated by two identifiers:
71
72- A unique device index used to designate the ML device
73  in all functions exported by the MLDEV API.
74
75- A device name used to designate the ML device in console messages,
76  for administration or debugging purposes.
77
78
79Device Features and Capabilities
80~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
81
82ML devices may support different feature set.
83In order to get the supported PMD feature ``rte_ml_dev_info_get()`` API
84which return the info of the device and its supported features.
85
86
87Device Configuration
88~~~~~~~~~~~~~~~~~~~~
89
90The configuration of each ML device includes the following operations:
91
92- Allocation of resources, including hardware resources if a physical device.
93- Resetting the device into a well-known default state.
94- Initialization of statistics counters.
95
96The ``rte_ml_dev_configure()`` API is used to configure a ML device.
97
98.. code-block:: c
99
100   int rte_ml_dev_configure(int16_t dev_id, const struct rte_ml_dev_config *cfg);
101
102The ``rte_ml_dev_config`` structure is used to pass the configuration parameters
103for the ML device, for example number of queue pairs, maximum number of models,
104maximum size of model and so on.
105
106Configuration of Queue Pairs
107~~~~~~~~~~~~~~~~~~~~~~~~~~~~
108
109Each ML device can be configured with number of queue pairs.
110Each queue pair is configured using ``rte_ml_dev_queue_pair_setup()``
111
112
113Logical Cores, Memory and Queues Pair Relationships
114~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115
116Multiple logical cores should never share the same queue pair
117for enqueuing operations or dequeueing operations on the same ML device
118since this would require global locks and hinder performance.
119
120
121Configuration of Machine Learning models
122~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123
124Pre-trained ML models that are built using external ML compiler / training frameworks
125are used to perform inference operations.
126These models are configured on an ML device in a two-stage process
127that includes loading the model on an ML device,
128and starting the model to accept inference operations.
129Inference operations can be queued for a model
130only when the model is in started state.
131Model load stage assigns a Model ID,
132which is unique for the model in a driver's context.
133Model ID is used during all subsequent slow-path and fast-path operations.
134
135Model loading and start is done
136through the ``rte_ml_model_load()`` and ``rte_ml_model_start()`` functions.
137
138Similarly stop and unloading are done
139through ``rte_ml_model_stop()`` and ``rte_ml_model_unload()`` functions.
140
141Stop and unload functions would release the resources allocated for the models.
142Inference tasks cannot be queued for a model that is stopped.
143
144Detailed information related to the model can be retrieved from the driver
145using the function ``rte_ml_model_info_get()``.
146Model information is accessible to the application
147through the ``rte_ml_model_info`` structure.
148Information available to the user would include the details related to
149the inputs and outputs, and the maximum batch size supported by the model.
150
151User can optionally update the model parameters such as weights and bias,
152without unloading the model, through the ``rte_ml_model_params_update()`` function.
153A model should be in stopped state to update the parameters.
154Model has to be started in order to enqueue inference requests after parameters update.
155
156
157Enqueue / Dequeue
158~~~~~~~~~~~~~~~~~
159
160The burst enqueue API uses a ML device identifier and a queue pair identifier
161to specify the device queue pair to schedule the processing on.
162The ``nb_ops`` parameter is the number of operations to process
163which are supplied in the ``ops`` array of ``rte_ml_op`` structures.
164The enqueue function returns the number of operations it enqueued for processing,
165a return value equal to ``nb_ops`` means that all packets have been enqueued.
166
167The dequeue API uses the same format as the enqueue API of processed
168but the ``nb_ops`` and ``ops`` parameters are now used to specify
169the max processed operations the user wishes to retrieve
170and the location in which to store them.
171The API call returns the actual number of processed operations returned;
172this can never be larger than ``nb_ops``.
173
174``rte_ml_op`` provides the required information to the driver
175to queue an ML inference task.
176ML op specifies the model to be used and the number of batches
177to be executed in the inference task.
178Input and output buffer information is specified through
179the structure ``rte_ml_buff_seg``, which supports segmented data.
180Input is provided through the ``rte_ml_op::input``
181and output through ``rte_ml_op::output``.
182Data pointed in each op, should not be released until the dequeue of that op.
183
184
185Quantize and Dequantize
186~~~~~~~~~~~~~~~~~~~~~~~
187
188Inference operations performed with lower precision types would improve
189the throughput and efficiency of the inference execution
190with a minimal loss of accuracy, which is within the tolerance limits.
191Quantization and dequantization is the process of converting data
192from a higher precision type to a lower precision type and vice-versa.
193ML library provides the functions ``rte_ml_io_quantize()`` and ``rte_ml_io_dequantize()``
194to enable data type conversions.
195User needs to provide the address of the quantized and dequantized data buffers
196to the functions, along the number of the batches in the buffers.
197
198For quantization, the dequantized data is assumed to be
199of the type ``dtype`` provided by the ``rte_ml_model_info::input``
200and the data is converted to ``qtype`` provided by the ``rte_ml_model_info::input``.
201
202For dequantization, the quantized data is assumed to be
203of the type ``qtype`` provided by the ``rte_ml_model_info::output``
204and the data is converted to ``dtype`` provided by the ``rte_ml_model_info::output``.
205
206Size of the buffers required for the input and output can be calculated
207using the functions ``rte_ml_io_input_size_get()`` and ``rte_ml_io_output_size_get()``.
208These functions would get the buffer sizes for both quantized and dequantized data
209for the given number of batches.
210