xref: /dpdk/doc/guides/mldevs/cnxk.rst (revision 455a771fd6f1a9cb6edc8711ff278ad31709cf7c)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright (c) 2022 Marvell.
3
4Marvell cnxk Machine Learning Poll Mode Driver
5==============================================
6
7The cnxk ML poll mode driver provides support for offloading
8Machine Learning inference operations to Machine Learning accelerator units
9on the **Marvell OCTEON cnxk** SoC family.
10
11The cnxk ML PMD code is organized into multiple files with all file names
12starting with cn10k, providing support for CN106XX and CN106XXS.
13
14More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_
15
16Supported OCTEON cnxk SoCs
17--------------------------
18
19- CN106XX
20- CN106XXS
21
22Features
23--------
24
25The OCTEON cnxk ML PMD provides support for the following set of operations:
26
27Slow-path device and ML model handling:
28
29* Device probing, configuration and close
30* Device start and stop
31* Model loading and unloading
32* Model start and stop
33* Data quantization and dequantization
34
35Fast-path Inference:
36
37* Inference execution
38* Error handling
39
40
41Compilation Prerequisites
42-------------------------
43
44This driver requires external libraries
45to optionally enable support for models compiled using Apache TVM framework.
46The following dependencies are not part of DPDK and must be installed separately:
47
48Jansson
49~~~~~~~
50
51This library enables support to parse and read JSON files.
52
53DLPack
54~~~~~~
55
56This library provides headers for open in-memory tensor structures.
57
58.. note::
59
60   DPDK CNXK ML driver requires DLPack version 0.7
61
62.. code-block:: console
63
64   git clone https://github.com/dmlc/dlpack.git
65   cd dlpack
66   git checkout v0.7 -b v0.7
67   cmake -S ./ -B build \
68      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
69      -DBUILD_MOCK=OFF
70   make -C build
71   make -C build install
72
73When cross-compiling, compiler must be provided to CMake:
74
75.. code-block:: console
76
77   -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
78   -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++
79
80DMLC
81~~~~
82
83  This is a common bricks library for building scalable
84  and portable distributed machine learning.
85
86.. code-block:: console
87
88   git clone https://github.com/dmlc/dmlc-core.git
89   cd dmlc-core
90   git checkout main
91   cmake -S ./ -B build \
92      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
93      -DCMAKE_C_FLAGS="-fpermissive" \
94      -DCMAKE_CXX_FLAGS="-fpermissive" \
95      -DUSE_OPENMP=OFF
96    make -C build
97    make -C build install
98
99When cross-compiling, compiler must be provided to CMake:
100
101.. code-block:: console
102
103   -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
104   -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++
105
106TVM
107~~~
108
109Apache TVM provides a runtime libraries used to execute models
110on CPU cores or hardware accelerators.
111
112.. note::
113
114   DPDK CNXK ML driver requires TVM version 0.10.0
115
116.. code-block:: console
117
118   git clone https://github.com/apache/tvm.git
119   cd tvm
120   git checkout v0.11.0 -b v0.11.0
121   git submodule update --init
122   cmake -S ./ -B build \
123      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
124      -DBUILD_STATIC_RUNTIME=OFF
125   make -C build
126   make -C build install
127
128When cross-compiling, more options must be provided to CMake:
129
130.. code-block:: console
131
132   -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
133   -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
134   -DMACHINE_NAME=aarch64-linux-gnu \
135   -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
136   -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
137
138TVMDP
139~~~~~
140
141  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
142  works as an interface between TVM runtime and DPDK drivers.
143  TVMDP library provides a simplified C interface
144  for TVM's runtime based on C++.
145
146.. note::
147
148   TVMDP library is dependent on TVM, dlpack, jansson and dmlc-core libraries.
149
150.. code-block:: console
151
152   git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
153   cd tvmdp
154   git checkout main
155   cmake -S ./ -B build \
156      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
157      -DBUILD_SHARED_LIBS=ON
158   make -C build
159   make -C build install
160
161When cross-compiling, more options must be provided to CMake:
162
163.. code-block:: console
164
165   -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
166   -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
167   -DCMAKE_FIND_ROOT_PATH=<install_prefix>
168
169libarchive
170~~~~~~~~~~
171
172Apache TVM framework generates compiled models as tar archives.
173This library enables support to decompress and read archive files
174in tar, xz and other formats.
175
176
177Installation
178------------
179
180The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform
181or cross-compiled on an x86 platform.
182
183In order for Meson to find the dependencies above during the configure stage,
184it is required to update environment variables as below:
185
186.. code-block:: console
187
188   CMAKE_PREFIX_PATH='<install_prefix>/lib/cmake/tvm:<install_prefix>/lib/cmake/dlpack:<install_prefix>/lib/cmake/dmlc'
189   PKG_CONFIG_PATH='<install_prefix>/lib/pkgconfig'
190
191Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
192
193
194Initialization
195--------------
196
197List the ML PF devices available on cn10k platform:
198
199.. code-block:: console
200
201   lspci -d:a092
202
203``a092`` is the ML device PF id. You should see output similar to:
204
205.. code-block:: console
206
207   0000:00:10.0 System peripheral: Cavium, Inc. Device a092
208
209Bind the ML PF device to the vfio_pci driver:
210
211.. code-block:: console
212
213   cd <dpdk directory>
214   usertools/dpdk-devbind.py -u 0000:00:10.0
215   usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
216
217
218VDEV support
219------------
220
221On platforms which don't support ML hardware acceleration through PCI device,
222the Marvell ML CNXK PMD can execute inference operations on a vdev
223with the ML models compiled using Apache TVM framework.
224
225VDEV can be enabled by passing the EAL arguments
226
227.. code-block:: console
228
229   --vdev ml_mvtvm
230
231VDEV can also be used on platforms with ML HW accelerator.
232However to use vdev in this case, the PCI device has to be unbound.
233When PCI device is bound, creation of vdev is skipped.
234
235
236Runtime Config Options
237----------------------
238
239**Firmware file path** (default ``/lib/firmware/mlip-fw.bin``)
240
241  Path to the firmware binary to be loaded during device configuration.
242  The parameter ``fw_path`` can be used by the user
243  to load ML firmware from a custom path.
244
245  This option is supported only on PCI HW accelerator.
246
247  For example::
248
249     -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
250
251  With the above configuration, driver loads the firmware from the path
252  ``/home/user/ml_fw.bin``.
253
254
255**Enable DPE warnings** (default ``1``)
256
257  ML firmware can be configured during load to handle the DPE errors reported
258  by ML inference engine.
259  When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
260  The parameter ``enable_dpe_warnings`` is used fo this configuration.
261
262  This option is supported only on PCI HW accelerator.
263
264  For example::
265
266     -a 0000:00:10.0,enable_dpe_warnings=0
267
268  With the above configuration, DPE non-fatal errors reported by HW
269  are considered as errors.
270
271
272**Model data caching** (default ``1``)
273
274  Enable caching model data on ML ACC cores.
275  Enabling this option executes a dummy inference request
276  in synchronous mode during model start stage.
277  Caching of model data improves the inferencing throughput / latency for the model.
278  The parameter ``cache_model_data`` is used to enable data caching.
279
280  This option is supported on PCI HW accelerator and vdev.
281
282  For example::
283
284     -a 0000:00:10.0,cache_model_data=0
285
286  With the above configuration, model data caching is disabled on HW accelerator.
287
288  For example::
289
290     --vdev ml_mvtvm,cache_model_data=0
291
292  With the above configuration, model data caching is disabled on vdev.
293
294
295**OCM allocation mode** (default ``lowest``)
296
297  Option to specify the method to be used while allocating OCM memory
298  for a model during model start.
299  Two modes are supported by the driver.
300  The parameter ``ocm_alloc_mode`` is used to select the OCM allocation mode.
301
302  ``lowest``
303    Allocate OCM for the model from first available free slot.
304    Search for the free slot is done starting from the lowest tile ID and lowest page ID.
305  ``largest``
306    Allocate OCM for the model from the slot with largest amount of free space.
307
308  This option is supported only on PCI HW accelerator.
309
310  For example::
311
312     -a 0000:00:10.0,ocm_alloc_mode=lowest
313
314  With the above configuration, OCM allocation for the model would be done
315  from the first available free slot / from the lowest possible tile ID.
316
317**OCM page size** (default ``16384``)
318
319  Option to specify the page size in bytes to be used for OCM management.
320  Available OCM is split into multiple pages of specified sizes
321  and the pages are allocated to the models.
322  The parameter ``ocm_page_size`` is used to specify the page size to be used.
323
324  Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
325  Default page size is 16 KB.
326
327  This option is supported only on PCI HW accelerator.
328
329  For example::
330
331     -a 0000:00:10.0,ocm_page_size=8192
332
333  With the above configuration, page size of OCM is set to 8192 bytes / 8 KB.
334
335
336**Enable hardware queue lock** (default ``0``)
337
338  Option to select the job request enqueue function to use
339  to queue the requests to hardware queue.
340  The parameter ``hw_queue_lock`` is used to select the enqueue function.
341
342  ``0``
343    Disable (default), use lock-free version of hardware enqueue function
344    for job queuing in enqueue burst operation.
345    To avoid race condition in request queuing to hardware,
346    disabling ``hw_queue_lock`` restricts the number of queue-pairs
347    supported by cnxk driver to 1.
348  ``1``
349    Enable, use spin-lock version of hardware enqueue function for job queuing.
350    Enabling spinlock version would disable restrictions on the number of queue-pairs
351    that can be supported by the driver.
352
353  This option is supported only on PCI HW accelerator.
354
355  For example::
356
357     -a 0000:00:10.0,hw_queue_lock=1
358
359  With the above configuration, spinlock version of hardware enqueue function is used
360  in the fast path enqueue burst operation.
361
362**Maximum queue pairs** (default ``1``)
363
364  VDEV supports additional EAL arguments to configure the maximum number
365  of queue-pairs on the ML device through the option ``max_qps``.
366
367  This option is supported only on vdev.
368
369  For example::
370
371     --vdev ml_mvtvm,max_qps=4
372
373  With the above configuration, 4 queue-pairs are created on the vdev.
374
375
376Debugging Options
377-----------------
378
379.. _table_octeon_cnxk_ml_debug_options:
380
381.. table:: OCTEON cnxk ML PMD debug options
382
383   +---+------------+-------------------------------------------------------+
384   | # | Component  | EAL log command                                       |
385   +===+============+=======================================================+
386   | 1 | ML         | --log-level='pmd\.common\.cnxk\.ml,8'                 |
387   +---+------------+-------------------------------------------------------+
388
389
390Extended stats
391--------------
392
393Marvell cnxk ML PMD supports reporting the device and model extended statistics.
394
395PMD supports the below list of 4 device extended stats.
396
397.. _table_octeon_cnxk_ml_device_xstats_names:
398
399.. table:: OCTEON cnxk ML PMD device xstats names
400
401   +---+---------------------+----------------------------------------------+
402   | # | Type                | Description                                  |
403   +===+=====================+==============================================+
404   | 1 | nb_models_loaded    | Number of models loaded                      |
405   +---+---------------------+----------------------------------------------+
406   | 2 | nb_models_unloaded  | Number of models unloaded                    |
407   +---+---------------------+----------------------------------------------+
408   | 3 | nb_models_started   | Number of models started                     |
409   +---+---------------------+----------------------------------------------+
410   | 4 | nb_models_stopped   | Number of models stopped                     |
411   +---+---------------------+----------------------------------------------+
412
413
414PMD supports the below list of 6 extended stats types per each model.
415
416.. _table_octeon_cnxk_ml_model_xstats_names:
417
418.. table:: OCTEON cnxk ML PMD model xstats names
419
420   +---+---------------------+----------------------------------------------+
421   | # | Type                | Description                                  |
422   +===+=====================+==============================================+
423   | 1 | Avg-HW-Latency      | Average hardware latency                     |
424   +---+---------------------+----------------------------------------------+
425   | 2 | Min-HW-Latency      | Minimum hardware latency                     |
426   +---+---------------------+----------------------------------------------+
427   | 3 | Max-HW-Latency      | Maximum hardware latency                     |
428   +---+---------------------+----------------------------------------------+
429   | 4 | Avg-FW-Latency      | Average firmware latency                     |
430   +---+---------------------+----------------------------------------------+
431   | 5 | Min-FW-Latency      | Minimum firmware latency                     |
432   +---+---------------------+----------------------------------------------+
433   | 6 | Max-FW-Latency      | Maximum firmware latency                     |
434   +---+---------------------+----------------------------------------------+
435
436Latency values reported by the PMD through xstats can have units,
437either in cycles or nano seconds.
438The units of the latency is determined during DPDK initialization
439and would depend on the availability of SCLK.
440Latencies are reported in nano seconds when the SCLK is available and in cycles otherwise.
441Application needs to initialize at least one RVU for the clock to be available.
442
443xstats names are dynamically generated by the PMD and would have the format
444``Model-<model_id>-Type-<units>``.
445
446For example::
447
448   Model-1-Avg-FW-Latency-ns
449
450The above xstat name would report average firmware latency in nano seconds
451for model ID 1.
452
453The number of xstats made available by the PMD change dynamically.
454The number would increase with loading a model and would decrease with unloading a model.
455The application needs to update the xstats map after a model is either loaded or unloaded.
456