xref: /dpdk/doc/guides/gpus/cuda.rst (revision 24c77594e08fb73dfeee852ce228bae61d6da2ea)
11306a73bSElena Agostini.. SPDX-License-Identifier: BSD-3-Clause
21306a73bSElena Agostini   Copyright (c) 2021 NVIDIA Corporation & Affiliates
31306a73bSElena Agostini
41306a73bSElena AgostiniCUDA GPU driver
51306a73bSElena Agostini===============
61306a73bSElena Agostini
71306a73bSElena AgostiniThe CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
81306a73bSElena AgostiniInformation and documentation about these devices can be found on the
91306a73bSElena Agostini`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
101306a73bSElena Agostini`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
111306a73bSElena Agostini
121306a73bSElena AgostiniBuild dependencies
131306a73bSElena Agostini------------------
141306a73bSElena Agostini
151306a73bSElena AgostiniThe CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
161306a73bSElena AgostiniTo get these headers there are two options:
171306a73bSElena Agostini
181306a73bSElena Agostini- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
191306a73bSElena Agostini  (either regular or stubs installation).
201306a73bSElena Agostini- Download these two headers from this `CUDA headers
211306a73bSElena Agostini  <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
221306a73bSElena Agostini
231306a73bSElena AgostiniYou need to indicate to meson where CUDA headers files are through the CFLAGS variable.
241306a73bSElena AgostiniThree ways:
251306a73bSElena Agostini
261306a73bSElena Agostini- Set ``export CFLAGS=-I/usr/local/cuda/include`` before building
271306a73bSElena Agostini- Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build``
281306a73bSElena Agostini- Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include``
291306a73bSElena Agostini
301306a73bSElena AgostiniIf headers are not found, the CUDA GPU driver library is not built.
311306a73bSElena Agostini
32*24c77594SElena AgostiniCPU map GPU memory
33*24c77594SElena Agostini~~~~~~~~~~~~~~~~~~
34*24c77594SElena Agostini
35*24c77594SElena AgostiniTo enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``),
36*24c77594SElena Agostiniyou need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver
37*24c77594SElena Agostiniinstalled on your system.
38*24c77594SElena Agostini
39*24c77594SElena AgostiniA quick recipe to download, build and run GDRCopy library and driver:
40*24c77594SElena Agostini
41*24c77594SElena Agostini.. code-block:: console
42*24c77594SElena Agostini
43*24c77594SElena Agostini  $ git clone https://github.com/NVIDIA/gdrcopy.git
44*24c77594SElena Agostini  $ make
45*24c77594SElena Agostini  $ # make install to install GDRCopy library system wide
46*24c77594SElena Agostini  $ # Launch gdrdrv kernel module on the system
47*24c77594SElena Agostini  $ sudo ./insmod.sh
48*24c77594SElena Agostini
49*24c77594SElena AgostiniYou need to indicate to meson where GDRCopy headers files are as in case of CUDA headers.
50*24c77594SElena AgostiniAn example would be:
51*24c77594SElena Agostini
52*24c77594SElena Agostini.. code-block:: console
53*24c77594SElena Agostini
54*24c77594SElena Agostini  $ meson build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include"
55*24c77594SElena Agostini
56*24c77594SElena AgostiniIf headers are not found, the CUDA GPU driver library is built without the CPU map capability
57*24c77594SElena Agostiniand will return error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function.
58*24c77594SElena Agostini
59*24c77594SElena Agostini
601306a73bSElena AgostiniCUDA Shared Library
611306a73bSElena Agostini-------------------
621306a73bSElena Agostini
631306a73bSElena AgostiniTo avoid any system configuration issue, the CUDA API **libcuda.so** shared library
641306a73bSElena Agostiniis not linked at building time because of a Meson bug that looks
651306a73bSElena Agostinifor `cudart` module even if the `meson.build` file only requires default `cuda` module.
661306a73bSElena Agostini
671306a73bSElena Agostini**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
681306a73bSElena Agostiniwhen the very first GPU is detected.
691306a73bSElena AgostiniIf CUDA installation resides in a custom directory,
701306a73bSElena Agostinithe environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
711306a73bSElena Agostinican look for **libcuda.so**.
721306a73bSElena Agostini
731306a73bSElena AgostiniAll CUDA API symbols are loaded at runtime as well.
741306a73bSElena AgostiniFor this reason, to build the CUDA driver library,
751306a73bSElena Agostinino need to install the CUDA library.
761306a73bSElena Agostini
77*24c77594SElena AgostiniCPU map GPU memory
78*24c77594SElena Agostini~~~~~~~~~~~~~~~~~~
79*24c77594SElena Agostini
80*24c77594SElena AgostiniSimilarly to CUDA shared library, if the **libgdrapi.so** shared library
81*24c77594SElena Agostiniis not installed in default locations (e.g. /usr/local/lib),
82*24c77594SElena Agostiniyou can use the variable ``GDRCOPY_PATH_L``.
83*24c77594SElena Agostini
84*24c77594SElena AgostiniAs an example, to enable the CPU map feature sanity check,
85*24c77594SElena Agostinirun the ``app/test-gpudev`` application with:
86*24c77594SElena Agostini
87*24c77594SElena Agostini.. code-block:: console
88*24c77594SElena Agostini
89*24c77594SElena Agostini  $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev
90*24c77594SElena Agostini
91*24c77594SElena AgostiniAdditionally, the ``gdrdrv`` kernel module built with the GDRCopy project
92*24c77594SElena Agostinihas to be loaded on the system:
93*24c77594SElena Agostini
94*24c77594SElena Agostini.. code-block:: console
95*24c77594SElena Agostini
96*24c77594SElena Agostini  $ lsmod | egrep gdrdrv
97*24c77594SElena Agostini  gdrdrv                 20480  0
98*24c77594SElena Agostini  nvidia              35307520  19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset
99*24c77594SElena Agostini
100*24c77594SElena Agostini
1011306a73bSElena AgostiniDesign
1021306a73bSElena Agostini------
1031306a73bSElena Agostini
1041306a73bSElena Agostini**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
1051306a73bSElena Agostini
1061306a73bSElena AgostiniGoal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
1071306a73bSElena AgostiniInstead, the scope is to implement the generic features of gpudev API.
1081306a73bSElena AgostiniFor a CUDA application, integrating the gpudev library functions
1091306a73bSElena Agostiniusing the CUDA driver library is quite straightforward
1101306a73bSElena Agostiniand doesn't create any compatibility problem.
1111306a73bSElena Agostini
1121306a73bSElena AgostiniInitialization
1131306a73bSElena Agostini~~~~~~~~~~~~~~
1141306a73bSElena Agostini
1151306a73bSElena AgostiniDuring initialization, CUDA driver library detects NVIDIA physical GPUs
1161306a73bSElena Agostinion the system or specified via EAL device options (e.g. ``-a b6:00.0``).
1171306a73bSElena AgostiniThe driver initializes the CUDA driver environment through ``cuInit(0)`` function.
1181306a73bSElena AgostiniFor this reason, it's required to set any CUDA environment configuration before
1191306a73bSElena Agostinicalling ``rte_eal_init`` function in the DPDK application.
1201306a73bSElena Agostini
1211306a73bSElena AgostiniIf the CUDA driver environment has been already initialized, the ``cuInit(0)``
1221306a73bSElena Agostiniin CUDA driver library has no effect.
1231306a73bSElena Agostini
1241306a73bSElena AgostiniCUDA Driver sub-contexts
1251306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~~~~
1261306a73bSElena Agostini
1271306a73bSElena AgostiniAfter initialization, a CUDA application can create multiple sub-contexts
1281306a73bSElena Agostinion GPU physical devices.
1291306a73bSElena AgostiniThrough gpudev library, is possible to register these sub-contexts
1301306a73bSElena Agostiniin the CUDA driver library as child devices having as parent a GPU physical device.
1311306a73bSElena Agostini
1321306a73bSElena AgostiniCUDA driver library also supports `MPS
1331306a73bSElena Agostini<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
1341306a73bSElena Agostini
1351306a73bSElena AgostiniGPU memory management
1361306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~
1371306a73bSElena Agostini
1381306a73bSElena AgostiniThe CUDA driver library maintains a table of GPU memory addresses allocated
1391306a73bSElena Agostiniand CPU memory addresses registered associated to the input CUDA context.
1401306a73bSElena AgostiniWhenever the application tried to deallocate or deregister a memory address,
1411306a73bSElena Agostiniif the address is not in the table the CUDA driver library will return an error.
1421306a73bSElena Agostini
1431306a73bSElena AgostiniFeatures
1441306a73bSElena Agostini--------
1451306a73bSElena Agostini
1461306a73bSElena Agostini- Register new child devices aka new CUDA Driver contexts.
1471306a73bSElena Agostini- Allocate memory on the GPU.
1481306a73bSElena Agostini- Register CPU memory to make it visible from GPU.
1491306a73bSElena Agostini
1501306a73bSElena AgostiniMinimal requirements
1511306a73bSElena Agostini--------------------
1521306a73bSElena Agostini
1531306a73bSElena AgostiniMinimal requirements to enable the CUDA driver library are:
1541306a73bSElena Agostini
1551306a73bSElena Agostini- NVIDIA GPU Ampere or Volta
1561306a73bSElena Agostini- CUDA 11.4 Driver API or newer
1571306a73bSElena Agostini
1581306a73bSElena Agostini`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
1591306a73bSElena Agostiniallows compatible network cards (e.g. Mellanox) to directly send and receive packets
1601306a73bSElena Agostiniusing GPU memory instead of additional memory copies through the CPU system memory.
1611306a73bSElena AgostiniTo enable this technology, system requirements are:
1621306a73bSElena Agostini
1631306a73bSElena Agostini- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
1641306a73bSElena Agostini  module running on the system;
1651306a73bSElena Agostini- Mellanox network card ConnectX-5 or newer (BlueField models included);
1661306a73bSElena Agostini- DPDK mlx5 PMD enabled;
1671306a73bSElena Agostini- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
1681306a73bSElena Agostini
1691306a73bSElena AgostiniLimitations
1701306a73bSElena Agostini-----------
1711306a73bSElena Agostini
1721306a73bSElena AgostiniSupported only on Linux.
1731306a73bSElena Agostini
1741306a73bSElena AgostiniSupported GPUs
1751306a73bSElena Agostini--------------
1761306a73bSElena Agostini
1771306a73bSElena AgostiniThe following NVIDIA GPU devices are supported by this CUDA driver library:
1781306a73bSElena Agostini
1791306a73bSElena Agostini- NVIDIA A100 80GB PCIe
1801306a73bSElena Agostini- NVIDIA A100 40GB PCIe
1811306a73bSElena Agostini- NVIDIA A30 24GB
1821306a73bSElena Agostini- NVIDIA A10 24GB
1831306a73bSElena Agostini- NVIDIA V100 32GB PCIe
1841306a73bSElena Agostini- NVIDIA V100 16GB PCIe
1851306a73bSElena Agostini
1861306a73bSElena AgostiniExternal references
1871306a73bSElena Agostini-------------------
1881306a73bSElena Agostini
1891306a73bSElena AgostiniA good example of how to use the GPU CUDA driver library through the gpudev library
1901306a73bSElena Agostiniis the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
1911306a73bSElena Agostini
1921306a73bSElena AgostiniThe application is based on vanilla DPDK example l2fwd
1931306a73bSElena Agostiniand is enhanced with GPU memory managed through gpudev library
1941306a73bSElena Agostiniand CUDA to launch the swap of packets MAC addresses workload on the GPU.
1951306a73bSElena Agostini
1961306a73bSElena Agostinil2fwd-nv is not intended to be used for performance
1971306a73bSElena Agostini(testpmd is the good candidate for this).
1981306a73bSElena AgostiniThe goal is to show different use-cases about how a CUDA application can use DPDK to:
1991306a73bSElena Agostini
2001306a73bSElena Agostini- Allocate memory on GPU device using gpudev library.
2011306a73bSElena Agostini- Use that memory to create an external GPU memory mempool.
2021306a73bSElena Agostini- Receive packets directly in GPU memory.
2031306a73bSElena Agostini- Coordinate the workload on the GPU with the network and CPU activity to receive packets.
2041306a73bSElena Agostini- Send modified packets directly from the GPU memory.
205