xref: /dpdk/doc/guides/gpus/cuda.rst (revision 24c77594e08fb73dfeee852ce228bae61d6da2ea)
1.. SPDX-License-Identifier: BSD-3-Clause
2   Copyright (c) 2021 NVIDIA Corporation & Affiliates
3
4CUDA GPU driver
5===============
6
7The CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
8Information and documentation about these devices can be found on the
9`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
10`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
11
12Build dependencies
13------------------
14
15The CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
16To get these headers there are two options:
17
18- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
19  (either regular or stubs installation).
20- Download these two headers from this `CUDA headers
21  <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
22
23You need to indicate to meson where CUDA headers files are through the CFLAGS variable.
24Three ways:
25
26- Set ``export CFLAGS=-I/usr/local/cuda/include`` before building
27- Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build``
28- Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include``
29
30If headers are not found, the CUDA GPU driver library is not built.
31
32CPU map GPU memory
33~~~~~~~~~~~~~~~~~~
34
35To enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``),
36you need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver
37installed on your system.
38
39A quick recipe to download, build and run GDRCopy library and driver:
40
41.. code-block:: console
42
43  $ git clone https://github.com/NVIDIA/gdrcopy.git
44  $ make
45  $ # make install to install GDRCopy library system wide
46  $ # Launch gdrdrv kernel module on the system
47  $ sudo ./insmod.sh
48
49You need to indicate to meson where GDRCopy headers files are as in case of CUDA headers.
50An example would be:
51
52.. code-block:: console
53
54  $ meson build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include"
55
56If headers are not found, the CUDA GPU driver library is built without the CPU map capability
57and will return error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function.
58
59
60CUDA Shared Library
61-------------------
62
63To avoid any system configuration issue, the CUDA API **libcuda.so** shared library
64is not linked at building time because of a Meson bug that looks
65for `cudart` module even if the `meson.build` file only requires default `cuda` module.
66
67**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
68when the very first GPU is detected.
69If CUDA installation resides in a custom directory,
70the environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
71can look for **libcuda.so**.
72
73All CUDA API symbols are loaded at runtime as well.
74For this reason, to build the CUDA driver library,
75no need to install the CUDA library.
76
77CPU map GPU memory
78~~~~~~~~~~~~~~~~~~
79
80Similarly to CUDA shared library, if the **libgdrapi.so** shared library
81is not installed in default locations (e.g. /usr/local/lib),
82you can use the variable ``GDRCOPY_PATH_L``.
83
84As an example, to enable the CPU map feature sanity check,
85run the ``app/test-gpudev`` application with:
86
87.. code-block:: console
88
89  $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev
90
91Additionally, the ``gdrdrv`` kernel module built with the GDRCopy project
92has to be loaded on the system:
93
94.. code-block:: console
95
96  $ lsmod | egrep gdrdrv
97  gdrdrv                 20480  0
98  nvidia              35307520  19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset
99
100
101Design
102------
103
104**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
105
106Goal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
107Instead, the scope is to implement the generic features of gpudev API.
108For a CUDA application, integrating the gpudev library functions
109using the CUDA driver library is quite straightforward
110and doesn't create any compatibility problem.
111
112Initialization
113~~~~~~~~~~~~~~
114
115During initialization, CUDA driver library detects NVIDIA physical GPUs
116on the system or specified via EAL device options (e.g. ``-a b6:00.0``).
117The driver initializes the CUDA driver environment through ``cuInit(0)`` function.
118For this reason, it's required to set any CUDA environment configuration before
119calling ``rte_eal_init`` function in the DPDK application.
120
121If the CUDA driver environment has been already initialized, the ``cuInit(0)``
122in CUDA driver library has no effect.
123
124CUDA Driver sub-contexts
125~~~~~~~~~~~~~~~~~~~~~~~~
126
127After initialization, a CUDA application can create multiple sub-contexts
128on GPU physical devices.
129Through gpudev library, is possible to register these sub-contexts
130in the CUDA driver library as child devices having as parent a GPU physical device.
131
132CUDA driver library also supports `MPS
133<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
134
135GPU memory management
136~~~~~~~~~~~~~~~~~~~~~
137
138The CUDA driver library maintains a table of GPU memory addresses allocated
139and CPU memory addresses registered associated to the input CUDA context.
140Whenever the application tried to deallocate or deregister a memory address,
141if the address is not in the table the CUDA driver library will return an error.
142
143Features
144--------
145
146- Register new child devices aka new CUDA Driver contexts.
147- Allocate memory on the GPU.
148- Register CPU memory to make it visible from GPU.
149
150Minimal requirements
151--------------------
152
153Minimal requirements to enable the CUDA driver library are:
154
155- NVIDIA GPU Ampere or Volta
156- CUDA 11.4 Driver API or newer
157
158`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
159allows compatible network cards (e.g. Mellanox) to directly send and receive packets
160using GPU memory instead of additional memory copies through the CPU system memory.
161To enable this technology, system requirements are:
162
163- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
164  module running on the system;
165- Mellanox network card ConnectX-5 or newer (BlueField models included);
166- DPDK mlx5 PMD enabled;
167- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
168
169Limitations
170-----------
171
172Supported only on Linux.
173
174Supported GPUs
175--------------
176
177The following NVIDIA GPU devices are supported by this CUDA driver library:
178
179- NVIDIA A100 80GB PCIe
180- NVIDIA A100 40GB PCIe
181- NVIDIA A30 24GB
182- NVIDIA A10 24GB
183- NVIDIA V100 32GB PCIe
184- NVIDIA V100 16GB PCIe
185
186External references
187-------------------
188
189A good example of how to use the GPU CUDA driver library through the gpudev library
190is the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
191
192The application is based on vanilla DPDK example l2fwd
193and is enhanced with GPU memory managed through gpudev library
194and CUDA to launch the swap of packets MAC addresses workload on the GPU.
195
196l2fwd-nv is not intended to be used for performance
197(testpmd is the good candidate for this).
198The goal is to show different use-cases about how a CUDA application can use DPDK to:
199
200- Allocate memory on GPU device using gpudev library.
201- Use that memory to create an external GPU memory mempool.
202- Receive packets directly in GPU memory.
203- Coordinate the workload on the GPU with the network and CPU activity to receive packets.
204- Send modified packets directly from the GPU memory.
205