xref: /dpdk/doc/guides/gpus/cuda.rst (revision 135551ae0bae5a9e575b0a531490c24ef730dac0)
1.. SPDX-License-Identifier: BSD-3-Clause
2   Copyright (c) 2021 NVIDIA Corporation & Affiliates
3
4CUDA GPU driver
5===============
6
7The CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
8Information and documentation about these devices can be found on the
9`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
10`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
11
12Build dependencies
13------------------
14
15The CUDA GPU driver library has a header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
16To get these headers, there are two options:
17
18- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
19  (either regular or stubs installation).
20- Download these two headers from this `CUDA headers
21  <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
22
23You can point to CUDA header files either with the ``CFLAGS`` environment variable,
24or with the ``c_args`` Meson option. Examples:
25
26- ``CFLAGS=-I/usr/local/cuda/include meson setup build``
27- ``meson setup build -Dc_args=-I/usr/local/cuda/include``
28
29If headers are not found, the CUDA GPU driver library is not built.
30
31CPU map GPU memory
32~~~~~~~~~~~~~~~~~~
33
34To enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``),
35you need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver
36installed on your system.
37
38A quick recipe to download, build and run GDRCopy library and driver:
39
40.. code-block:: console
41
42  $ git clone https://github.com/NVIDIA/gdrcopy.git
43  $ make
44  $ # make install to install GDRCopy library system wide
45  $ # Launch gdrdrv kernel module on the system
46  $ sudo ./insmod.sh
47
48You need to indicate to Meson where GDRCopy header files are as in case of CUDA headers.
49An example would be:
50
51.. code-block:: console
52
53  $ meson setup build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include"
54
55If headers are not found, the CUDA GPU driver library is built without the CPU map capability,
56and will return an error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function.
57
58
59CUDA Shared Library
60-------------------
61
62To avoid any system configuration issue, the CUDA API **libcuda.so** shared library
63is not linked at building time because of a Meson bug that looks
64for `cudart` module even if the `meson.build` file only requires default `cuda` module.
65
66**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
67when the very first GPU is detected.
68If CUDA installation resides in a custom directory,
69the environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
70can look for **libcuda.so**.
71
72All CUDA API symbols are loaded at runtime as well.
73For this reason, to build the CUDA driver library,
74no need to install the CUDA library.
75
76CPU map GPU memory
77~~~~~~~~~~~~~~~~~~
78
79Similarly to CUDA shared library, if the **libgdrapi.so** shared library
80is not installed in default locations (e.g. /usr/local/lib),
81you can use the variable ``GDRCOPY_PATH_L``.
82
83As an example, to enable the CPU map feature sanity check,
84run the ``app/test-gpudev`` application with:
85
86.. code-block:: console
87
88  $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev
89
90Additionally, the ``gdrdrv`` kernel module built with the GDRCopy project
91has to be loaded on the system:
92
93.. code-block:: console
94
95  $ lsmod | egrep gdrdrv
96  gdrdrv                 20480  0
97  nvidia              35307520  19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset
98
99
100Design
101------
102
103**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
104
105Goal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
106Instead, the scope is to implement the generic features of gpudev API.
107For a CUDA application, integrating the gpudev library functions
108using the CUDA driver library is quite straightforward
109and doesn't create any compatibility problem.
110
111Initialization
112~~~~~~~~~~~~~~
113
114During initialization, CUDA driver library detects NVIDIA physical GPUs
115on the system or specified via EAL device options (e.g. ``-a b6:00.0``).
116The driver initializes the CUDA driver environment through ``cuInit(0)`` function.
117For this reason, it's required to set any CUDA environment configuration before
118calling ``rte_eal_init`` function in the DPDK application.
119
120If the CUDA driver environment has been already initialized, the ``cuInit(0)``
121in CUDA driver library has no effect.
122
123CUDA Driver sub-contexts
124~~~~~~~~~~~~~~~~~~~~~~~~
125
126After initialization, a CUDA application can create multiple sub-contexts
127on GPU physical devices.
128Through gpudev library, is possible to register these sub-contexts
129in the CUDA driver library as child devices having as parent a GPU physical device.
130
131CUDA driver library also supports `MPS
132<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
133
134GPU memory management
135~~~~~~~~~~~~~~~~~~~~~
136
137The CUDA driver library maintains a table of GPU memory addresses allocated
138and CPU memory addresses registered associated to the input CUDA context.
139Whenever the application tried to deallocate or deregister a memory address,
140if the address is not in the table the CUDA driver library will return an error.
141
142Features
143--------
144
145- Register new child devices, aka CUDA driver contexts.
146- Allocate memory on the GPU.
147- Register CPU memory to make it visible from GPU.
148
149Minimal requirements
150--------------------
151
152Minimal requirements to enable the CUDA driver library are:
153
154- NVIDIA GPU Ampere or Volta
155- CUDA 11.4 Driver API or newer
156
157`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
158allows compatible network cards (e.g. ConnectX) to directly send and receive packets
159using GPU memory instead of additional memory copies through the CPU system memory.
160To enable this technology, system requirements are:
161
162- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
163  module running on the system;
164- NVIDIA network card ConnectX-5 or newer (BlueField models included);
165- DPDK mlx5 PMD enabled;
166- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
167
168Limitations
169-----------
170
171Supported only on Linux.
172
173Supported GPUs
174--------------
175
176The following NVIDIA GPU devices are supported by this CUDA driver library:
177
178- NVIDIA A100 80GB PCIe
179- NVIDIA A100 40GB PCIe
180- NVIDIA A30 24GB
181- NVIDIA A10 24GB
182- NVIDIA V100 32GB PCIe
183- NVIDIA V100 16GB PCIe
184
185External references
186-------------------
187
188A good example of how to use the GPU CUDA driver library through the gpudev library
189is the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
190
191The application is based on the DPDK example l2fwd,
192with GPU memory managed through gpudev library.
193It includes a CUDA workload swapping MAC addresses
194of packets received in the GPU.
195
196l2fwd-nv is not intended to be used for performance
197(testpmd is the good candidate for this).
198The goal is to show different use-cases about how a CUDA application can use DPDK to:
199
200- Allocate memory on GPU device using gpudev library.
201- Use that memory to create an external GPU memory mempool.
202- Receive packets directly in GPU memory.
203- Coordinate the workload on the GPU with the network and CPU activity to receive packets.
204- Send modified packets directly from the GPU memory.
205