1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright (c) 2021 NVIDIA Corporation & Affiliates 3 4CUDA GPU driver 5=============== 6 7The CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs. 8Information and documentation about these devices can be found on the 9`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the 10`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_. 11 12Build dependencies 13------------------ 14 15The CUDA GPU driver library has a header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``. 16To get these headers, there are two options: 17 18- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_ 19 (either regular or stubs installation). 20- Download these two headers from this `CUDA headers 21 <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository. 22 23You can point to CUDA header files either with the ``CFLAGS`` environment variable, 24or with the ``c_args`` Meson option. Examples: 25 26- ``CFLAGS=-I/usr/local/cuda/include meson setup build`` 27- ``meson setup build -Dc_args=-I/usr/local/cuda/include`` 28 29If headers are not found, the CUDA GPU driver library is not built. 30 31CPU map GPU memory 32~~~~~~~~~~~~~~~~~~ 33 34To enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``), 35you need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver 36installed on your system. 37 38A quick recipe to download, build and run GDRCopy library and driver: 39 40.. code-block:: console 41 42 $ git clone https://github.com/NVIDIA/gdrcopy.git 43 $ make 44 $ # make install to install GDRCopy library system wide 45 $ # Launch gdrdrv kernel module on the system 46 $ sudo ./insmod.sh 47 48You need to indicate to Meson where GDRCopy header files are as in case of CUDA headers. 49An example would be: 50 51.. code-block:: console 52 53 $ meson setup build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include" 54 55If headers are not found, the CUDA GPU driver library is built without the CPU map capability, 56and will return an error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function. 57 58 59CUDA Shared Library 60------------------- 61 62To avoid any system configuration issue, the CUDA API **libcuda.so** shared library 63is not linked at building time because of a Meson bug that looks 64for `cudart` module even if the `meson.build` file only requires default `cuda` module. 65 66**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen`` 67when the very first GPU is detected. 68If CUDA installation resides in a custom directory, 69the environment variable ``CUDA_PATH_L`` should specify where ``dlopen`` 70can look for **libcuda.so**. 71 72All CUDA API symbols are loaded at runtime as well. 73For this reason, to build the CUDA driver library, 74no need to install the CUDA library. 75 76CPU map GPU memory 77~~~~~~~~~~~~~~~~~~ 78 79Similarly to CUDA shared library, if the **libgdrapi.so** shared library 80is not installed in default locations (e.g. /usr/local/lib), 81you can use the variable ``GDRCOPY_PATH_L``. 82 83As an example, to enable the CPU map feature sanity check, 84run the ``app/test-gpudev`` application with: 85 86.. code-block:: console 87 88 $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev 89 90Additionally, the ``gdrdrv`` kernel module built with the GDRCopy project 91has to be loaded on the system: 92 93.. code-block:: console 94 95 $ lsmod | egrep gdrdrv 96 gdrdrv 20480 0 97 nvidia 35307520 19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset 98 99 100Design 101------ 102 103**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API). 104 105Goal of this driver library is not to provide a wrapper for the whole CUDA Driver API. 106Instead, the scope is to implement the generic features of gpudev API. 107For a CUDA application, integrating the gpudev library functions 108using the CUDA driver library is quite straightforward 109and doesn't create any compatibility problem. 110 111Initialization 112~~~~~~~~~~~~~~ 113 114During initialization, CUDA driver library detects NVIDIA physical GPUs 115on the system or specified via EAL device options (e.g. ``-a b6:00.0``). 116The driver initializes the CUDA driver environment through ``cuInit(0)`` function. 117For this reason, it's required to set any CUDA environment configuration before 118calling ``rte_eal_init`` function in the DPDK application. 119 120If the CUDA driver environment has been already initialized, the ``cuInit(0)`` 121in CUDA driver library has no effect. 122 123CUDA Driver sub-contexts 124~~~~~~~~~~~~~~~~~~~~~~~~ 125 126After initialization, a CUDA application can create multiple sub-contexts 127on GPU physical devices. 128Through gpudev library, is possible to register these sub-contexts 129in the CUDA driver library as child devices having as parent a GPU physical device. 130 131CUDA driver library also supports `MPS 132<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__. 133 134GPU memory management 135~~~~~~~~~~~~~~~~~~~~~ 136 137The CUDA driver library maintains a table of GPU memory addresses allocated 138and CPU memory addresses registered associated to the input CUDA context. 139Whenever the application tried to deallocate or deregister a memory address, 140if the address is not in the table the CUDA driver library will return an error. 141 142Features 143-------- 144 145- Register new child devices, aka CUDA driver contexts. 146- Allocate memory on the GPU. 147- Register CPU memory to make it visible from GPU. 148 149Minimal requirements 150-------------------- 151 152Minimal requirements to enable the CUDA driver library are: 153 154- NVIDIA GPU Ampere or Volta 155- CUDA 11.4 Driver API or newer 156 157`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_ 158allows compatible network cards (e.g. ConnectX) to directly send and receive packets 159using GPU memory instead of additional memory copies through the CPU system memory. 160To enable this technology, system requirements are: 161 162- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_ 163 module running on the system; 164- NVIDIA network card ConnectX-5 or newer (BlueField models included); 165- DPDK mlx5 PMD enabled; 166- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended. 167 168Limitations 169----------- 170 171Supported only on Linux. 172 173Supported GPUs 174-------------- 175 176The following NVIDIA GPU devices are supported by this CUDA driver library: 177 178- NVIDIA A100 80GB PCIe 179- NVIDIA A100 40GB PCIe 180- NVIDIA A30 24GB 181- NVIDIA A10 24GB 182- NVIDIA V100 32GB PCIe 183- NVIDIA V100 16GB PCIe 184 185External references 186------------------- 187 188A good example of how to use the GPU CUDA driver library through the gpudev library 189is the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_. 190 191The application is based on the DPDK example l2fwd, 192with GPU memory managed through gpudev library. 193It includes a CUDA workload swapping MAC addresses 194of packets received in the GPU. 195 196l2fwd-nv is not intended to be used for performance 197(testpmd is the good candidate for this). 198The goal is to show different use-cases about how a CUDA application can use DPDK to: 199 200- Allocate memory on GPU device using gpudev library. 201- Use that memory to create an external GPU memory mempool. 202- Receive packets directly in GPU memory. 203- Coordinate the workload on the GPU with the network and CPU activity to receive packets. 204- Send modified packets directly from the GPU memory. 205