1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright (c) 2021 NVIDIA Corporation & Affiliates 3 4CUDA GPU driver 5=============== 6 7The CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs. 8Information and documentation about these devices can be found on the 9`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the 10`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_. 11 12Build dependencies 13------------------ 14 15The CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``. 16To get these headers there are two options: 17 18- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_ 19 (either regular or stubs installation). 20- Download these two headers from this `CUDA headers 21 <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository. 22 23You need to indicate to meson where CUDA headers files are through the CFLAGS variable. 24Three ways: 25 26- Set ``export CFLAGS=-I/usr/local/cuda/include`` before building 27- Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build`` 28- Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include`` 29 30If headers are not found, the CUDA GPU driver library is not built. 31 32CUDA Shared Library 33------------------- 34 35To avoid any system configuration issue, the CUDA API **libcuda.so** shared library 36is not linked at building time because of a Meson bug that looks 37for `cudart` module even if the `meson.build` file only requires default `cuda` module. 38 39**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen`` 40when the very first GPU is detected. 41If CUDA installation resides in a custom directory, 42the environment variable ``CUDA_PATH_L`` should specify where ``dlopen`` 43can look for **libcuda.so**. 44 45All CUDA API symbols are loaded at runtime as well. 46For this reason, to build the CUDA driver library, 47no need to install the CUDA library. 48 49Design 50------ 51 52**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API). 53 54Goal of this driver library is not to provide a wrapper for the whole CUDA Driver API. 55Instead, the scope is to implement the generic features of gpudev API. 56For a CUDA application, integrating the gpudev library functions 57using the CUDA driver library is quite straightforward 58and doesn't create any compatibility problem. 59 60Initialization 61~~~~~~~~~~~~~~ 62 63During initialization, CUDA driver library detects NVIDIA physical GPUs 64on the system or specified via EAL device options (e.g. ``-a b6:00.0``). 65The driver initializes the CUDA driver environment through ``cuInit(0)`` function. 66For this reason, it's required to set any CUDA environment configuration before 67calling ``rte_eal_init`` function in the DPDK application. 68 69If the CUDA driver environment has been already initialized, the ``cuInit(0)`` 70in CUDA driver library has no effect. 71 72CUDA Driver sub-contexts 73~~~~~~~~~~~~~~~~~~~~~~~~ 74 75After initialization, a CUDA application can create multiple sub-contexts 76on GPU physical devices. 77Through gpudev library, is possible to register these sub-contexts 78in the CUDA driver library as child devices having as parent a GPU physical device. 79 80CUDA driver library also supports `MPS 81<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__. 82 83GPU memory management 84~~~~~~~~~~~~~~~~~~~~~ 85 86The CUDA driver library maintains a table of GPU memory addresses allocated 87and CPU memory addresses registered associated to the input CUDA context. 88Whenever the application tried to deallocate or deregister a memory address, 89if the address is not in the table the CUDA driver library will return an error. 90 91Features 92-------- 93 94- Register new child devices aka new CUDA Driver contexts. 95- Allocate memory on the GPU. 96- Register CPU memory to make it visible from GPU. 97 98Minimal requirements 99-------------------- 100 101Minimal requirements to enable the CUDA driver library are: 102 103- NVIDIA GPU Ampere or Volta 104- CUDA 11.4 Driver API or newer 105 106`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_ 107allows compatible network cards (e.g. Mellanox) to directly send and receive packets 108using GPU memory instead of additional memory copies through the CPU system memory. 109To enable this technology, system requirements are: 110 111- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_ 112 module running on the system; 113- Mellanox network card ConnectX-5 or newer (BlueField models included); 114- DPDK mlx5 PMD enabled; 115- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended. 116 117Limitations 118----------- 119 120Supported only on Linux. 121 122Supported GPUs 123-------------- 124 125The following NVIDIA GPU devices are supported by this CUDA driver library: 126 127- NVIDIA A100 80GB PCIe 128- NVIDIA A100 40GB PCIe 129- NVIDIA A30 24GB 130- NVIDIA A10 24GB 131- NVIDIA V100 32GB PCIe 132- NVIDIA V100 16GB PCIe 133 134External references 135------------------- 136 137A good example of how to use the GPU CUDA driver library through the gpudev library 138is the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_. 139 140The application is based on vanilla DPDK example l2fwd 141and is enhanced with GPU memory managed through gpudev library 142and CUDA to launch the swap of packets MAC addresses workload on the GPU. 143 144l2fwd-nv is not intended to be used for performance 145(testpmd is the good candidate for this). 146The goal is to show different use-cases about how a CUDA application can use DPDK to: 147 148- Allocate memory on GPU device using gpudev library. 149- Use that memory to create an external GPU memory mempool. 150- Receive packets directly in GPU memory. 151- Coordinate the workload on the GPU with the network and CPU activity to receive packets. 152- Send modified packets directly from the GPU memory. 153