11306a73bSElena Agostini.. SPDX-License-Identifier: BSD-3-Clause 21306a73bSElena Agostini Copyright (c) 2021 NVIDIA Corporation & Affiliates 31306a73bSElena Agostini 41306a73bSElena AgostiniCUDA GPU driver 51306a73bSElena Agostini=============== 61306a73bSElena Agostini 71306a73bSElena AgostiniThe CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs. 81306a73bSElena AgostiniInformation and documentation about these devices can be found on the 91306a73bSElena Agostini`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the 101306a73bSElena Agostini`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_. 111306a73bSElena Agostini 121306a73bSElena AgostiniBuild dependencies 131306a73bSElena Agostini------------------ 141306a73bSElena Agostini 151306a73bSElena AgostiniThe CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``. 161306a73bSElena AgostiniTo get these headers there are two options: 171306a73bSElena Agostini 181306a73bSElena Agostini- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_ 191306a73bSElena Agostini (either regular or stubs installation). 201306a73bSElena Agostini- Download these two headers from this `CUDA headers 211306a73bSElena Agostini <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository. 221306a73bSElena Agostini 231306a73bSElena AgostiniYou need to indicate to meson where CUDA headers files are through the CFLAGS variable. 241306a73bSElena AgostiniThree ways: 251306a73bSElena Agostini 261306a73bSElena Agostini- Set ``export CFLAGS=-I/usr/local/cuda/include`` before building 271306a73bSElena Agostini- Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build`` 281306a73bSElena Agostini- Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include`` 291306a73bSElena Agostini 301306a73bSElena AgostiniIf headers are not found, the CUDA GPU driver library is not built. 311306a73bSElena Agostini 32*24c77594SElena AgostiniCPU map GPU memory 33*24c77594SElena Agostini~~~~~~~~~~~~~~~~~~ 34*24c77594SElena Agostini 35*24c77594SElena AgostiniTo enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``), 36*24c77594SElena Agostiniyou need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver 37*24c77594SElena Agostiniinstalled on your system. 38*24c77594SElena Agostini 39*24c77594SElena AgostiniA quick recipe to download, build and run GDRCopy library and driver: 40*24c77594SElena Agostini 41*24c77594SElena Agostini.. code-block:: console 42*24c77594SElena Agostini 43*24c77594SElena Agostini $ git clone https://github.com/NVIDIA/gdrcopy.git 44*24c77594SElena Agostini $ make 45*24c77594SElena Agostini $ # make install to install GDRCopy library system wide 46*24c77594SElena Agostini $ # Launch gdrdrv kernel module on the system 47*24c77594SElena Agostini $ sudo ./insmod.sh 48*24c77594SElena Agostini 49*24c77594SElena AgostiniYou need to indicate to meson where GDRCopy headers files are as in case of CUDA headers. 50*24c77594SElena AgostiniAn example would be: 51*24c77594SElena Agostini 52*24c77594SElena Agostini.. code-block:: console 53*24c77594SElena Agostini 54*24c77594SElena Agostini $ meson build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include" 55*24c77594SElena Agostini 56*24c77594SElena AgostiniIf headers are not found, the CUDA GPU driver library is built without the CPU map capability 57*24c77594SElena Agostiniand will return error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function. 58*24c77594SElena Agostini 59*24c77594SElena Agostini 601306a73bSElena AgostiniCUDA Shared Library 611306a73bSElena Agostini------------------- 621306a73bSElena Agostini 631306a73bSElena AgostiniTo avoid any system configuration issue, the CUDA API **libcuda.so** shared library 641306a73bSElena Agostiniis not linked at building time because of a Meson bug that looks 651306a73bSElena Agostinifor `cudart` module even if the `meson.build` file only requires default `cuda` module. 661306a73bSElena Agostini 671306a73bSElena Agostini**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen`` 681306a73bSElena Agostiniwhen the very first GPU is detected. 691306a73bSElena AgostiniIf CUDA installation resides in a custom directory, 701306a73bSElena Agostinithe environment variable ``CUDA_PATH_L`` should specify where ``dlopen`` 711306a73bSElena Agostinican look for **libcuda.so**. 721306a73bSElena Agostini 731306a73bSElena AgostiniAll CUDA API symbols are loaded at runtime as well. 741306a73bSElena AgostiniFor this reason, to build the CUDA driver library, 751306a73bSElena Agostinino need to install the CUDA library. 761306a73bSElena Agostini 77*24c77594SElena AgostiniCPU map GPU memory 78*24c77594SElena Agostini~~~~~~~~~~~~~~~~~~ 79*24c77594SElena Agostini 80*24c77594SElena AgostiniSimilarly to CUDA shared library, if the **libgdrapi.so** shared library 81*24c77594SElena Agostiniis not installed in default locations (e.g. /usr/local/lib), 82*24c77594SElena Agostiniyou can use the variable ``GDRCOPY_PATH_L``. 83*24c77594SElena Agostini 84*24c77594SElena AgostiniAs an example, to enable the CPU map feature sanity check, 85*24c77594SElena Agostinirun the ``app/test-gpudev`` application with: 86*24c77594SElena Agostini 87*24c77594SElena Agostini.. code-block:: console 88*24c77594SElena Agostini 89*24c77594SElena Agostini $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev 90*24c77594SElena Agostini 91*24c77594SElena AgostiniAdditionally, the ``gdrdrv`` kernel module built with the GDRCopy project 92*24c77594SElena Agostinihas to be loaded on the system: 93*24c77594SElena Agostini 94*24c77594SElena Agostini.. code-block:: console 95*24c77594SElena Agostini 96*24c77594SElena Agostini $ lsmod | egrep gdrdrv 97*24c77594SElena Agostini gdrdrv 20480 0 98*24c77594SElena Agostini nvidia 35307520 19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset 99*24c77594SElena Agostini 100*24c77594SElena Agostini 1011306a73bSElena AgostiniDesign 1021306a73bSElena Agostini------ 1031306a73bSElena Agostini 1041306a73bSElena Agostini**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API). 1051306a73bSElena Agostini 1061306a73bSElena AgostiniGoal of this driver library is not to provide a wrapper for the whole CUDA Driver API. 1071306a73bSElena AgostiniInstead, the scope is to implement the generic features of gpudev API. 1081306a73bSElena AgostiniFor a CUDA application, integrating the gpudev library functions 1091306a73bSElena Agostiniusing the CUDA driver library is quite straightforward 1101306a73bSElena Agostiniand doesn't create any compatibility problem. 1111306a73bSElena Agostini 1121306a73bSElena AgostiniInitialization 1131306a73bSElena Agostini~~~~~~~~~~~~~~ 1141306a73bSElena Agostini 1151306a73bSElena AgostiniDuring initialization, CUDA driver library detects NVIDIA physical GPUs 1161306a73bSElena Agostinion the system or specified via EAL device options (e.g. ``-a b6:00.0``). 1171306a73bSElena AgostiniThe driver initializes the CUDA driver environment through ``cuInit(0)`` function. 1181306a73bSElena AgostiniFor this reason, it's required to set any CUDA environment configuration before 1191306a73bSElena Agostinicalling ``rte_eal_init`` function in the DPDK application. 1201306a73bSElena Agostini 1211306a73bSElena AgostiniIf the CUDA driver environment has been already initialized, the ``cuInit(0)`` 1221306a73bSElena Agostiniin CUDA driver library has no effect. 1231306a73bSElena Agostini 1241306a73bSElena AgostiniCUDA Driver sub-contexts 1251306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~~~~ 1261306a73bSElena Agostini 1271306a73bSElena AgostiniAfter initialization, a CUDA application can create multiple sub-contexts 1281306a73bSElena Agostinion GPU physical devices. 1291306a73bSElena AgostiniThrough gpudev library, is possible to register these sub-contexts 1301306a73bSElena Agostiniin the CUDA driver library as child devices having as parent a GPU physical device. 1311306a73bSElena Agostini 1321306a73bSElena AgostiniCUDA driver library also supports `MPS 1331306a73bSElena Agostini<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__. 1341306a73bSElena Agostini 1351306a73bSElena AgostiniGPU memory management 1361306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~ 1371306a73bSElena Agostini 1381306a73bSElena AgostiniThe CUDA driver library maintains a table of GPU memory addresses allocated 1391306a73bSElena Agostiniand CPU memory addresses registered associated to the input CUDA context. 1401306a73bSElena AgostiniWhenever the application tried to deallocate or deregister a memory address, 1411306a73bSElena Agostiniif the address is not in the table the CUDA driver library will return an error. 1421306a73bSElena Agostini 1431306a73bSElena AgostiniFeatures 1441306a73bSElena Agostini-------- 1451306a73bSElena Agostini 1461306a73bSElena Agostini- Register new child devices aka new CUDA Driver contexts. 1471306a73bSElena Agostini- Allocate memory on the GPU. 1481306a73bSElena Agostini- Register CPU memory to make it visible from GPU. 1491306a73bSElena Agostini 1501306a73bSElena AgostiniMinimal requirements 1511306a73bSElena Agostini-------------------- 1521306a73bSElena Agostini 1531306a73bSElena AgostiniMinimal requirements to enable the CUDA driver library are: 1541306a73bSElena Agostini 1551306a73bSElena Agostini- NVIDIA GPU Ampere or Volta 1561306a73bSElena Agostini- CUDA 11.4 Driver API or newer 1571306a73bSElena Agostini 1581306a73bSElena Agostini`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_ 1591306a73bSElena Agostiniallows compatible network cards (e.g. Mellanox) to directly send and receive packets 1601306a73bSElena Agostiniusing GPU memory instead of additional memory copies through the CPU system memory. 1611306a73bSElena AgostiniTo enable this technology, system requirements are: 1621306a73bSElena Agostini 1631306a73bSElena Agostini- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_ 1641306a73bSElena Agostini module running on the system; 1651306a73bSElena Agostini- Mellanox network card ConnectX-5 or newer (BlueField models included); 1661306a73bSElena Agostini- DPDK mlx5 PMD enabled; 1671306a73bSElena Agostini- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended. 1681306a73bSElena Agostini 1691306a73bSElena AgostiniLimitations 1701306a73bSElena Agostini----------- 1711306a73bSElena Agostini 1721306a73bSElena AgostiniSupported only on Linux. 1731306a73bSElena Agostini 1741306a73bSElena AgostiniSupported GPUs 1751306a73bSElena Agostini-------------- 1761306a73bSElena Agostini 1771306a73bSElena AgostiniThe following NVIDIA GPU devices are supported by this CUDA driver library: 1781306a73bSElena Agostini 1791306a73bSElena Agostini- NVIDIA A100 80GB PCIe 1801306a73bSElena Agostini- NVIDIA A100 40GB PCIe 1811306a73bSElena Agostini- NVIDIA A30 24GB 1821306a73bSElena Agostini- NVIDIA A10 24GB 1831306a73bSElena Agostini- NVIDIA V100 32GB PCIe 1841306a73bSElena Agostini- NVIDIA V100 16GB PCIe 1851306a73bSElena Agostini 1861306a73bSElena AgostiniExternal references 1871306a73bSElena Agostini------------------- 1881306a73bSElena Agostini 1891306a73bSElena AgostiniA good example of how to use the GPU CUDA driver library through the gpudev library 1901306a73bSElena Agostiniis the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_. 1911306a73bSElena Agostini 1921306a73bSElena AgostiniThe application is based on vanilla DPDK example l2fwd 1931306a73bSElena Agostiniand is enhanced with GPU memory managed through gpudev library 1941306a73bSElena Agostiniand CUDA to launch the swap of packets MAC addresses workload on the GPU. 1951306a73bSElena Agostini 1961306a73bSElena Agostinil2fwd-nv is not intended to be used for performance 1971306a73bSElena Agostini(testpmd is the good candidate for this). 1981306a73bSElena AgostiniThe goal is to show different use-cases about how a CUDA application can use DPDK to: 1991306a73bSElena Agostini 2001306a73bSElena Agostini- Allocate memory on GPU device using gpudev library. 2011306a73bSElena Agostini- Use that memory to create an external GPU memory mempool. 2021306a73bSElena Agostini- Receive packets directly in GPU memory. 2031306a73bSElena Agostini- Coordinate the workload on the GPU with the network and CPU activity to receive packets. 2041306a73bSElena Agostini- Send modified packets directly from the GPU memory. 205