11306a73bSElena Agostini.. SPDX-License-Identifier: BSD-3-Clause 21306a73bSElena Agostini Copyright (c) 2021 NVIDIA Corporation & Affiliates 31306a73bSElena Agostini 41306a73bSElena AgostiniCUDA GPU driver 51306a73bSElena Agostini=============== 61306a73bSElena Agostini 71306a73bSElena AgostiniThe CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs. 81306a73bSElena AgostiniInformation and documentation about these devices can be found on the 91306a73bSElena Agostini`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the 101306a73bSElena Agostini`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_. 111306a73bSElena Agostini 121306a73bSElena AgostiniBuild dependencies 131306a73bSElena Agostini------------------ 141306a73bSElena Agostini 15*135551aeSAli AlnubaniThe CUDA GPU driver library has a header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``. 16*135551aeSAli AlnubaniTo get these headers, there are two options: 171306a73bSElena Agostini 181306a73bSElena Agostini- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_ 191306a73bSElena Agostini (either regular or stubs installation). 201306a73bSElena Agostini- Download these two headers from this `CUDA headers 211306a73bSElena Agostini <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository. 221306a73bSElena Agostini 23*135551aeSAli AlnubaniYou can point to CUDA header files either with the ``CFLAGS`` environment variable, 24*135551aeSAli Alnubanior with the ``c_args`` Meson option. Examples: 251306a73bSElena Agostini 26*135551aeSAli Alnubani- ``CFLAGS=-I/usr/local/cuda/include meson setup build`` 27*135551aeSAli Alnubani- ``meson setup build -Dc_args=-I/usr/local/cuda/include`` 281306a73bSElena Agostini 291306a73bSElena AgostiniIf headers are not found, the CUDA GPU driver library is not built. 301306a73bSElena Agostini 3124c77594SElena AgostiniCPU map GPU memory 3224c77594SElena Agostini~~~~~~~~~~~~~~~~~~ 3324c77594SElena Agostini 3424c77594SElena AgostiniTo enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``), 3524c77594SElena Agostiniyou need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver 3624c77594SElena Agostiniinstalled on your system. 3724c77594SElena Agostini 3824c77594SElena AgostiniA quick recipe to download, build and run GDRCopy library and driver: 3924c77594SElena Agostini 4024c77594SElena Agostini.. code-block:: console 4124c77594SElena Agostini 4224c77594SElena Agostini $ git clone https://github.com/NVIDIA/gdrcopy.git 4324c77594SElena Agostini $ make 4424c77594SElena Agostini $ # make install to install GDRCopy library system wide 4524c77594SElena Agostini $ # Launch gdrdrv kernel module on the system 4624c77594SElena Agostini $ sudo ./insmod.sh 4724c77594SElena Agostini 48*135551aeSAli AlnubaniYou need to indicate to Meson where GDRCopy header files are as in case of CUDA headers. 4924c77594SElena AgostiniAn example would be: 5024c77594SElena Agostini 5124c77594SElena Agostini.. code-block:: console 5224c77594SElena Agostini 53e24b8ad4SStephen Hemminger $ meson setup build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include" 5424c77594SElena Agostini 55*135551aeSAli AlnubaniIf headers are not found, the CUDA GPU driver library is built without the CPU map capability, 56*135551aeSAli Alnubaniand will return an error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function. 5724c77594SElena Agostini 5824c77594SElena Agostini 591306a73bSElena AgostiniCUDA Shared Library 601306a73bSElena Agostini------------------- 611306a73bSElena Agostini 621306a73bSElena AgostiniTo avoid any system configuration issue, the CUDA API **libcuda.so** shared library 631306a73bSElena Agostiniis not linked at building time because of a Meson bug that looks 641306a73bSElena Agostinifor `cudart` module even if the `meson.build` file only requires default `cuda` module. 651306a73bSElena Agostini 661306a73bSElena Agostini**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen`` 671306a73bSElena Agostiniwhen the very first GPU is detected. 681306a73bSElena AgostiniIf CUDA installation resides in a custom directory, 691306a73bSElena Agostinithe environment variable ``CUDA_PATH_L`` should specify where ``dlopen`` 701306a73bSElena Agostinican look for **libcuda.so**. 711306a73bSElena Agostini 721306a73bSElena AgostiniAll CUDA API symbols are loaded at runtime as well. 731306a73bSElena AgostiniFor this reason, to build the CUDA driver library, 741306a73bSElena Agostinino need to install the CUDA library. 751306a73bSElena Agostini 7624c77594SElena AgostiniCPU map GPU memory 7724c77594SElena Agostini~~~~~~~~~~~~~~~~~~ 7824c77594SElena Agostini 7924c77594SElena AgostiniSimilarly to CUDA shared library, if the **libgdrapi.so** shared library 8024c77594SElena Agostiniis not installed in default locations (e.g. /usr/local/lib), 8124c77594SElena Agostiniyou can use the variable ``GDRCOPY_PATH_L``. 8224c77594SElena Agostini 8324c77594SElena AgostiniAs an example, to enable the CPU map feature sanity check, 8424c77594SElena Agostinirun the ``app/test-gpudev`` application with: 8524c77594SElena Agostini 8624c77594SElena Agostini.. code-block:: console 8724c77594SElena Agostini 8824c77594SElena Agostini $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev 8924c77594SElena Agostini 9024c77594SElena AgostiniAdditionally, the ``gdrdrv`` kernel module built with the GDRCopy project 9124c77594SElena Agostinihas to be loaded on the system: 9224c77594SElena Agostini 9324c77594SElena Agostini.. code-block:: console 9424c77594SElena Agostini 9524c77594SElena Agostini $ lsmod | egrep gdrdrv 9624c77594SElena Agostini gdrdrv 20480 0 9724c77594SElena Agostini nvidia 35307520 19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset 9824c77594SElena Agostini 9924c77594SElena Agostini 1001306a73bSElena AgostiniDesign 1011306a73bSElena Agostini------ 1021306a73bSElena Agostini 1031306a73bSElena Agostini**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API). 1041306a73bSElena Agostini 1051306a73bSElena AgostiniGoal of this driver library is not to provide a wrapper for the whole CUDA Driver API. 1061306a73bSElena AgostiniInstead, the scope is to implement the generic features of gpudev API. 1071306a73bSElena AgostiniFor a CUDA application, integrating the gpudev library functions 1081306a73bSElena Agostiniusing the CUDA driver library is quite straightforward 1091306a73bSElena Agostiniand doesn't create any compatibility problem. 1101306a73bSElena Agostini 1111306a73bSElena AgostiniInitialization 1121306a73bSElena Agostini~~~~~~~~~~~~~~ 1131306a73bSElena Agostini 1141306a73bSElena AgostiniDuring initialization, CUDA driver library detects NVIDIA physical GPUs 1151306a73bSElena Agostinion the system or specified via EAL device options (e.g. ``-a b6:00.0``). 1161306a73bSElena AgostiniThe driver initializes the CUDA driver environment through ``cuInit(0)`` function. 1171306a73bSElena AgostiniFor this reason, it's required to set any CUDA environment configuration before 1181306a73bSElena Agostinicalling ``rte_eal_init`` function in the DPDK application. 1191306a73bSElena Agostini 1201306a73bSElena AgostiniIf the CUDA driver environment has been already initialized, the ``cuInit(0)`` 1211306a73bSElena Agostiniin CUDA driver library has no effect. 1221306a73bSElena Agostini 1231306a73bSElena AgostiniCUDA Driver sub-contexts 1241306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~~~~ 1251306a73bSElena Agostini 1261306a73bSElena AgostiniAfter initialization, a CUDA application can create multiple sub-contexts 1271306a73bSElena Agostinion GPU physical devices. 1281306a73bSElena AgostiniThrough gpudev library, is possible to register these sub-contexts 1291306a73bSElena Agostiniin the CUDA driver library as child devices having as parent a GPU physical device. 1301306a73bSElena Agostini 1311306a73bSElena AgostiniCUDA driver library also supports `MPS 1321306a73bSElena Agostini<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__. 1331306a73bSElena Agostini 1341306a73bSElena AgostiniGPU memory management 1351306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~ 1361306a73bSElena Agostini 1371306a73bSElena AgostiniThe CUDA driver library maintains a table of GPU memory addresses allocated 1381306a73bSElena Agostiniand CPU memory addresses registered associated to the input CUDA context. 1391306a73bSElena AgostiniWhenever the application tried to deallocate or deregister a memory address, 1401306a73bSElena Agostiniif the address is not in the table the CUDA driver library will return an error. 1411306a73bSElena Agostini 1421306a73bSElena AgostiniFeatures 1431306a73bSElena Agostini-------- 1441306a73bSElena Agostini 145*135551aeSAli Alnubani- Register new child devices, aka CUDA driver contexts. 1461306a73bSElena Agostini- Allocate memory on the GPU. 1471306a73bSElena Agostini- Register CPU memory to make it visible from GPU. 1481306a73bSElena Agostini 1491306a73bSElena AgostiniMinimal requirements 1501306a73bSElena Agostini-------------------- 1511306a73bSElena Agostini 1521306a73bSElena AgostiniMinimal requirements to enable the CUDA driver library are: 1531306a73bSElena Agostini 1541306a73bSElena Agostini- NVIDIA GPU Ampere or Volta 1551306a73bSElena Agostini- CUDA 11.4 Driver API or newer 1561306a73bSElena Agostini 1571306a73bSElena Agostini`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_ 1580f91f952SThomas Monjalonallows compatible network cards (e.g. ConnectX) to directly send and receive packets 1591306a73bSElena Agostiniusing GPU memory instead of additional memory copies through the CPU system memory. 1601306a73bSElena AgostiniTo enable this technology, system requirements are: 1611306a73bSElena Agostini 1621306a73bSElena Agostini- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_ 1631306a73bSElena Agostini module running on the system; 1640f91f952SThomas Monjalon- NVIDIA network card ConnectX-5 or newer (BlueField models included); 1651306a73bSElena Agostini- DPDK mlx5 PMD enabled; 1661306a73bSElena Agostini- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended. 1671306a73bSElena Agostini 1681306a73bSElena AgostiniLimitations 1691306a73bSElena Agostini----------- 1701306a73bSElena Agostini 1711306a73bSElena AgostiniSupported only on Linux. 1721306a73bSElena Agostini 1731306a73bSElena AgostiniSupported GPUs 1741306a73bSElena Agostini-------------- 1751306a73bSElena Agostini 1761306a73bSElena AgostiniThe following NVIDIA GPU devices are supported by this CUDA driver library: 1771306a73bSElena Agostini 1781306a73bSElena Agostini- NVIDIA A100 80GB PCIe 1791306a73bSElena Agostini- NVIDIA A100 40GB PCIe 1801306a73bSElena Agostini- NVIDIA A30 24GB 1811306a73bSElena Agostini- NVIDIA A10 24GB 1821306a73bSElena Agostini- NVIDIA V100 32GB PCIe 1831306a73bSElena Agostini- NVIDIA V100 16GB PCIe 1841306a73bSElena Agostini 1851306a73bSElena AgostiniExternal references 1861306a73bSElena Agostini------------------- 1871306a73bSElena Agostini 1881306a73bSElena AgostiniA good example of how to use the GPU CUDA driver library through the gpudev library 1891306a73bSElena Agostiniis the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_. 1901306a73bSElena Agostini 191*135551aeSAli AlnubaniThe application is based on the DPDK example l2fwd, 192*135551aeSAli Alnubaniwith GPU memory managed through gpudev library. 193*135551aeSAli AlnubaniIt includes a CUDA workload swapping MAC addresses 194*135551aeSAli Alnubaniof packets received in the GPU. 1951306a73bSElena Agostini 1961306a73bSElena Agostinil2fwd-nv is not intended to be used for performance 1971306a73bSElena Agostini(testpmd is the good candidate for this). 1981306a73bSElena AgostiniThe goal is to show different use-cases about how a CUDA application can use DPDK to: 1991306a73bSElena Agostini 2001306a73bSElena Agostini- Allocate memory on GPU device using gpudev library. 2011306a73bSElena Agostini- Use that memory to create an external GPU memory mempool. 2021306a73bSElena Agostini- Receive packets directly in GPU memory. 2031306a73bSElena Agostini- Coordinate the workload on the GPU with the network and CPU activity to receive packets. 2041306a73bSElena Agostini- Send modified packets directly from the GPU memory. 205