xref: /dpdk/doc/guides/gpus/cuda.rst (revision 135551ae0bae5a9e575b0a531490c24ef730dac0)
11306a73bSElena Agostini.. SPDX-License-Identifier: BSD-3-Clause
21306a73bSElena Agostini   Copyright (c) 2021 NVIDIA Corporation & Affiliates
31306a73bSElena Agostini
41306a73bSElena AgostiniCUDA GPU driver
51306a73bSElena Agostini===============
61306a73bSElena Agostini
71306a73bSElena AgostiniThe CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
81306a73bSElena AgostiniInformation and documentation about these devices can be found on the
91306a73bSElena Agostini`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
101306a73bSElena Agostini`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
111306a73bSElena Agostini
121306a73bSElena AgostiniBuild dependencies
131306a73bSElena Agostini------------------
141306a73bSElena Agostini
15*135551aeSAli AlnubaniThe CUDA GPU driver library has a header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
16*135551aeSAli AlnubaniTo get these headers, there are two options:
171306a73bSElena Agostini
181306a73bSElena Agostini- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
191306a73bSElena Agostini  (either regular or stubs installation).
201306a73bSElena Agostini- Download these two headers from this `CUDA headers
211306a73bSElena Agostini  <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
221306a73bSElena Agostini
23*135551aeSAli AlnubaniYou can point to CUDA header files either with the ``CFLAGS`` environment variable,
24*135551aeSAli Alnubanior with the ``c_args`` Meson option. Examples:
251306a73bSElena Agostini
26*135551aeSAli Alnubani- ``CFLAGS=-I/usr/local/cuda/include meson setup build``
27*135551aeSAli Alnubani- ``meson setup build -Dc_args=-I/usr/local/cuda/include``
281306a73bSElena Agostini
291306a73bSElena AgostiniIf headers are not found, the CUDA GPU driver library is not built.
301306a73bSElena Agostini
3124c77594SElena AgostiniCPU map GPU memory
3224c77594SElena Agostini~~~~~~~~~~~~~~~~~~
3324c77594SElena Agostini
3424c77594SElena AgostiniTo enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``),
3524c77594SElena Agostiniyou need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver
3624c77594SElena Agostiniinstalled on your system.
3724c77594SElena Agostini
3824c77594SElena AgostiniA quick recipe to download, build and run GDRCopy library and driver:
3924c77594SElena Agostini
4024c77594SElena Agostini.. code-block:: console
4124c77594SElena Agostini
4224c77594SElena Agostini  $ git clone https://github.com/NVIDIA/gdrcopy.git
4324c77594SElena Agostini  $ make
4424c77594SElena Agostini  $ # make install to install GDRCopy library system wide
4524c77594SElena Agostini  $ # Launch gdrdrv kernel module on the system
4624c77594SElena Agostini  $ sudo ./insmod.sh
4724c77594SElena Agostini
48*135551aeSAli AlnubaniYou need to indicate to Meson where GDRCopy header files are as in case of CUDA headers.
4924c77594SElena AgostiniAn example would be:
5024c77594SElena Agostini
5124c77594SElena Agostini.. code-block:: console
5224c77594SElena Agostini
53e24b8ad4SStephen Hemminger  $ meson setup build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include"
5424c77594SElena Agostini
55*135551aeSAli AlnubaniIf headers are not found, the CUDA GPU driver library is built without the CPU map capability,
56*135551aeSAli Alnubaniand will return an error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function.
5724c77594SElena Agostini
5824c77594SElena Agostini
591306a73bSElena AgostiniCUDA Shared Library
601306a73bSElena Agostini-------------------
611306a73bSElena Agostini
621306a73bSElena AgostiniTo avoid any system configuration issue, the CUDA API **libcuda.so** shared library
631306a73bSElena Agostiniis not linked at building time because of a Meson bug that looks
641306a73bSElena Agostinifor `cudart` module even if the `meson.build` file only requires default `cuda` module.
651306a73bSElena Agostini
661306a73bSElena Agostini**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
671306a73bSElena Agostiniwhen the very first GPU is detected.
681306a73bSElena AgostiniIf CUDA installation resides in a custom directory,
691306a73bSElena Agostinithe environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
701306a73bSElena Agostinican look for **libcuda.so**.
711306a73bSElena Agostini
721306a73bSElena AgostiniAll CUDA API symbols are loaded at runtime as well.
731306a73bSElena AgostiniFor this reason, to build the CUDA driver library,
741306a73bSElena Agostinino need to install the CUDA library.
751306a73bSElena Agostini
7624c77594SElena AgostiniCPU map GPU memory
7724c77594SElena Agostini~~~~~~~~~~~~~~~~~~
7824c77594SElena Agostini
7924c77594SElena AgostiniSimilarly to CUDA shared library, if the **libgdrapi.so** shared library
8024c77594SElena Agostiniis not installed in default locations (e.g. /usr/local/lib),
8124c77594SElena Agostiniyou can use the variable ``GDRCOPY_PATH_L``.
8224c77594SElena Agostini
8324c77594SElena AgostiniAs an example, to enable the CPU map feature sanity check,
8424c77594SElena Agostinirun the ``app/test-gpudev`` application with:
8524c77594SElena Agostini
8624c77594SElena Agostini.. code-block:: console
8724c77594SElena Agostini
8824c77594SElena Agostini  $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev
8924c77594SElena Agostini
9024c77594SElena AgostiniAdditionally, the ``gdrdrv`` kernel module built with the GDRCopy project
9124c77594SElena Agostinihas to be loaded on the system:
9224c77594SElena Agostini
9324c77594SElena Agostini.. code-block:: console
9424c77594SElena Agostini
9524c77594SElena Agostini  $ lsmod | egrep gdrdrv
9624c77594SElena Agostini  gdrdrv                 20480  0
9724c77594SElena Agostini  nvidia              35307520  19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset
9824c77594SElena Agostini
9924c77594SElena Agostini
1001306a73bSElena AgostiniDesign
1011306a73bSElena Agostini------
1021306a73bSElena Agostini
1031306a73bSElena Agostini**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
1041306a73bSElena Agostini
1051306a73bSElena AgostiniGoal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
1061306a73bSElena AgostiniInstead, the scope is to implement the generic features of gpudev API.
1071306a73bSElena AgostiniFor a CUDA application, integrating the gpudev library functions
1081306a73bSElena Agostiniusing the CUDA driver library is quite straightforward
1091306a73bSElena Agostiniand doesn't create any compatibility problem.
1101306a73bSElena Agostini
1111306a73bSElena AgostiniInitialization
1121306a73bSElena Agostini~~~~~~~~~~~~~~
1131306a73bSElena Agostini
1141306a73bSElena AgostiniDuring initialization, CUDA driver library detects NVIDIA physical GPUs
1151306a73bSElena Agostinion the system or specified via EAL device options (e.g. ``-a b6:00.0``).
1161306a73bSElena AgostiniThe driver initializes the CUDA driver environment through ``cuInit(0)`` function.
1171306a73bSElena AgostiniFor this reason, it's required to set any CUDA environment configuration before
1181306a73bSElena Agostinicalling ``rte_eal_init`` function in the DPDK application.
1191306a73bSElena Agostini
1201306a73bSElena AgostiniIf the CUDA driver environment has been already initialized, the ``cuInit(0)``
1211306a73bSElena Agostiniin CUDA driver library has no effect.
1221306a73bSElena Agostini
1231306a73bSElena AgostiniCUDA Driver sub-contexts
1241306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~~~~
1251306a73bSElena Agostini
1261306a73bSElena AgostiniAfter initialization, a CUDA application can create multiple sub-contexts
1271306a73bSElena Agostinion GPU physical devices.
1281306a73bSElena AgostiniThrough gpudev library, is possible to register these sub-contexts
1291306a73bSElena Agostiniin the CUDA driver library as child devices having as parent a GPU physical device.
1301306a73bSElena Agostini
1311306a73bSElena AgostiniCUDA driver library also supports `MPS
1321306a73bSElena Agostini<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
1331306a73bSElena Agostini
1341306a73bSElena AgostiniGPU memory management
1351306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~
1361306a73bSElena Agostini
1371306a73bSElena AgostiniThe CUDA driver library maintains a table of GPU memory addresses allocated
1381306a73bSElena Agostiniand CPU memory addresses registered associated to the input CUDA context.
1391306a73bSElena AgostiniWhenever the application tried to deallocate or deregister a memory address,
1401306a73bSElena Agostiniif the address is not in the table the CUDA driver library will return an error.
1411306a73bSElena Agostini
1421306a73bSElena AgostiniFeatures
1431306a73bSElena Agostini--------
1441306a73bSElena Agostini
145*135551aeSAli Alnubani- Register new child devices, aka CUDA driver contexts.
1461306a73bSElena Agostini- Allocate memory on the GPU.
1471306a73bSElena Agostini- Register CPU memory to make it visible from GPU.
1481306a73bSElena Agostini
1491306a73bSElena AgostiniMinimal requirements
1501306a73bSElena Agostini--------------------
1511306a73bSElena Agostini
1521306a73bSElena AgostiniMinimal requirements to enable the CUDA driver library are:
1531306a73bSElena Agostini
1541306a73bSElena Agostini- NVIDIA GPU Ampere or Volta
1551306a73bSElena Agostini- CUDA 11.4 Driver API or newer
1561306a73bSElena Agostini
1571306a73bSElena Agostini`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
1580f91f952SThomas Monjalonallows compatible network cards (e.g. ConnectX) to directly send and receive packets
1591306a73bSElena Agostiniusing GPU memory instead of additional memory copies through the CPU system memory.
1601306a73bSElena AgostiniTo enable this technology, system requirements are:
1611306a73bSElena Agostini
1621306a73bSElena Agostini- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
1631306a73bSElena Agostini  module running on the system;
1640f91f952SThomas Monjalon- NVIDIA network card ConnectX-5 or newer (BlueField models included);
1651306a73bSElena Agostini- DPDK mlx5 PMD enabled;
1661306a73bSElena Agostini- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
1671306a73bSElena Agostini
1681306a73bSElena AgostiniLimitations
1691306a73bSElena Agostini-----------
1701306a73bSElena Agostini
1711306a73bSElena AgostiniSupported only on Linux.
1721306a73bSElena Agostini
1731306a73bSElena AgostiniSupported GPUs
1741306a73bSElena Agostini--------------
1751306a73bSElena Agostini
1761306a73bSElena AgostiniThe following NVIDIA GPU devices are supported by this CUDA driver library:
1771306a73bSElena Agostini
1781306a73bSElena Agostini- NVIDIA A100 80GB PCIe
1791306a73bSElena Agostini- NVIDIA A100 40GB PCIe
1801306a73bSElena Agostini- NVIDIA A30 24GB
1811306a73bSElena Agostini- NVIDIA A10 24GB
1821306a73bSElena Agostini- NVIDIA V100 32GB PCIe
1831306a73bSElena Agostini- NVIDIA V100 16GB PCIe
1841306a73bSElena Agostini
1851306a73bSElena AgostiniExternal references
1861306a73bSElena Agostini-------------------
1871306a73bSElena Agostini
1881306a73bSElena AgostiniA good example of how to use the GPU CUDA driver library through the gpudev library
1891306a73bSElena Agostiniis the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
1901306a73bSElena Agostini
191*135551aeSAli AlnubaniThe application is based on the DPDK example l2fwd,
192*135551aeSAli Alnubaniwith GPU memory managed through gpudev library.
193*135551aeSAli AlnubaniIt includes a CUDA workload swapping MAC addresses
194*135551aeSAli Alnubaniof packets received in the GPU.
1951306a73bSElena Agostini
1961306a73bSElena Agostinil2fwd-nv is not intended to be used for performance
1971306a73bSElena Agostini(testpmd is the good candidate for this).
1981306a73bSElena AgostiniThe goal is to show different use-cases about how a CUDA application can use DPDK to:
1991306a73bSElena Agostini
2001306a73bSElena Agostini- Allocate memory on GPU device using gpudev library.
2011306a73bSElena Agostini- Use that memory to create an external GPU memory mempool.
2021306a73bSElena Agostini- Receive packets directly in GPU memory.
2031306a73bSElena Agostini- Coordinate the workload on the GPU with the network and CPU activity to receive packets.
2041306a73bSElena Agostini- Send modified packets directly from the GPU memory.
205