guides/gpus/cuda.rst

1306a73bSElena Agostini.. SPDX-License-Identifier: BSD-3-Clause
1306a73bSElena Agostini   Copyright (c) 2021 NVIDIA Corporation & Affiliates
1306a73bSElena Agostini
1306a73bSElena AgostiniCUDA GPU driver
1306a73bSElena Agostini===============
1306a73bSElena Agostini
1306a73bSElena AgostiniThe CUDA GPU driver library (**librte_gpu_cuda**) provides support for NVIDIA GPUs.
1306a73bSElena AgostiniInformation and documentation about these devices can be found on the
1306a73bSElena Agostini`NVIDIA website <http://www.nvidia.com>`_. Help is also provided by the
1306a73bSElena Agostini`NVIDIA CUDA Toolkit developer zone <https://docs.nvidia.com/cuda>`_.
1306a73bSElena Agostini
1306a73bSElena AgostiniBuild dependencies
1306a73bSElena Agostini------------------
1306a73bSElena Agostini
1306a73bSElena AgostiniThe CUDA GPU driver library has an header-only dependency on ``cuda.h`` and ``cudaTypedefs.h``.
1306a73bSElena AgostiniTo get these headers there are two options:
1306a73bSElena Agostini
1306a73bSElena Agostini- Install `CUDA Toolkit <https://developer.nvidia.com/cuda-toolkit>`_
1306a73bSElena Agostini  (either regular or stubs installation).
1306a73bSElena Agostini- Download these two headers from this `CUDA headers
1306a73bSElena Agostini  <https://gitlab.com/nvidia/headers/cuda-individual/cudart>`_ repository.
1306a73bSElena Agostini
1306a73bSElena AgostiniYou need to indicate to meson where CUDA headers files are through the CFLAGS variable.
1306a73bSElena AgostiniThree ways:
1306a73bSElena Agostini
1306a73bSElena Agostini- Set ``export CFLAGS=-I/usr/local/cuda/include`` before building
1306a73bSElena Agostini- Add CFLAGS in the meson command line ``CFLAGS=-I/usr/local/cuda/include meson build``
1306a73bSElena Agostini- Add the ``-Dc_args`` in meson command line ``meson build -Dc_args=-I/usr/local/cuda/include``
1306a73bSElena Agostini
1306a73bSElena AgostiniIf headers are not found, the CUDA GPU driver library is not built.
1306a73bSElena Agostini
*24c77594SElena AgostiniCPU map GPU memory
*24c77594SElena Agostini~~~~~~~~~~~~~~~~~~
*24c77594SElena Agostini
*24c77594SElena AgostiniTo enable this gpudev feature (i.e. implement the ``rte_gpu_mem_cpu_map``),
*24c77594SElena Agostiniyou need the `GDRCopy <https://github.com/NVIDIA/gdrcopy>`_ library and driver
*24c77594SElena Agostiniinstalled on your system.
*24c77594SElena Agostini
*24c77594SElena AgostiniA quick recipe to download, build and run GDRCopy library and driver:
*24c77594SElena Agostini
*24c77594SElena Agostini.. code-block:: console
*24c77594SElena Agostini
*24c77594SElena Agostini  $ git clone https://github.com/NVIDIA/gdrcopy.git
*24c77594SElena Agostini  $ make
*24c77594SElena Agostini  $ # make install to install GDRCopy library system wide
*24c77594SElena Agostini  $ # Launch gdrdrv kernel module on the system
*24c77594SElena Agostini  $ sudo ./insmod.sh
*24c77594SElena Agostini
*24c77594SElena AgostiniYou need to indicate to meson where GDRCopy headers files are as in case of CUDA headers.
*24c77594SElena AgostiniAn example would be:
*24c77594SElena Agostini
*24c77594SElena Agostini.. code-block:: console
*24c77594SElena Agostini
*24c77594SElena Agostini  $ meson build -Dc_args="-I/usr/local/cuda/include -I/path/to/gdrcopy/include"
*24c77594SElena Agostini
*24c77594SElena AgostiniIf headers are not found, the CUDA GPU driver library is built without the CPU map capability
*24c77594SElena Agostiniand will return error if the application invokes the gpudev ``rte_gpu_mem_cpu_map`` function.
*24c77594SElena Agostini
*24c77594SElena Agostini
1306a73bSElena AgostiniCUDA Shared Library
1306a73bSElena Agostini-------------------
1306a73bSElena Agostini
1306a73bSElena AgostiniTo avoid any system configuration issue, the CUDA API **libcuda.so** shared library
1306a73bSElena Agostiniis not linked at building time because of a Meson bug that looks
1306a73bSElena Agostinifor `cudart` module even if the `meson.build` file only requires default `cuda` module.
1306a73bSElena Agostini
1306a73bSElena Agostini**libcuda.so** is loaded at runtime in the ``cuda_gpu_probe`` function through ``dlopen``
1306a73bSElena Agostiniwhen the very first GPU is detected.
1306a73bSElena AgostiniIf CUDA installation resides in a custom directory,
1306a73bSElena Agostinithe environment variable ``CUDA_PATH_L`` should specify where ``dlopen``
1306a73bSElena Agostinican look for **libcuda.so**.
1306a73bSElena Agostini
1306a73bSElena AgostiniAll CUDA API symbols are loaded at runtime as well.
1306a73bSElena AgostiniFor this reason, to build the CUDA driver library,
1306a73bSElena Agostinino need to install the CUDA library.
1306a73bSElena Agostini
*24c77594SElena AgostiniCPU map GPU memory
*24c77594SElena Agostini~~~~~~~~~~~~~~~~~~
*24c77594SElena Agostini
*24c77594SElena AgostiniSimilarly to CUDA shared library, if the **libgdrapi.so** shared library
*24c77594SElena Agostiniis not installed in default locations (e.g. /usr/local/lib),
*24c77594SElena Agostiniyou can use the variable ``GDRCOPY_PATH_L``.
*24c77594SElena Agostini
*24c77594SElena AgostiniAs an example, to enable the CPU map feature sanity check,
*24c77594SElena Agostinirun the ``app/test-gpudev`` application with:
*24c77594SElena Agostini
*24c77594SElena Agostini.. code-block:: console
*24c77594SElena Agostini
*24c77594SElena Agostini  $ sudo CUDA_PATH_L=/path/to/libcuda GDRCOPY_PATH_L=/path/to/libgdrapi ./build/app/dpdk-test-gpudev
*24c77594SElena Agostini
*24c77594SElena AgostiniAdditionally, the ``gdrdrv`` kernel module built with the GDRCopy project
*24c77594SElena Agostinihas to be loaded on the system:
*24c77594SElena Agostini
*24c77594SElena Agostini.. code-block:: console
*24c77594SElena Agostini
*24c77594SElena Agostini  $ lsmod | egrep gdrdrv
*24c77594SElena Agostini  gdrdrv                 20480  0
*24c77594SElena Agostini  nvidia              35307520  19 nvidia_uvm,nv_peer_mem,gdrdrv,nvidia_modeset
*24c77594SElena Agostini
*24c77594SElena Agostini
1306a73bSElena AgostiniDesign
1306a73bSElena Agostini------
1306a73bSElena Agostini
1306a73bSElena Agostini**librte_gpu_cuda** relies on CUDA Driver API (no need for CUDA Runtime API).
1306a73bSElena Agostini
1306a73bSElena AgostiniGoal of this driver library is not to provide a wrapper for the whole CUDA Driver API.
1306a73bSElena AgostiniInstead, the scope is to implement the generic features of gpudev API.
1306a73bSElena AgostiniFor a CUDA application, integrating the gpudev library functions
1306a73bSElena Agostiniusing the CUDA driver library is quite straightforward
1306a73bSElena Agostiniand doesn't create any compatibility problem.
1306a73bSElena Agostini
1306a73bSElena AgostiniInitialization
1306a73bSElena Agostini~~~~~~~~~~~~~~
1306a73bSElena Agostini
1306a73bSElena AgostiniDuring initialization, CUDA driver library detects NVIDIA physical GPUs
1306a73bSElena Agostinion the system or specified via EAL device options (e.g. ``-a b6:00.0``).
1306a73bSElena AgostiniThe driver initializes the CUDA driver environment through ``cuInit(0)`` function.
1306a73bSElena AgostiniFor this reason, it's required to set any CUDA environment configuration before
1306a73bSElena Agostinicalling ``rte_eal_init`` function in the DPDK application.
1306a73bSElena Agostini
1306a73bSElena AgostiniIf the CUDA driver environment has been already initialized, the ``cuInit(0)``
1306a73bSElena Agostiniin CUDA driver library has no effect.
1306a73bSElena Agostini
1306a73bSElena AgostiniCUDA Driver sub-contexts
1306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~~~~
1306a73bSElena Agostini
1306a73bSElena AgostiniAfter initialization, a CUDA application can create multiple sub-contexts
1306a73bSElena Agostinion GPU physical devices.
1306a73bSElena AgostiniThrough gpudev library, is possible to register these sub-contexts
1306a73bSElena Agostiniin the CUDA driver library as child devices having as parent a GPU physical device.
1306a73bSElena Agostini
1306a73bSElena AgostiniCUDA driver library also supports `MPS
1306a73bSElena Agostini<https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf>`__.
1306a73bSElena Agostini
1306a73bSElena AgostiniGPU memory management
1306a73bSElena Agostini~~~~~~~~~~~~~~~~~~~~~
1306a73bSElena Agostini
1306a73bSElena AgostiniThe CUDA driver library maintains a table of GPU memory addresses allocated
1306a73bSElena Agostiniand CPU memory addresses registered associated to the input CUDA context.
1306a73bSElena AgostiniWhenever the application tried to deallocate or deregister a memory address,
1306a73bSElena Agostiniif the address is not in the table the CUDA driver library will return an error.
1306a73bSElena Agostini
1306a73bSElena AgostiniFeatures
1306a73bSElena Agostini--------
1306a73bSElena Agostini
1306a73bSElena Agostini- Register new child devices aka new CUDA Driver contexts.
1306a73bSElena Agostini- Allocate memory on the GPU.
1306a73bSElena Agostini- Register CPU memory to make it visible from GPU.
1306a73bSElena Agostini
1306a73bSElena AgostiniMinimal requirements
1306a73bSElena Agostini--------------------
1306a73bSElena Agostini
1306a73bSElena AgostiniMinimal requirements to enable the CUDA driver library are:
1306a73bSElena Agostini
1306a73bSElena Agostini- NVIDIA GPU Ampere or Volta
1306a73bSElena Agostini- CUDA 11.4 Driver API or newer
1306a73bSElena Agostini
1306a73bSElena Agostini`GPUDirect RDMA Technology <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html>`_
1306a73bSElena Agostiniallows compatible network cards (e.g. Mellanox) to directly send and receive packets
1306a73bSElena Agostiniusing GPU memory instead of additional memory copies through the CPU system memory.
1306a73bSElena AgostiniTo enable this technology, system requirements are:
1306a73bSElena Agostini
1306a73bSElena Agostini- `nvidia-peermem <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_
1306a73bSElena Agostini  module running on the system;
1306a73bSElena Agostini- Mellanox network card ConnectX-5 or newer (BlueField models included);
1306a73bSElena Agostini- DPDK mlx5 PMD enabled;
1306a73bSElena Agostini- To reach the best performance, an additional PCIe switch between GPU and NIC is recommended.
1306a73bSElena Agostini
1306a73bSElena AgostiniLimitations
1306a73bSElena Agostini-----------
1306a73bSElena Agostini
1306a73bSElena AgostiniSupported only on Linux.
1306a73bSElena Agostini
1306a73bSElena AgostiniSupported GPUs
1306a73bSElena Agostini--------------
1306a73bSElena Agostini
1306a73bSElena AgostiniThe following NVIDIA GPU devices are supported by this CUDA driver library:
1306a73bSElena Agostini
1306a73bSElena Agostini- NVIDIA A100 80GB PCIe
1306a73bSElena Agostini- NVIDIA A100 40GB PCIe
1306a73bSElena Agostini- NVIDIA A30 24GB
1306a73bSElena Agostini- NVIDIA A10 24GB
1306a73bSElena Agostini- NVIDIA V100 32GB PCIe
1306a73bSElena Agostini- NVIDIA V100 16GB PCIe
1306a73bSElena Agostini
1306a73bSElena AgostiniExternal references
1306a73bSElena Agostini-------------------
1306a73bSElena Agostini
1306a73bSElena AgostiniA good example of how to use the GPU CUDA driver library through the gpudev library
1306a73bSElena Agostiniis the l2fwd-nv application that can be found `here <https://github.com/NVIDIA/l2fwd-nv>`_.
1306a73bSElena Agostini
1306a73bSElena AgostiniThe application is based on vanilla DPDK example l2fwd
1306a73bSElena Agostiniand is enhanced with GPU memory managed through gpudev library
1306a73bSElena Agostiniand CUDA to launch the swap of packets MAC addresses workload on the GPU.
1306a73bSElena Agostini
1306a73bSElena Agostinil2fwd-nv is not intended to be used for performance
1306a73bSElena Agostini(testpmd is the good candidate for this).
1306a73bSElena AgostiniThe goal is to show different use-cases about how a CUDA application can use DPDK to:
1306a73bSElena Agostini
1306a73bSElena Agostini- Allocate memory on GPU device using gpudev library.
1306a73bSElena Agostini- Use that memory to create an external GPU memory mempool.
1306a73bSElena Agostini- Receive packets directly in GPU memory.
1306a73bSElena Agostini- Coordinate the workload on the GPU with the network and CPU activity to receive packets.
1306a73bSElena Agostini- Send modified packets directly from the GPU memory.