15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2014 Intel Corporation. 31ab07743SBernard Iremonger 4c265d586SBruce Richardson.. include:: <isonum.txt> 5c265d586SBruce Richardson 61ab07743SBernard Iremonger.. _Enabling_Additional_Functionality: 71ab07743SBernard Iremonger 81ab07743SBernard IremongerEnabling Additional Functionality 91ab07743SBernard Iremonger================================= 101ab07743SBernard Iremonger 119180da67SAnatoly Burakov.. _Running_Without_Root_Privileges: 129180da67SAnatoly Burakov 132e486e26SSiobhan ButlerRunning DPDK Applications Without Root Privileges 143c3a861cSAnatoly Burakov------------------------------------------------- 153c3a861cSAnatoly Burakov 16979bb5d4SDmitry KozlyukThe following sections describe generic requirements and configuration 17979bb5d4SDmitry Kozlyukfor running DPDK applications as non-root. 18979bb5d4SDmitry KozlyukThere may be additional requirements documented for some drivers. 193c3a861cSAnatoly Burakov 20979bb5d4SDmitry KozlyukHugepages 21979bb5d4SDmitry Kozlyuk~~~~~~~~~ 223c3a861cSAnatoly Burakov 23979bb5d4SDmitry KozlyukHugepages must be reserved as root before running the application as non-root, 24979bb5d4SDmitry Kozlyukfor example:: 25979bb5d4SDmitry Kozlyuk 26979bb5d4SDmitry Kozlyuk sudo dpdk-hugepages.py --reserve 1G 27979bb5d4SDmitry Kozlyuk 28979bb5d4SDmitry KozlyukIf multi-process is not required, running with ``--in-memory`` 29979bb5d4SDmitry Kozlyukbypasses the need to access hugepage mount point and files within it. 30979bb5d4SDmitry KozlyukOtherwise, hugepage directory must be made accessible 31979bb5d4SDmitry Kozlyukfor writing to the unprivileged user. 32979bb5d4SDmitry KozlyukA good way for managing multiple applications using hugepages 33979bb5d4SDmitry Kozlyukis to mount the filesystem with group permissions 34979bb5d4SDmitry Kozlyukand add a supplementary group to each application or container. 35979bb5d4SDmitry Kozlyuk 36979bb5d4SDmitry KozlyukOne option is to use the script provided by this project:: 37979bb5d4SDmitry Kozlyuk 38979bb5d4SDmitry Kozlyuk export HUGEDIR=$HOME/huge-1G 39979bb5d4SDmitry Kozlyuk mkdir -p $HUGEDIR 40979bb5d4SDmitry Kozlyuk sudo dpdk-hugepages.py --mount --directory $HUGEDIR --user `id -u` --group `id -g` 41979bb5d4SDmitry Kozlyuk 42979bb5d4SDmitry KozlyukIn production environment, the OS can manage mount points 43979bb5d4SDmitry Kozlyuk(`systemd example <https://github.com/systemd/systemd/blob/main/units/dev-hugepages.mount>`_). 44979bb5d4SDmitry Kozlyuk 45979bb5d4SDmitry KozlyukThe ``hugetlb`` filesystem has additional options to guarantee or limit 46979bb5d4SDmitry Kozlyukthe amount of memory that is possible to allocate using the mount point. 47979bb5d4SDmitry KozlyukRefer to the `documentation <https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt>`_. 48979bb5d4SDmitry Kozlyuk 49979bb5d4SDmitry Kozlyuk.. note:: 50979bb5d4SDmitry Kozlyuk 51979bb5d4SDmitry Kozlyuk Using ``vfio-pci`` kernel driver, if applicable, can eliminate the need 52979bb5d4SDmitry Kozlyuk for physical addresses and therefore eliminate the permission requirements 53979bb5d4SDmitry Kozlyuk described below. 54979bb5d4SDmitry Kozlyuk 55979bb5d4SDmitry KozlyukIf the driver requires using physical addresses (PA), 56979bb5d4SDmitry Kozlyukthe executable file must be granted additional capabilities: 57979bb5d4SDmitry Kozlyuk 5850b567c6SDmitry Kozlyuk* ``DAC_READ_SEARCH`` and ``SYS_ADMIN`` to read ``/proc/self/pagemaps`` 59979bb5d4SDmitry Kozlyuk* ``IPC_LOCK`` to lock hugepages in memory 60979bb5d4SDmitry Kozlyuk 61979bb5d4SDmitry Kozlyuk.. code-block:: console 62979bb5d4SDmitry Kozlyuk 6350b567c6SDmitry Kozlyuk setcap cap_dac_read_search,cap_ipc_lock,cap_sys_admin+ep <executable> 64979bb5d4SDmitry Kozlyuk 65979bb5d4SDmitry KozlyukIf physical addresses are not accessible, 66979bb5d4SDmitry Kozlyukthe following message will appear during EAL initialization:: 67979bb5d4SDmitry Kozlyuk 68979bb5d4SDmitry Kozlyuk EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 69979bb5d4SDmitry Kozlyuk 70979bb5d4SDmitry KozlyukIt is harmless in case PA are not needed. 71979bb5d4SDmitry Kozlyuk 72979bb5d4SDmitry KozlyukResource Limits 73979bb5d4SDmitry Kozlyuk~~~~~~~~~~~~~~~ 743c3a861cSAnatoly Burakov 753c3a861cSAnatoly BurakovWhen running as non-root user, there may be some additional resource limits 763c3a861cSAnatoly Burakovthat are imposed by the system. Specifically, the following resource limits may 773c3a861cSAnatoly Burakovneed to be adjusted in order to ensure normal DPDK operation: 783c3a861cSAnatoly Burakov 793c3a861cSAnatoly Burakov* RLIMIT_LOCKS (number of file locks that can be held by a process) 803c3a861cSAnatoly Burakov 813c3a861cSAnatoly Burakov* RLIMIT_NOFILE (number of open file descriptors that can be held open by a process) 823c3a861cSAnatoly Burakov 833c3a861cSAnatoly Burakov* RLIMIT_MEMLOCK (amount of pinned pages the process is allowed to have) 843c3a861cSAnatoly Burakov 853c3a861cSAnatoly BurakovThe above limits can usually be adjusted by editing 863c3a861cSAnatoly Burakov``/etc/security/limits.conf`` file, and rebooting. 873c3a861cSAnatoly Burakov 88*a886540eSDavid MarchandSee :ref:`Hugepage Mapping <hugepage_mapping>` section to learn how these limits affect EAL. 89979bb5d4SDmitry Kozlyuk 90979bb5d4SDmitry KozlyukDevice Control 91979bb5d4SDmitry Kozlyuk~~~~~~~~~~~~~~ 92979bb5d4SDmitry Kozlyuk 93979bb5d4SDmitry KozlyukIf the HPET is to be used, ``/dev/hpet`` permissions must be adjusted. 943c3a861cSAnatoly Burakov 953c3a861cSAnatoly BurakovFor ``vfio-pci`` kernel driver, the following Linux file system objects' 963c3a861cSAnatoly Burakovpermissions should be adjusted: 973c3a861cSAnatoly Burakov 983c3a861cSAnatoly Burakov* The VFIO device file, ``/dev/vfio/vfio`` 993c3a861cSAnatoly Burakov 1003c3a861cSAnatoly Burakov* The directories under ``/dev/vfio`` that correspond to IOMMU group numbers of 1013c3a861cSAnatoly Burakov devices intended to be used by DPDK, for example, ``/dev/vfio/50`` 1021ab07743SBernard Iremonger 1031ab07743SBernard IremongerPower Management and Power Saving Functionality 1041ab07743SBernard Iremonger----------------------------------------------- 1051ab07743SBernard Iremonger 106c265d586SBruce RichardsonEnhanced Intel SpeedStep\ |reg| Technology must be enabled in the platform BIOS if the power management feature of DPDK is to be used. 10729c67340SJohn McNamaraOtherwise, the sys file folder ``/sys/devices/system/cpu/cpu0/cpufreq`` will not exist, and the CPU frequency- based power management cannot be used. 1081ab07743SBernard IremongerConsult the relevant BIOS documentation to determine how these settings can be accessed. 1091ab07743SBernard Iremonger 110c265d586SBruce RichardsonFor example, on some Intel reference platform BIOS variants, the path to Enhanced Intel SpeedStep\ |reg| Technology is:: 1111ab07743SBernard Iremonger 11229c67340SJohn McNamara Advanced 11329c67340SJohn McNamara -> Processor Configuration 114c265d586SBruce Richardson -> Enhanced Intel SpeedStep\ |reg| Tech 1151ab07743SBernard Iremonger 11629c67340SJohn McNamaraIn addition, C3 and C6 should be enabled as well for power management. The path of C3 and C6 on the same platform BIOS is:: 1171ab07743SBernard Iremonger 11829c67340SJohn McNamara Advanced 11929c67340SJohn McNamara -> Processor Configuration 12029c67340SJohn McNamara -> Processor C3 Advanced 12129c67340SJohn McNamara -> Processor Configuration 12229c67340SJohn McNamara -> Processor C6 1231ab07743SBernard Iremonger 12429c67340SJohn McNamaraUsing Linux Core Isolation to Reduce Context Switches 12529c67340SJohn McNamara----------------------------------------------------- 1261ab07743SBernard Iremonger 127c053d9e9SSarosh ArifWhile the threads used by a DPDK application are pinned to logical cores on the system, 128b4b988fdSPavan Nikhileshit is possible for the Linux scheduler to run other tasks on those cores. 129b4b988fdSPavan NikhileshTo help prevent additional workloads, timers, RCU processing and IRQs 130b4b988fdSPavan Nikhileshfrom running on those cores, it is possible to use 131b4b988fdSPavan Nikhileshthe Linux kernel parameters ``isolcpus``, ``nohz_full``, ``irqaffinity`` 132b4b988fdSPavan Nikhileshto isolate them from the general Linux scheduler tasks. 1331ab07743SBernard Iremonger 134b4b988fdSPavan NikhileshFor example, if a given CPU has 0-7 cores 135b4b988fdSPavan Nikhileshand DPDK applications are to run on logical cores 2, 4 and 6, 1361ab07743SBernard Iremongerthe following should be added to the kernel parameter list: 1371ab07743SBernard Iremonger 1381ab07743SBernard Iremonger.. code-block:: console 1391ab07743SBernard Iremonger 140b4b988fdSPavan Nikhilesh isolcpus=2,4,6 nohz_full=2,4,6 irqaffinity=0,1,3,5,7 141b4b988fdSPavan Nikhilesh 142b4b988fdSPavan Nikhilesh.. note:: 143b4b988fdSPavan Nikhilesh 144b4b988fdSPavan Nikhilesh More detailed information about the above parameters can be found at 145b4b988fdSPavan Nikhilesh `NO_HZ <https://www.kernel.org/doc/html/latest/timers/no_hz.html>`_, 146b4b988fdSPavan Nikhilesh `IRQ <https://www.kernel.org/doc/html/latest/core-api/irq/>`_, 147b4b988fdSPavan Nikhilesh and `kernel parameters 148b4b988fdSPavan Nikhilesh <https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html>`_ 149b4b988fdSPavan Nikhilesh 150b4b988fdSPavan NikhileshFor more fine grained control over resource management and performance tuning 151b4b988fdSPavan Nikhileshone can look into "Linux cgroups", 152b4b988fdSPavan Nikhilesh`cpusets <https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cpusets.html>`_, 153b4b988fdSPavan Nikhilesh`cpuset man pages <https://man7.org/linux/man-pages/man7/cpuset.7.html>`_, and 154b4b988fdSPavan Nikhilesh`systemd CPU affinity <https://www.freedesktop.org/software/systemd/man/systemd.exec.html>`_. 155b4b988fdSPavan Nikhilesh 156b4b988fdSPavan NikhileshAlso see 157b4b988fdSPavan Nikhilesh`CPU isolation example <https://www.suse.com/c/cpu-isolation-practical-example-part-5/>`_ 158b4b988fdSPavan Nikhileshand `systemd core isolation example <https://www.rcannings.com/systemd-core-isolation/>`_. 1591ab07743SBernard Iremonger 1603d883660SBruce Richardson.. _High_Precision_Event_Timer: 1613d883660SBruce Richardson 1623d883660SBruce RichardsonHigh Precision Event Timer (HPET) Functionality 1633d883660SBruce Richardson----------------------------------------------- 1643d883660SBruce Richardson 1653d883660SBruce RichardsonDPDK can support the system HPET as a timer source rather than the system default timers, 1663d883660SBruce Richardsonsuch as the core Time-Stamp Counter (TSC) on x86 systems. 1673d883660SBruce RichardsonTo enable HPET support in DPDK: 1683d883660SBruce Richardson 1693d883660SBruce Richardson#. Ensure that HPET is enabled in BIOS settings. 1703d883660SBruce Richardson#. Enable ``HPET_MMAP`` support in kernel configuration. 1713d883660SBruce Richardson Note that this my involve doing a kernel rebuild, 1723d883660SBruce Richardson as many common linux distributions do *not* have this setting 1733d883660SBruce Richardson enabled by default in their kernel builds. 1743d883660SBruce Richardson#. Enable DPDK support for HPET by using the build-time meson option ``use_hpet``, 1753d883660SBruce Richardson for example, ``meson configure -Duse_hpet=true`` 1763d883660SBruce Richardson 1773d883660SBruce RichardsonFor an application to use the ``rte_get_hpet_cycles()`` and ``rte_get_hpet_hz()`` API calls, 1783d883660SBruce Richardsonand optionally to make the HPET the default time source for the rte_timer library, 1793d883660SBruce Richardsonthe ``rte_eal_hpet_init()`` API call should be called at application initialization. 1803d883660SBruce RichardsonThis API call will ensure that the HPET is accessible, 1813d883660SBruce Richardsonreturning an error to the application if it is not. 1823d883660SBruce Richardson 1833d883660SBruce RichardsonFor applications that require timing APIs, but not the HPET timer specifically, 1843d883660SBruce Richardsonit is recommended that the ``rte_get_timer_cycles()`` and ``rte_get_timer_hz()`` 1853d883660SBruce RichardsonAPI calls be used instead of the HPET-specific APIs. 1863d883660SBruce RichardsonThese generic APIs can work with either TSC or HPET time sources, 1873d883660SBruce Richardsondepending on what is requested by an application call to ``rte_eal_hpet_init()``, 1883d883660SBruce Richardsonif any, and on what is available on the system at runtime. 189