15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2014 Intel Corporation. 3f9d7ffecSJohn McNamara 4f9d7ffecSJohn McNamaraWhat does "EAL: map_all_hugepages(): open failed: Permission denied Cannot init memory" mean? 5f9d7ffecSJohn McNamara--------------------------------------------------------------------------------------------- 6f9d7ffecSJohn McNamara 7f9d7ffecSJohn McNamaraThis is most likely due to the test application not being run with sudo to promote the user to a superuser. 8f9d7ffecSJohn McNamaraAlternatively, applications can also be run as regular user. 9f9d7ffecSJohn McNamaraFor more information, please refer to :ref:`DPDK Getting Started Guide <linux_gsg>`. 10f9d7ffecSJohn McNamara 11f9d7ffecSJohn McNamara 125dd667e1SBruce RichardsonIf I want to change the number of hugepages allocated, how do I remove the original pages allocated? 135dd667e1SBruce Richardson---------------------------------------------------------------------------------------------------- 14f9d7ffecSJohn McNamara 15f9d7ffecSJohn McNamaraThe number of pages allocated can be seen by executing the following command:: 16f9d7ffecSJohn McNamara 17f9d7ffecSJohn McNamara grep Huge /proc/meminfo 18f9d7ffecSJohn McNamara 19f9d7ffecSJohn McNamaraOnce all the pages are mmapped by an application, they stay that way. 20f9d7ffecSJohn McNamaraIf you start a test application with less than the maximum, then you have free pages. 21c14ef1ecSSarosh ArifWhen you stop and restart the test application, it looks to see if the pages are available in the ``/dev/hugepages`` directory and mmaps them. 22f9d7ffecSJohn McNamaraIf you look in the directory, you will see ``n`` number of 2M pages files. If you specified 1024, you will see 1024 page files. 23f9d7ffecSJohn McNamaraThese are then placed in memory segments to get contiguous memory. 24f9d7ffecSJohn McNamara 25b0a49787SDavid MarchandIf you need to change the number of pages, it is easier to first remove the pages. 26f9d7ffecSJohn McNamara 27f9d7ffecSJohn McNamara 2835b09d76SKeith WilesIf I execute "l2fwd -l 0-3 -m 64 -n 3 -- -p 3", I get the following output, indicating that there are no socket 0 hugepages to allocate the mbuf and ring structures to? 2935b09d76SKeith Wiles------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 30f9d7ffecSJohn McNamara 31f9d7ffecSJohn McNamaraI have set up a total of 1024 Hugepages (that is, allocated 512 2M pages to each NUMA node). 32f9d7ffecSJohn McNamara 33f9d7ffecSJohn McNamaraThe -m command line parameter does not guarantee that huge pages will be reserved on specific sockets. Therefore, allocated huge pages may not be on socket 0. 34f9d7ffecSJohn McNamaraTo request memory to be reserved on a specific socket, please use the --socket-mem command-line parameter instead of -m. 35f9d7ffecSJohn McNamara 36f9d7ffecSJohn McNamara 37f9d7ffecSJohn McNamaraI am running a 32-bit DPDK application on a NUMA system, and sometimes the application initializes fine but cannot allocate memory. Why is that happening? 38f9d7ffecSJohn McNamara---------------------------------------------------------------------------------------------------------------------------------------------------------- 39f9d7ffecSJohn McNamara 40631c2190SQi Zhang32-bit applications have limitations in terms of how much virtual memory is available, hence the number of hugepages they are able to allocate is also limited (1 GB size). 41631c2190SQi ZhangIf your system has a lot (>1 GB size) of hugepage memory, not all of it will be allocated. 42f9d7ffecSJohn McNamaraDue to hugepages typically being allocated on a local NUMA node, the hugepages allocation the application gets during the initialization depends on which 43f9d7ffecSJohn McNamaraNUMA node it is running on (the EAL does not affinitize cores until much later in the initialization process). 44cb056611SStephen HemmingerSometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK main core and 45f9d7ffecSJohn McNamaratherefore all the hugepages are allocated on the wrong socket. 46f9d7ffecSJohn McNamara 47631c2190SQi ZhangTo avoid this scenario, either lower the amount of hugepage memory available to 1 GB size (or less), or run the application with taskset 48cb056611SStephen Hemmingeraffinitizing the application to a would-be main core. 49f9d7ffecSJohn McNamara 50cb056611SStephen HemmingerFor example, if your EAL coremask is 0xff0, the main core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset:: 51f9d7ffecSJohn McNamara 5235b09d76SKeith Wiles taskset 0x10 ./l2fwd -l 4-11 -n 2 5335b09d76SKeith Wiles 5435b09d76SKeith Wiles.. Note: Instead of '-c 0xff0' use the '-l 4-11' as a cleaner way to define lcores. 55f9d7ffecSJohn McNamara 56f9d7ffecSJohn McNamaraIn this way, the hugepages have a greater chance of being allocated to the correct socket. 57f9d7ffecSJohn McNamaraAdditionally, a ``--socket-mem`` option could be used to ensure the availability of memory for each socket, so that if hugepages were allocated on 58f9d7ffecSJohn McNamarathe wrong socket, the application simply will not start. 59f9d7ffecSJohn McNamara 60f9d7ffecSJohn McNamara 61f9d7ffecSJohn McNamaraOn application startup, there is a lot of EAL information printed. Is there any way to reduce this? 62f9d7ffecSJohn McNamara--------------------------------------------------------------------------------------------------- 63f9d7ffecSJohn McNamara 64a0173139SStephen HemmingerYes, the option ``--log-level=`` accepts either symbolic names (or numbers): 65f9d7ffecSJohn McNamara 66a0173139SStephen Hemminger1. emergency 67a0173139SStephen Hemminger2. alert 68a0173139SStephen Hemminger3. critical 69a0173139SStephen Hemminger4. error 70a0173139SStephen Hemminger5. warning 71a0173139SStephen Hemminger6. notice 72a0173139SStephen Hemminger7. info 73a0173139SStephen Hemminger8. debug 74f9d7ffecSJohn McNamara 75f9d7ffecSJohn McNamaraHow can I tune my network application to achieve lower latency? 76f9d7ffecSJohn McNamara--------------------------------------------------------------- 77f9d7ffecSJohn McNamara 78f9d7ffecSJohn McNamaraTraditionally, there is a trade-off between throughput and latency. An application can be tuned to achieve a high throughput, 79f9d7ffecSJohn McNamarabut the end-to-end latency of an average packet typically increases as a result. 80f9d7ffecSJohn McNamaraSimilarly, the application can be tuned to have, on average, a low end-to-end latency at the cost of lower throughput. 81f9d7ffecSJohn McNamara 82f9d7ffecSJohn McNamaraTo achieve higher throughput, the DPDK attempts to aggregate the cost of processing each packet individually by processing packets in bursts. 831f4c80dfSBruce RichardsonUsing the testpmd application as an example, the "burst" size can be set on the command line to a value of 32 (also the default value). 841f4c80dfSBruce RichardsonThis allows the application to request 32 packets at a time from the PMD. 851f4c80dfSBruce RichardsonThe testpmd application then immediately attempts to transmit all the packets that were received, in this case, all 32 packets. 86f9d7ffecSJohn McNamaraThe packets are not transmitted until the tail pointer is updated on the corresponding TX queue of the network port. 87f9d7ffecSJohn McNamaraThis behavior is desirable when tuning for high throughput because the cost of tail pointer updates to both the RX and TX queues 881f4c80dfSBruce Richardsoncan be spread across 32 packets, effectively hiding the relatively slow MMIO cost of writing to the PCIe* device. 89f9d7ffecSJohn McNamara 901f4c80dfSBruce RichardsonHowever, this is not very desirable when tuning for low latency, because the first packet that was received must also wait for the other 31 packets to be received. 911f4c80dfSBruce RichardsonIt cannot be transmitted until the other 31 packets have also been processed because the NIC will not know to transmit the packets until the TX tail pointer has been updated, 921f4c80dfSBruce Richardsonwhich is not done until all 32 packets have been processed for transmission. 93f9d7ffecSJohn McNamara 94f9d7ffecSJohn McNamaraTo consistently achieve low latency even under heavy system load, the application developer should avoid processing packets in bunches. 95f9d7ffecSJohn McNamaraThe testpmd application can be configured from the command line to use a burst value of 1. 96f9d7ffecSJohn McNamaraThis allows a single packet to be processed at a time, providing lower latency, but with the added cost of lower throughput. 97f9d7ffecSJohn McNamara 98f9d7ffecSJohn McNamara 99f9d7ffecSJohn McNamaraWithout NUMA enabled, my network throughput is low, why? 100f9d7ffecSJohn McNamara-------------------------------------------------------- 101f9d7ffecSJohn McNamara 102f9d7ffecSJohn McNamaraI have a dual Intel® Xeon® E5645 processors 2.40 GHz with four Intel® 82599 10 Gigabit Ethernet NICs. 103f9d7ffecSJohn McNamaraUsing eight logical cores on each processor with RSS set to distribute network load from two 10 GbE interfaces to the cores on each processor. 104f9d7ffecSJohn McNamara 105f9d7ffecSJohn McNamaraWithout NUMA enabled, memory is allocated from both sockets, since memory is interleaved. 106f9d7ffecSJohn McNamaraTherefore, each 64B chunk is interleaved across both memory domains. 107f9d7ffecSJohn McNamara 108f9d7ffecSJohn McNamaraThe first 64B chunk is mapped to node 0, the second 64B chunk is mapped to node 1, the third to node 0, the fourth to node 1. 109f9d7ffecSJohn McNamaraIf you allocated 256B, you would get memory that looks like this: 110f9d7ffecSJohn McNamara 111f9d7ffecSJohn McNamara.. code-block:: console 112f9d7ffecSJohn McNamara 113f9d7ffecSJohn McNamara 256B buffer 114f9d7ffecSJohn McNamara Offset 0x00 - Node 0 115f9d7ffecSJohn McNamara Offset 0x40 - Node 1 116f9d7ffecSJohn McNamara Offset 0x80 - Node 0 117f9d7ffecSJohn McNamara Offset 0xc0 - Node 1 118f9d7ffecSJohn McNamara 119f9d7ffecSJohn McNamaraTherefore, packet buffers and descriptor rings are allocated from both memory domains, thus incurring QPI bandwidth accessing the other memory and much higher latency. 120f9d7ffecSJohn McNamaraFor best performance with NUMA disabled, only one socket should be populated. 121f9d7ffecSJohn McNamara 122f9d7ffecSJohn McNamara 123f9d7ffecSJohn McNamaraI am getting errors about not being able to open files. Why? 124f9d7ffecSJohn McNamara------------------------------------------------------------ 125f9d7ffecSJohn McNamara 126f9d7ffecSJohn McNamaraAs the DPDK operates, it opens a lot of files, which can result in reaching the open files limits, which is set using the ulimit command or in the limits.conf file. 127f9d7ffecSJohn McNamaraThis is especially true when using a large number (>512) of 2 MB huge pages. Please increase the open file limit if your application is not able to open files. 12854653074SJohn McNamaraThis can be done either by issuing a ulimit command or editing the limits.conf file. Please consult Linux manpages for usage information. 129f9d7ffecSJohn McNamara 130f9d7ffecSJohn McNamara 131f9d7ffecSJohn McNamaraVF driver for IXGBE devices cannot be initialized 132f9d7ffecSJohn McNamara------------------------------------------------- 133f9d7ffecSJohn McNamara 13454653074SJohn McNamaraSome versions of Linux IXGBE driver do not assign a random MAC address to VF devices at initialization. 135f9d7ffecSJohn McNamaraIn this case, this has to be done manually on the VM host, using the following command: 136f9d7ffecSJohn McNamara 137f9d7ffecSJohn McNamara.. code-block:: console 138f9d7ffecSJohn McNamara 139f9d7ffecSJohn McNamara ip link set <interface> vf <VF function> mac <MAC address> 140f9d7ffecSJohn McNamara 141f9d7ffecSJohn McNamarawhere <interface> being the interface providing the virtual functions for example, eth0, <VF function> being the virtual function number, for example 0, 142f9d7ffecSJohn McNamaraand <MAC address> being the desired MAC address. 143f9d7ffecSJohn McNamara 144f9d7ffecSJohn McNamara 145f9d7ffecSJohn McNamaraIs it safe to add an entry to the hash table while running? 146f9d7ffecSJohn McNamara------------------------------------------------------------ 147f9d7ffecSJohn McNamaraCurrently the table implementation is not a thread safe implementation and assumes that locking between threads and processes is handled by the user's application. 148f9d7ffecSJohn McNamaraThis is likely to be supported in future releases. 149f9d7ffecSJohn McNamara 150f9d7ffecSJohn McNamara 151f9d7ffecSJohn McNamaraWhat is the purpose of setting iommu=pt? 152f9d7ffecSJohn McNamara---------------------------------------- 153f9d7ffecSJohn McNamaraDPDK uses a 1:1 mapping and does not support IOMMU. IOMMU allows for simpler VM physical address translation. 154f9d7ffecSJohn McNamaraThe second role of IOMMU is to allow protection from unwanted memory access by an unsafe device that has DMA privileges. 155f9d7ffecSJohn McNamaraUnfortunately, the protection comes with an extremely high performance cost for high speed NICs. 156f9d7ffecSJohn McNamara 157f9d7ffecSJohn McNamaraSetting ``iommu=pt`` disables IOMMU support for the hypervisor. 158f9d7ffecSJohn McNamara 159f9d7ffecSJohn McNamara 160f9d7ffecSJohn McNamaraWhen trying to send packets from an application to itself, meaning smac==dmac, using Intel(R) 82599 VF packets are lost. 161f9d7ffecSJohn McNamara------------------------------------------------------------------------------------------------------------------------ 162f9d7ffecSJohn McNamara 163f9d7ffecSJohn McNamaraCheck on register ``LLE(PFVMTXSSW[n])``, which allows an individual pool to send traffic and have it looped back to itself. 164f9d7ffecSJohn McNamara 165f9d7ffecSJohn McNamara 16654653074SJohn McNamaraCan I split packet RX to use DPDK and have an application's higher order functions continue using Linux pthread? 16754653074SJohn McNamara---------------------------------------------------------------------------------------------------------------- 168f9d7ffecSJohn McNamara 16954653074SJohn McNamaraThe DPDK's lcore threads are Linux pthreads bound onto specific cores. Configure the DPDK to do work on the same 170f9d7ffecSJohn McNamaracores and run the application's other work on other cores using the DPDK's "coremask" setting to specify which 171f9d7ffecSJohn McNamaracores it should launch itself on. 172f9d7ffecSJohn McNamara 173f9d7ffecSJohn McNamara 174f9d7ffecSJohn McNamaraIs it possible to exchange data between DPDK processes and regular userspace processes via some shared memory or IPC mechanism? 175f9d7ffecSJohn McNamara------------------------------------------------------------------------------------------------------------------------------- 176f9d7ffecSJohn McNamara 177f9d7ffecSJohn McNamaraYes - DPDK processes are regular Linux/BSD processes, and can use all OS provided IPC mechanisms. 178f9d7ffecSJohn McNamara 179f9d7ffecSJohn McNamara 180f9d7ffecSJohn McNamaraCan the multiple queues in Intel(R) I350 be used with DPDK? 181f9d7ffecSJohn McNamara----------------------------------------------------------- 182f9d7ffecSJohn McNamara 183f9d7ffecSJohn McNamaraI350 has RSS support and 8 queue pairs can be used in RSS mode. It should work with multi-queue DPDK applications using RSS. 184f9d7ffecSJohn McNamara 185f9d7ffecSJohn McNamara 186f9d7ffecSJohn McNamaraHow can hugepage-backed memory be shared among multiple processes? 187f9d7ffecSJohn McNamara------------------------------------------------------------------ 188f9d7ffecSJohn McNamara 189*c0f5a9ddSStephen HemmingerSee the Primary and Secondary examples in the :doc:`../sample_app_ug/multi_process`. 190ccf5fd60SJohn McNamara 191ccf5fd60SJohn McNamara 192ccf5fd60SJohn McNamaraWhy can't my application receive packets on my system with UEFI Secure Boot enabled? 193ccf5fd60SJohn McNamara------------------------------------------------------------------------------------ 194ccf5fd60SJohn McNamara 195ccf5fd60SJohn McNamaraIf UEFI secure boot is enabled, the Linux kernel may disallow the use of UIO on the system. 196ccf5fd60SJohn McNamaraTherefore, devices for use by DPDK should be bound to the ``vfio-pci`` kernel module rather than ``igb_uio`` or ``uio_pci_generic``. 197