#
a0dede62 |
| 17-Nov-2019 |
Vamsi Attunuru <vattunuru@marvell.com> |
eal/linux: remove KNI restriction on IOVA
Now that KNI supports VA (with kernel versions starting 4.6.0), we can accept IOVA as VA, but KNI must be configured for this. Pass iova_mode when creating
eal/linux: remove KNI restriction on IOVA
Now that KNI supports VA (with kernel versions starting 4.6.0), we can accept IOVA as VA, but KNI must be configured for this. Pass iova_mode when creating KNI netdevs.
So far, IOVA detection policy forced IOVA as PA when KNI is loaded, whatever the buses IOVA requirements were.
We can now use IOVA as VA, but this comes with a cost in KNI. When no constraint is expressed by the buses, keep the current behavior of choosing PA.
Note: this change supposes that dpdk is built on the same kernel than the target system kernel; no objection has been expressed on this topic.
Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com>
show more ...
|
#
f43d3dbb |
| 12-Nov-2019 |
David Marchand <david.marchand@redhat.com> |
doc/guides: clean repeated words
Shoot repeated words in all our guides.
Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com>
|
#
7911ba04 |
| 18-Oct-2019 |
Phil Yang <phil.yang@arm.com> |
stack: enable lock-free implementation for aarch64
Enable both C11 atomic and non C11 atomic lock-free stack for aarch64.
Introduced a new header to reduce the ifdef clutter across generic and C11
stack: enable lock-free implementation for aarch64
Enable both C11 atomic and non C11 atomic lock-free stack for aarch64.
Introduced a new header to reduce the ifdef clutter across generic and C11 files. The rte_stack_lf_stubs.h contains stub implementations of __rte_stack_lf_count, __rte_stack_lf_push_elems and __rte_stack_lf_pop_elems.
Suggested-by: Gage Eads <gage.eads@intel.com> Suggested-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Phil Yang <phil.yang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Tested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: Jerin Jacob <jerinj@marvell.com>
show more ...
|
#
79a0bbe5 |
| 29-Jul-2019 |
Anatoly Burakov <anatoly.burakov@intel.com> |
eal: pick IOVA as PA if IOMMU is not available
When IOMMU is not available, /sys/kernel/iommu_groups will not be populated. This is happening since at least 3.6 when VFIO support was added. If the d
eal: pick IOVA as PA if IOMMU is not available
When IOMMU is not available, /sys/kernel/iommu_groups will not be populated. This is happening since at least 3.6 when VFIO support was added. If the directory is empty, EAL should not pick IOVA as VA as the default IOVA mode.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: Jerin Jacob <jerinj@marvell.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Reviewed-by: David Marchand <david.marchand@redhat.com>
show more ...
|
#
bbe29a9b |
| 22-Jul-2019 |
Jerin Jacob <jerinj@marvell.com> |
eal/linux: select IOVA as VA mode for default case
When bus layer reports the preferred mode as RTE_IOVA_DC then select the RTE_IOVA_VA mode:
- All drivers work in RTE_IOVA_VA mode, irrespective of
eal/linux: select IOVA as VA mode for default case
When bus layer reports the preferred mode as RTE_IOVA_DC then select the RTE_IOVA_VA mode:
- All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability.
- By default, a mempool asks for IOVA-contiguous memory using RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it may affect the application boot time.
Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: David Marchand <david.marchand@redhat.com>
show more ...
|
#
b76fafb1 |
| 22-Jul-2019 |
David Marchand <david.marchand@redhat.com> |
eal: fix IOVA mode selection as VA for PCI drivers
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "dr
eal: fix IOVA mode selection as VA for PCI drivers
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "driver supports both PA and VA" by most net drivers and used to let dpdk processes to run as non root (which do not have access to physical addresses on recent kernels).
The check on physical addresses actually closed the gap for those drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this flag can retain its intended meaning. Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope that we won't break it again.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
3855b415 |
| 03-May-2019 |
Anatoly Burakov <anatoly.burakov@intel.com> |
ipc: add warnings about not using IPC with memory API
IPC and memory-related API's should not be mixed because memory relies on IPC internally. Add explicit warnings to IPC API and to the documentat
ipc: add warnings about not using IPC with memory API
IPC and memory-related API's should not be mixed because memory relies on IPC internally. Add explicit warnings to IPC API and to the documentation about this.
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
d629b7b5 |
| 26-Apr-2019 |
John McNamara <john.mcnamara@intel.com> |
doc: fix spelling reported by aspell in guides
Fix spelling errors in the guide docs.
Signed-off-by: John McNamara <john.mcnamara@intel.com> Acked-by: Rami Rosen <ramirose@gmail.com>
|
#
e75bc77f |
| 03-Apr-2019 |
Gage Eads <gage.eads@intel.com> |
mempool/stack: add lock-free stack mempool handler
This commit adds support for lock-free (linked list based) stack mempool handler.
In mempool_perf_autotest the lock-based stack outperforms the lo
mempool/stack: add lock-free stack mempool handler
This commit adds support for lock-free (linked list based) stack mempool handler.
In mempool_perf_autotest the lock-based stack outperforms the lock-free handler for certain lcore/alloc count/free count combinations*, however: - For applications with preemptible pthreads, a standard (lock-based) stack's worst-case performance (i.e. one thread being preempted while holding the spinlock) is much worse than the lock-free stack's. - Using per-thread mempool caches will largely mitigate the performance difference.
*Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4, running on isolcpus cores with a tickless scheduler. The lock-based stack's rate_persec was 0.6x-3.5x the lock-free stack's.
Signed-off-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
show more ...
|
#
1e3380a2 |
| 29-Mar-2019 |
Anatoly Burakov <anatoly.burakov@intel.com> |
mem: do not use lockfiles for single file segments mode
Due to internal glibc limitations [1], DPDK may exhaust internal file descriptor limits when using smaller page sizes, which results in inabil
mem: do not use lockfiles for single file segments mode
Due to internal glibc limitations [1], DPDK may exhaust internal file descriptor limits when using smaller page sizes, which results in inability to use system calls such as select() by user applications.
Single file segments option stores lock files per page to ensure that pages are deleted when there are no more users, however this is not necessary because the processes will be holding onto the pages anyway because of mmap(). Thus, removing pages from the filesystem is safe even though they may be used by some other secondary process. As a result, single file segments mode no longer stores inordinate amounts of segment fd's, and the above issue with fd limits is solved.
However, this will not work for legacy mem mode. For that, simply document that using bigger page sizes is the only option.
[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
c33a675b |
| 10-Mar-2019 |
Shahaf Shuler <shahafs@mellanox.com> |
bus: introduce device level DMA memory mapping
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This me
bus: introduce device level DMA memory mapping
The DPDK APIs expose 3 different modes to work with memory used for DMA:
1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This memory is allocated by the DPDK libraries, included in the DPDK memory system (memseg lists) and automatically DMA mapped by the DPDK layers.
2. Use memory allocated by the user and register to the DPDK memory systems. Upon registration of memory, the DPDK layers will DMA map it to all needed devices. After registration, allocation of this memory will be done with rte_*malloc APIs.
3. Use memory allocated by the user and not registered to the DPDK memory system. This is for users who wants to have tight control on this memory (e.g. avoid the rte_malloc header). The user should create a memory, register it through rte_extmem_register API, and call DMA map function in order to register such memory to the different devices.
The scope of the patch focus on #3 above.
Currently the only way to map external memory is through VFIO (rte_vfio_dma_map). While VFIO is common, there are other vendors which use different ways to map memory (e.g. Mellanox and NXP).
The work in this patch moves the DMA mapping to vendor agnostic APIs. Device level DMA map and unmap APIs were added. Implementation of those APIs was done currently only for PCI devices.
For PCI bus devices, the pci driver can expose its own map and unmap functions to be used for the mapping. In case the driver doesn't provide any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
Application usage with those APIs is quite simple: * allocate memory * call rte_extmem_register on the memory chunk. * take a device, and query its rte_device. * call the device specific mapping function for this device.
Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap APIs, leaving the rte device APIs as the preferred option for the user.
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
show more ...
|
#
218c4e68 |
| 06-Mar-2019 |
Bruce Richardson <bruce.richardson@intel.com> |
mk: use linux and freebsd in config names
Rather than using linuxapp and bsdapp everywhere, we can change things to use the, more readable, terms "linux" and "freebsd" in our build configs. Rather t
mk: use linux and freebsd in config names
Rather than using linuxapp and bsdapp everywhere, we can change things to use the, more readable, terms "linux" and "freebsd" in our build configs. Rather than renaming the configs we can just duplicate the existing ones with the new names using symlinks, and use the new names exclusively internally. ["make showconfigs" also only shows the new names to keep the list short] The result is that backward compatibility is kept fully but any new builds or development can be done using the newer names, i.e. both "make config T=x86_64-native-linuxapp-gcc" and "T=x86_64-native-linux-gcc" work.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
show more ...
|
#
91d7846c |
| 06-Mar-2019 |
Bruce Richardson <bruce.richardson@intel.com> |
eal/linux: rename linuxapp to linux
The term "linuxapp" is a legacy one, but just calling the subdirectory "linux" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richards
eal/linux: rename linuxapp to linux
The term "linuxapp" is a legacy one, but just calling the subdirectory "linux" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
show more ...
|
#
25c99fbd |
| 06-Mar-2019 |
Bruce Richardson <bruce.richardson@intel.com> |
eal/bsd: rename bsdapp to freebsd
The term "bsdapp" is a legacy one, but just calling the subdirectory "freebsd" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richardson
eal/bsd: rename bsdapp to freebsd
The term "bsdapp" is a legacy one, but just calling the subdirectory "freebsd" is just clearer for all concerned.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
show more ...
|
#
c3568ea3 |
| 19-Feb-2019 |
David Marchand <david.marchand@redhat.com> |
eal: restrict control threads to startup CPU affinity
Spawning the ctrl threads on anything that is not part of the eal coremask is not that polite to the rest of the system, especially when you too
eal: restrict control threads to startup CPU affinity
Spawning the ctrl threads on anything that is not part of the eal coremask is not that polite to the rest of the system, especially when you took good care to pin your processes on cpu resources with tools like taskset (linux) / cpuset (freebsd).
Rather than introduce yet another eal options to control on which cpu those ctrl threads are created, let's take the startup cpu affinity as a reference and remove the eal coremask from it. If no cpu is left, then we default to the master core.
The cpuset is computed once at init before the original cpu affinity is lost.
Introduced a RTE_CPU_AND macro to abstract the differences between linux and freebsd respective macros.
Examples in a 4 cores FreeBSD vm:
$ ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048
$ procstat -S 1057 PID TID COMM TDNAME CPU CSID CPU MASK 1057 100131 testpmd - 2 1 2 1057 100140 testpmd eal-intr-thread 1 1 0-1 1057 100141 testpmd rte_mp_handle 1 1 0-1 1057 100142 testpmd lcore-slave-3 3 1 3
$ cpuset -l 1,2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048
$ procstat -S 1061 PID TID COMM TDNAME CPU CSID CPU MASK 1061 100131 testpmd - 2 2 2 1061 100144 testpmd eal-intr-thread 1 2 1 1061 100145 testpmd rte_mp_handle 1 2 1 1061 100147 testpmd lcore-slave-3 3 2 3
$ cpuset -l 2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048
$ procstat -S 1065 PID TID COMM TDNAME CPU CSID CPU MASK 1065 100131 testpmd - 2 2 2 1065 100148 testpmd eal-intr-thread 2 2 2 1065 100149 testpmd rte_mp_handle 2 2 2 1065 100150 testpmd lcore-slave-3 3 2 3
Fixes: d651ee4919cd ("eal: set affinity for control threads") Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
show more ...
|
#
41328c40 |
| 06-Dec-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
doc: remove note on memory mode limitation in multi-process
Memory mode flags are now shared between primary and secondary processes, so the in documentation about limitations is no longer necessary
doc: remove note on memory mode limitation in multi-process
Memory mode flags are now shared between primary and secondary processes, so the in documentation about limitations is no longer necessary.
Fixes: 64cdfc35aaad ("mem: store memory mode flags in shared config") Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
bed79418 |
| 20-Dec-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
mem: allow usage of non-heap external memory in multiprocess
Add multiprocess support for externally allocated memory areas that are not added to DPDK heap (and add relevant doc sections).
Signed-o
mem: allow usage of non-heap external memory in multiprocess
Add multiprocess support for externally allocated memory areas that are not added to DPDK heap (and add relevant doc sections).
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
show more ...
|
#
950e8fb4 |
| 20-Dec-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
mem: allow registering external memory areas
The general use-case of using external memory is well covered by existing external memory API's. However, certain use cases require manual management of
mem: allow registering external memory areas
The general use-case of using external memory is well covered by existing external memory API's. However, certain use cases require manual management of externally allocated memory areas, so this memory should not be added to the heap. It should, however, be added to DPDK's internal structures, so that API's like ``rte_virt2memseg`` would work on such external memory segments.
This commit adds such an API to DPDK. The new functions will allow to register and unregister externally allocated memory areas, as well as documentation for them.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Yongseok Koh <yskoh@mellanox.com>
show more ...
|
#
476c847a |
| 14-Dec-2018 |
Jim Harris <james.r.harris@intel.com> |
malloc: add option --match-allocations
SPDK uses the rte_mem_event_callback_register API to create RDMA memory regions (MRs) for newly allocated regions of memory. This is used in both the SPDK NVMe
malloc: add option --match-allocations
SPDK uses the rte_mem_event_callback_register API to create RDMA memory regions (MRs) for newly allocated regions of memory. This is used in both the SPDK NVMe-oF target and the NVMe-oF host driver.
DPDK creates internal malloc_elem structures for these allocated regions. As users malloc and free memory, DPDK will sometimes merge malloc_elems that originated from different allocations that were notified through the registered mem_event callback routine. This results in subsequent allocations that can span across multiple RDMA MRs. This requires SPDK to check each DPDK buffer to see if it crosses an MR boundary, and if so, would have to add considerable logic and complexity to describe that buffer before it can be accessed by the RNIC. It is somewhat analagous to rte_malloc returning a buffer that is not IOVA-contiguous.
As a malloc_elem gets split and some of these elements get freed, it can also result in DPDK sending an RTE_MEM_EVENT_FREE notification for a subset of the original RTE_MEM_EVENT_ALLOC notification. This is also problematic for RDMA memory regions, since unregistering the memory region is all-or-nothing. It is not possible to unregister part of a memory region.
To support these types of applications, this patch adds a new --match-allocations EAL init flag. When this flag is specified, malloc elements from different hugepage allocations will never be merged. Memory will also only be freed back to the system (with the requisite memory event callback) exactly as it was originally allocated.
Since part of this patch is extending the size of struct malloc_elem, we also fix up the malloc autotests so they do not assume its size exactly fits in one cacheline.
Signed-off-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
e3e363a2 |
| 22-Nov-2018 |
Thomas Monjalon <thomas@monjalon.net> |
doc: remove PCI-specific details from EAL guide
The PCI bus is an independent driver and not part of EAL as it was in the early days. EAL must be understood as a generic layer.
Signed-off-by: Thoma
doc: remove PCI-specific details from EAL guide
The PCI bus is an independent driver and not part of EAL as it was in the early days. EAL must be understood as a generic layer.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: John McNamara <john.mcnamara@intel.com>
show more ...
|
#
075b182b |
| 03-Oct-2018 |
Eric Zhang <eric.zhang@windriver.com> |
eal: force IOVA to a particular mode
This patch uses EAL option "--iova-mode" to force the IOVA mode to a particular value. There exists virtual devices that are not directly attached to the PCI bus
eal: force IOVA to a particular mode
This patch uses EAL option "--iova-mode" to force the IOVA mode to a particular value. There exists virtual devices that are not directly attached to the PCI bus, and therefore the auto detection of the IOVA mode based on probing the PCI bus and IOMMU configuration may not report the required addressing mode. Using the EAL option permits the mode to be explicitly configured in this scenario.
Signed-off-by: Eric Zhang <eric.zhang@windriver.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Marko Kovacevic <marko.kovacevic@intel.com>
show more ...
|
#
66498f0f |
| 02-Oct-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
doc: add external memory feature
Document the addition of external memory support to DPDK.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
|
#
4a6e683c |
| 17-Jul-2018 |
Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> |
ring: clarify preemptible nature of algorithm
rte_ring implementation is not preemptible only under certain circumstances. This clarification is helpful for data plane and control plane communicatio
ring: clarify preemptible nature of algorithm
rte_ring implementation is not preemptible only under certain circumstances. This clarification is helpful for data plane and control plane communication using rte_ring.
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Reviewed-by: Gavin Hu <gavin.hu@arm.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
show more ...
|
#
e4348122 |
| 31-May-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
eal: add option to limit memory allocation on sockets
Previously, it was possible to limit maximum amount of memory allowed for allocation by creating validator callbacks. Although a powerful tool,
eal: add option to limit memory allocation on sockets
Previously, it was possible to limit maximum amount of memory allowed for allocation by creating validator callbacks. Although a powerful tool, it's a bit of a hassle and requires modifying the application for it to work with DPDK example applications.
Fix this by adding a new parameter "--socket-limit", with syntax similar to "--socket-mem", which would set per-socket memory allocation limits, and set up a default validator callback to deny all allocations above the limit.
This option is incompatible with legacy mode, as validator callbacks are not supported there.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
b3173932 |
| 25-May-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
doc: update guides for memory subsystem
Document new command-line switches and the principles behind the new memory subsystem. Also, replace outdated malloc heap picture.
Signed-off-by: Anatoly Bur
doc: update guides for memory subsystem
Document new command-line switches and the principles behind the new memory subsystem. Also, replace outdated malloc heap picture.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|