#
66d3724b |
| 05-Aug-2019 |
David Marchand <david.marchand@redhat.com> |
bus/pci: always check IOMMU capabilities
IOMMU capabilities won't change and must be checked even if no PCI device seem to be supported yet when EAL initialised.
This is to accommodate with SPDK th
bus/pci: always check IOMMU capabilities
IOMMU capabilities won't change and must be checked even if no PCI device seem to be supported yet when EAL initialised.
This is to accommodate with SPDK that registers its drivers after rte_eal_init(), especially on PPC platform where the IOMMU does not support VA.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Takeshi Yoshimura <tyos@jp.ibm.com>
show more ...
|
#
b62f3aff |
| 02-Aug-2019 |
David Marchand <david.marchand@redhat.com> |
bus/pci: remove unused x86 Linux constant
This macro is unused after a previous fix.
Fixes: fe822eb8c565 ("bus/pci: use IOVA DMA mask check when setting IOVA mode") Cc: stable@dpdk.org
Signed-off-
bus/pci: remove unused x86 Linux constant
This macro is unused after a previous fix.
Fixes: fe822eb8c565 ("bus/pci: use IOVA DMA mask check when setting IOVA mode") Cc: stable@dpdk.org
Signed-off-by: David Marchand <david.marchand@redhat.com>
show more ...
|
#
d622cad8 |
| 22-Jul-2019 |
Jerin Jacob <jerinj@marvell.com> |
bus/pci: change IOVA as VA flag name
In order to align name with other PCI driver flag such as RTE_PCI_DRV_NEED_MAPPING and to reflect its purpose, change RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI
bus/pci: change IOVA as VA flag name
In order to align name with other PCI driver flag such as RTE_PCI_DRV_NEED_MAPPING and to reflect its purpose, change RTE_PCI_DRV_IOVA_AS_VA flag name as RTE_PCI_DRV_NEED_IOVA_AS_VA.
Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David Marchand <david.marchand@redhat.com>
show more ...
|
#
b76fafb1 |
| 22-Jul-2019 |
David Marchand <david.marchand@redhat.com> |
eal: fix IOVA mode selection as VA for PCI drivers
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "dr
eal: fix IOVA mode selection as VA for PCI drivers
The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which was intended to mean "driver only supports VA" but had been understood as "driver supports both PA and VA" by most net drivers and used to let dpdk processes to run as non root (which do not have access to physical addresses on recent kernels).
The check on physical addresses actually closed the gap for those drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this flag can retain its intended meaning. Document explicitly its meaning.
We can check that a driver requirement wrt to IOVA mode is fulfilled before trying to probe a device.
Finally, document the heuristic used to select the IOVA mode and hope that we won't break it again.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com> Tested-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
62f8f5ac |
| 22-Jul-2019 |
David Marchand <david.marchand@redhat.com> |
bus/pci: remove Mellanox kernel driver type
This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2.
The PCI bus now reports DC when faced with a device bound to an unknown driver and, in such
bus/pci: remove Mellanox kernel driver type
This reverts commit 0cb86518db57d35e0abc14d6703fad561a0310e2.
The PCI bus now reports DC when faced with a device bound to an unknown driver and, in such a case, the IOVA mode is selected against physical address availability.
As a consequence, there is no reason for this special case for Mellanox drivers.
Fixes: 703458e19c16 ("bus/pci: consider only usable devices for IOVA mode")
Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Jerin Jacob <jerinj@marvell.com>
show more ...
|
#
703458e1 |
| 14-Jun-2019 |
Ben Walker <benjamin.walker@intel.com> |
bus/pci: consider only usable devices for IOVA mode
When selecting the preferred IOVA mode of the pci bus, the current heuristic ("are devices bound?", "are devices bound to UIO?", "are pmd drivers
bus/pci: consider only usable devices for IOVA mode
When selecting the preferred IOVA mode of the pci bus, the current heuristic ("are devices bound?", "are devices bound to UIO?", "are pmd drivers supporting IOVA as VA?" etc..) should honor the device white/blacklist so that an unwanted device does not impact the decision.
There is no reason to consider a device which has no driver available.
This applies to all OS, so implements this in common code then call a OS specific callback.
On Linux side: - the VFIO special considerations should be evaluated only if VFIO support is built, - there is no strong requirement on using VA rather than PA if a driver supports VA, so defaulting to DC in such a case.
Signed-off-by: Ben Walker <benjamin.walker@intel.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
0cb86518 |
| 02-May-2019 |
Yongseok Koh <yskoh@mellanox.com> |
bus/pci: add Mellanox kernel driver type
When checking RTE_PCI_DRV_IOVA_AS_VA flag to determine IOVA mode, pci_one_device_has_iova_va() returns true only if kernel driver of the device is vfio. Howe
bus/pci: add Mellanox kernel driver type
When checking RTE_PCI_DRV_IOVA_AS_VA flag to determine IOVA mode, pci_one_device_has_iova_va() returns true only if kernel driver of the device is vfio. However, Mellanox mlx4/5 PMD doesn't need to be detached from kernel driver and attached to VFIO/UIO. Control path still goes through the existing kernel driver, which is mlx4_core/mlx5_core. In order to make RTE_PCI_DRV_IOVA_AS_VA effective for mlx4/mlx5 PMD, a new kernel driver type has to be introduced.
Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
show more ...
|
#
a38eafed |
| 01-Nov-2018 |
Fan Zhang <roy.fan.zhang@intel.com> |
bus/pci: fix config r/w access
The recent change to rte_pci_read/write_config() missed uio_pci_generic case.
Fixes: 630deed612ca ("bus/pci: compare kernel driver instead of interrupt handler") Cc:
bus/pci: fix config r/w access
The recent change to rte_pci_read/write_config() missed uio_pci_generic case.
Fixes: 630deed612ca ("bus/pci: compare kernel driver instead of interrupt handler") Cc: stable@dpdk.org
Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
show more ...
|
#
ec200687 |
| 01-Nov-2018 |
Alejandro Lucero <alejandro.lucero@netronome.com> |
bus/pci: avoid call to DMA mask check
Calling rte_mem_check_dma_mask when memory has not been initialized yet is wrong. This patch use rte_mem_set_dma_mask instead.
Once memory initialization is do
bus/pci: avoid call to DMA mask check
Calling rte_mem_check_dma_mask when memory has not been initialized yet is wrong. This patch use rte_mem_set_dma_mask instead.
Once memory initialization is done, the dma mask set will be used for checking memory mapped is within the specified mask.
Fixes: fe822eb8c565 ("bus/pci: use IOVA DMA mask check when setting IOVA mode")
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
show more ...
|
#
0de9eb61 |
| 01-Nov-2018 |
Alejandro Lucero <alejandro.lucero@netronome.com> |
mem: rename DMA mask check with proper prefix
Current name rte_eal_check_dma_mask does not follow the naming used in the rest of the file.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronom
mem: rename DMA mask check with proper prefix
Current name rte_eal_check_dma_mask does not follow the naming used in the rest of the file.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
show more ...
|
#
55e411b3 |
| 30-Oct-2018 |
Qi Zhang <qi.z.zhang@intel.com> |
bus/pci: fix resource mapping override
When scanning an already plugged device, the virtual address of mapped PCI resource in rte_pci_device will be overridden with 0, that may cause driver does not
bus/pci: fix resource mapping override
When scanning an already plugged device, the virtual address of mapped PCI resource in rte_pci_device will be overridden with 0, that may cause driver does not work correctly. The fix is not to update any rte_pci_device's field if the being scanned device's driver is already probed.
Bugzilla ID: 85 Fixes: c752998b5e2e ("pci: introduce library and driver") Cc: stable@dpdk.org
Reported-by: Geoffrey Lv <geoffrey.lv@gmail.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
show more ...
|
#
630deed6 |
| 25-Oct-2018 |
Alejandro Lucero <alejandro.lucero@netronome.com> |
bus/pci: compare kernel driver instead of interrupt handler
Invoking the right pci read/write functions is based on interrupt handler type. However, this is not configured for secondary processes pr
bus/pci: compare kernel driver instead of interrupt handler
Invoking the right pci read/write functions is based on interrupt handler type. However, this is not configured for secondary processes precluding to use those functions.
This patch fixes the issue using the driver name the device is bound to instead.
Fixes: 632b2d1deeed ("eal: provide functions to access PCI config") Cc: stable@dpdk.org
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
fe822eb8 |
| 05-Oct-2018 |
Alejandro Lucero <alejandro.lucero@netronome.com> |
bus/pci: use IOVA DMA mask check when setting IOVA mode
Currently the code precludes IOVA mode if IOMMU hardware reports less addressing bits than necessary for full virtual memory range.
Although
bus/pci: use IOVA DMA mask check when setting IOVA mode
Currently the code precludes IOVA mode if IOMMU hardware reports less addressing bits than necessary for full virtual memory range.
Although VT-d emulation currently only supports 39 bits, it could be iovas for allocated memlory being within that supported range. This patch allows IOVA mode in such a case adding a call to rte_eal_check_dma_mask using the reported addressing bits by the IOMMU hardware.
Indeed, memory initialization code has been modified for using lower virtual addresses than those used by the kernel for 64 bits processes by default, and therefore memsegs iovas can use 39 bits or less for most systems. And this is likely 100% true for VMs.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
f74d50a7 |
| 05-Oct-2018 |
Alejandro Lucero <alejandro.lucero@netronome.com> |
bus/pci: check IOMMU addressing limitation just once
Current code checks if IOMMU hardware reports enough addressing bits for using IOVA mode but it repeats the same check for any PCI device present
bus/pci: check IOMMU addressing limitation just once
Current code checks if IOMMU hardware reports enough addressing bits for using IOVA mode but it repeats the same check for any PCI device present. This is not necessary because the IOMMU hardware is the same for all of them.
This patch only checks the IOMMU using first PCI device found.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
show more ...
|
#
6844d146 |
| 02-Oct-2018 |
Thomas Monjalon <thomas@monjalon.net> |
eal: add bus pointer in device structure
When a device is added with a devargs (hotplug or whitelist), the bus pointer can be retrieved via its devargs. But there is no such devargs.bus in case of s
eal: add bus pointer in device structure
When a device is added with a devargs (hotplug or whitelist), the bus pointer can be retrieved via its devargs. But there is no such devargs.bus in case of standard scan.
A pointer to the rte_bus handle is added to rte_device. When a device is allocated (during a scan), the pointer to its bus is assigned.
It will make possible to remove a rte_device, using the function pointer from its bus.
The function rte_bus_find_by_device() becomes useless, and may be removed later.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
show more ...
|
#
4104b2a4 |
| 02-Oct-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg list, we would've needed to multiply page size by length of fbarray backing that memseg list. This i
mem: add length to memseg list
Previously, to calculate length of memory area covered by a memseg list, we would've needed to multiply page size by length of fbarray backing that memseg list. This is not obvious and unnecessarily low level, so store length in the memseg list itself.
This breaks ABI, so bump the EAL ABI version and document the change. Also, while we're breaking ABI, pack the members a little better.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
show more ...
|
#
95b01d8c |
| 08-Aug-2018 |
Rami Rosen <rami.rosen@intel.com> |
bus/pci: remove unneeded EAL private include
This trivial patch removes an uneeded include from drivers/bus/pci/linux/pci.c.
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
|
#
52f711f7 |
| 15-May-2018 |
Andy Green <andy@warmcat.com> |
bus/pci: fix size of driver name buffer
Variable dri_name is a pointer and it is incorrect to use its size as the buffer size. Caller knows the buffer size and it is safer to pass it explicitly.
Fi
bus/pci: fix size of driver name buffer
Variable dri_name is a pointer and it is incorrect to use its size as the buffer size. Caller knows the buffer size and it is safer to pass it explicitly.
Fixes: fe5f777b5383 ("bus/pci: replace strncpy by strlcpy") Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com> Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
show more ...
|
#
fe5f777b |
| 14-May-2018 |
Andy Green <andy@warmcat.com> |
bus/pci: replace strncpy by strlcpy
In function ‘pci_get_kernel_driver_by_path’, inlined from ‘pci_scan_one.isra.1’ at drivers/bus/pci/linux/pci.c:317:8: drivers/bus/pci/linux/pci.c:57:3: error
bus/pci: replace strncpy by strlcpy
In function ‘pci_get_kernel_driver_by_path’, inlined from ‘pci_scan_one.isra.1’ at drivers/bus/pci/linux/pci.c:317:8: drivers/bus/pci/linux/pci.c:57:3: error: ‘strncpy’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=] strncpy(dri_name, name + 1, strlen(name + 1) + 1);
Fixes: d9a8cd9595f2 ("pci: add kernel driver type") Cc: stable@dpdk.org
Signed-off-by: Andy Green <andy@warmcat.com> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
show more ...
|
#
66cc45e2 |
| 11-Apr-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
mem: replace memseg with memseg lists
Before, we were aggregating multiple pages into one memseg, so the number of memsegs was small. Now, each page gets its own memseg, so the list of memsegs is hu
mem: replace memseg with memseg lists
Before, we were aggregating multiple pages into one memseg, so the number of memsegs was small. Now, each page gets its own memseg, so the list of memsegs is huge. To accommodate the new memseg list size and to keep the under-the-hood workings sane, the memseg list is now not just a single list, but multiple lists. To be precise, each hugepage size available on the system gets one or more memseg lists, per socket.
In order to support dynamic memory allocation, we reserve all memory in advance (unless we're in 32-bit legacy mode, in which case we do not preallocate memory). As in, we do an anonymous mmap() of the entire maximum size of memory per hugepage size, per socket (which is limited to either RTE_MAX_MEMSEG_PER_TYPE pages or RTE_MAX_MEM_MB_PER_TYPE megabytes worth of memory, whichever is the smaller one), split over multiple lists (which are limited to either RTE_MAX_MEMSEG_PER_LIST memsegs or RTE_MAX_MEM_MB_PER_LIST megabytes per list, whichever is the smaller one). There is also a global limit of CONFIG_RTE_MAX_MEM_MB megabytes, which is mainly used for 32-bit targets to limit amounts of preallocated memory, but can be used to place an upper limit on total amount of VA memory that can be allocated by DPDK application.
So, for each hugepage size, we get (by default) up to 128G worth of memory, per socket, split into chunks of up to 32G in size. The address space is claimed at the start, in eal_common_memory.c. The actual page allocation code is in eal_memalloc.c (Linux-only), and largely consists of copied EAL memory init code.
Pages in the list are also indexed by address. That is, in order to figure out where the page belongs, one can simply look at base address for a memseg list. Similarly, figuring out IOVA address of a memzone is a matter of finding the right memseg list, getting offset and dividing by page size to get the appropriate memseg.
This commit also removes rte_eal_dump_physmem_layout() call, according to deprecation notice [1], and removes that deprecation notice as well.
On 32-bit targets due to limited VA space, DPDK will no longer spread memory to different sockets like before. Instead, it will (by default) allocate all of the memory on socket where master lcore is. To override this behavior, --socket-mem must be used.
The rest of the changes are really ripple effects from the memseg change - heap changes, compile fixes, and rewrites to support fbarray-backed memseg lists. Due to earlier switch to _walk() functions, most of the changes are simple fixes, however some of the _walk() calls were switched to memseg list walk, where it made sense to do so.
Additionally, we are also switching locks from flock() to fcntl(). Down the line, we will be introducing single-file segments option, and we cannot use flock() locks to lock parts of the file. Therefore, we will use fcntl() locks for legacy mem as well, in case someone is unfortunate enough to accidentally start legacy mem primary process alongside an already working non-legacy mem-based primary process.
[1] http://dpdk.org/dev/patchwork/patch/34002/
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
show more ...
|
#
7411d032 |
| 11-Apr-2018 |
Anatoly Burakov <anatoly.burakov@intel.com> |
bus/pci: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and simplify code.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by:
bus/pci: use memseg walk instead of iteration
Reduce dependency on internal details of EAL memory subsystem, and simplify code.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Tested-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Tested-by: Hemant Agrawal <hemant.agrawal@nxp.com> Tested-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
show more ...
|
#
54a328f5 |
| 12-Jan-2018 |
Maxime Coquelin <maxime.coquelin@redhat.com> |
bus/pci: forbid IOVA mode if IOMMU address width too small
Intel VT-d supports different address widths for the IOVAs, from 39 bits to 56 bits.
While recent processors support at least 48 bits, VT-
bus/pci: forbid IOVA mode if IOMMU address width too small
Intel VT-d supports different address widths for the IOVAs, from 39 bits to 56 bits.
While recent processors support at least 48 bits, VT-d emulation currently only supports 39 bits. It makes DMA mapping to fail in this case when using VA as IOVA mode, as user-space virtual addresses uses up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).
This patch parses VT-d CAP register value available in sysfs, and forbid VA as IOVA mode if the GAW is 39 bits or unknown.
Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs") Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Chas Williams <chas3@att.com>
show more ...
|
#
6c700148 |
| 10-Jan-2018 |
Zhiyong Yang <zhiyong.yang@intel.com> |
bus/pci: fix interrupt handler type
For virtio legacy device, testpmd startup fails when using uio_pci_generic.
The issue is caused by invoking the function pci_ioport_map. The correct value of int
bus/pci: fix interrupt handler type
For virtio legacy device, testpmd startup fails when using uio_pci_generic.
The issue is caused by invoking the function pci_ioport_map. The correct value of intr_handle.type is already set before calling it, we should avoid overwriting the default value "RTE_INTR_HANDLE_UNKNOWN" in this function. Besides, the removal has no harm to other cases because it is set to 0 by a memset on the whole struct during allocation in the function pci_scan_one.
Such assignments are removed in the meanwhile in pci_uio_map_resource(), pci_vfio_map_resource_primary() and pci_vfio_map_resource_secondary() in order to keep consistencies and avoid future questions.
Fixes: 756ce64b1ecd ("eal: introduce PCI ioport API") Cc: stable@dpdk.org
Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com> Reviewed-by: Thomas Monjalon <thomas@monjalon.net>
show more ...
|
#
5566a3e3 |
| 19-Dec-2017 |
Bruce Richardson <bruce.richardson@intel.com> |
drivers: use SPDX tag for Intel copyright files
Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@in
drivers: use SPDX tag for Intel copyright files
Replace the BSD license header with the SPDX tag for files with only an Intel copyright on them.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
show more ...
|
#
f1b7c6b7 |
| 07-Nov-2017 |
Jonas Pfefferle <jpf@zurich.ibm.com> |
bus/pci: fix PPC condition for IOMMU class
This fixes the use of an never defined PPC64 define in ret_pci_get_iommu_class.
Fixes: b48e0e2d9cb4 ("bus/pci: fix IOMMU class for sPAPR")
Signed-off-by:
bus/pci: fix PPC condition for IOMMU class
This fixes the use of an never defined PPC64 define in ret_pci_get_iommu_class.
Fixes: b48e0e2d9cb4 ("bus/pci: fix IOMMU class for sPAPR")
Signed-off-by: Jonas Pfefferle <jpf@zurich.ibm.com>
show more ...
|