#
151cbe3a |
| 12-Apr-2021 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: separate Rx function declarations to another file
The mlx5_rxtx.c file contains a lot of Tx burst functions, each of those is performance-optimized for the specific set of requested offloa
net/mlx5: separate Rx function declarations to another file
The mlx5_rxtx.c file contains a lot of Tx burst functions, each of those is performance-optimized for the specific set of requested offloads. These ones are generated on the basis of the template function and it takes significant time to compile, just due to a large number of giant functions generated in the same file and this compilation is not being done in parallel with using multithreading.
Therefore we can split the mlx5_rxtx.c file into several separate files to allow different functions to be compiled simultaneously. In this patch, we separate Rx function declarations to different header file in preparation for removing them from the source file and as an optional preparation step for further consolidation of Rx burst functions.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
d61381ad |
| 14-Mar-2021 |
Viacheslav Ovsiienko <viacheslavo@nvidia.com> |
net/mlx5: support timestamp format
This patch adds support for the timestamp format settings for the receive and send queues. If the firmware version x.30.1000 or above is installed and the NIC time
net/mlx5: support timestamp format
This patch adds support for the timestamp format settings for the receive and send queues. If the firmware version x.30.1000 or above is installed and the NIC timestamps are configured with the real-time format, the default zero values for newly added fields cause the queue creation to fail.
The patch queries the timestamp formats supported by the hardware and sets the configuration values in queue context accordingly.
Fixes: 86fc67fc9315 ("net/mlx5: create advanced RxQ object via DevX") Fixes: ae18a1ae9692 ("net/mlx5: support Tx hairpin queues") Fixes: 15c3807e86ab ("common/mlx5: support DevX QP operations") Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>
show more ...
|
#
e6988afd |
| 25-Feb-2021 |
Matan Azrad <matan@nvidia.com> |
net/mlx5: fix imissed statistics
The imissed port statistic counts packets that were dropped by the device Rx queues.
In mlx5, the imissed counter summarizes 2 counters: - packets dropped by the S
net/mlx5: fix imissed statistics
The imissed port statistic counts packets that were dropped by the device Rx queues.
In mlx5, the imissed counter summarizes 2 counters: - packets dropped by the SW queue handling counted by SW. - packets dropped by the HW queues due to "out of buffer" events detected when no SW buffer is available for the incoming packets.
There is HW counter object that should be created per device, and all the Rx queues should be assigned to this counter in configuration time.
This part was missed when the Rx queues were created by DevX what remained the "out of buffer" counter clean forever in this case.
Add 2 options to assign the DevX Rx queues to queue counter: - Create queue counter per device by DevX and assign all the queues to it. - Query the kernel counter and assign all the queues to it.
Use the first option by default and if it is failed, fallback to the second option.
Fixes: e79c9be91515 ("net/mlx5: support Rx hairpin queues") Fixes: dc9ceff73c99 ("net/mlx5: create advanced RxQ via DevX") Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
00984de5 |
| 04-Feb-2021 |
Viacheslav Ovsiienko <viacheslavo@nvidia.com> |
net/mlx5: fix Tx queue size created with DevX
The number of descriptors specified for queue creation implies the queue should be able to contain the specified amount of packets being sent. Typically
net/mlx5: fix Tx queue size created with DevX
The number of descriptors specified for queue creation implies the queue should be able to contain the specified amount of packets being sent. Typically one packet takes one queue descriptor (WQE) to be handled. If there is inline data option enabled one packet might require more WQEs to embrace the inline data and the overall queue size (the number of queue descriptors) should be adjusted accordingly.
In mlx5 PMD the queues can be created either via Verbs, using the rdma-core library or via DevX as direct kernel/firmware call. The rdma-core does queue size adjustment internally, depending on TSO and inline setting. The DevX approach missed this point. This caused the queue size discrepancy and performance variations.
The patch adjusts the Tx queue size for the DevX approach in the same as it is done in rdma-core implementation.
Fixes: 86d259cec852 ("net/mlx5: separate Tx queue object creations") Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
6e0a3637 |
| 06-Jan-2021 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: move Rx RQ creation to common
Using common function for Rx RQ creation.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
|
#
74e91860 |
| 06-Jan-2021 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: move Tx SQ creation to common
Using common function for Tx SQ creation.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
|
#
5cd33796 |
| 06-Jan-2021 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: move Rx CQ creation to common
Using common function for Rx CQ creation.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
|
#
5f04f70c |
| 06-Jan-2021 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: move Tx CQ creation to common
Using common function for Tx CQ creation.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
|
#
a2521c8f |
| 06-Jan-2021 |
Michael Baum <michaelba@nvidia.com> |
common/mlx5: fix completion queue entry size configuration
According to the current data-path implementation in the PMD the CQE size must follow the cache-line size. So, the configuration of the CQE
common/mlx5: fix completion queue entry size configuration
According to the current data-path implementation in the PMD the CQE size must follow the cache-line size. So, the configuration of the CQE size should be depended in RTE_CACHE_LINE_SIZE.
Wrongly, part of the CQE creations didn't follow it exactly what caused an incompatibility between HW and SW in the data-path when working in 128B cache-line size systems.
Adjust the rule for any CQE creation. Remove the cqe_size attribute from the DevX CQ creation command and set it inside the command translation according to the cache-line size.
Fixes: 79a7e409a2f6 ("common/mlx5: prepare support of packet pacing") Fixes: 5cd0a83f413e ("common/mlx5: support more fields in DevX CQ create") Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
f1ae0b35 |
| 28-Dec-2020 |
Ophir Munk <ophirmu@nvidia.com> |
net/mlx5: enable more shared code on Windows
Use macro HAVE_INFINIBAND_VERBS_H to successfully compile files both under Linux and Windows (or any non Linux in general). Under Windows this macro: 1.
net/mlx5: enable more shared code on Windows
Use macro HAVE_INFINIBAND_VERBS_H to successfully compile files both under Linux and Windows (or any non Linux in general). Under Windows this macro: 1. Hides Verbs references. 2. Exposes required DV structs that are under ifdefs related to rdma core.
Linux code under definitions such as #ifdef HAVE_IBV_FLOW_DV_SUPPORT is required unconditionally under Windows however those definitions are never effective without rdma-core presence. Therefore update the #ifdef condition to consider HAVE_INFINIBAND_VERBS_H as well (undefined macro when running without an rdma-core library).
For example: -#ifdef HAVE_IBV_FLOW_DV_SUPPORT +#if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
88019723 |
| 28-Dec-2020 |
Ophir Munk <ophirmu@nvidia.com> |
net/mlx5: fix flow operation wrapper per OS
Wrap glue call dv_create_flow_action_dest_devx_tir() with an OS API.
Fixes: b293fbf9672b ("net/mlx5: add OS specific flow actions operations") Cc: stable
net/mlx5: fix flow operation wrapper per OS
Wrap glue call dv_create_flow_action_dest_devx_tir() with an OS API.
Fixes: b293fbf9672b ("net/mlx5: add OS specific flow actions operations") Cc: stable@dpdk.org
Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
98174626 |
| 28-Dec-2020 |
Tal Shnaiderman <talshn@nvidia.com> |
common/mlx5: wrap event channel functions per OS
Wrap the API to create/destroy event channel and to subscribe an event with OS calls. In Linux those calls are implemented by glue functions while in
common/mlx5: wrap event channel functions per OS
Wrap the API to create/destroy event channel and to subscribe an event with OS calls. In Linux those calls are implemented by glue functions while in Windows they are not supported.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
07a99de8 |
| 28-Dec-2020 |
Tal Shnaiderman <talshn@nvidia.com> |
net/mlx5: wrap glue reg/dereg UMEM per OS
Wrap glue calls for UMEM registration and deregistration with generic OS calls since each OS (Linux or Windows) has a different glue API parameters.
Signed
net/mlx5: wrap glue reg/dereg UMEM per OS
Wrap glue calls for UMEM registration and deregistration with generic OS calls since each OS (Linux or Windows) has a different glue API parameters.
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
6e09c7bb |
| 24-Nov-2020 |
Gregory Etelson <getelson@nvidia.com> |
net/mlx5: fix DevX resources freeing
Invalid memory release order of DevX resources caused PMD crash.
1. SQ and CQ memory must be unregistered with DevX before it is freed. 2. SQ objects reference
net/mlx5: fix DevX resources freeing
Invalid memory release order of DevX resources caused PMD crash.
1. SQ and CQ memory must be unregistered with DevX before it is freed. 2. SQ objects reference to a CQ ones. Hence, SQ should be destroyed in advance of CQ it references to.
Fixes: 6deb19e1b2d2 ("net/mlx5: separate Rx queue object creations") Fixes: 88f2e3f18cc7 ("net/mlx5: rearrange SQ and CQ creation in DevX module")
Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
fa7ad49e |
| 22-Nov-2020 |
Andrey Vesnovaty <andreyv@nvidia.com> |
net/mlx5: fix shared RSS action update
The shared RSS action update was not operational due to lack of kernel driver support of TIR object modification. This commit introduces the workaround to supp
net/mlx5: fix shared RSS action update
The shared RSS action update was not operational due to lack of kernel driver support of TIR object modification. This commit introduces the workaround to support shared RSS action modify using an indirect queue table update instead of touching TIR object directly. Limitations: the only supported RSS property to update is queues, the rest of the properties ignored.
Fixes: d2046c09aa64 ("net/mlx5: support shared action for RSS")
Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
ff2deada |
| 15-Nov-2020 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: fix Rx packet padding config via DevX
Received packets can be aligned to the size of the cache line on PCI transactions. This could improve performance by avoiding partial cache line write
net/mlx5: fix Rx packet padding config via DevX
Received packets can be aligned to the size of the cache line on PCI transactions. This could improve performance by avoiding partial cache line writes in exchange for increased PCI bandwidth.
This feature is supposed to be controlled by the rxq_pkt_pad_en devarg and it is true for an RxQ created via the Verbs API. But in the DevX API case, it is erroneously controlled by the rxq_cqe_pad_en devarg instead, which is in charge of the CQE padding instead and should not control the RxQ creation.
Fix DevX RxQ creation by using the proper configuration flag for Rx packet padding that is being set by the rxq_pkt_pad_en devarg.
Fixes: dc9ceff73c99 ("net/mlx5: create advanced RxQ via DevX") Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
876b5d52 |
| 03-Nov-2020 |
Matan Azrad <matan@nvidia.com> |
net/mlx5: fix Tx queue stop state
The Tx queue stop API doesn't call the PMD callback when the state of the queue is stopped. The drivers should update the state to be stopped when the queue stop ca
net/mlx5: fix Tx queue stop state
The Tx queue stop API doesn't call the PMD callback when the state of the queue is stopped. The drivers should update the state to be stopped when the queue stop callback is done successfully or when the port is stopped. The drivers should update the state to be started when the queue start callback is done successfully or when the port is started.
The driver wrongly didn't update the state as started when the port start callback was done which kept the state as stopped. Following call to a queue stop API was not completed by ethdev layer because the state is already stopped.
Move the state update from the Tx queue setup to the port start callback.
Fixes: 161d103b231c ("net/mlx5: add queue start and stop") Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
8178d9be |
| 28-Oct-2020 |
Tal Shnaiderman <talshn@nvidia.com> |
net/mlx5: fix SQ resources release in error flow
Fix in error flow in which the function mlx5_txq_release_devx_sq_resources is called twice by setting the release object to NULL after the first call
net/mlx5: fix SQ resources release in error flow
Fix in error flow in which the function mlx5_txq_release_devx_sq_resources is called twice by setting the release object to NULL after the first call
The incorrect flow was introduced in the work done on generic object creation.
Once an error flow inside mlx5_txq_create_devx_sq_resources occurs the function will call mlx5_txq_release_devx_sq_resources however the released pointers are not set to NULL after the release calls and undefined memory is released in the same call in mlx5_txq_release_devx_resources.
This results in calls to MLX5_FREE with an already released memory addresses and assert in mlx5_release_dbr:
EAL: Error: Invalid memory EAL: Error: Invalid memory
PANIC in mlx5_txq_release_devx_sq_resources(): assert "(mlx5_release_dbr(&txq_obj->txq_ctrl->priv->dbrpgs, mlx5_os_get_umem_id (txq_obj->sq_dbrec_page->umem), txq_obj->sq_dbrec_offset)) == 0" failed
The fix is setting the released pointers to NULL after the first release calls.
Fixes: 86d259cec852 ("net/mlx5: separate Tx queue object creations") Cc: stable@dpdk.org
Signed-off-by: Tal Shnaiderman <talshn@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
54c2d46b |
| 01-Nov-2020 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: support flow tag and packet header miniCQEs
CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniC
net/mlx5: support flow tag and packet header miniCQEs
CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniCQE size is only 8 bytes and this limits the ability to successfully keep the compression session in case of various traffic patterns.
The current miniCQE format only keeps the compression session alive in case of uniform traffic with the Hash RSS as the only difference. There are requests to keep the compression session in case of tagged traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic. Add 2 new miniCQE formats in order to achieve the best performance for these traffic patterns: Flow Tag and Packet Header miniCQEs.
The existing rxq_cqe_comp_en devarg is modified to specify the desired miniCQE format. Specifying 2 selects Flow Tag format for better compression rate in case of RTE Flow Mark traffic. Specifying 3 selects Checksum format (existing format for MPRQ). Specifying 4 selects L3/L4 Header format for better compression rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
b8cc58c1 |
| 23-Oct-2020 |
Andrey Vesnovaty <andreyv@nvidia.com> |
net/mlx5: modify hash Rx queue objects
Implement modification for hashed table of Rx queue object (see mlx5_hrxq_modify()). This implementation relies on the capability to modify TIR object via DevX
net/mlx5: modify hash Rx queue objects
Implement modification for hashed table of Rx queue object (see mlx5_hrxq_modify()). This implementation relies on the capability to modify TIR object via DevX API, i.e. current implementation doesn't support verbs HW object operations. The functionality to modify hashed table of Rx queue object is prerequisite to implement rete_flow_shared_action_update() for shared RSS action in mlx5 PMD.
Signed-off-by: Andrey Vesnovaty <andreyv@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
0f20acbf |
| 21-Oct-2020 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: implement vectorized MPRQ burst
MPRQ (Multi-Packet Rx Queue) processes one packet at a time using simple scalar instructions. MPRQ works by posting a single large buffer (consisted of mult
net/mlx5: implement vectorized MPRQ burst
MPRQ (Multi-Packet Rx Queue) processes one packet at a time using simple scalar instructions. MPRQ works by posting a single large buffer (consisted of multiple fixed-size strides) in order to receive multiple packets at once on this buffer. A Rx packet is then copied to a user-provided mbuf or PMD attaches the Rx packet to the mbuf by the pointer to an external buffer.
There is an opportunity to speed up the packet receiving by processing 4 packets simultaneously using SIMD (single instruction, multiple data) extensions. Allocate mbufs in batches for every MPRQ buffer and process the packets in groups of 4 until all the strides are exhausted. Then switch to another MPRQ buffer and repeat the process over again.
The vectorized MPRQ burst routine is engaged automatically in case the mprq_en=1 devarg is specified and the vectorization is not disabled explicitly by providing rx_vec_en=0 devarg. There is a limitation: LRO is not supported and scalar MPRQ is selected if it is on.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
e96242ef |
| 01-Oct-2020 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: remove Rx queue object type field
Once the separation between Verbs and DevX is done using function pointers, the type field of the Rx queue object structure becomes redundant and no more
net/mlx5: remove Rx queue object type field
Once the separation between Verbs and DevX is done using function pointers, the type field of the Rx queue object structure becomes redundant and no more code is used. Remove the unnecessary field from the structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
4c6d80f1 |
| 01-Oct-2020 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: separate Rx queue state modification
Separate Rx state modification to the Verbs and DevX modules.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.c
net/mlx5: separate Rx queue state modification
Separate Rx state modification to the Verbs and DevX modules.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
354cc08a |
| 01-Oct-2020 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: remove Tx queue object type field
Once the separation between Verbs and DevX is done using function pointers, the type field of the Tx queue object structure becomes redundant and no more
net/mlx5: remove Tx queue object type field
Once the separation between Verbs and DevX is done using function pointers, the type field of the Tx queue object structure becomes redundant and no more code is used. Remove the unnecessary field from the structure.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
a9c79306 |
| 01-Oct-2020 |
Michael Baum <michaelba@nvidia.com> |
net/mlx5: share Tx queue object modification
Use new modify_qp functions for Tx object creation in DevX and Verbs modules.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <
net/mlx5: share Tx queue object modification
Use new modify_qp functions for Tx object creation in DevX and Verbs modules.
Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|