#
43fd3624 |
| 21-Jan-2025 |
Andre Muezerie <andremue@linux.microsoft.com> |
drivers: replace GCC pragma with cast
"GCC diagnostic ignored" pragmas have been commonly sprinkled over the code. Clang supports GCC's pragma for compatibility with existing source code, so #pragma
drivers: replace GCC pragma with cast
"GCC diagnostic ignored" pragmas have been commonly sprinkled over the code. Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html).
Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well.
Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> Acked-by: Morten Brørup <mb@smartsharesystems.com>
show more ...
|
#
a7ae9ba1 |
| 13-Nov-2024 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: fix miniCQEs number calculation
Use the information from the CQE, not from the title packet, for getting the number of miniCQEs in the compressed CQEs array. This way we can avoid segfault
net/mlx5: fix miniCQEs number calculation
Use the information from the CQE, not from the title packet, for getting the number of miniCQEs in the compressed CQEs array. This way we can avoid segfaults in the rxq_cq_decompress_v() in case of mbuf corruption (due to double mbuf free, for example).
Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
3638f431 |
| 28-Oct-2024 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: fix shared queue port number in vector Rx
Wrong CQE is used to get the shared Rx queue port number in vectorized Rx burst routine. Fix the CQE indexing.
Fixes: 25ed2ebff131 ("net/mlx5: su
net/mlx5: fix shared queue port number in vector Rx
Wrong CQE is used to get the shared Rx queue port number in vectorized Rx burst routine. Fix the CQE indexing.
Fixes: 25ed2ebff131 ("net/mlx5: support shared Rx queue port data path") Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
191128d7 |
| 25-Oct-2024 |
David Marchand <david.marchand@redhat.com> |
drivers: use bitops API instead of compiler builtins
Stop using directly __builtin_ bit operations, prefer existing DPDK wrappers.
Note: this is a brute sed all over drivers (skipping base drivers)
drivers: use bitops API instead of compiler builtins
Stop using directly __builtin_ bit operations, prefer existing DPDK wrappers.
Note: this is a brute sed all over drivers (skipping base drivers) for __builtin_* that have a direct replacement in EAL bitops. There is more work to do, like adding some missing macros inspired from kernel (FIELD_*) macros but this is left for later.
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
show more ...
|
#
90ec9b0d |
| 01-Nov-2023 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: replenish MPRQ buffers for miniCQEs
Keep unzipping if the next CQE is the miniCQE array in rxq_cq_decompress_v() routine only for non-MPRQ scenario, MPRQ requires buffer replenishment betw
net/mlx5: replenish MPRQ buffers for miniCQEs
Keep unzipping if the next CQE is the miniCQE array in rxq_cq_decompress_v() routine only for non-MPRQ scenario, MPRQ requires buffer replenishment between the miniCQEs.
Restore the check for the initial compressed CQE for SPRQ and check that the current CQE is not compressed before copying it as a possible title CQE.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
show more ...
|
#
7ac7450b |
| 30-May-2023 |
Ruifeng Wang <ruifeng.wang@arm.com> |
net/mlx5: fix risk in NEON Rx descriptor read
In NEON vector PMD, vector load loads two contiguous 8B of descriptor data into vector register. Given vector load ensures no 16B atomicity, read of the
net/mlx5: fix risk in NEON Rx descriptor read
In NEON vector PMD, vector load loads two contiguous 8B of descriptor data into vector register. Given vector load ensures no 16B atomicity, read of the word that includes op_own field could be reordered after read of other words. In this case, some words could contain invalid data.
Reloaded qword0 after read barrier to update vector register. This ensures that the fetched data is correct.
Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.
Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions") Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Tested-by: Ali Alnubani <alialnu@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
fca8cba4 |
| 21-Jun-2023 |
David Marchand <david.marchand@redhat.com> |
ethdev: advertise flow restore in mbuf
As reported by Ilya [1], unconditionally calling rte_flow_get_restore_info() impacts an application performance for drivers that do not provide this ops. It co
ethdev: advertise flow restore in mbuf
As reported by Ilya [1], unconditionally calling rte_flow_get_restore_info() impacts an application performance for drivers that do not provide this ops. It could also impact processing of packets that require no call to rte_flow_get_restore_info() at all.
Register a dynamic mbuf flag when an application negotiates tunnel metadata delivery (calling rte_eth_rx_metadata_negotiate() with RTE_ETH_RX_METADATA_TUNNEL_ID).
Drivers then advertise that metadata can be extracted by setting this dynamic flag in each mbuf.
The application then calls rte_flow_get_restore_info() only when required.
Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/
Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Tested-by: Ali Alnubani <alialnu@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>
show more ...
|
#
fc3e1798 |
| 28-Feb-2023 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: support enhanced CQE zipping in vector Rx burst
Add Enhanced CQE compression support to vectorized Rx burst routines. Adopt the same algorithm as scalar Rx burst routines have today. 1. Re
net/mlx5: support enhanced CQE zipping in vector Rx burst
Add Enhanced CQE compression support to vectorized Rx burst routines. Adopt the same algorithm as scalar Rx burst routines have today. 1. Retrieve the validity_iteration_count from CQEs and use it to check if the CQE is ready to be processed instead of the owner_bit. 2. Do not invalidate reserved CQEs between miniCQE arrays. 3. Copy the title packet from the last processed uncompressed CQE since we will need it later to build packets from zipped CQEs. 4. Skip the regular CQE processing and go straight to the CQE unzip function in case the very first CQE is compressed to sace CPU time.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
1f903ebe |
| 27-Jan-2023 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: check compressed CQE opcode in vectorized Rx
The CQE opcode is never checked for a compressed CQE in the vectorized Rx burst routines. It is assumed that compressed CQEs are always valid a
net/mlx5: check compressed CQE opcode in vectorized Rx
The CQE opcode is never checked for a compressed CQE in the vectorized Rx burst routines. It is assumed that compressed CQEs are always valid and skipped error checking.
This is obviously not the case and error CQEs may be compressed together as well. Need to check for the MLX5_CQE_RESP_ERR opcode and mark all the packets as bad ones in the compression session if it is there.
Note that this issue is not applicable to the scalar Rx burst.
Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
7be78d02 |
| 29-Nov-2021 |
Josh Soref <jsoref@gmail.com> |
fix spelling in comments and strings
The tool comes from https://github.com/jsoref
Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
|
#
25ed2ebf |
| 04-Nov-2021 |
Viacheslav Ovsiienko <viacheslavo@nvidia.com> |
net/mlx5: support shared Rx queue port data path
When receive packet, mlx5 PMD saves mbuf port number from RxQ data.
To support shared RxQ, save port number into RQ context as user index. Received
net/mlx5: support shared Rx queue port data path
When receive packet, mlx5 PMD saves mbuf port number from RxQ data.
To support shared RxQ, save port number into RQ context as user index. Received packet resolve port number from CQE user index which derived from RQ context.
Legacy Verbs API doesn't support RQ user index setting, still read from RxQ port number.
Signed-off-by: Xueming Li <xuemingl@nvidia.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
daa02b5c |
| 15-Oct-2021 |
Olivier Matz <olivier.matz@6wind.com> |
mbuf: add namespace to offload flags
Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation.
Sign
mbuf: add namespace to offload flags
Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
show more ...
|
#
6d5735c1 |
| 20-Jul-2021 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: fix meta register conversion for extensive mode
Register C is used in the extensive metadata mode number 1 and its width can vary from 0 to 32 bits depending on the kernel usage of it.
Th
net/mlx5: fix meta register conversion for extensive mode
Register C is used in the extensive metadata mode number 1 and its width can vary from 0 to 32 bits depending on the kernel usage of it.
There are several issues associated with this mode (dv_xmeta_en=1): 1. The metadata setting assumes that the width is always 16 bits, which is the most common case in this mode. Use the proper mask. 2. The same is true for the modify_field Flow API. 16-bits width is hardcoded for dv_xmeta_en=1. Switch to the register C mask width. 3. Metadata is stored in the most significant bits in CQE in this mode because the registers copy code was not updated during the metadata conversion to the big-endian format. Update this code to avoid shifting the metadata in the datapath.
Fixes: b57e414b48 ("net/mlx5: convert meta register to big-endian") Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
6e695b0c |
| 08-Jun-2021 |
Sarosh Arif <sarosh.arif@emumba.com> |
net/mlx5: fix typo in vectorized Rx comments
Change "returing" to "returning".
Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx") Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
net/mlx5: fix typo in vectorized Rx comments
Change "returing" to "returning".
Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx") Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") Fixes: 3c2ddbd413e3 ("net/mlx5: separate shareable vector functions") Cc: stable@dpdk.org
Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>
show more ...
|
#
ff6fcd41 |
| 07-Jul-2021 |
Ruifeng Wang <ruifeng.wang@arm.com> |
net/mlx5: remove redundant operations in NEON Rx
Mask of entries after the compressed CQE is covered by invalid mask of non-compressed valid CQEs. Hence remove redundant calculation on mask. The cha
net/mlx5: remove redundant operations in NEON Rx
Mask of entries after the compressed CQE is covered by invalid mask of non-compressed valid CQEs. Hence remove redundant calculation on mask. The change showed slight performance uplift on N1SDP.
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
b57e414b |
| 16-Jun-2021 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: convert meta register to big-endian
Metadata were stored in the CPU order (little-endian format on x86), while all the packet header fields are stored in the network order. That caused wro
net/mlx5: convert meta register to big-endian
Metadata were stored in the CPU order (little-endian format on x86), while all the packet header fields are stored in the network order. That caused wrong results whenever we tried to use metadata value in the modify_field action: bytes were swapped as a result.
Convert the metadata value into big-endian format before storing it in the Mellanox NIC to achieve consistent behaviour.
Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action") Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
4eefb20f |
| 07-Mar-2021 |
Viacheslav Ovsiienko <viacheslavo@nvidia.com> |
net/mlx5: fix Rx metadata leftovers
The Rx metadata might use the metadata register C0 to keep the values. The same register C0 might be used by kernel for source vport value handling, kernel uses u
net/mlx5: fix Rx metadata leftovers
The Rx metadata might use the metadata register C0 to keep the values. The same register C0 might be used by kernel for source vport value handling, kernel uses upper half of the register, leaving the lower half for application usage.
In the extended metadata mode 1 (dv_xmeta_en devarg is assigned with value 1) the metadata width is 16 bits only, the Rx datapath code fetched the entire 32-bit value of the metadata register and presented one to application. The patch provides data masking depending on the chosen metadata mode.
Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues") Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com>
show more ...
|
#
71094ae3 |
| 29-Oct-2020 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: fix CQE decompression for Arm and PowerPC
The recent Rx code refactoring moved the incrementing of the CQ completion index out of the rxq_cq_decompress_v() function to the rxq_burst_v() fu
net/mlx5: fix CQE decompression for Arm and PowerPC
The recent Rx code refactoring moved the incrementing of the CQ completion index out of the rxq_cq_decompress_v() function to the rxq_burst_v() function.
The advancing of CQ completion index was removed in SSE version only causing Neon and Altivec Rx bursts to stall.
Remove the incrementation of CQ completion index for all the architectures in order to fix the stall.
Fixes: 1ded26239aa0 ("net/mlx5: refactor vectorized Rx")
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
54c2d46b |
| 01-Nov-2020 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: support flow tag and packet header miniCQEs
CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniC
net/mlx5: support flow tag and packet header miniCQEs
CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniCQE size is only 8 bytes and this limits the ability to successfully keep the compression session in case of various traffic patterns.
The current miniCQE format only keeps the compression session alive in case of uniform traffic with the Hash RSS as the only difference. There are requests to keep the compression session in case of tagged traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic. Add 2 new miniCQE formats in order to achieve the best performance for these traffic patterns: Flow Tag and Packet Header miniCQEs.
The existing rxq_cqe_comp_en devarg is modified to specify the desired miniCQE format. Specifying 2 selects Flow Tag format for better compression rate in case of RTE Flow Mark traffic. Specifying 3 selects Checksum format (existing format for MPRQ). Specifying 4 selects L3/L4 Header format for better compression rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
1ded2623 |
| 21-Oct-2020 |
Alexander Kozyrev <akozyrev@nvidia.com> |
net/mlx5: refactor vectorized Rx
Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instruct
net/mlx5: refactor vectorized Rx
Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instructions in a single reusable block is a first preparatory step to implement vectorized Rx burst for MPRQ feature.
Pass a pointer to the storage of mbufs directly to the rxq_copy_mbuf_v instead of calculating the pointer inside this function. This is needed for the future vectorized Rx routing which is going to pass a different pointer here.
Calculate the number of packets to replenish inside the mlx5_rx_replenish_bulk_mbuf. Containing this logic in one place allows us to do the same for MPRQ case.
Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
show more ...
|
#
04840ecb |
| 29-Oct-2020 |
Thomas Monjalon <thomas@monjalon.net> |
net/mlx5: switch Rx timestamp to dynamic mbuf field
The mbuf timestamp is moved to a dynamic field in order to allow removal of the deprecated static field. The related mbuf flag is also replaced.
net/mlx5: switch Rx timestamp to dynamic mbuf field
The mbuf timestamp is moved to a dynamic field in order to allow removal of the deprecated static field. The related mbuf flag is also replaced.
The dynamic offset and flag are stored in struct mlx5_rxq_data to favor cache locality.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com>
show more ...
|
#
f0f5d844 |
| 23-Sep-2020 |
Phil Yang <phil.yang@arm.com> |
eal: remove deprecated coherent IO memory barriers
Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs provide the same functionality as rte_io_*mb APIs on all platforms, so remov
eal: remove deprecated coherent IO memory barriers
Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs provide the same functionality as rte_io_*mb APIs on all platforms, so remove them and use rte_io_*mb instead.
Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: David Marchand <david.marchand@redhat.com>
show more ...
|
#
4ffab7b9 |
| 23-Jul-2020 |
Viacheslav Ovsiienko <viacheslavo@mellanox.com> |
net/mlx5: fix metadata storing for NEON Rx
There was the typo introducing the bug, affected the mlx5 vectorized rx_burst on ARM architectures in case if CQE compression was enabled.
Fixes: 6c55b622
net/mlx5: fix metadata storing for NEON Rx
There was the typo introducing the bug, affected the mlx5 vectorized rx_burst on ARM architectures in case if CQE compression was enabled.
Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues") Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
show more ...
|
#
6f52bd33 |
| 22-Jul-2020 |
Alexander Kozyrev <akozyrev@mellanox.com> |
net/mlx5: fix vectorized mini-CQE prefetching
There was an optimization work to prefetch all the CQEs before their invalidation. It allowed us to speed up the mini-CQE decompression process by prehe
net/mlx5: fix vectorized mini-CQE prefetching
There was an optimization work to prefetch all the CQEs before their invalidation. It allowed us to speed up the mini-CQE decompression process by preheating the cache in the vectorized Rx routine.
Prefetching of the next mini-CQE, on the other hand, showed no difference in the performance on x86 platform. So, that was removed. Unfortunately this caused the performance drop on ARM.
Prefetch the mini-CQE as well as all the soon to be invalidated CQEs to get both CQE and mini-CQE on the hot path.
Fixes: 28a4b96321a3 ("net/mlx5: prefetch CQEs for a faster decompression") Cc: stable@dpdk.org
Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
show more ...
|
#
a2854c4d |
| 16-Jul-2020 |
Viacheslav Ovsiienko <viacheslavo@mellanox.com> |
net/mlx5: convert Rx timestamps in real-time format
The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains t
net/mlx5: convert Rx timestamps in real-time format
The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains the UTC seconds and the lower 32-bit word contains the nanoseconds. This patch detects what format is configured in the NIC and performs the conversion accordingly.
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com>
show more ...
|