mlx5_rxtx_vec_neon.h - OpenGrok history log for /dpdk/drivers/net/mlx5/mlx5_rxtx_vec

Revision	Date	Author	Comments
# 43fd3624	21-Jan-2025	Andre Muezerie <andremue@linux.microsoft.com>	drivers: replace GCC pragma with cast "GCC diagnostic ignored" pragmas have been commonly sprinkled over the code. Clang supports GCC's pragma for compatibility with existing source code, so #pragma drivers: replace GCC pragma with cast "GCC diagnostic ignored" pragmas have been commonly sprinkled over the code. Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> show more ...
# a7ae9ba1	13-Nov-2024	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix miniCQEs number calculation Use the information from the CQE, not from the title packet, for getting the number of miniCQEs in the compressed CQEs array. This way we can avoid segfault net/mlx5: fix miniCQEs number calculation Use the information from the CQE, not from the title packet, for getting the number of miniCQEs in the compressed CQEs array. This way we can avoid segfaults in the rxq_cq_decompress_v() in case of mbuf corruption (due to double mbuf free, for example). Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 3638f431	28-Oct-2024	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix shared queue port number in vector Rx Wrong CQE is used to get the shared Rx queue port number in vectorized Rx burst routine. Fix the CQE indexing. Fixes: 25ed2ebff131 ("net/mlx5: su net/mlx5: fix shared queue port number in vector Rx Wrong CQE is used to get the shared Rx queue port number in vectorized Rx burst routine. Fix the CQE indexing. Fixes: 25ed2ebff131 ("net/mlx5: support shared Rx queue port data path") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 191128d7	25-Oct-2024	David Marchand <david.marchand@redhat.com>	drivers: use bitops API instead of compiler builtins Stop using directly __builtin_ bit operations, prefer existing DPDK wrappers. Note: this is a brute sed all over drivers (skipping base drivers) drivers: use bitops API instead of compiler builtins Stop using directly __builtin_ bit operations, prefer existing DPDK wrappers. Note: this is a brute sed all over drivers (skipping base drivers) for __builtin_* that have a direct replacement in EAL bitops. There is more work to do, like adding some missing macros inspired from kernel (FIELD_*) macros but this is left for later. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> show more ...
# 90ec9b0d	01-Nov-2023	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: replenish MPRQ buffers for miniCQEs Keep unzipping if the next CQE is the miniCQE array in rxq_cq_decompress_v() routine only for non-MPRQ scenario, MPRQ requires buffer replenishment betw net/mlx5: replenish MPRQ buffers for miniCQEs Keep unzipping if the next CQE is the miniCQE array in rxq_cq_decompress_v() routine only for non-MPRQ scenario, MPRQ requires buffer replenishment between the miniCQEs. Restore the check for the initial compressed CQE for SPRQ and check that the current CQE is not compressed before copying it as a possible title CQE. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com> show more ...
# 7ac7450b	30-May-2023	Ruifeng Wang <ruifeng.wang@arm.com>	net/mlx5: fix risk in NEON Rx descriptor read In NEON vector PMD, vector load loads two contiguous 8B of descriptor data into vector register. Given vector load ensures no 16B atomicity, read of the net/mlx5: fix risk in NEON Rx descriptor read In NEON vector PMD, vector load loads two contiguous 8B of descriptor data into vector register. Given vector load ensures no 16B atomicity, read of the word that includes op_own field could be reordered after read of other words. In this case, some words could contain invalid data. Reloaded qword0 after read barrier to update vector register. This ensures that the fetched data is correct. Testpmd single core test on N1SDP/ThunderX2 showed no performance drop. Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions") Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Tested-by: Ali Alnubani <alialnu@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# fca8cba4	21-Jun-2023	David Marchand <david.marchand@redhat.com>	ethdev: advertise flow restore in mbuf As reported by Ilya [1], unconditionally calling rte_flow_get_restore_info() impacts an application performance for drivers that do not provide this ops. It co ethdev: advertise flow restore in mbuf As reported by Ilya [1], unconditionally calling rte_flow_get_restore_info() impacts an application performance for drivers that do not provide this ops. It could also impact processing of packets that require no call to rte_flow_get_restore_info() at all. Register a dynamic mbuf flag when an application negotiates tunnel metadata delivery (calling rte_eth_rx_metadata_negotiate() with RTE_ETH_RX_METADATA_TUNNEL_ID). Drivers then advertise that metadata can be extracted by setting this dynamic flag in each mbuf. The application then calls rte_flow_get_restore_info() only when required. Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/ Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Tested-by: Ali Alnubani <alialnu@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> show more ...
# fc3e1798	28-Feb-2023	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: support enhanced CQE zipping in vector Rx burst Add Enhanced CQE compression support to vectorized Rx burst routines. Adopt the same algorithm as scalar Rx burst routines have today. 1. Re net/mlx5: support enhanced CQE zipping in vector Rx burst Add Enhanced CQE compression support to vectorized Rx burst routines. Adopt the same algorithm as scalar Rx burst routines have today. 1. Retrieve the validity_iteration_count from CQEs and use it to check if the CQE is ready to be processed instead of the owner_bit. 2. Do not invalidate reserved CQEs between miniCQE arrays. 3. Copy the title packet from the last processed uncompressed CQE since we will need it later to build packets from zipped CQEs. 4. Skip the regular CQE processing and go straight to the CQE unzip function in case the very first CQE is compressed to sace CPU time. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 1f903ebe	27-Jan-2023	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: check compressed CQE opcode in vectorized Rx The CQE opcode is never checked for a compressed CQE in the vectorized Rx burst routines. It is assumed that compressed CQEs are always valid a net/mlx5: check compressed CQE opcode in vectorized Rx The CQE opcode is never checked for a compressed CQE in the vectorized Rx burst routines. It is assumed that compressed CQEs are always valid and skipped error checking. This is obviously not the case and error CQEs may be compressed together as well. Need to check for the MLX5_CQE_RESP_ERR opcode and mark all the packets as bad ones in the compression session if it is there. Note that this issue is not applicable to the scalar Rx burst. Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> show more ...
# 7be78d02	29-Nov-2021	Josh Soref <jsoref@gmail.com>	fix spelling in comments and strings The tool comes from https://github.com/jsoref Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
# 25ed2ebf	04-Nov-2021	Viacheslav Ovsiienko <viacheslavo@nvidia.com>	net/mlx5: support shared Rx queue port data path When receive packet, mlx5 PMD saves mbuf port number from RxQ data. To support shared RxQ, save port number into RQ context as user index. Received net/mlx5: support shared Rx queue port data path When receive packet, mlx5 PMD saves mbuf port number from RxQ data. To support shared RxQ, save port number into RQ context as user index. Received packet resolve port number from CQE user index which derived from RQ context. Legacy Verbs API doesn't support RQ user index setting, still read from RxQ port number. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# daa02b5c	15-Oct-2021	Olivier Matz <olivier.matz@6wind.com>	mbuf: add namespace to offload flags Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation. Sign mbuf: add namespace to offload flags Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> show more ...
# 6d5735c1	20-Jul-2021	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix meta register conversion for extensive mode Register C is used in the extensive metadata mode number 1 and its width can vary from 0 to 32 bits depending on the kernel usage of it. Th net/mlx5: fix meta register conversion for extensive mode Register C is used in the extensive metadata mode number 1 and its width can vary from 0 to 32 bits depending on the kernel usage of it. There are several issues associated with this mode (dv_xmeta_en=1): 1. The metadata setting assumes that the width is always 16 bits, which is the most common case in this mode. Use the proper mask. 2. The same is true for the modify_field Flow API. 16-bits width is hardcoded for dv_xmeta_en=1. Switch to the register C mask width. 3. Metadata is stored in the most significant bits in CQE in this mode because the registers copy code was not updated during the metadata conversion to the big-endian format. Update this code to avoid shifting the metadata in the datapath. Fixes: b57e414b48 ("net/mlx5: convert meta register to big-endian") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 6e695b0c	08-Jun-2021	Sarosh Arif <sarosh.arif@emumba.com>	net/mlx5: fix typo in vectorized Rx comments Change "returing" to "returning". Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx") Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") net/mlx5: fix typo in vectorized Rx comments Change "returing" to "returning". Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx") Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") Fixes: 3c2ddbd413e3 ("net/mlx5: separate shareable vector functions") Cc: stable@dpdk.org Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com> show more ...
# ff6fcd41	07-Jul-2021	Ruifeng Wang <ruifeng.wang@arm.com>	net/mlx5: remove redundant operations in NEON Rx Mask of entries after the compressed CQE is covered by invalid mask of non-compressed valid CQEs. Hence remove redundant calculation on mask. The cha net/mlx5: remove redundant operations in NEON Rx Mask of entries after the compressed CQE is covered by invalid mask of non-compressed valid CQEs. Hence remove redundant calculation on mask. The change showed slight performance uplift on N1SDP. Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# b57e414b	16-Jun-2021	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: convert meta register to big-endian Metadata were stored in the CPU order (little-endian format on x86), while all the packet header fields are stored in the network order. That caused wro net/mlx5: convert meta register to big-endian Metadata were stored in the CPU order (little-endian format on x86), while all the packet header fields are stored in the network order. That caused wrong results whenever we tried to use metadata value in the modify_field action: bytes were swapped as a result. Convert the metadata value into big-endian format before storing it in the Mellanox NIC to achieve consistent behaviour. Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 4eefb20f	07-Mar-2021	Viacheslav Ovsiienko <viacheslavo@nvidia.com>	net/mlx5: fix Rx metadata leftovers The Rx metadata might use the metadata register C0 to keep the values. The same register C0 might be used by kernel for source vport value handling, kernel uses u net/mlx5: fix Rx metadata leftovers The Rx metadata might use the metadata register C0 to keep the values. The same register C0 might be used by kernel for source vport value handling, kernel uses upper half of the register, leaving the lower half for application usage. In the extended metadata mode 1 (dv_xmeta_en devarg is assigned with value 1) the metadata width is 16 bits only, the Rx datapath code fetched the entire 32-bit value of the metadata register and presented one to application. The patch provides data masking depending on the chosen metadata mode. Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> show more ...
# 71094ae3	29-Oct-2020	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix CQE decompression for Arm and PowerPC The recent Rx code refactoring moved the incrementing of the CQ completion index out of the rxq_cq_decompress_v() function to the rxq_burst_v() fu net/mlx5: fix CQE decompression for Arm and PowerPC The recent Rx code refactoring moved the incrementing of the CQ completion index out of the rxq_cq_decompress_v() function to the rxq_burst_v() function. The advancing of CQ completion index was removed in SSE version only causing Neon and Altivec Rx bursts to stall. Remove the incrementation of CQ completion index for all the architectures in order to fix the stall. Fixes: 1ded26239aa0 ("net/mlx5: refactor vectorized Rx") Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 54c2d46b	01-Nov-2020	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: support flow tag and packet header miniCQEs CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniC net/mlx5: support flow tag and packet header miniCQEs CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniCQE size is only 8 bytes and this limits the ability to successfully keep the compression session in case of various traffic patterns. The current miniCQE format only keeps the compression session alive in case of uniform traffic with the Hash RSS as the only difference. There are requests to keep the compression session in case of tagged traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic. Add 2 new miniCQE formats in order to achieve the best performance for these traffic patterns: Flow Tag and Packet Header miniCQEs. The existing rxq_cqe_comp_en devarg is modified to specify the desired miniCQE format. Specifying 2 selects Flow Tag format for better compression rate in case of RTE Flow Mark traffic. Specifying 3 selects Checksum format (existing format for MPRQ). Specifying 4 selects L3/L4 Header format for better compression rate in case of mixed TCP/UDP and IPv4/IPv6 traffic. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 1ded2623	21-Oct-2020	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: refactor vectorized Rx Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instruct net/mlx5: refactor vectorized Rx Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instructions in a single reusable block is a first preparatory step to implement vectorized Rx burst for MPRQ feature. Pass a pointer to the storage of mbufs directly to the rxq_copy_mbuf_v instead of calculating the pointer inside this function. This is needed for the future vectorized Rx routing which is going to pass a different pointer here. Calculate the number of packets to replenish inside the mlx5_rx_replenish_bulk_mbuf. Containing this logic in one place allows us to do the same for MPRQ case. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 04840ecb	29-Oct-2020	Thomas Monjalon <thomas@monjalon.net>	net/mlx5: switch Rx timestamp to dynamic mbuf field The mbuf timestamp is moved to a dynamic field in order to allow removal of the deprecated static field. The related mbuf flag is also replaced. net/mlx5: switch Rx timestamp to dynamic mbuf field The mbuf timestamp is moved to a dynamic field in order to allow removal of the deprecated static field. The related mbuf flag is also replaced. The dynamic offset and flag are stored in struct mlx5_rxq_data to favor cache locality. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> show more ...
# f0f5d844	23-Sep-2020	Phil Yang <phil.yang@arm.com>	eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remov eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remove them and use rte_io_*mb instead. Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: David Marchand <david.marchand@redhat.com> show more ...
# 4ffab7b9	23-Jul-2020	Viacheslav Ovsiienko <viacheslavo@mellanox.com>	net/mlx5: fix metadata storing for NEON Rx There was the typo introducing the bug, affected the mlx5 vectorized rx_burst on ARM architectures in case if CQE compression was enabled. Fixes: 6c55b622 net/mlx5: fix metadata storing for NEON Rx There was the typo introducing the bug, affected the mlx5 vectorized rx_burst on ARM architectures in case if CQE compression was enabled. Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
# 6f52bd33	22-Jul-2020	Alexander Kozyrev <akozyrev@mellanox.com>	net/mlx5: fix vectorized mini-CQE prefetching There was an optimization work to prefetch all the CQEs before their invalidation. It allowed us to speed up the mini-CQE decompression process by prehe net/mlx5: fix vectorized mini-CQE prefetching There was an optimization work to prefetch all the CQEs before their invalidation. It allowed us to speed up the mini-CQE decompression process by preheating the cache in the vectorized Rx routine. Prefetching of the next mini-CQE, on the other hand, showed no difference in the performance on x86 platform. So, that was removed. Unfortunately this caused the performance drop on ARM. Prefetch the mini-CQE as well as all the soon to be invalidated CQEs to get both CQE and mini-CQE on the hot path. Fixes: 28a4b96321a3 ("net/mlx5: prefetch CQEs for a faster decompression") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
# a2854c4d	16-Jul-2020	Viacheslav Ovsiienko <viacheslavo@mellanox.com>	net/mlx5: convert Rx timestamps in real-time format The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains t net/mlx5: convert Rx timestamps in real-time format The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains the UTC seconds and the lower 32-bit word contains the nanoseconds. This patch detects what format is configured in the NIC and performs the conversion accordingly. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> show more ...
12 3