mlx5_rxtx_vec_sse.h - OpenGrok history log for /dpdk/drivers/net/mlx5/mlx5_rxtx_vec

Revision	Date	Author	Comments
# 43fd3624	21-Jan-2025	Andre Muezerie <andremue@linux.microsoft.com>	drivers: replace GCC pragma with cast "GCC diagnostic ignored" pragmas have been commonly sprinkled over the code. Clang supports GCC's pragma for compatibility with existing source code, so #pragma drivers: replace GCC pragma with cast "GCC diagnostic ignored" pragmas have been commonly sprinkled over the code. Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> show more ...
# a7ae9ba1	13-Nov-2024	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix miniCQEs number calculation Use the information from the CQE, not from the title packet, for getting the number of miniCQEs in the compressed CQEs array. This way we can avoid segfault net/mlx5: fix miniCQEs number calculation Use the information from the CQE, not from the title packet, for getting the number of miniCQEs in the compressed CQEs array. This way we can avoid segfaults in the rxq_cq_decompress_v() in case of mbuf corruption (due to double mbuf free, for example). Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 3638f431	28-Oct-2024	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix shared queue port number in vector Rx Wrong CQE is used to get the shared Rx queue port number in vectorized Rx burst routine. Fix the CQE indexing. Fixes: 25ed2ebff131 ("net/mlx5: su net/mlx5: fix shared queue port number in vector Rx Wrong CQE is used to get the shared Rx queue port number in vectorized Rx burst routine. Fix the CQE indexing. Fixes: 25ed2ebff131 ("net/mlx5: support shared Rx queue port data path") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 90ec9b0d	01-Nov-2023	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: replenish MPRQ buffers for miniCQEs Keep unzipping if the next CQE is the miniCQE array in rxq_cq_decompress_v() routine only for non-MPRQ scenario, MPRQ requires buffer replenishment betw net/mlx5: replenish MPRQ buffers for miniCQEs Keep unzipping if the next CQE is the miniCQE array in rxq_cq_decompress_v() routine only for non-MPRQ scenario, MPRQ requires buffer replenishment between the miniCQEs. Restore the check for the initial compressed CQE for SPRQ and check that the current CQE is not compressed before copying it as a possible title CQE. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com> show more ...
# 393ff728	20-Jun-2024	Bruce Richardson <bruce.richardson@intel.com>	drivers/net: replace intrinsic header include with rte_vect Rather than having the SSE code in each driver include tmmintrin.h, which often does not contain all needed intrinsics, e.g. _mm_cvtsi128_ drivers/net: replace intrinsic header include with rte_vect Rather than having the SSE code in each driver include tmmintrin.h, which often does not contain all needed intrinsics, e.g. _mm_cvtsi128_si64() for 32-bit x86 builds, we can just replace the include of ?mmintrin.h with rte_vect.h for all network drivers. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> show more ...
# 3d4e27fd	25-Aug-2023	David Marchand <david.marchand@redhat.com>	use abstracted bit count functions Now that DPDK provides such bit count functions, make use of them. This patch was prepared with a "brutal" commandline: $ old=__builtin_clzll; new=rte_clz64; g use abstracted bit count functions Now that DPDK provides such bit count functions, make use of them. This patch was prepared with a "brutal" commandline: $ old=__builtin_clzll; new=rte_clz64; git grep -lw $old :^lib/eal/include/rte_bitops.h \| xargs sed -i -e "s#\<$old\>#$new#g" $ old=__builtin_clz; new=rte_clz32; git grep -lw $old :^lib/eal/include/rte_bitops.h \| xargs sed -i -e "s#\<$old\>#$new#g" $ old=__builtin_ctzll; new=rte_ctz64; git grep -lw $old :^lib/eal/include/rte_bitops.h \| xargs sed -i -e "s#\<$old\>#$new#g" $ old=__builtin_ctz; new=rte_ctz32; git grep -lw $old :^lib/eal/include/rte_bitops.h \| xargs sed -i -e "s#\<$old\>#$new#g" $ old=__builtin_popcountll; new=rte_popcount64; git grep -lw $old :^lib/eal/include/rte_bitops.h \| xargs sed -i -e "s#\<$old\>#$new#g" $ old=__builtin_popcount; new=rte_popcount32; git grep -lw $old :^lib/eal/include/rte_bitops.h \| xargs sed -i -e "s#\<$old\>#$new#g" Then inclusion of rte_bitops.h was added were necessary. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> show more ...
# fca8cba4	21-Jun-2023	David Marchand <david.marchand@redhat.com>	ethdev: advertise flow restore in mbuf As reported by Ilya [1], unconditionally calling rte_flow_get_restore_info() impacts an application performance for drivers that do not provide this ops. It co ethdev: advertise flow restore in mbuf As reported by Ilya [1], unconditionally calling rte_flow_get_restore_info() impacts an application performance for drivers that do not provide this ops. It could also impact processing of packets that require no call to rte_flow_get_restore_info() at all. Register a dynamic mbuf flag when an application negotiates tunnel metadata delivery (calling rte_eth_rx_metadata_negotiate() with RTE_ETH_RX_METADATA_TUNNEL_ID). Drivers then advertise that metadata can be extracted by setting this dynamic flag in each mbuf. The application then calls rte_flow_get_restore_info() only when required. Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/ Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Tested-by: Ali Alnubani <alialnu@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com> show more ...
# fc3e1798	28-Feb-2023	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: support enhanced CQE zipping in vector Rx burst Add Enhanced CQE compression support to vectorized Rx burst routines. Adopt the same algorithm as scalar Rx burst routines have today. 1. Re net/mlx5: support enhanced CQE zipping in vector Rx burst Add Enhanced CQE compression support to vectorized Rx burst routines. Adopt the same algorithm as scalar Rx burst routines have today. 1. Retrieve the validity_iteration_count from CQEs and use it to check if the CQE is ready to be processed instead of the owner_bit. 2. Do not invalidate reserved CQEs between miniCQE arrays. 3. Copy the title packet from the last processed uncompressed CQE since we will need it later to build packets from zipped CQEs. 4. Skip the regular CQE processing and go straight to the CQE unzip function in case the very first CQE is compressed to sace CPU time. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 1f903ebe	27-Jan-2023	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: check compressed CQE opcode in vectorized Rx The CQE opcode is never checked for a compressed CQE in the vectorized Rx burst routines. It is assumed that compressed CQEs are always valid a net/mlx5: check compressed CQE opcode in vectorized Rx The CQE opcode is never checked for a compressed CQE in the vectorized Rx burst routines. It is assumed that compressed CQEs are always valid and skipped error checking. This is obviously not the case and error CQEs may be compressed together as well. Need to check for the MLX5_CQE_RESP_ERR opcode and mark all the packets as bad ones in the compression session if it is there. Note that this issue is not applicable to the scalar Rx burst. Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> show more ...
# 7be78d02	29-Nov-2021	Josh Soref <jsoref@gmail.com>	fix spelling in comments and strings The tool comes from https://github.com/jsoref Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
# 25ed2ebf	04-Nov-2021	Viacheslav Ovsiienko <viacheslavo@nvidia.com>	net/mlx5: support shared Rx queue port data path When receive packet, mlx5 PMD saves mbuf port number from RxQ data. To support shared RxQ, save port number into RQ context as user index. Received net/mlx5: support shared Rx queue port data path When receive packet, mlx5 PMD saves mbuf port number from RxQ data. To support shared RxQ, save port number into RQ context as user index. Received packet resolve port number from CQE user index which derived from RQ context. Legacy Verbs API doesn't support RQ user index setting, still read from RxQ port number. Signed-off-by: Xueming Li <xuemingl@nvidia.com> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# daa02b5c	15-Oct-2021	Olivier Matz <olivier.matz@6wind.com>	mbuf: add namespace to offload flags Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation. Sign mbuf: add namespace to offload flags Fix the mbuf offload flags namespace by adding an RTE_ prefix to the name. The old flags remain usable, but a deprecation warning is issued at compilation. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> show more ...
# 6d5735c1	20-Jul-2021	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix meta register conversion for extensive mode Register C is used in the extensive metadata mode number 1 and its width can vary from 0 to 32 bits depending on the kernel usage of it. Th net/mlx5: fix meta register conversion for extensive mode Register C is used in the extensive metadata mode number 1 and its width can vary from 0 to 32 bits depending on the kernel usage of it. There are several issues associated with this mode (dv_xmeta_en=1): 1. The metadata setting assumes that the width is always 16 bits, which is the most common case in this mode. Use the proper mask. 2. The same is true for the modify_field Flow API. 16-bits width is hardcoded for dv_xmeta_en=1. Switch to the register C mask width. 3. Metadata is stored in the most significant bits in CQE in this mode because the registers copy code was not updated during the metadata conversion to the big-endian format. Update this code to avoid shifting the metadata in the datapath. Fixes: b57e414b48 ("net/mlx5: convert meta register to big-endian") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 6e695b0c	08-Jun-2021	Sarosh Arif <sarosh.arif@emumba.com>	net/mlx5: fix typo in vectorized Rx comments Change "returing" to "returning". Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx") Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") net/mlx5: fix typo in vectorized Rx comments Change "returing" to "returning". Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx") Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") Fixes: 3c2ddbd413e3 ("net/mlx5: separate shareable vector functions") Cc: stable@dpdk.org Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com> show more ...
# b57e414b	16-Jun-2021	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: convert meta register to big-endian Metadata were stored in the CPU order (little-endian format on x86), while all the packet header fields are stored in the network order. That caused wro net/mlx5: convert meta register to big-endian Metadata were stored in the CPU order (little-endian format on x86), while all the packet header fields are stored in the network order. That caused wrong results whenever we tried to use metadata value in the modify_field action: bytes were swapped as a result. Convert the metadata value into big-endian format before storing it in the Mellanox NIC to achieve consistent behaviour. Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 4eefb20f	07-Mar-2021	Viacheslav Ovsiienko <viacheslavo@nvidia.com>	net/mlx5: fix Rx metadata leftovers The Rx metadata might use the metadata register C0 to keep the values. The same register C0 might be used by kernel for source vport value handling, kernel uses u net/mlx5: fix Rx metadata leftovers The Rx metadata might use the metadata register C0 to keep the values. The same register C0 might be used by kernel for source vport value handling, kernel uses upper half of the register, leaving the lower half for application usage. In the extended metadata mode 1 (dv_xmeta_en devarg is assigned with value 1) the metadata width is 16 bits only, the Rx datapath code fetched the entire 32-bit value of the metadata register and presented one to application. The patch provides data masking depending on the chosen metadata mode. Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues") Cc: stable@dpdk.org Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> show more ...
# bd0940a5	14-Jan-2021	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: fix flow tag decompression Packets can get a wrong Flow Tag on x86 architecture with the Flow Tag compression format (rxq_cqe_comp_en=2) enabled inside the SSE Rx burst. The shuffle mask t net/mlx5: fix flow tag decompression Packets can get a wrong Flow Tag on x86 architecture with the Flow Tag compression format (rxq_cqe_comp_en=2) enabled inside the SSE Rx burst. The shuffle mask that extracts a Flow Tag from the pair of compressed CQEs is reversed. This leads to the wrong Flow Tag assignment. Correct the shuffle mask to get proper bytes for a Flow Tag from miniCQEs. Fixes: 54c2d46b160f ("net/mlx5: support flow tag and packet header miniCQEs") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 54c2d46b	01-Nov-2020	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: support flow tag and packet header miniCQEs CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniC net/mlx5: support flow tag and packet header miniCQEs CQE compression allows us to save the PCI bandwidth and improve the performance by compressing several CQEs together to a miniCQE. But the miniCQE size is only 8 bytes and this limits the ability to successfully keep the compression session in case of various traffic patterns. The current miniCQE format only keeps the compression session alive in case of uniform traffic with the Hash RSS as the only difference. There are requests to keep the compression session in case of tagged traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic. Add 2 new miniCQE formats in order to achieve the best performance for these traffic patterns: Flow Tag and Packet Header miniCQEs. The existing rxq_cqe_comp_en devarg is modified to specify the desired miniCQE format. Specifying 2 selects Flow Tag format for better compression rate in case of RTE Flow Mark traffic. Specifying 3 selects Checksum format (existing format for MPRQ). Specifying 4 selects L3/L4 Header format for better compression rate in case of mixed TCP/UDP and IPv4/IPv6 traffic. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 1ded2623	21-Oct-2020	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: refactor vectorized Rx Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instruct net/mlx5: refactor vectorized Rx Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instructions in a single reusable block is a first preparatory step to implement vectorized Rx burst for MPRQ feature. Pass a pointer to the storage of mbufs directly to the rxq_copy_mbuf_v instead of calculating the pointer inside this function. This is needed for the future vectorized Rx routing which is going to pass a different pointer here. Calculate the number of packets to replenish inside the mlx5_rx_replenish_bulk_mbuf. Containing this logic in one place allows us to do the same for MPRQ case. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 04840ecb	29-Oct-2020	Thomas Monjalon <thomas@monjalon.net>	net/mlx5: switch Rx timestamp to dynamic mbuf field The mbuf timestamp is moved to a dynamic field in order to allow removal of the deprecated static field. The related mbuf flag is also replaced. net/mlx5: switch Rx timestamp to dynamic mbuf field The mbuf timestamp is moved to a dynamic field in order to allow removal of the deprecated static field. The related mbuf flag is also replaced. The dynamic offset and flag are stored in struct mlx5_rxq_data to favor cache locality. Signed-off-by: Thomas Monjalon <thomas@monjalon.net> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: David Christensen <drc@linux.vnet.ibm.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> Acked-by: David Marchand <david.marchand@redhat.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> show more ...
# f0f5d844	23-Sep-2020	Phil Yang <phil.yang@arm.com>	eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remov eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remove them and use rte_io_*mb instead. Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: David Marchand <david.marchand@redhat.com> show more ...
# 6f52bd33	22-Jul-2020	Alexander Kozyrev <akozyrev@mellanox.com>	net/mlx5: fix vectorized mini-CQE prefetching There was an optimization work to prefetch all the CQEs before their invalidation. It allowed us to speed up the mini-CQE decompression process by prehe net/mlx5: fix vectorized mini-CQE prefetching There was an optimization work to prefetch all the CQEs before their invalidation. It allowed us to speed up the mini-CQE decompression process by preheating the cache in the vectorized Rx routine. Prefetching of the next mini-CQE, on the other hand, showed no difference in the performance on x86 platform. So, that was removed. Unfortunately this caused the performance drop on ARM. Prefetch the mini-CQE as well as all the soon to be invalidated CQEs to get both CQE and mini-CQE on the hot path. Fixes: 28a4b96321a3 ("net/mlx5: prefetch CQEs for a faster decompression") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
# a2854c4d	16-Jul-2020	Viacheslav Ovsiienko <viacheslavo@mellanox.com>	net/mlx5: convert Rx timestamps in real-time format The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains t net/mlx5: convert Rx timestamps in real-time format The ConnectX-6DX supports the timestamps in various formats, the new realtime format is introduced - the upper 32-bit word of timestamp contains the UTC seconds and the lower 32-bit word contains the nanoseconds. This patch detects what format is configured in the NIC and performs the conversion accordingly. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> show more ...
# c9cc554b	02-Jun-2020	Alexander Kozyrev <akozyrev@mellanox.com>	net/mlx5: fix vectorized Rx burst termination Maximum burst size of Vectorized Rx burst routine is set to MLX5_VPMD_RX_MAX_BURST(64). This limits the performance of any application that would like t net/mlx5: fix vectorized Rx burst termination Maximum burst size of Vectorized Rx burst routine is set to MLX5_VPMD_RX_MAX_BURST(64). This limits the performance of any application that would like to gather more than 64 packets from the single Rx burst for batch processing (i.e. VPP). The situation gets worse with a mix of zipped and unzipped CQEs. They are processed separately and the Rx burst function returns small number of packets every call. Repeat the cycle of gathering packets from the vectorized Rx routine until a requested number of packets are collected or there are no more CQEs left to process. Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> show more ...
# 0c555915	27-Apr-2020	Alexander Kozyrev <akozyrev@mellanox.com>	net/mlx5: fix assert in dynamic metadata handling The assert in dynamic flow metadata handling is wrong after the fix for the performance degradation. The assert meant to check the metadata mask but net/mlx5: fix assert in dynamic metadata handling The assert in dynamic flow metadata handling is wrong after the fix for the performance degradation. The assert meant to check the metadata mask but was updated with the metadata offset instead. Fix this assert and restore proper metadata mask checking. Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues") Cc: stable@dpdk.org Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
12 3