mlx5_rxtx_vec.h - OpenGrok history log for /dpdk/drivers/net/mlx5/mlx5_rxtx

Revision	Date	Author	Comments
# 295968d1	22-Oct-2021	Ferruh Yigit <ferruh.yigit@intel.com>	ethdev: add namespace Add 'RTE_ETH' namespace to all enums & macros in a backward compatible way. The macros for backward compatibility can be removed in next LTS. Also updated some struct names to ethdev: add namespace Add 'RTE_ETH' namespace to all enums & macros in a backward compatible way. The macros for backward compatibility can be removed in next LTS. Also updated some struct names to have 'rte_eth' prefix. All internal components switched to using new names. Syntax fixed on lines that this patch touches. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Acked-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Wisam Jaddo <wisamm@nvidia.com> Acked-by: Rosen Xu <rosen.xu@intel.com> Acked-by: Chenbo Xia <chenbo.xia@intel.com> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com> Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> show more ...
# 9f1d636f	19-Oct-2021	Michael Baum <michaelba@nvidia.com>	common/mlx5: share MR management Add global shared MR cache as a field of common device structure. Move MR management to use this global cache for all drivers. Signed-off-by: Michael Baum <michaelb common/mlx5: share MR management Add global shared MR cache as a field of common device structure. Move MR management to use this global cache for all drivers. Signed-off-by: Michael Baum <michaelba@nvidia.com> Acked-by: Matan Azrad <matan@nvidia.com> show more ...
# 0f20acbf	21-Oct-2020	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: implement vectorized MPRQ burst MPRQ (Multi-Packet Rx Queue) processes one packet at a time using simple scalar instructions. MPRQ works by posting a single large buffer (consisted of mult net/mlx5: implement vectorized MPRQ burst MPRQ (Multi-Packet Rx Queue) processes one packet at a time using simple scalar instructions. MPRQ works by posting a single large buffer (consisted of multiple fixed-size strides) in order to receive multiple packets at once on this buffer. A Rx packet is then copied to a user-provided mbuf or PMD attaches the Rx packet to the mbuf by the pointer to an external buffer. There is an opportunity to speed up the packet receiving by processing 4 packets simultaneously using SIMD (single instruction, multiple data) extensions. Allocate mbufs in batches for every MPRQ buffer and process the packets in groups of 4 until all the strides are exhausted. Then switch to another MPRQ buffer and repeat the process over again. The vectorized MPRQ burst routine is engaged automatically in case the mprq_en=1 devarg is specified and the vectorization is not disabled explicitly by providing rx_vec_en=0 devarg. There is a limitation: LRO is not supported and scalar MPRQ is selected if it is on. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# 1ded2623	21-Oct-2020	Alexander Kozyrev <akozyrev@nvidia.com>	net/mlx5: refactor vectorized Rx Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instruct net/mlx5: refactor vectorized Rx Move the main processing cycle into a separate function: rxq_cq_process_v. Put the regular rxq_burst_v function to a non-arch specific file. Having all SIMD instructions in a single reusable block is a first preparatory step to implement vectorized Rx burst for MPRQ feature. Pass a pointer to the storage of mbufs directly to the rxq_copy_mbuf_v instead of calculating the pointer inside this function. This is needed for the future vectorized Rx routing which is going to pass a different pointer here. Calculate the number of packets to replenish inside the mlx5_rx_replenish_bulk_mbuf. Containing this logic in one place allows us to do the same for MPRQ case. Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com> show more ...
# f0f5d844	23-Sep-2020	Phil Yang <phil.yang@arm.com>	eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remov eal: remove deprecated coherent IO memory barriers Since the 20.08 release deprecated rte_cio_mb APIs because these APIs provide the same functionality as rte_io_mb APIs on all platforms, so remove them and use rte_io_*mb instead. Signed-off-by: Phil Yang <phil.yang@arm.com> Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Acked-by: David Marchand <david.marchand@redhat.com> show more ...
# b8dc6b0e	13-Apr-2020	Vu Pham <vuhuong@mellanox.com>	common/mlx5: refactor memory management Refactor common memory btree and cache management to common driver. Replace some input parameters of MR APIs to more common data structure like PD, port_id, s common/mlx5: refactor memory management Refactor common memory btree and cache management to common driver. Replace some input parameters of MR APIs to more common data structure like PD, port_id, share_cache,... so that multiple PMD drivers can use those MR APIs. Modify mlx5 net pmd driver to use MR management APIs from common driver. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
# 8e46d4e1	30-Jan-2020	Alexander Kozyrev <akozyrev@mellanox.com>	common/mlx5: improve assert control Use the MLX5_ASSERT macros instead of the standard assert clause. Depends on the RTE_LIBRTE_MLX5_DEBUG configuration option to define it. If RTE_LIBRTE_MLX5_DEBUG common/mlx5: improve assert control Use the MLX5_ASSERT macros instead of the standard assert clause. Depends on the RTE_LIBRTE_MLX5_DEBUG configuration option to define it. If RTE_LIBRTE_MLX5_DEBUG is enabled MLX5_ASSERT is equal to RTE_VERIFY to bypass the global CONFIG_RTE_ENABLE_ASSERT option. If RTE_LIBRTE_MLX5_DEBUG is disabled, the global CONFIG_RTE_ENABLE_ASSERT can still make this assert active by calling RTE_VERIFY inside RTE_ASSERT. Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
# 7b4f1e6b	29-Jan-2020	Matan Azrad <matan@mellanox.com>	common/mlx5: introduce common library A new Mellanox vdpa PMD will be added to support vdpa operations by Mellanox adapters. This vdpa PMD design includes mlx5_glue and mlx5_devx operations and lar common/mlx5: introduce common library A new Mellanox vdpa PMD will be added to support vdpa operations by Mellanox adapters. This vdpa PMD design includes mlx5_glue and mlx5_devx operations and large parts of them are shared with the net/mlx5 PMD. Create a new common library in drivers/common for mlx5 PMDs. Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5 common library in drivers/common. The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c, mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to drivers/common/mlx5. Share the log mechanism macros. Separate also the log mechanism to allow different log level control to the common library. Build files and version files are adjusted accordingly. Include lines are adjusted accordingly. Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
# bdb8e5b1	20-Jan-2020	Viacheslav Ovsiienko <viacheslavo@mellanox.com>	net/mlx5: allow allocated mbuf with external buffer In the Rx datapath the flags in the newly allocated mbufs are all explicitly cleared but the EXT_ATTACHED_MBUF must be preserved. It would allow t net/mlx5: allow allocated mbuf with external buffer In the Rx datapath the flags in the newly allocated mbufs are all explicitly cleared but the EXT_ATTACHED_MBUF must be preserved. It would allow to use mbuf pools with pre-attached external data buffers. The vectorized rx_burst routines are updated in order to inherit the EXT_ATTACHED_MBUF from mbuf pool private RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF flag. Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Matan Azrad <matan@mellanox.com> show more ...
# 9bf26e13	05-Nov-2019	Viacheslav Ovsiienko <viacheslavo@mellanox.com>	ethdev: move egress metadata to dynamic field The dynamic mbuf fields were introduced by [1]. The egress metadata is good candidate to be moved from statically allocated field tx_metadata to dynamic ethdev: move egress metadata to dynamic field The dynamic mbuf fields were introduced by [1]. The egress metadata is good candidate to be moved from statically allocated field tx_metadata to dynamic one. Because mbufs are used in half-duplex fashion only, it is safe to share this dynamic field with ingress metadata. The shared dynamic field contains either egress (if application going to transmit mbuf with tx_burst) or ingress (if mbuf is received with rx_burst) metadata and can be accessed by RTE_FLOW_DYNF_METADATA() macro or with rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper routines. PKT_TX_DYNF_METADATA/PKT_RX_DYNF_METADATA flag will be set along with the data. The mbuf dynamic field must be registered by calling rte_flow_dynf_metadata_register() prior accessing the data. The availability of dynamic mbuf metadata field can be checked with rte_flow_dynf_metadata_avail() routine. DEV_TX_OFFLOAD_MATCH_METADATA offload and configuration flag is removed. The metadata support in PMDs is engaged on dynamic field registration. Metadata feature is getting complex. We might have some set of actions and items that might be supported by PMDs in multiple combinations, the supported values and masks are the subjects to query by perfroming trials (with rte_flow_validate). [1] http://patches.dpdk.org/patch/62040/ Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com> Acked-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Ori Kam <orika@mellanox.com> show more ...
# 8b8f7994	22-Jul-2019	Matan Azrad <matan@mellanox.com>	net/mlx5: update LRO fields in completion entry Update the CQE structure to include LRO fields. Some reserved values were changed, hence also data-path code used the reserved values were updated ac net/mlx5: update LRO fields in completion entry Update the CQE structure to include LRO fields. Some reserved values were changed, hence also data-path code used the reserved values were updated accordingly. Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com> show more ...
# 9c55c6bd	25-Mar-2019	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: revert mbuf address calculation for x86 When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be loaded. non-x86 processors (mostly RISC such as ARM and Power) are more vul net/mlx5: revert mbuf address calculation for x86 When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be loaded. non-x86 processors (mostly RISC such as ARM and Power) are more vulnerable to load stall. For x86, reducing the number of instructions seems to matter most. For x86, this is simply a load but for other architectures, it is calculated from the address of mbuf structure by rte_mbuf_buf_addr() without having to load the first cacheline of the mbuf. Fixes: 12d468a62bc1 ("net/mlx5: fix instruction hotspot on replenishing Rx buffer") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> show more ...
# 12d468a6	14-Jan-2019	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: fix instruction hotspot on replenishing Rx buffer On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed to be accessed as it is static and easily calculated from the mb net/mlx5: fix instruction hotspot on replenishing Rx buffer On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed to be accessed as it is static and easily calculated from the mbuf address. Accessing the mbuf content causes unnecessary load stall and it is worsened on ARM. Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> show more ...
# 6bd7fbd0	23-Oct-2018	Dekel Peled <dekelp@mellanox.com>	net/mlx5: support metadata as flow rule criteria As described in series starting at [1], it adds option to set metadata value as match pattern when creating a new flow rule. This patch adds metadat net/mlx5: support metadata as flow rule criteria As described in series starting at [1], it adds option to set metadata value as match pattern when creating a new flow rule. This patch adds metadata support in mlx5 driver, in two parts: - Add the validation and setting of metadata value in matcher, when creating a new flow rule. - Add the passing of metadata value from mbuf to wqe when indicated by ol_flag, in different burst functions. [1] "ethdev: support metadata as flow rule criteria" http://mails.dpdk.org/archives/dev/2018-September/113269.html Signed-off-by: Dekel Peled <dekelp@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> show more ...
# e10245a1	26-Jun-2018	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: fix Rx buffer replenishment threshold The threshold of buffer replenishment for vectorized Rx burst is a constant value (64). If the size of Rx queue is comparatively small, device could r net/mlx5: fix Rx buffer replenishment threshold The threshold of buffer replenishment for vectorized Rx burst is a constant value (64). If the size of Rx queue is comparatively small, device could run out of buffers. For example, if the size of Rx queue is 128, buffers are replenished only twice per a wraparound. This can cause jitter in receiving packets and the jitter can cause unnecessary retransmission for TCP connections. Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86") Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") Cc: stable@dpdk.org Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> show more ...
# 7d6bf6b8	09-May-2018	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: add Multi-Packet Rx support Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffe net/mlx5: add Multi-Packet Rx support Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth by posting a single large buffer for multiple packets. Instead of posting a buffer per a packet, one large buffer is posted in order to receive multiple packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides and each stride receives one packet. Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is comparatively small, or PMD attaches the Rx packet to the mbuf by external buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external buffers will be allocated and managed by PMD. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Shahaf Shuler <shahafs@mellanox.com> show more ...
# 974f1e7e	09-May-2018	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: add new memory region support This is the new design of Memory Region (MR) for mlx PMD, in order to: - Accommodate the new memory hotplug model. - Support non-contiguous Mempool. There ar net/mlx5: add new memory region support This is the new design of Memory Region (MR) for mlx PMD, in order to: - Accommodate the new memory hotplug model. - Support non-contiguous Mempool. There are multiple layers for MR search. L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized array by linear search. L0/L1 is in an inline function - mlx5_mr_lookup_cache(). If L1 misses, the bottom-half function is called to look up the address from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh() and it is not an inline function. Data structure for L2 is the Binary Tree. If L2 misses, the search falls into the slowest path which takes locks in order to access global device cache (priv->mr.cache) which is also a B-tree and caches the original MR list (priv->mr.mr_list) of the device. Unless the global cache is overflowed, it is all-inclusive of the MR list. This is L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and can't be expanded on the fly due to deadlock. Refer to the comments in the code for the details - mr_lookup_dev(). If L3 is overflowed, the list will have to be searched directly bypassing the cache although it is slower. If L3 misses, a new MR for the address should be created - mlx5_mr_create(). When it creates a new MR, it tries to register adjacent memsegs as much as possible which are virtually contiguous around the address. This must take two locks - memory_hotplug_lock and priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any allocation/free of memory inside. In the free callback of the memory hotplug event, freed space is searched from the MR list and corresponding bits are cleared from the bitmap of MRs. This can fragment a MR and the MR will have multiple search entries in the caches. Once there's a change by the event, the global cache must be rebuilt and all the per-queue caches will be flushed as well. If memory is frequently freed in run-time, that may cause jitter on dataplane processing in the worst case by incurring MR cache flush and rebuild. But, it would be the least probable scenario. To guarantee the most optimal performance, it is highly recommended to use an EAL option - '--socket-mem'. Then, the reserved memory will be pinned and won't be freed dynamically. And it is also recommended to configure per-lcore cache of Mempool. Even though there're many MRs for a device or MRs are highly fragmented, the cache of Mempool will be much helpful to reduce misses on per-queue caches anyway. '--legacy-mem' is also supported. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> show more ...
# 5feecc57	20-Mar-2018	Shahaf Shuler <shahafs@mellanox.com>	align SPDX Mellanox copyrights Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> align SPDX Mellanox copyrights Aligning Mellanox SPDX copyrights to a single format. In addition replace to SPDX licence files which were missed. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> show more ...
# 8fd92a66	29-Jan-2018	Olivier Matz <olivier.matz@6wind.com>	net/mlx5: use SPDX tags in 6WIND copyrighted files Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Bruce Richardson <bruce.rich net/mlx5: use SPDX tags in 6WIND copyrighted files Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Thomas Monjalon <thomas@monjalon.net> show more ...
# 4fe7f662	25-Jan-2018	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: replace I/O memory barrier with coherent version Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
# dbccb4cd	10-Jan-2018	Shahaf Shuler <shahafs@mellanox.com>	net/mlx5: convert to new Tx offloads API Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit support the new Tx offloads API. Sig net/mlx5: convert to new Tx offloads API Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit support the new Tx offloads API. Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> show more ...
# 03e0868b	10-Oct-2017	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: fix deadlock due to buffered slots in Rx SW ring When replenishing Rx ring, there're always buffered slots reserved between consumed entries and HW owned entries. These have to be filled w net/mlx5: fix deadlock due to buffered slots in Rx SW ring When replenishing Rx ring, there're always buffered slots reserved between consumed entries and HW owned entries. These have to be filled with fake mbufs to protect from possible overflow rather than optimistically expecting successful replenishment which can cause deadlock with small-sized queue. Fixes: fc048bd52cb7 ("net/mlx5: fix overflow of Rx SW ring") Cc: stable@dpdk.org Reported-by: Martin Weiser <martin.weiser@allegro-packets.com> Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Tested-by: Martin Weiser <martin.weiser@allegro-packets.com> show more ...
# 570acdb1	09-Oct-2017	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: add vectorized Rx/Tx burst for ARM Brings vectorization through NEON instructions. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
# 3c2ddbd4	09-Oct-2017	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: separate shareable vector functions Considering more architecture (e.g. ARM and PowerPC) will be added for vectorized Rx/Tx burst, all the shareable functions which don't use any vector in net/mlx5: separate shareable vector functions Considering more architecture (e.g. ARM and PowerPC) will be added for vectorized Rx/Tx burst, all the shareable functions which don't use any vector intrinsics need to be separated from architecture-dependent functions. All the vector functions for x86 SSE are moved to a new header file - mlx5_rxtx_vec_sse.h. And shareable common functions are now in mlx5_rxtx_vec.c. Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> show more ...
# 5bfc9fc1	09-Oct-2017	Yongseok Koh <yskoh@mellanox.com>	net/mlx5: use static assert for compile-time sanity checks Replace compile-time sanity check with static_assert() as c11 standard has been set. Add mlx5_rxtx_vec.h and move the sanity checks to the net/mlx5: use static assert for compile-time sanity checks Replace compile-time sanity check with static_assert() as c11 standard has been set. Add mlx5_rxtx_vec.h and move the sanity checks to the file Signed-off-by: Yongseok Koh <yskoh@mellanox.com> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com> show more ...