History log of /dpdk/drivers/net/mlx5/mlx5_rxtx_vec.h (Results 1 – 25 of 25)
Revision Date Author Comments
# 295968d1 22-Oct-2021 Ferruh Yigit <ferruh.yigit@intel.com>

ethdev: add namespace

Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to

ethdev: add namespace

Add 'RTE_ETH' namespace to all enums & macros in a backward compatible
way. The macros for backward compatibility can be removed in next LTS.
Also updated some struct names to have 'rte_eth' prefix.

All internal components switched to using new names.

Syntax fixed on lines that this patch touches.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
Acked-by: Rosen Xu <rosen.xu@intel.com>
Acked-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>

show more ...


# 9f1d636f 19-Oct-2021 Michael Baum <michaelba@nvidia.com>

common/mlx5: share MR management

Add global shared MR cache as a field of common device structure.
Move MR management to use this global cache for all drivers.

Signed-off-by: Michael Baum <michaelb

common/mlx5: share MR management

Add global shared MR cache as a field of common device structure.
Move MR management to use this global cache for all drivers.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

show more ...


# 0f20acbf 21-Oct-2020 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: implement vectorized MPRQ burst

MPRQ (Multi-Packet Rx Queue) processes one packet at a time using
simple scalar instructions. MPRQ works by posting a single large buffer
(consisted of mult

net/mlx5: implement vectorized MPRQ burst

MPRQ (Multi-Packet Rx Queue) processes one packet at a time using
simple scalar instructions. MPRQ works by posting a single large buffer
(consisted of multiple fixed-size strides) in order to receive multiple
packets at once on this buffer. A Rx packet is then copied to a
user-provided mbuf or PMD attaches the Rx packet to the mbuf by the
pointer to an external buffer.

There is an opportunity to speed up the packet receiving by processing
4 packets simultaneously using SIMD (single instruction, multiple data)
extensions. Allocate mbufs in batches for every MPRQ buffer and process
the packets in groups of 4 until all the strides are exhausted. Then
switch to another MPRQ buffer and repeat the process over again.

The vectorized MPRQ burst routine is engaged automatically in case
the mprq_en=1 devarg is specified and the vectorization is not disabled
explicitly by providing rx_vec_en=0 devarg. There is a limitation:
LRO is not supported and scalar MPRQ is selected if it is on.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 1ded2623 21-Oct-2020 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: refactor vectorized Rx

Move the main processing cycle into a separate function:
rxq_cq_process_v. Put the regular rxq_burst_v function
to a non-arch specific file. Having all SIMD instruct

net/mlx5: refactor vectorized Rx

Move the main processing cycle into a separate function:
rxq_cq_process_v. Put the regular rxq_burst_v function
to a non-arch specific file. Having all SIMD instructions
in a single reusable block is a first preparatory step to
implement vectorized Rx burst for MPRQ feature.

Pass a pointer to the storage of mbufs directly to the
rxq_copy_mbuf_v instead of calculating the pointer inside
this function. This is needed for the future vectorized Rx
routing which is going to pass a different pointer here.

Calculate the number of packets to replenish inside the
mlx5_rx_replenish_bulk_mbuf. Containing this logic in one
place allows us to do the same for MPRQ case.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# f0f5d844 23-Sep-2020 Phil Yang <phil.yang@arm.com>

eal: remove deprecated coherent IO memory barriers

Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs
provide the same functionality as rte_io_*mb APIs on all platforms, so
remov

eal: remove deprecated coherent IO memory barriers

Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs
provide the same functionality as rte_io_*mb APIs on all platforms, so
remove them and use rte_io_*mb instead.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: David Marchand <david.marchand@redhat.com>

show more ...


# b8dc6b0e 13-Apr-2020 Vu Pham <vuhuong@mellanox.com>

common/mlx5: refactor memory management

Refactor common memory btree and cache management to common driver.
Replace some input parameters of MR APIs to more common data structure
like PD, port_id, s

common/mlx5: refactor memory management

Refactor common memory btree and cache management to common driver.
Replace some input parameters of MR APIs to more common data structure
like PD, port_id, share_cache,... so that multiple PMD drivers can
use those MR APIs.

Modify mlx5 net pmd driver to use MR management APIs from common driver.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

show more ...


# 8e46d4e1 30-Jan-2020 Alexander Kozyrev <akozyrev@mellanox.com>

common/mlx5: improve assert control

Use the MLX5_ASSERT macros instead of the standard assert clause.
Depends on the RTE_LIBRTE_MLX5_DEBUG configuration option to define it.
If RTE_LIBRTE_MLX5_DEBUG

common/mlx5: improve assert control

Use the MLX5_ASSERT macros instead of the standard assert clause.
Depends on the RTE_LIBRTE_MLX5_DEBUG configuration option to define it.
If RTE_LIBRTE_MLX5_DEBUG is enabled MLX5_ASSERT is equal to RTE_VERIFY
to bypass the global CONFIG_RTE_ENABLE_ASSERT option.
If RTE_LIBRTE_MLX5_DEBUG is disabled, the global CONFIG_RTE_ENABLE_ASSERT
can still make this assert active by calling RTE_VERIFY inside RTE_ASSERT.

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

show more ...


# 7b4f1e6b 29-Jan-2020 Matan Azrad <matan@mellanox.com>

common/mlx5: introduce common library

A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.

This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
lar

common/mlx5: introduce common library

A new Mellanox vdpa PMD will be added to support vdpa operations by
Mellanox adapters.

This vdpa PMD design includes mlx5_glue and mlx5_devx operations and
large parts of them are shared with the net/mlx5 PMD.

Create a new common library in drivers/common for mlx5 PMDs.
Move mlx5_glue, mlx5_devx_cmds and their dependencies to the new mlx5
common library in drivers/common.

The files mlx5_devx_cmds.c, mlx5_devx_cmds.h, mlx5_glue.c,
mlx5_glue.h and mlx5_prm.h are moved as is from drivers/net/mlx5 to
drivers/common/mlx5.

Share the log mechanism macros.
Separate also the log mechanism to allow different log level control to
the common library.

Build files and version files are adjusted accordingly.
Include lines are adjusted accordingly.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

show more ...


# bdb8e5b1 20-Jan-2020 Viacheslav Ovsiienko <viacheslavo@mellanox.com>

net/mlx5: allow allocated mbuf with external buffer

In the Rx datapath the flags in the newly allocated mbufs
are all explicitly cleared but the EXT_ATTACHED_MBUF must be
preserved. It would allow t

net/mlx5: allow allocated mbuf with external buffer

In the Rx datapath the flags in the newly allocated mbufs
are all explicitly cleared but the EXT_ATTACHED_MBUF must be
preserved. It would allow to use mbuf pools with pre-attached
external data buffers.

The vectorized rx_burst routines are updated in order to
inherit the EXT_ATTACHED_MBUF from mbuf pool private
RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF flag.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>

show more ...


# 9bf26e13 05-Nov-2019 Viacheslav Ovsiienko <viacheslavo@mellanox.com>

ethdev: move egress metadata to dynamic field

The dynamic mbuf fields were introduced by [1]. The egress metadata is
good candidate to be moved from statically allocated field tx_metadata to
dynamic

ethdev: move egress metadata to dynamic field

The dynamic mbuf fields were introduced by [1]. The egress metadata is
good candidate to be moved from statically allocated field tx_metadata to
dynamic one. Because mbufs are used in half-duplex fashion only, it is
safe to share this dynamic field with ingress metadata.

The shared dynamic field contains either egress (if application going to
transmit mbuf with tx_burst) or ingress (if mbuf is received with rx_burst)
metadata and can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_TX_DYNF_METADATA/PKT_RX_DYNF_METADATA flag will be set
along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior accessing the data.

The availability of dynamic mbuf metadata field can be checked with
rte_flow_dynf_metadata_avail() routine.

DEV_TX_OFFLOAD_MATCH_METADATA offload and configuration flag is removed.
The metadata support in PMDs is engaged on dynamic field registration.

Metadata feature is getting complex. We might have some set of actions
and items that might be supported by PMDs in multiple combinations,
the supported values and masks are the subjects to query by perfroming
trials (with rte_flow_validate).

[1] http://patches.dpdk.org/patch/62040/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>

show more ...


# 8b8f7994 22-Jul-2019 Matan Azrad <matan@mellanox.com>

net/mlx5: update LRO fields in completion entry

Update the CQE structure to include LRO fields.

Some reserved values were changed, hence also data-path code used the
reserved values were updated ac

net/mlx5: update LRO fields in completion entry

Update the CQE structure to include LRO fields.

Some reserved values were changed, hence also data-path code used the
reserved values were updated accordingly.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

show more ...


# 9c55c6bd 25-Mar-2019 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: revert mbuf address calculation for x86

When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be
loaded. non-x86 processors (mostly RISC such as ARM and Power) are more
vul

net/mlx5: revert mbuf address calculation for x86

When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be
loaded. non-x86 processors (mostly RISC such as ARM and Power) are more
vulnerable to load stall. For x86, reducing the number of instructions
seems to matter most.

For x86, this is simply a load but for other architectures, it is
calculated from the address of mbuf structure by rte_mbuf_buf_addr()
without having to load the first cacheline of the mbuf.

Fixes: 12d468a62bc1 ("net/mlx5: fix instruction hotspot on replenishing Rx buffer")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

show more ...


# 12d468a6 14-Jan-2019 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix instruction hotspot on replenishing Rx buffer

On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
to be accessed as it is static and easily calculated from the mb

net/mlx5: fix instruction hotspot on replenishing Rx buffer

On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
to be accessed as it is static and easily calculated from the mbuf address.
Accessing the mbuf content causes unnecessary load stall and it is worsened
on ARM.

Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

show more ...


# 6bd7fbd0 23-Oct-2018 Dekel Peled <dekelp@mellanox.com>

net/mlx5: support metadata as flow rule criteria

As described in series starting at [1], it adds option to set
metadata value as match pattern when creating a new flow rule.

This patch adds metadat

net/mlx5: support metadata as flow rule criteria

As described in series starting at [1], it adds option to set
metadata value as match pattern when creating a new flow rule.

This patch adds metadata support in mlx5 driver, in two parts:
- Add the validation and setting of metadata value in matcher,
when creating a new flow rule.
- Add the passing of metadata value from mbuf to wqe when
indicated by ol_flag, in different burst functions.

[1] "ethdev: support metadata as flow rule criteria"
http://mails.dpdk.org/archives/dev/2018-September/113269.html

Signed-off-by: Dekel Peled <dekelp@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

show more ...


# e10245a1 26-Jun-2018 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix Rx buffer replenishment threshold

The threshold of buffer replenishment for vectorized Rx burst is a constant
value (64). If the size of Rx queue is comparatively small, device could
r

net/mlx5: fix Rx buffer replenishment threshold

The threshold of buffer replenishment for vectorized Rx burst is a constant
value (64). If the size of Rx queue is comparatively small, device could
run out of buffers. For example, if the size of Rx queue is 128, buffers
are replenished only twice per a wraparound. This can cause jitter in
receiving packets and the jitter can cause unnecessary retransmission for
TCP connections.

Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

show more ...


# 7d6bf6b8 09-May-2018 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add Multi-Packet Rx support

Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffe

net/mlx5: add Multi-Packet Rx support

Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe
bandwidth by posting a single large buffer for multiple packets. Instead of
posting a buffer per a packet, one large buffer is posted in order to
receive multiple packets on the buffer. A MPRQ buffer consists of multiple
fixed-size strides and each stride receives one packet.

Rx packet is mem-copied to a user-provided mbuf if the size of Rx packet is
comparatively small, or PMD attaches the Rx packet to the mbuf by external
buffer attachment - rte_pktmbuf_attach_extbuf(). A mempool for external
buffers will be allocated and managed by PMD.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Shahaf Shuler <shahafs@mellanox.com>

show more ...


# 974f1e7e 09-May-2018 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add new memory region support

This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There ar

net/mlx5: add new memory region support

This is the new design of Memory Region (MR) for mlx PMD, in order to:
- Accommodate the new memory hotplug model.
- Support non-contiguous Mempool.

There are multiple layers for MR search.

L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
array by linear search. L0/L1 is in an inline function -
mlx5_mr_lookup_cache().

If L1 misses, the bottom-half function is called to look up the address
from the bigger local cache of the queue. This is L2 - mlx5_mr_addr2mr_bh()
and it is not an inline function. Data structure for L2 is the Binary Tree.

If L2 misses, the search falls into the slowest path which takes locks in
order to access global device cache (priv->mr.cache) which is also a B-tree
and caches the original MR list (priv->mr.mr_list) of the device. Unless
the global cache is overflowed, it is all-inclusive of the MR list. This is
L3 - mlx5_mr_lookup_dev(). The size of the L3 cache table is limited and
can't be expanded on the fly due to deadlock. Refer to the comments in the
code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
have to be searched directly bypassing the cache although it is slower.

If L3 misses, a new MR for the address should be created -
mlx5_mr_create(). When it creates a new MR, it tries to register adjacent
memsegs as much as possible which are virtually contiguous around the
address. This must take two locks - memory_hotplug_lock and
priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
allocation/free of memory inside.

In the free callback of the memory hotplug event, freed space is searched
from the MR list and corresponding bits are cleared from the bitmap of MRs.
This can fragment a MR and the MR will have multiple search entries in the
caches. Once there's a change by the event, the global cache must be
rebuilt and all the per-queue caches will be flushed as well. If memory is
frequently freed in run-time, that may cause jitter on dataplane processing
in the worst case by incurring MR cache flush and rebuild. But, it would be
the least probable scenario.

To guarantee the most optimal performance, it is highly recommended to use
an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
and won't be freed dynamically. And it is also recommended to configure
per-lcore cache of Mempool. Even though there're many MRs for a device or
MRs are highly fragmented, the cache of Mempool will be much helpful to
reduce misses on per-queue caches anyway.

'--legacy-mem' is also supported.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

show more ...


# 5feecc57 20-Mar-2018 Shahaf Shuler <shahafs@mellanox.com>

align SPDX Mellanox copyrights

Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>

align SPDX Mellanox copyrights

Aligning Mellanox SPDX copyrights to a single format.
In addition replace to SPDX licence files which were missed.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

show more ...


# 8fd92a66 29-Jan-2018 Olivier Matz <olivier.matz@6wind.com>

net/mlx5: use SPDX tags in 6WIND copyrighted files

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Bruce Richardson <bruce.rich

net/mlx5: use SPDX tags in 6WIND copyrighted files

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>

show more ...


# 4fe7f662 25-Jan-2018 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: replace I/O memory barrier with coherent version

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>


# dbccb4cd 10-Jan-2018 Shahaf Shuler <shahafs@mellanox.com>

net/mlx5: convert to new Tx offloads API

Ethdev Tx offloads API has changed since:

commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")

This commit support the new Tx offloads API.

Sig

net/mlx5: convert to new Tx offloads API

Ethdev Tx offloads API has changed since:

commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")

This commit support the new Tx offloads API.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

show more ...


# 03e0868b 10-Oct-2017 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: fix deadlock due to buffered slots in Rx SW ring

When replenishing Rx ring, there're always buffered slots reserved
between consumed entries and HW owned entries. These have to be filled
w

net/mlx5: fix deadlock due to buffered slots in Rx SW ring

When replenishing Rx ring, there're always buffered slots reserved
between consumed entries and HW owned entries. These have to be filled
with fake mbufs to protect from possible overflow rather than
optimistically expecting successful replenishment which can cause
deadlock with small-sized queue.

Fixes: fc048bd52cb7 ("net/mlx5: fix overflow of Rx SW ring")
Cc: stable@dpdk.org

Reported-by: Martin Weiser <martin.weiser@allegro-packets.com>
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Tested-by: Martin Weiser <martin.weiser@allegro-packets.com>

show more ...


# 570acdb1 09-Oct-2017 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: add vectorized Rx/Tx burst for ARM

Brings vectorization through NEON instructions.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>


# 3c2ddbd4 09-Oct-2017 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: separate shareable vector functions

Considering more architecture (e.g. ARM and PowerPC) will be added for
vectorized Rx/Tx burst, all the shareable functions which don't use any
vector in

net/mlx5: separate shareable vector functions

Considering more architecture (e.g. ARM and PowerPC) will be added for
vectorized Rx/Tx burst, all the shareable functions which don't use any
vector intrinsics need to be separated from architecture-dependent
functions. All the vector functions for x86 SSE are moved to a new
header file - mlx5_rxtx_vec_sse.h. And shareable common functions are
now in mlx5_rxtx_vec.c.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

show more ...


# 5bfc9fc1 09-Oct-2017 Yongseok Koh <yskoh@mellanox.com>

net/mlx5: use static assert for compile-time sanity checks

Replace compile-time sanity check with static_assert() as c11 standard
has been set. Add mlx5_rxtx_vec.h and move the sanity checks to the

net/mlx5: use static assert for compile-time sanity checks

Replace compile-time sanity check with static_assert() as c11 standard
has been set. Add mlx5_rxtx_vec.h and move the sanity checks to the file

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

show more ...