History log of /dpdk/drivers/net/mlx5/mlx5_rxtx_vec_neon.h (Results 1 – 25 of 62)
Revision Date Author Comments
# 43fd3624 21-Jan-2025 Andre Muezerie <andremue@linux.microsoft.com>

drivers: replace GCC pragma with cast

"GCC diagnostic ignored" pragmas have been commonly sprinkled
over the code.
Clang supports GCC's pragma for compatibility with existing
source code, so #pragma

drivers: replace GCC pragma with cast

"GCC diagnostic ignored" pragmas have been commonly sprinkled
over the code.
Clang supports GCC's pragma for compatibility with existing
source code, so #pragma GCC diagnostic and #pragma clang
diagnostic are synonyms for Clang
(https://clang.llvm.org/docs/UsersManual.html).

Now that effort is being made to make the code compatible with MSVC
these expressions would become more complex. It makes sense to hide
this complexity behind macros. This makes maintenance easier as these
macros are defined in a single place. As a plus the code becomes
more readable as well.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>

show more ...


# a7ae9ba1 13-Nov-2024 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: fix miniCQEs number calculation

Use the information from the CQE, not from the title packet,
for getting the number of miniCQEs in the compressed CQEs array.
This way we can avoid segfault

net/mlx5: fix miniCQEs number calculation

Use the information from the CQE, not from the title packet,
for getting the number of miniCQEs in the compressed CQEs array.
This way we can avoid segfaults in the rxq_cq_decompress_v()
in case of mbuf corruption (due to double mbuf free, for example).

Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 3638f431 28-Oct-2024 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: fix shared queue port number in vector Rx

Wrong CQE is used to get the shared Rx queue port number in
vectorized Rx burst routine. Fix the CQE indexing.

Fixes: 25ed2ebff131 ("net/mlx5: su

net/mlx5: fix shared queue port number in vector Rx

Wrong CQE is used to get the shared Rx queue port number in
vectorized Rx burst routine. Fix the CQE indexing.

Fixes: 25ed2ebff131 ("net/mlx5: support shared Rx queue port data path")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 191128d7 25-Oct-2024 David Marchand <david.marchand@redhat.com>

drivers: use bitops API instead of compiler builtins

Stop using directly __builtin_ bit operations,
prefer existing DPDK wrappers.

Note: this is a brute sed all over drivers (skipping base drivers)

drivers: use bitops API instead of compiler builtins

Stop using directly __builtin_ bit operations,
prefer existing DPDK wrappers.

Note: this is a brute sed all over drivers (skipping base drivers)
for __builtin_* that have a direct replacement in EAL bitops.
There is more work to do, like adding some missing macros inspired from
kernel (FIELD_*) macros but this is left for later.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

show more ...


# 90ec9b0d 01-Nov-2023 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: replenish MPRQ buffers for miniCQEs

Keep unzipping if the next CQE is the miniCQE array in
rxq_cq_decompress_v() routine only for non-MPRQ scenario,
MPRQ requires buffer replenishment betw

net/mlx5: replenish MPRQ buffers for miniCQEs

Keep unzipping if the next CQE is the miniCQE array in
rxq_cq_decompress_v() routine only for non-MPRQ scenario,
MPRQ requires buffer replenishment between the miniCQEs.

Restore the check for the initial compressed CQE for SPRQ
and check that the current CQE is not compressed before
copying it as a possible title CQE.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

show more ...


# 7ac7450b 30-May-2023 Ruifeng Wang <ruifeng.wang@arm.com>

net/mlx5: fix risk in NEON Rx descriptor read

In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the

net/mlx5: fix risk in NEON Rx descriptor read

In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes op_own field could be
reordered after read of other words. In this case, some words could
contain invalid data.

Reloaded qword0 after read barrier to update vector register. This
ensures that the fetched data is correct.

Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.

Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# fca8cba4 21-Jun-2023 David Marchand <david.marchand@redhat.com>

ethdev: advertise flow restore in mbuf

As reported by Ilya [1], unconditionally calling
rte_flow_get_restore_info() impacts an application performance for drivers
that do not provide this ops.
It co

ethdev: advertise flow restore in mbuf

As reported by Ilya [1], unconditionally calling
rte_flow_get_restore_info() impacts an application performance for drivers
that do not provide this ops.
It could also impact processing of packets that require no call to
rte_flow_get_restore_info() at all.

Register a dynamic mbuf flag when an application negotiates tunnel
metadata delivery (calling rte_eth_rx_metadata_negotiate() with
RTE_ETH_RX_METADATA_TUNNEL_ID).

Drivers then advertise that metadata can be extracted by setting this
dynamic flag in each mbuf.

The application then calls rte_flow_get_restore_info() only when required.

Link: http://inbox.dpdk.org/dev/5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org/

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>

show more ...


# fc3e1798 28-Feb-2023 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: support enhanced CQE zipping in vector Rx burst

Add Enhanced CQE compression support to vectorized Rx burst routines.
Adopt the same algorithm as scalar Rx burst routines have today.
1. Re

net/mlx5: support enhanced CQE zipping in vector Rx burst

Add Enhanced CQE compression support to vectorized Rx burst routines.
Adopt the same algorithm as scalar Rx burst routines have today.
1. Retrieve the validity_iteration_count from CQEs and use it
to check if the CQE is ready to be processed instead of the owner_bit.
2. Do not invalidate reserved CQEs between miniCQE arrays.
3. Copy the title packet from the last processed uncompressed CQE
since we will need it later to build packets from zipped CQEs.
4. Skip the regular CQE processing and go straight to the CQE unzip
function in case the very first CQE is compressed to sace CPU time.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 1f903ebe 27-Jan-2023 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: check compressed CQE opcode in vectorized Rx

The CQE opcode is never checked for a compressed CQE in
the vectorized Rx burst routines. It is assumed that
compressed CQEs are always valid a

net/mlx5: check compressed CQE opcode in vectorized Rx

The CQE opcode is never checked for a compressed CQE in
the vectorized Rx burst routines. It is assumed that
compressed CQEs are always valid and skipped error checking.

This is obviously not the case and error CQEs may be
compressed together as well. Need to check for the
MLX5_CQE_RESP_ERR opcode and mark all the packets as
bad ones in the compression session if it is there.

Note that this issue is not applicable to the scalar Rx burst.

Fixes: 6cb559d67b ("net/mlx5: add vectorized Rx/Tx burst for x86")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

show more ...


# 7be78d02 29-Nov-2021 Josh Soref <jsoref@gmail.com>

fix spelling in comments and strings

The tool comes from https://github.com/jsoref

Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>


# 25ed2ebf 04-Nov-2021 Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: support shared Rx queue port data path

When receive packet, mlx5 PMD saves mbuf port number from
RxQ data.

To support shared RxQ, save port number into RQ context as user index.
Received

net/mlx5: support shared Rx queue port data path

When receive packet, mlx5 PMD saves mbuf port number from
RxQ data.

To support shared RxQ, save port number into RQ context as user index.
Received packet resolve port number from CQE user index which derived
from RQ context.

Legacy Verbs API doesn't support RQ user index setting, still read from
RxQ port number.

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# daa02b5c 15-Oct-2021 Olivier Matz <olivier.matz@6wind.com>

mbuf: add namespace to offload flags

Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.

Sign

mbuf: add namespace to offload flags

Fix the mbuf offload flags namespace by adding an RTE_ prefix to the
name. The old flags remain usable, but a deprecation warning is issued
at compilation.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>

show more ...


# 6d5735c1 20-Jul-2021 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: fix meta register conversion for extensive mode

Register C is used in the extensive metadata mode number 1 and its
width can vary from 0 to 32 bits depending on the kernel usage of it.

Th

net/mlx5: fix meta register conversion for extensive mode

Register C is used in the extensive metadata mode number 1 and its
width can vary from 0 to 32 bits depending on the kernel usage of it.

There are several issues associated with this mode (dv_xmeta_en=1):
1. The metadata setting assumes that the width is always 16 bits,
which is the most common case in this mode. Use the proper mask.
2. The same is true for the modify_field Flow API. 16-bits width
is hardcoded for dv_xmeta_en=1. Switch to the register C mask width.
3. Metadata is stored in the most significant bits in CQE in this
mode because the registers copy code was not updated during the
metadata conversion to the big-endian format. Update this code to
avoid shifting the metadata in the datapath.

Fixes: b57e414b48 ("net/mlx5: convert meta register to big-endian")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 6e695b0c 08-Jun-2021 Sarosh Arif <sarosh.arif@emumba.com>

net/mlx5: fix typo in vectorized Rx comments

Change "returing" to "returning".

Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx")
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")

net/mlx5: fix typo in vectorized Rx comments

Change "returing" to "returning".

Fixes: 2e542da70937 ("net/mlx5: add Altivec Rx")
Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Fixes: 3c2ddbd413e3 ("net/mlx5: separate shareable vector functions")
Cc: stable@dpdk.org

Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>

show more ...


# ff6fcd41 07-Jul-2021 Ruifeng Wang <ruifeng.wang@arm.com>

net/mlx5: remove redundant operations in NEON Rx

Mask of entries after the compressed CQE is covered by invalid mask of
non-compressed valid CQEs. Hence remove redundant calculation on mask.
The cha

net/mlx5: remove redundant operations in NEON Rx

Mask of entries after the compressed CQE is covered by invalid mask of
non-compressed valid CQEs. Hence remove redundant calculation on mask.
The change showed slight performance uplift on N1SDP.

Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# b57e414b 16-Jun-2021 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: convert meta register to big-endian

Metadata were stored in the CPU order (little-endian format on x86),
while all the packet header fields are stored in the network order.
That caused wro

net/mlx5: convert meta register to big-endian

Metadata were stored in the CPU order (little-endian format on x86),
while all the packet header fields are stored in the network order.
That caused wrong results whenever we tried to use metadata value
in the modify_field action: bytes were swapped as a result.

Convert the metadata value into big-endian format before storing it
in the Mellanox NIC to achieve consistent behaviour.

Fixes: 641dbe4fb053 ("net/mlx5: support modify field flow action")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 4eefb20f 07-Mar-2021 Viacheslav Ovsiienko <viacheslavo@nvidia.com>

net/mlx5: fix Rx metadata leftovers

The Rx metadata might use the metadata register C0 to keep the
values. The same register C0 might be used by kernel for source
vport value handling, kernel uses u

net/mlx5: fix Rx metadata leftovers

The Rx metadata might use the metadata register C0 to keep the
values. The same register C0 might be used by kernel for source
vport value handling, kernel uses upper half of the register,
leaving the lower half for application usage.

In the extended metadata mode 1 (dv_xmeta_en devarg is
assigned with value 1) the metadata width is 16 bits only,
the Rx datapath code fetched the entire 32-bit value of the
metadata register and presented one to application. The patch
provides data masking depending on the chosen metadata mode.

Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>

show more ...


# 71094ae3 29-Oct-2020 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: fix CQE decompression for Arm and PowerPC

The recent Rx code refactoring moved the incrementing
of the CQ completion index out of the rxq_cq_decompress_v()
function to the rxq_burst_v() fu

net/mlx5: fix CQE decompression for Arm and PowerPC

The recent Rx code refactoring moved the incrementing
of the CQ completion index out of the rxq_cq_decompress_v()
function to the rxq_burst_v() function.

The advancing of CQ completion index was removed in SSE
version only causing Neon and Altivec Rx bursts to stall.

Remove the incrementation of CQ completion index for all
the architectures in order to fix the stall.

Fixes: 1ded26239aa0 ("net/mlx5: refactor vectorized Rx")

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 54c2d46b 01-Nov-2020 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: support flow tag and packet header miniCQEs

CQE compression allows us to save the PCI bandwidth and improve
the performance by compressing several CQEs together to a miniCQE.
But the miniC

net/mlx5: support flow tag and packet header miniCQEs

CQE compression allows us to save the PCI bandwidth and improve
the performance by compressing several CQEs together to a miniCQE.
But the miniCQE size is only 8 bytes and this limits the ability
to successfully keep the compression session in case of various
traffic patterns.

The current miniCQE format only keeps the compression session alive
in case of uniform traffic with the Hash RSS as the only difference.
There are requests to keep the compression session in case of tagged
traffic by RTE Flow Mark Id and mixed UDP/TCP and IPv4/IPv6 traffic.
Add 2 new miniCQE formats in order to achieve the best performance
for these traffic patterns: Flow Tag and Packet Header miniCQEs.

The existing rxq_cqe_comp_en devarg is modified to specify the
desired miniCQE format. Specifying 2 selects Flow Tag format
for better compression rate in case of RTE Flow Mark traffic.
Specifying 3 selects Checksum format (existing format for MPRQ).
Specifying 4 selects L3/L4 Header format for better compression
rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 1ded2623 21-Oct-2020 Alexander Kozyrev <akozyrev@nvidia.com>

net/mlx5: refactor vectorized Rx

Move the main processing cycle into a separate function:
rxq_cq_process_v. Put the regular rxq_burst_v function
to a non-arch specific file. Having all SIMD instruct

net/mlx5: refactor vectorized Rx

Move the main processing cycle into a separate function:
rxq_cq_process_v. Put the regular rxq_burst_v function
to a non-arch specific file. Having all SIMD instructions
in a single reusable block is a first preparatory step to
implement vectorized Rx burst for MPRQ feature.

Pass a pointer to the storage of mbufs directly to the
rxq_copy_mbuf_v instead of calculating the pointer inside
this function. This is needed for the future vectorized Rx
routing which is going to pass a different pointer here.

Calculate the number of packets to replenish inside the
mlx5_rx_replenish_bulk_mbuf. Containing this logic in one
place allows us to do the same for MPRQ case.

Signed-off-by: Alexander Kozyrev <akozyrev@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

show more ...


# 04840ecb 29-Oct-2020 Thomas Monjalon <thomas@monjalon.net>

net/mlx5: switch Rx timestamp to dynamic mbuf field

The mbuf timestamp is moved to a dynamic field
in order to allow removal of the deprecated static field.
The related mbuf flag is also replaced.

net/mlx5: switch Rx timestamp to dynamic mbuf field

The mbuf timestamp is moved to a dynamic field
in order to allow removal of the deprecated static field.
The related mbuf flag is also replaced.

The dynamic offset and flag are stored in struct mlx5_rxq_data
to favor cache locality.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

show more ...


# f0f5d844 23-Sep-2020 Phil Yang <phil.yang@arm.com>

eal: remove deprecated coherent IO memory barriers

Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs
provide the same functionality as rte_io_*mb APIs on all platforms, so
remov

eal: remove deprecated coherent IO memory barriers

Since the 20.08 release deprecated rte_cio_*mb APIs because these APIs
provide the same functionality as rte_io_*mb APIs on all platforms, so
remove them and use rte_io_*mb instead.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: David Marchand <david.marchand@redhat.com>

show more ...


# 4ffab7b9 23-Jul-2020 Viacheslav Ovsiienko <viacheslavo@mellanox.com>

net/mlx5: fix metadata storing for NEON Rx

There was the typo introducing the bug, affected the mlx5 vectorized
rx_burst on ARM architectures in case if CQE compression was enabled.

Fixes: 6c55b622

net/mlx5: fix metadata storing for NEON Rx

There was the typo introducing the bug, affected the mlx5 vectorized
rx_burst on ARM architectures in case if CQE compression was enabled.

Fixes: 6c55b622a956 ("net/mlx5: set dynamic flow metadata in Rx queues")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

show more ...


# 6f52bd33 22-Jul-2020 Alexander Kozyrev <akozyrev@mellanox.com>

net/mlx5: fix vectorized mini-CQE prefetching

There was an optimization work to prefetch all the CQEs before
their invalidation. It allowed us to speed up the mini-CQE
decompression process by prehe

net/mlx5: fix vectorized mini-CQE prefetching

There was an optimization work to prefetch all the CQEs before
their invalidation. It allowed us to speed up the mini-CQE
decompression process by preheating the cache in the vectorized
Rx routine.

Prefetching of the next mini-CQE, on the other hand, showed
no difference in the performance on x86 platform. So, that was
removed. Unfortunately this caused the performance drop on ARM.

Prefetch the mini-CQE as well as all the soon to be
invalidated CQEs to get both CQE and mini-CQE on the hot path.

Fixes: 28a4b96321a3 ("net/mlx5: prefetch CQEs for a faster decompression")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

show more ...


# a2854c4d 16-Jul-2020 Viacheslav Ovsiienko <viacheslavo@mellanox.com>

net/mlx5: convert Rx timestamps in real-time format

The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains t

net/mlx5: convert Rx timestamps in real-time format

The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains the UTC seconds and the lower 32-bit word
contains the nanoseconds. This patch detects what format is
configured in the NIC and performs the conversion accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>

show more ...


123