xref: /dpdk/doc/guides/howto/debug_troubleshoot.rst (revision b563c1421282a1ec6038e5d26b4cd4fcbb01ada1)
108db7bdeSVipin Varghese..  SPDX-License-Identifier: BSD-3-Clause
208db7bdeSVipin Varghese    Copyright(c) 2018 Intel Corporation.
308db7bdeSVipin Varghese
408db7bdeSVipin VargheseDebug & Troubleshoot guide
508db7bdeSVipin Varghese==========================
608db7bdeSVipin Varghese
708db7bdeSVipin VargheseDPDK applications can be designed to have simple or complex pipeline processing
808db7bdeSVipin Varghesestages making use of single or multiple threads. Applications can use poll mode
908db7bdeSVipin Varghesehardware devices which helps in offloading CPU cycles too. It is common to find
1008db7bdeSVipin Varghesesolutions designed with
1108db7bdeSVipin Varghese
1208db7bdeSVipin Varghese* single or multiple primary processes
1308db7bdeSVipin Varghese
1408db7bdeSVipin Varghese* single primary and single secondary
1508db7bdeSVipin Varghese
1608db7bdeSVipin Varghese* single primary and multiple secondaries
1708db7bdeSVipin Varghese
1808db7bdeSVipin VargheseIn all the above cases, it is tedious to isolate, debug, and understand various
1908db7bdeSVipin Varghesebehaviors which occur randomly or periodically. The goal of the guide is to
2008db7bdeSVipin Vargheseconsolidate a few commonly seen issues for reference. Then, isolate to identify
2108db7bdeSVipin Varghesethe root cause through step by step debug at various stages.
2208db7bdeSVipin Varghese
2308db7bdeSVipin Varghese.. note::
2408db7bdeSVipin Varghese
2508db7bdeSVipin Varghese It is difficult to cover all possible issues; in a single attempt. With
2608db7bdeSVipin Varghese feedback and suggestions from the community, more cases can be covered.
2708db7bdeSVipin Varghese
2808db7bdeSVipin Varghese
2908db7bdeSVipin VargheseApplication Overview
3008db7bdeSVipin Varghese--------------------
3108db7bdeSVipin Varghese
3208db7bdeSVipin VargheseBy making use of the application model as a reference, we can discuss multiple
3308db7bdeSVipin Varghesecauses of issues in the guide. Let us assume the sample makes use of a single
3408db7bdeSVipin Vargheseprimary process, with various processing stages running on multiple cores. The
3508db7bdeSVipin Vargheseapplication may also make uses of Poll Mode Driver, and libraries like service
3608db7bdeSVipin Varghesecores, mempool, mbuf, eventdev, cryptodev, QoS, and ethdev.
3708db7bdeSVipin Varghese
3808db7bdeSVipin VargheseThe overview of an application modeled using PMD is shown in
3908db7bdeSVipin Varghese:numref:`dtg_sample_app_model`.
4008db7bdeSVipin Varghese
4108db7bdeSVipin Varghese.. _dtg_sample_app_model:
4208db7bdeSVipin Varghese
4308db7bdeSVipin Varghese.. figure:: img/dtg_sample_app_model.*
4408db7bdeSVipin Varghese
4508db7bdeSVipin Varghese   Overview of pipeline stage of an application
4608db7bdeSVipin Varghese
4708db7bdeSVipin Varghese
4808db7bdeSVipin VargheseBottleneck Analysis
4908db7bdeSVipin Varghese-------------------
5008db7bdeSVipin Varghese
5108db7bdeSVipin VargheseA couple of factors that lead the design decision could be the platform, scale
5208db7bdeSVipin Varghesefactor, and target. This distinct preference leads to multiple combinations,
5308db7bdeSVipin Varghesethat are built using PMD and libraries of DPDK. While the compiler, library
5408db7bdeSVipin Varghesemode, and optimization flags are the components are to be constant, that
5508db7bdeSVipin Vargheseaffects the application too.
5608db7bdeSVipin Varghese
5708db7bdeSVipin Varghese
5808db7bdeSVipin VargheseIs there mismatch in packet (received < desired) rate?
5908db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6008db7bdeSVipin Varghese
6108db7bdeSVipin VargheseRX Port and associated core :numref:`dtg_rx_rate`.
6208db7bdeSVipin Varghese
6308db7bdeSVipin Varghese.. _dtg_rx_rate:
6408db7bdeSVipin Varghese
6508db7bdeSVipin Varghese.. figure:: img/dtg_rx_rate.*
6608db7bdeSVipin Varghese
6708db7bdeSVipin Varghese   RX packet rate compared against received rate.
6808db7bdeSVipin Varghese
6908db7bdeSVipin Varghese#. Is the configuration for the RX setup correctly?
7008db7bdeSVipin Varghese
7108db7bdeSVipin Varghese   * Identify if port Speed and Duplex is matching to desired values with
7208db7bdeSVipin Varghese     ``rte_eth_link_get``.
7308db7bdeSVipin Varghese
7408db7bdeSVipin Varghese   * Check promiscuous mode if the drops do not occur for unique MAC address
7508db7bdeSVipin Varghese     with ``rte_eth_promiscuous_get``.
7608db7bdeSVipin Varghese
7708db7bdeSVipin Varghese#. Is the drop isolated to certain NIC only?
7808db7bdeSVipin Varghese
7908db7bdeSVipin Varghese   * Make use of ``rte_eth_dev_stats`` to identify the drops cause.
8008db7bdeSVipin Varghese
8108db7bdeSVipin Varghese   * If there are mbuf drops, check nb_desc for RX descriptor as it might not
8208db7bdeSVipin Varghese     be sufficient for the application.
8308db7bdeSVipin Varghese
8408db7bdeSVipin Varghese   * If ``rte_eth_dev_stats`` shows drops are on specific RX queues, ensure RX
8508db7bdeSVipin Varghese     lcore threads has enough cycles for ``rte_eth_rx_burst`` on the port queue
8608db7bdeSVipin Varghese     pair.
8708db7bdeSVipin Varghese
8808db7bdeSVipin Varghese   * If there are redirect to a specific port queue pair with, ensure RX lcore
8908db7bdeSVipin Varghese     threads gets enough cycles.
9008db7bdeSVipin Varghese
9108db7bdeSVipin Varghese   * Check the RSS configuration ``rte_eth_dev_rss_hash_conf_get`` if the
9208db7bdeSVipin Varghese     spread is not even and causing drops.
9308db7bdeSVipin Varghese
9408db7bdeSVipin Varghese   * If PMD stats are not updating, then there might be offload or configuration
9508db7bdeSVipin Varghese     which is dropping the incoming traffic.
9608db7bdeSVipin Varghese
9708db7bdeSVipin Varghese#. Is there drops still seen?
9808db7bdeSVipin Varghese
9908db7bdeSVipin Varghese   * If there are multiple port queue pair, it might be the RX thread, RX
10008db7bdeSVipin Varghese     distributor, or event RX adapter not having enough cycles.
10108db7bdeSVipin Varghese
10208db7bdeSVipin Varghese   * If there are drops seen for RX adapter or RX distributor, try using
10308db7bdeSVipin Varghese     ``rte_prefetch_non_temporal`` which intimates the core that the mbuf in the
10408db7bdeSVipin Varghese     cache is temporary.
10508db7bdeSVipin Varghese
10608db7bdeSVipin Varghese
10708db7bdeSVipin VargheseIs there packet drops at receive or transmit?
10808db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10908db7bdeSVipin Varghese
11008db7bdeSVipin VargheseRX-TX port and associated cores :numref:`dtg_rx_tx_drop`.
11108db7bdeSVipin Varghese
11208db7bdeSVipin Varghese.. _dtg_rx_tx_drop:
11308db7bdeSVipin Varghese
11408db7bdeSVipin Varghese.. figure:: img/dtg_rx_tx_drop.*
11508db7bdeSVipin Varghese
11608db7bdeSVipin Varghese   RX-TX drops
11708db7bdeSVipin Varghese
11808db7bdeSVipin Varghese#. At RX
11908db7bdeSVipin Varghese
12008db7bdeSVipin Varghese   * Identify if there are multiple RX queue configured for port by
12108db7bdeSVipin Varghese     ``nb_rx_queues`` using ``rte_eth_dev_info_get``.
12208db7bdeSVipin Varghese
12308db7bdeSVipin Varghese   * Using ``rte_eth_dev_stats`` fetch drops in q_errors, check if RX thread
12408db7bdeSVipin Varghese     is configured to fetch packets from the port queue pair.
12508db7bdeSVipin Varghese
12608db7bdeSVipin Varghese   * Using ``rte_eth_dev_stats`` shows drops in ``rx_nombuf``, check if RX
12708db7bdeSVipin Varghese     thread has enough cycles to consume the packets from the queue.
12808db7bdeSVipin Varghese
12908db7bdeSVipin Varghese#. At TX
13008db7bdeSVipin Varghese
13108db7bdeSVipin Varghese   * If the TX rate is falling behind the application fill rate, identify if
13208db7bdeSVipin Varghese     there are enough descriptors with ``rte_eth_dev_info_get`` for TX.
13308db7bdeSVipin Varghese
13408db7bdeSVipin Varghese   * Check the ``nb_pkt`` in ``rte_eth_tx_burst`` is done for multiple packets.
13508db7bdeSVipin Varghese
13608db7bdeSVipin Varghese   * Check ``rte_eth_tx_burst`` invokes the vector function call for the PMD.
13708db7bdeSVipin Varghese
13808db7bdeSVipin Varghese   * If oerrors are getting incremented, TX packet validations are failing.
13908db7bdeSVipin Varghese     Check if there queue specific offload failures.
14008db7bdeSVipin Varghese
14108db7bdeSVipin Varghese   * If the drops occur for large size packets, check MTU and multi-segment
14208db7bdeSVipin Varghese     support configured for NIC.
14308db7bdeSVipin Varghese
14408db7bdeSVipin Varghese
14508db7bdeSVipin VargheseIs there object drops in producer point for the ring library?
14608db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14708db7bdeSVipin Varghese
14808db7bdeSVipin VargheseProducer point for ring :numref:`dtg_producer_ring`.
14908db7bdeSVipin Varghese
15008db7bdeSVipin Varghese.. _dtg_producer_ring:
15108db7bdeSVipin Varghese
15208db7bdeSVipin Varghese.. figure:: img/dtg_producer_ring.*
15308db7bdeSVipin Varghese
15408db7bdeSVipin Varghese   Producer point for Rings
15508db7bdeSVipin Varghese
15608db7bdeSVipin Varghese#. Performance issue isolation at producer
15708db7bdeSVipin Varghese
15808db7bdeSVipin Varghese   * Use ``rte_ring_dump`` to validate for all single producer flag is set to
15908db7bdeSVipin Varghese     ``RING_F_SP_ENQ``.
16008db7bdeSVipin Varghese
16108db7bdeSVipin Varghese   * There should be sufficient ``rte_ring_free_count`` at any point in time.
16208db7bdeSVipin Varghese
16308db7bdeSVipin Varghese   * Extreme stalls in dequeue stage of the pipeline will cause
16408db7bdeSVipin Varghese     ``rte_ring_full`` to be true.
16508db7bdeSVipin Varghese
16608db7bdeSVipin Varghese
16708db7bdeSVipin VargheseIs there object drops in consumer point for the ring library?
16808db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16908db7bdeSVipin Varghese
17008db7bdeSVipin VargheseConsumer point for ring :numref:`dtg_consumer_ring`.
17108db7bdeSVipin Varghese
17208db7bdeSVipin Varghese.. _dtg_consumer_ring:
17308db7bdeSVipin Varghese
17408db7bdeSVipin Varghese.. figure:: img/dtg_consumer_ring.*
17508db7bdeSVipin Varghese
17608db7bdeSVipin Varghese   Consumer point for Rings
17708db7bdeSVipin Varghese
17808db7bdeSVipin Varghese#. Performance issue isolation at consumer
17908db7bdeSVipin Varghese
18008db7bdeSVipin Varghese   * Use ``rte_ring_dump`` to validate for all single consumer flag is set to
18108db7bdeSVipin Varghese     ``RING_F_SC_DEQ``.
18208db7bdeSVipin Varghese
18308db7bdeSVipin Varghese   * If the desired burst dequeue falls behind the actual dequeue, the enqueue
18408db7bdeSVipin Varghese     stage is not filling up the ring as required.
18508db7bdeSVipin Varghese
18608db7bdeSVipin Varghese   * Extreme stall in the enqueue will lead to ``rte_ring_empty`` to be true.
18708db7bdeSVipin Varghese
18808db7bdeSVipin Varghese
18908db7bdeSVipin VargheseIs there a variance in packet or object processing rate in the pipeline?
19008db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19108db7bdeSVipin Varghese
19208db7bdeSVipin VargheseMemory objects close to NUMA :numref:`dtg_mempool`.
19308db7bdeSVipin Varghese
19408db7bdeSVipin Varghese.. _dtg_mempool:
19508db7bdeSVipin Varghese
19608db7bdeSVipin Varghese.. figure:: img/dtg_mempool.*
19708db7bdeSVipin Varghese
19808db7bdeSVipin Varghese   Memory objects have to be close to the device per NUMA.
19908db7bdeSVipin Varghese
20008db7bdeSVipin Varghese#. Stall in processing pipeline can be attributes of MBUF release delays.
20108db7bdeSVipin Varghese   These can be narrowed down to
20208db7bdeSVipin Varghese
20308db7bdeSVipin Varghese   * Heavy processing cycles at single or multiple processing stages.
20408db7bdeSVipin Varghese
20508db7bdeSVipin Varghese   * Cache is spread due to the increased stages in the pipeline.
20608db7bdeSVipin Varghese
20708db7bdeSVipin Varghese   * CPU thread responsible for TX is not able to keep up with the burst of
20808db7bdeSVipin Varghese     traffic.
20908db7bdeSVipin Varghese
21008db7bdeSVipin Varghese   * Extra cycles to linearize multi-segment buffer and software offload like
21108db7bdeSVipin Varghese     checksum, TSO, and VLAN strip.
21208db7bdeSVipin Varghese
21308db7bdeSVipin Varghese   * Packet buffer copy in fast path also results in stalls in MBUF release if
21408db7bdeSVipin Varghese     not done selectively.
21508db7bdeSVipin Varghese
21608db7bdeSVipin Varghese   * Application logic sets ``rte_pktmbuf_refcnt_set`` to higher than the
21708db7bdeSVipin Varghese     desired value and frequently uses ``rte_pktmbuf_prefree_seg`` and does
21808db7bdeSVipin Varghese     not release MBUF back to mempool.
21908db7bdeSVipin Varghese
22008db7bdeSVipin Varghese#. Lower performance between the pipeline processing stages can be
22108db7bdeSVipin Varghese
22208db7bdeSVipin Varghese   * The NUMA instance for packets or objects from NIC, mempool, and ring
22308db7bdeSVipin Varghese     should be the same.
22408db7bdeSVipin Varghese
22508db7bdeSVipin Varghese   * Drops on a specific socket are due to insufficient objects in the pool.
22608db7bdeSVipin Varghese     Use ``rte_mempool_get_count`` or ``rte_mempool_avail_count`` to monitor
22708db7bdeSVipin Varghese     when drops occurs.
22808db7bdeSVipin Varghese
22908db7bdeSVipin Varghese   * Try prefetching the content in processing pipeline logic to minimize the
23008db7bdeSVipin Varghese     stalls.
23108db7bdeSVipin Varghese
23208db7bdeSVipin Varghese#. Performance issue can be due to special cases
23308db7bdeSVipin Varghese
23408db7bdeSVipin Varghese   * Check if MBUF continuous with ``rte_pktmbuf_is_contiguous`` as certain
23508db7bdeSVipin Varghese     offload requires the same.
23608db7bdeSVipin Varghese
23708db7bdeSVipin Varghese   * Use ``rte_mempool_cache_create`` for user threads require access to
23808db7bdeSVipin Varghese     mempool objects.
23908db7bdeSVipin Varghese
24008db7bdeSVipin Varghese   * If the variance is absent for larger huge pages, then try rte_mem_lock_page
24108db7bdeSVipin Varghese     on the objects, packets, lookup tables to isolate the issue.
24208db7bdeSVipin Varghese
24308db7bdeSVipin Varghese
24408db7bdeSVipin VargheseIs there a variance in cryptodev performance?
24508db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24608db7bdeSVipin Varghese
24708db7bdeSVipin VargheseCrypto device and PMD :numref:`dtg_crypto`.
24808db7bdeSVipin Varghese
24908db7bdeSVipin Varghese.. _dtg_crypto:
25008db7bdeSVipin Varghese
25108db7bdeSVipin Varghese.. figure:: img/dtg_crypto.*
25208db7bdeSVipin Varghese
25308db7bdeSVipin Varghese   CRYPTO and interaction with PMD device.
25408db7bdeSVipin Varghese
25508db7bdeSVipin Varghese#. Performance issue isolation for enqueue
25608db7bdeSVipin Varghese
25708db7bdeSVipin Varghese   * Ensure cryptodev, resources and enqueue is running on NUMA cores.
25808db7bdeSVipin Varghese
25908db7bdeSVipin Varghese   * Isolate if the cause of errors for err_count using ``rte_cryptodev_stats``.
26008db7bdeSVipin Varghese
26108db7bdeSVipin Varghese   * Parallelize enqueue thread for varied multiple queue pair.
26208db7bdeSVipin Varghese
26308db7bdeSVipin Varghese#. Performance issue isolation for dequeue
26408db7bdeSVipin Varghese
26508db7bdeSVipin Varghese   * Ensure cryptodev, resources and dequeue are running on NUMA cores.
26608db7bdeSVipin Varghese
26708db7bdeSVipin Varghese   * Isolate if the cause of errors for err_count using ``rte_cryptodev_stats``.
26808db7bdeSVipin Varghese
26908db7bdeSVipin Varghese   * Parallelize dequeue thread for varied multiple queue pair.
27008db7bdeSVipin Varghese
27108db7bdeSVipin Varghese#. Performance issue isolation for crypto operation
27208db7bdeSVipin Varghese
27308db7bdeSVipin Varghese   * If the cryptodev software-assist is in use, ensure the library is built
27408db7bdeSVipin Varghese     with right (SIMD) flags or check if the queue pair using CPU ISA for
27508db7bdeSVipin Varghese     feature_flags AVX|SSE|NEON using ``rte_cryptodev_info_get``.
27608db7bdeSVipin Varghese
27708db7bdeSVipin Varghese   * If the cryptodev hardware-assist is in use, ensure both firmware and
27808db7bdeSVipin Varghese     drivers are up to date.
27908db7bdeSVipin Varghese
28008db7bdeSVipin Varghese#. Configuration issue isolation
28108db7bdeSVipin Varghese
28208db7bdeSVipin Varghese   * Identify cryptodev instances with ``rte_cryptodev_count`` and
28308db7bdeSVipin Varghese     ``rte_cryptodev_info_get``.
28408db7bdeSVipin Varghese
28508db7bdeSVipin Varghese
28608db7bdeSVipin VargheseIs user functions performance is not as expected?
28708db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
28808db7bdeSVipin Varghese
28908db7bdeSVipin VargheseCustom worker function :numref:`dtg_distributor_worker`.
29008db7bdeSVipin Varghese
29108db7bdeSVipin Varghese.. _dtg_distributor_worker:
29208db7bdeSVipin Varghese
29308db7bdeSVipin Varghese.. figure:: img/dtg_distributor_worker.*
29408db7bdeSVipin Varghese
29508db7bdeSVipin Varghese   Custom worker function performance drops.
29608db7bdeSVipin Varghese
29708db7bdeSVipin Varghese#. Performance issue isolation
29808db7bdeSVipin Varghese
29908db7bdeSVipin Varghese   * The functions running on CPU cores without context switches are the
30008db7bdeSVipin Varghese     performing scenarios. Identify lcore with ``rte_lcore`` and lcore index
30108db7bdeSVipin Varghese     mapping with CPU using ``rte_lcore_index``.
30208db7bdeSVipin Varghese
30308db7bdeSVipin Varghese   * Use ``rte_thread_get_affinity`` to isolate functions running on the same
30408db7bdeSVipin Varghese     CPU core.
30508db7bdeSVipin Varghese
30608db7bdeSVipin Varghese#. Configuration issue isolation
30708db7bdeSVipin Varghese
3085c307ba2SDavid Marchand   * Identify core role using ``rte_eal_lcore_role`` to identify RTE, OFF,
3095c307ba2SDavid Marchand     SERVICE and NON_EAL. Check performance functions are mapped to run on the
3105c307ba2SDavid Marchand     cores.
31108db7bdeSVipin Varghese
31208db7bdeSVipin Varghese   * For high-performance execution logic ensure running it on correct NUMA
313*cb056611SStephen Hemminger     and worker core.
31408db7bdeSVipin Varghese
3153cd73a1aSThomas Monjalon   * Analyze run logic with ``rte_dump_stack`` and
31608db7bdeSVipin Varghese     ``rte_memdump`` for more insights.
31708db7bdeSVipin Varghese
31808db7bdeSVipin Varghese   * Make use of objdump to ensure opcode is matching to the desired state.
31908db7bdeSVipin Varghese
32008db7bdeSVipin Varghese
32108db7bdeSVipin VargheseIs the execution cycles for dynamic service functions are not frequent?
32208db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32308db7bdeSVipin Varghese
32408db7bdeSVipin Vargheseservice functions on service cores :numref:`dtg_service`.
32508db7bdeSVipin Varghese
32608db7bdeSVipin Varghese.. _dtg_service:
32708db7bdeSVipin Varghese
32808db7bdeSVipin Varghese.. figure:: img/dtg_service.*
32908db7bdeSVipin Varghese
33008db7bdeSVipin Varghese   functions running on service cores
33108db7bdeSVipin Varghese
33208db7bdeSVipin Varghese#. Performance issue isolation
33308db7bdeSVipin Varghese
33408db7bdeSVipin Varghese   * Services configured for parallel execution should have
33508db7bdeSVipin Varghese     ``rte_service_lcore_count`` should be equal to
33608db7bdeSVipin Varghese     ``rte_service_lcore_count_services``.
33708db7bdeSVipin Varghese
33808db7bdeSVipin Varghese   * A service to run parallel on all cores should return
33908db7bdeSVipin Varghese     ``RTE_SERVICE_CAP_MT_SAFE`` for ``rte_service_probe_capability`` and
34008db7bdeSVipin Varghese     ``rte_service_map_lcore_get`` returns unique lcore.
34108db7bdeSVipin Varghese
34208db7bdeSVipin Varghese   * If service function execution cycles for dynamic service functions are
34308db7bdeSVipin Varghese     not frequent?
34408db7bdeSVipin Varghese
34508db7bdeSVipin Varghese   * If services share the lcore, overall execution should fit budget.
34608db7bdeSVipin Varghese
34708db7bdeSVipin Varghese#. Configuration issue isolation
34808db7bdeSVipin Varghese
34908db7bdeSVipin Varghese   * Check if service is running with ``rte_service_runstate_get``.
35008db7bdeSVipin Varghese
35108db7bdeSVipin Varghese   * Generic debug via ``rte_service_dump``.
35208db7bdeSVipin Varghese
35308db7bdeSVipin Varghese
35408db7bdeSVipin VargheseIs there a bottleneck in the performance of eventdev?
35508db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
35608db7bdeSVipin Varghese
35708db7bdeSVipin Varghese#. Check for generic configuration
35808db7bdeSVipin Varghese
35908db7bdeSVipin Varghese   * Ensure the event devices created are right NUMA using
36008db7bdeSVipin Varghese     ``rte_event_dev_count`` and ``rte_event_dev_socket_id``.
36108db7bdeSVipin Varghese
36208db7bdeSVipin Varghese   * Check for event stages if the events are looped back into the same queue.
36308db7bdeSVipin Varghese
36408db7bdeSVipin Varghese   * If the failure is on the enqueue stage for events, check if queue depth
36508db7bdeSVipin Varghese     with ``rte_event_dev_info_get``.
36608db7bdeSVipin Varghese
36708db7bdeSVipin Varghese#. If there are performance drops in the enqueue stage
36808db7bdeSVipin Varghese
36908db7bdeSVipin Varghese   * Use ``rte_event_dev_dump`` to dump the eventdev information.
37008db7bdeSVipin Varghese
37108db7bdeSVipin Varghese   * Periodically checks stats for queue and port to identify the starvation.
37208db7bdeSVipin Varghese
37308db7bdeSVipin Varghese   * Check the in-flight events for the desired queue for enqueue and dequeue.
37408db7bdeSVipin Varghese
37508db7bdeSVipin Varghese
37608db7bdeSVipin VargheseIs there a variance in traffic manager?
37708db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37808db7bdeSVipin Varghese
37908db7bdeSVipin VargheseTraffic Manager on TX interface :numref:`dtg_qos_tx`.
38008db7bdeSVipin Varghese
38108db7bdeSVipin Varghese.. _dtg_qos_tx:
38208db7bdeSVipin Varghese
38308db7bdeSVipin Varghese.. figure:: img/dtg_qos_tx.*
38408db7bdeSVipin Varghese
38508db7bdeSVipin Varghese   Traffic Manager just before TX.
38608db7bdeSVipin Varghese
38708db7bdeSVipin Varghese#. Identify the cause for a variance from expected behavior, is due to
38808db7bdeSVipin Varghese   insufficient CPU cycles. Use ``rte_tm_capabilities_get`` to fetch features
38908db7bdeSVipin Varghese   for hierarchies, WRED and priority schedulers to be offloaded hardware.
39008db7bdeSVipin Varghese
39108db7bdeSVipin Varghese#. Undesired flow drops can be narrowed down to WRED, priority, and rates
39208db7bdeSVipin Varghese   limiters.
39308db7bdeSVipin Varghese
39408db7bdeSVipin Varghese#. Isolate the flow in which the undesired drops occur. Use
39508db7bdeSVipin Varghese   ``rte_tn_get_number_of_leaf_node`` and flow table to ping down the leaf
39608db7bdeSVipin Varghese   where drops occur.
39708db7bdeSVipin Varghese
39808db7bdeSVipin Varghese#. Check the stats using ``rte_tm_stats_update`` and ``rte_tm_node_stats_read``
39908db7bdeSVipin Varghese   for drops for hierarchy, schedulers and WRED configurations.
40008db7bdeSVipin Varghese
40108db7bdeSVipin Varghese
40284ba341cSVipin VargheseIs the packet in the unexpected format?
40384ba341cSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
40408db7bdeSVipin Varghese
40508db7bdeSVipin VarghesePacket capture before and after processing :numref:`dtg_pdump`.
40608db7bdeSVipin Varghese
40708db7bdeSVipin Varghese.. _dtg_pdump:
40808db7bdeSVipin Varghese
40908db7bdeSVipin Varghese.. figure:: img/dtg_pdump.*
41008db7bdeSVipin Varghese
41108db7bdeSVipin Varghese   Capture points of Traffic at RX-TX.
41208db7bdeSVipin Varghese
41308db7bdeSVipin Varghese#. To isolate the possible packet corruption in the processing pipeline,
41408db7bdeSVipin Varghese   carefully staged capture packets are to be implemented.
41508db7bdeSVipin Varghese
41608db7bdeSVipin Varghese   * First, isolate at NIC entry and exit.
41708db7bdeSVipin Varghese
41808db7bdeSVipin Varghese     Use pdump in primary to allow secondary to access port-queue pair. The
41908db7bdeSVipin Varghese     packets get copied over in RX|TX callback by the secondary process using
42008db7bdeSVipin Varghese     ring buffers.
42108db7bdeSVipin Varghese
42208db7bdeSVipin Varghese   * Second, isolate at pipeline entry and exit.
42308db7bdeSVipin Varghese
42408db7bdeSVipin Varghese     Using hooks or callbacks capture the packet middle of the pipeline stage
42508db7bdeSVipin Varghese     to copy the packets, which can be shared to the secondary debug process
42608db7bdeSVipin Varghese     via user-defined custom rings.
42708db7bdeSVipin Varghese
42808db7bdeSVipin Varghese.. note::
42908db7bdeSVipin Varghese
43008db7bdeSVipin Varghese   Use similar analysis to objects and metadata corruption.
43108db7bdeSVipin Varghese
43208db7bdeSVipin Varghese
43308db7bdeSVipin VargheseDoes the issue still persist?
43408db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
43508db7bdeSVipin Varghese
43608db7bdeSVipin VargheseThe issue can be further narrowed down to the following causes.
43708db7bdeSVipin Varghese
43808db7bdeSVipin Varghese#. If there are vendor or application specific metadata, check for errors due
43908db7bdeSVipin Varghese   to META data error flags. Dumping private meta-data in the objects can give
44008db7bdeSVipin Varghese   insight into details for debugging.
44108db7bdeSVipin Varghese
44208db7bdeSVipin Varghese#. If there are multi-process for either data or configuration, check for
44308db7bdeSVipin Varghese   possible errors in the secondary process where the configuration fails and
44408db7bdeSVipin Varghese   possible data corruption in the data plane.
44508db7bdeSVipin Varghese
44608db7bdeSVipin Varghese#. Random drops in the RX or TX when opening other application is an indication
44708db7bdeSVipin Varghese   of the effect of a noisy neighbor. Try using the cache allocation technique
44808db7bdeSVipin Varghese   to minimize the effect between applications.
44908db7bdeSVipin Varghese
45008db7bdeSVipin Varghese
45108db7bdeSVipin VargheseHow to develop a custom code to debug?
45208db7bdeSVipin Varghese--------------------------------------
45308db7bdeSVipin Varghese
45408db7bdeSVipin Varghese#. For an application that runs as the primary process only, debug functionality
45508db7bdeSVipin Varghese   is added in the same process. These can be invoked by timer call-back,
45608db7bdeSVipin Varghese   service core and signal handler.
45708db7bdeSVipin Varghese
45808db7bdeSVipin Varghese#. For the application that runs as multiple processes. debug functionality in
45908db7bdeSVipin Varghese   a standalone secondary process.
460