xref: /dpdk/doc/guides/howto/debug_troubleshoot.rst (revision 08db7bde16178873851567db7f5f27d49dbd688b)
1*08db7bdeSVipin Varghese..  SPDX-License-Identifier: BSD-3-Clause
2*08db7bdeSVipin Varghese    Copyright(c) 2018 Intel Corporation.
3*08db7bdeSVipin Varghese
4*08db7bdeSVipin VargheseDebug & Troubleshoot guide
5*08db7bdeSVipin Varghese==========================
6*08db7bdeSVipin Varghese
7*08db7bdeSVipin VargheseDPDK applications can be designed to have simple or complex pipeline processing
8*08db7bdeSVipin Varghesestages making use of single or multiple threads. Applications can use poll mode
9*08db7bdeSVipin Varghesehardware devices which helps in offloading CPU cycles too. It is common to find
10*08db7bdeSVipin Varghesesolutions designed with
11*08db7bdeSVipin Varghese
12*08db7bdeSVipin Varghese* single or multiple primary processes
13*08db7bdeSVipin Varghese
14*08db7bdeSVipin Varghese* single primary and single secondary
15*08db7bdeSVipin Varghese
16*08db7bdeSVipin Varghese* single primary and multiple secondaries
17*08db7bdeSVipin Varghese
18*08db7bdeSVipin VargheseIn all the above cases, it is tedious to isolate, debug, and understand various
19*08db7bdeSVipin Varghesebehaviors which occur randomly or periodically. The goal of the guide is to
20*08db7bdeSVipin Vargheseconsolidate a few commonly seen issues for reference. Then, isolate to identify
21*08db7bdeSVipin Varghesethe root cause through step by step debug at various stages.
22*08db7bdeSVipin Varghese
23*08db7bdeSVipin Varghese.. note::
24*08db7bdeSVipin Varghese
25*08db7bdeSVipin Varghese It is difficult to cover all possible issues; in a single attempt. With
26*08db7bdeSVipin Varghese feedback and suggestions from the community, more cases can be covered.
27*08db7bdeSVipin Varghese
28*08db7bdeSVipin Varghese
29*08db7bdeSVipin VargheseApplication Overview
30*08db7bdeSVipin Varghese--------------------
31*08db7bdeSVipin Varghese
32*08db7bdeSVipin VargheseBy making use of the application model as a reference, we can discuss multiple
33*08db7bdeSVipin Varghesecauses of issues in the guide. Let us assume the sample makes use of a single
34*08db7bdeSVipin Vargheseprimary process, with various processing stages running on multiple cores. The
35*08db7bdeSVipin Vargheseapplication may also make uses of Poll Mode Driver, and libraries like service
36*08db7bdeSVipin Varghesecores, mempool, mbuf, eventdev, cryptodev, QoS, and ethdev.
37*08db7bdeSVipin Varghese
38*08db7bdeSVipin VargheseThe overview of an application modeled using PMD is shown in
39*08db7bdeSVipin Varghese:numref:`dtg_sample_app_model`.
40*08db7bdeSVipin Varghese
41*08db7bdeSVipin Varghese.. _dtg_sample_app_model:
42*08db7bdeSVipin Varghese
43*08db7bdeSVipin Varghese.. figure:: img/dtg_sample_app_model.*
44*08db7bdeSVipin Varghese
45*08db7bdeSVipin Varghese   Overview of pipeline stage of an application
46*08db7bdeSVipin Varghese
47*08db7bdeSVipin Varghese
48*08db7bdeSVipin VargheseBottleneck Analysis
49*08db7bdeSVipin Varghese-------------------
50*08db7bdeSVipin Varghese
51*08db7bdeSVipin VargheseA couple of factors that lead the design decision could be the platform, scale
52*08db7bdeSVipin Varghesefactor, and target. This distinct preference leads to multiple combinations,
53*08db7bdeSVipin Varghesethat are built using PMD and libraries of DPDK. While the compiler, library
54*08db7bdeSVipin Varghesemode, and optimization flags are the components are to be constant, that
55*08db7bdeSVipin Vargheseaffects the application too.
56*08db7bdeSVipin Varghese
57*08db7bdeSVipin Varghese
58*08db7bdeSVipin VargheseIs there mismatch in packet (received < desired) rate?
59*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60*08db7bdeSVipin Varghese
61*08db7bdeSVipin VargheseRX Port and associated core :numref:`dtg_rx_rate`.
62*08db7bdeSVipin Varghese
63*08db7bdeSVipin Varghese.. _dtg_rx_rate:
64*08db7bdeSVipin Varghese
65*08db7bdeSVipin Varghese.. figure:: img/dtg_rx_rate.*
66*08db7bdeSVipin Varghese
67*08db7bdeSVipin Varghese   RX packet rate compared against received rate.
68*08db7bdeSVipin Varghese
69*08db7bdeSVipin Varghese#. Is the configuration for the RX setup correctly?
70*08db7bdeSVipin Varghese
71*08db7bdeSVipin Varghese   * Identify if port Speed and Duplex is matching to desired values with
72*08db7bdeSVipin Varghese     ``rte_eth_link_get``.
73*08db7bdeSVipin Varghese
74*08db7bdeSVipin Varghese   * Check ``DEV_RX_OFFLOAD_JUMBO_FRAME`` is set with ``rte_eth_dev_info_get``.
75*08db7bdeSVipin Varghese
76*08db7bdeSVipin Varghese   * Check promiscuous mode if the drops do not occur for unique MAC address
77*08db7bdeSVipin Varghese     with ``rte_eth_promiscuous_get``.
78*08db7bdeSVipin Varghese
79*08db7bdeSVipin Varghese#. Is the drop isolated to certain NIC only?
80*08db7bdeSVipin Varghese
81*08db7bdeSVipin Varghese   * Make use of ``rte_eth_dev_stats`` to identify the drops cause.
82*08db7bdeSVipin Varghese
83*08db7bdeSVipin Varghese   * If there are mbuf drops, check nb_desc for RX descriptor as it might not
84*08db7bdeSVipin Varghese     be sufficient for the application.
85*08db7bdeSVipin Varghese
86*08db7bdeSVipin Varghese   * If ``rte_eth_dev_stats`` shows drops are on specific RX queues, ensure RX
87*08db7bdeSVipin Varghese     lcore threads has enough cycles for ``rte_eth_rx_burst`` on the port queue
88*08db7bdeSVipin Varghese     pair.
89*08db7bdeSVipin Varghese
90*08db7bdeSVipin Varghese   * If there are redirect to a specific port queue pair with, ensure RX lcore
91*08db7bdeSVipin Varghese     threads gets enough cycles.
92*08db7bdeSVipin Varghese
93*08db7bdeSVipin Varghese   * Check the RSS configuration ``rte_eth_dev_rss_hash_conf_get`` if the
94*08db7bdeSVipin Varghese     spread is not even and causing drops.
95*08db7bdeSVipin Varghese
96*08db7bdeSVipin Varghese   * If PMD stats are not updating, then there might be offload or configuration
97*08db7bdeSVipin Varghese     which is dropping the incoming traffic.
98*08db7bdeSVipin Varghese
99*08db7bdeSVipin Varghese#. Is there drops still seen?
100*08db7bdeSVipin Varghese
101*08db7bdeSVipin Varghese   * If there are multiple port queue pair, it might be the RX thread, RX
102*08db7bdeSVipin Varghese     distributor, or event RX adapter not having enough cycles.
103*08db7bdeSVipin Varghese
104*08db7bdeSVipin Varghese   * If there are drops seen for RX adapter or RX distributor, try using
105*08db7bdeSVipin Varghese     ``rte_prefetch_non_temporal`` which intimates the core that the mbuf in the
106*08db7bdeSVipin Varghese     cache is temporary.
107*08db7bdeSVipin Varghese
108*08db7bdeSVipin Varghese
109*08db7bdeSVipin VargheseIs there packet drops at receive or transmit?
110*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
111*08db7bdeSVipin Varghese
112*08db7bdeSVipin VargheseRX-TX port and associated cores :numref:`dtg_rx_tx_drop`.
113*08db7bdeSVipin Varghese
114*08db7bdeSVipin Varghese.. _dtg_rx_tx_drop:
115*08db7bdeSVipin Varghese
116*08db7bdeSVipin Varghese.. figure:: img/dtg_rx_tx_drop.*
117*08db7bdeSVipin Varghese
118*08db7bdeSVipin Varghese   RX-TX drops
119*08db7bdeSVipin Varghese
120*08db7bdeSVipin Varghese#. At RX
121*08db7bdeSVipin Varghese
122*08db7bdeSVipin Varghese   * Identify if there are multiple RX queue configured for port by
123*08db7bdeSVipin Varghese     ``nb_rx_queues`` using ``rte_eth_dev_info_get``.
124*08db7bdeSVipin Varghese
125*08db7bdeSVipin Varghese   * Using ``rte_eth_dev_stats`` fetch drops in q_errors, check if RX thread
126*08db7bdeSVipin Varghese     is configured to fetch packets from the port queue pair.
127*08db7bdeSVipin Varghese
128*08db7bdeSVipin Varghese   * Using ``rte_eth_dev_stats`` shows drops in ``rx_nombuf``, check if RX
129*08db7bdeSVipin Varghese     thread has enough cycles to consume the packets from the queue.
130*08db7bdeSVipin Varghese
131*08db7bdeSVipin Varghese#. At TX
132*08db7bdeSVipin Varghese
133*08db7bdeSVipin Varghese   * If the TX rate is falling behind the application fill rate, identify if
134*08db7bdeSVipin Varghese     there are enough descriptors with ``rte_eth_dev_info_get`` for TX.
135*08db7bdeSVipin Varghese
136*08db7bdeSVipin Varghese   * Check the ``nb_pkt`` in ``rte_eth_tx_burst`` is done for multiple packets.
137*08db7bdeSVipin Varghese
138*08db7bdeSVipin Varghese   * Check ``rte_eth_tx_burst`` invokes the vector function call for the PMD.
139*08db7bdeSVipin Varghese
140*08db7bdeSVipin Varghese   * If oerrors are getting incremented, TX packet validations are failing.
141*08db7bdeSVipin Varghese     Check if there queue specific offload failures.
142*08db7bdeSVipin Varghese
143*08db7bdeSVipin Varghese   * If the drops occur for large size packets, check MTU and multi-segment
144*08db7bdeSVipin Varghese     support configured for NIC.
145*08db7bdeSVipin Varghese
146*08db7bdeSVipin Varghese
147*08db7bdeSVipin VargheseIs there object drops in producer point for the ring library?
148*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149*08db7bdeSVipin Varghese
150*08db7bdeSVipin VargheseProducer point for ring :numref:`dtg_producer_ring`.
151*08db7bdeSVipin Varghese
152*08db7bdeSVipin Varghese.. _dtg_producer_ring:
153*08db7bdeSVipin Varghese
154*08db7bdeSVipin Varghese.. figure:: img/dtg_producer_ring.*
155*08db7bdeSVipin Varghese
156*08db7bdeSVipin Varghese   Producer point for Rings
157*08db7bdeSVipin Varghese
158*08db7bdeSVipin Varghese#. Performance issue isolation at producer
159*08db7bdeSVipin Varghese
160*08db7bdeSVipin Varghese   * Use ``rte_ring_dump`` to validate for all single producer flag is set to
161*08db7bdeSVipin Varghese     ``RING_F_SP_ENQ``.
162*08db7bdeSVipin Varghese
163*08db7bdeSVipin Varghese   * There should be sufficient ``rte_ring_free_count`` at any point in time.
164*08db7bdeSVipin Varghese
165*08db7bdeSVipin Varghese   * Extreme stalls in dequeue stage of the pipeline will cause
166*08db7bdeSVipin Varghese     ``rte_ring_full`` to be true.
167*08db7bdeSVipin Varghese
168*08db7bdeSVipin Varghese
169*08db7bdeSVipin VargheseIs there object drops in consumer point for the ring library?
170*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
171*08db7bdeSVipin Varghese
172*08db7bdeSVipin VargheseConsumer point for ring :numref:`dtg_consumer_ring`.
173*08db7bdeSVipin Varghese
174*08db7bdeSVipin Varghese.. _dtg_consumer_ring:
175*08db7bdeSVipin Varghese
176*08db7bdeSVipin Varghese.. figure:: img/dtg_consumer_ring.*
177*08db7bdeSVipin Varghese
178*08db7bdeSVipin Varghese   Consumer point for Rings
179*08db7bdeSVipin Varghese
180*08db7bdeSVipin Varghese#. Performance issue isolation at consumer
181*08db7bdeSVipin Varghese
182*08db7bdeSVipin Varghese   * Use ``rte_ring_dump`` to validate for all single consumer flag is set to
183*08db7bdeSVipin Varghese     ``RING_F_SC_DEQ``.
184*08db7bdeSVipin Varghese
185*08db7bdeSVipin Varghese   * If the desired burst dequeue falls behind the actual dequeue, the enqueue
186*08db7bdeSVipin Varghese     stage is not filling up the ring as required.
187*08db7bdeSVipin Varghese
188*08db7bdeSVipin Varghese   * Extreme stall in the enqueue will lead to ``rte_ring_empty`` to be true.
189*08db7bdeSVipin Varghese
190*08db7bdeSVipin Varghese
191*08db7bdeSVipin VargheseIs there a variance in packet or object processing rate in the pipeline?
192*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193*08db7bdeSVipin Varghese
194*08db7bdeSVipin VargheseMemory objects close to NUMA :numref:`dtg_mempool`.
195*08db7bdeSVipin Varghese
196*08db7bdeSVipin Varghese.. _dtg_mempool:
197*08db7bdeSVipin Varghese
198*08db7bdeSVipin Varghese.. figure:: img/dtg_mempool.*
199*08db7bdeSVipin Varghese
200*08db7bdeSVipin Varghese   Memory objects have to be close to the device per NUMA.
201*08db7bdeSVipin Varghese
202*08db7bdeSVipin Varghese#. Stall in processing pipeline can be attributes of MBUF release delays.
203*08db7bdeSVipin Varghese   These can be narrowed down to
204*08db7bdeSVipin Varghese
205*08db7bdeSVipin Varghese   * Heavy processing cycles at single or multiple processing stages.
206*08db7bdeSVipin Varghese
207*08db7bdeSVipin Varghese   * Cache is spread due to the increased stages in the pipeline.
208*08db7bdeSVipin Varghese
209*08db7bdeSVipin Varghese   * CPU thread responsible for TX is not able to keep up with the burst of
210*08db7bdeSVipin Varghese     traffic.
211*08db7bdeSVipin Varghese
212*08db7bdeSVipin Varghese   * Extra cycles to linearize multi-segment buffer and software offload like
213*08db7bdeSVipin Varghese     checksum, TSO, and VLAN strip.
214*08db7bdeSVipin Varghese
215*08db7bdeSVipin Varghese   * Packet buffer copy in fast path also results in stalls in MBUF release if
216*08db7bdeSVipin Varghese     not done selectively.
217*08db7bdeSVipin Varghese
218*08db7bdeSVipin Varghese   * Application logic sets ``rte_pktmbuf_refcnt_set`` to higher than the
219*08db7bdeSVipin Varghese     desired value and frequently uses ``rte_pktmbuf_prefree_seg`` and does
220*08db7bdeSVipin Varghese     not release MBUF back to mempool.
221*08db7bdeSVipin Varghese
222*08db7bdeSVipin Varghese#. Lower performance between the pipeline processing stages can be
223*08db7bdeSVipin Varghese
224*08db7bdeSVipin Varghese   * The NUMA instance for packets or objects from NIC, mempool, and ring
225*08db7bdeSVipin Varghese     should be the same.
226*08db7bdeSVipin Varghese
227*08db7bdeSVipin Varghese   * Drops on a specific socket are due to insufficient objects in the pool.
228*08db7bdeSVipin Varghese     Use ``rte_mempool_get_count`` or ``rte_mempool_avail_count`` to monitor
229*08db7bdeSVipin Varghese     when drops occurs.
230*08db7bdeSVipin Varghese
231*08db7bdeSVipin Varghese   * Try prefetching the content in processing pipeline logic to minimize the
232*08db7bdeSVipin Varghese     stalls.
233*08db7bdeSVipin Varghese
234*08db7bdeSVipin Varghese#. Performance issue can be due to special cases
235*08db7bdeSVipin Varghese
236*08db7bdeSVipin Varghese   * Check if MBUF continuous with ``rte_pktmbuf_is_contiguous`` as certain
237*08db7bdeSVipin Varghese     offload requires the same.
238*08db7bdeSVipin Varghese
239*08db7bdeSVipin Varghese   * Use ``rte_mempool_cache_create`` for user threads require access to
240*08db7bdeSVipin Varghese     mempool objects.
241*08db7bdeSVipin Varghese
242*08db7bdeSVipin Varghese   * If the variance is absent for larger huge pages, then try rte_mem_lock_page
243*08db7bdeSVipin Varghese     on the objects, packets, lookup tables to isolate the issue.
244*08db7bdeSVipin Varghese
245*08db7bdeSVipin Varghese
246*08db7bdeSVipin VargheseIs there a variance in cryptodev performance?
247*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248*08db7bdeSVipin Varghese
249*08db7bdeSVipin VargheseCrypto device and PMD :numref:`dtg_crypto`.
250*08db7bdeSVipin Varghese
251*08db7bdeSVipin Varghese.. _dtg_crypto:
252*08db7bdeSVipin Varghese
253*08db7bdeSVipin Varghese.. figure:: img/dtg_crypto.*
254*08db7bdeSVipin Varghese
255*08db7bdeSVipin Varghese   CRYPTO and interaction with PMD device.
256*08db7bdeSVipin Varghese
257*08db7bdeSVipin Varghese#. Performance issue isolation for enqueue
258*08db7bdeSVipin Varghese
259*08db7bdeSVipin Varghese   * Ensure cryptodev, resources and enqueue is running on NUMA cores.
260*08db7bdeSVipin Varghese
261*08db7bdeSVipin Varghese   * Isolate if the cause of errors for err_count using ``rte_cryptodev_stats``.
262*08db7bdeSVipin Varghese
263*08db7bdeSVipin Varghese   * Parallelize enqueue thread for varied multiple queue pair.
264*08db7bdeSVipin Varghese
265*08db7bdeSVipin Varghese#. Performance issue isolation for dequeue
266*08db7bdeSVipin Varghese
267*08db7bdeSVipin Varghese   * Ensure cryptodev, resources and dequeue are running on NUMA cores.
268*08db7bdeSVipin Varghese
269*08db7bdeSVipin Varghese   * Isolate if the cause of errors for err_count using ``rte_cryptodev_stats``.
270*08db7bdeSVipin Varghese
271*08db7bdeSVipin Varghese   * Parallelize dequeue thread for varied multiple queue pair.
272*08db7bdeSVipin Varghese
273*08db7bdeSVipin Varghese#. Performance issue isolation for crypto operation
274*08db7bdeSVipin Varghese
275*08db7bdeSVipin Varghese   * If the cryptodev software-assist is in use, ensure the library is built
276*08db7bdeSVipin Varghese     with right (SIMD) flags or check if the queue pair using CPU ISA for
277*08db7bdeSVipin Varghese     feature_flags AVX|SSE|NEON using ``rte_cryptodev_info_get``.
278*08db7bdeSVipin Varghese
279*08db7bdeSVipin Varghese   * If the cryptodev hardware-assist is in use, ensure both firmware and
280*08db7bdeSVipin Varghese     drivers are up to date.
281*08db7bdeSVipin Varghese
282*08db7bdeSVipin Varghese#. Configuration issue isolation
283*08db7bdeSVipin Varghese
284*08db7bdeSVipin Varghese   * Identify cryptodev instances with ``rte_cryptodev_count`` and
285*08db7bdeSVipin Varghese     ``rte_cryptodev_info_get``.
286*08db7bdeSVipin Varghese
287*08db7bdeSVipin Varghese
288*08db7bdeSVipin VargheseIs user functions performance is not as expected?
289*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
290*08db7bdeSVipin Varghese
291*08db7bdeSVipin VargheseCustom worker function :numref:`dtg_distributor_worker`.
292*08db7bdeSVipin Varghese
293*08db7bdeSVipin Varghese.. _dtg_distributor_worker:
294*08db7bdeSVipin Varghese
295*08db7bdeSVipin Varghese.. figure:: img/dtg_distributor_worker.*
296*08db7bdeSVipin Varghese
297*08db7bdeSVipin Varghese   Custom worker function performance drops.
298*08db7bdeSVipin Varghese
299*08db7bdeSVipin Varghese#. Performance issue isolation
300*08db7bdeSVipin Varghese
301*08db7bdeSVipin Varghese   * The functions running on CPU cores without context switches are the
302*08db7bdeSVipin Varghese     performing scenarios. Identify lcore with ``rte_lcore`` and lcore index
303*08db7bdeSVipin Varghese     mapping with CPU using ``rte_lcore_index``.
304*08db7bdeSVipin Varghese
305*08db7bdeSVipin Varghese   * The functions running on CPU cores without context switches are the
306*08db7bdeSVipin Varghese     performing scenarios. Identify lcore with ``rte_lcore`` and lcore index
307*08db7bdeSVipin Varghese     mapping with CPU using ``rte_lcore_index``.
308*08db7bdeSVipin Varghese
309*08db7bdeSVipin Varghese   * Use ``rte_thread_get_affinity`` to isolate functions running on the same
310*08db7bdeSVipin Varghese     CPU core.
311*08db7bdeSVipin Varghese
312*08db7bdeSVipin Varghese#. Configuration issue isolation
313*08db7bdeSVipin Varghese
314*08db7bdeSVipin Varghese   * Identify core role using ``rte_eal_lcore_role`` to identify RTE, OFF and
315*08db7bdeSVipin Varghese     SERVICE. Check performance functions are mapped to run on the cores.
316*08db7bdeSVipin Varghese
317*08db7bdeSVipin Varghese   * For high-performance execution logic ensure running it on correct NUMA
318*08db7bdeSVipin Varghese     and non-master core.
319*08db7bdeSVipin Varghese
320*08db7bdeSVipin Varghese   * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
321*08db7bdeSVipin Varghese     ``rte_memdump`` for more insights.
322*08db7bdeSVipin Varghese
323*08db7bdeSVipin Varghese   * Make use of objdump to ensure opcode is matching to the desired state.
324*08db7bdeSVipin Varghese
325*08db7bdeSVipin Varghese
326*08db7bdeSVipin VargheseIs the execution cycles for dynamic service functions are not frequent?
327*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
328*08db7bdeSVipin Varghese
329*08db7bdeSVipin Vargheseservice functions on service cores :numref:`dtg_service`.
330*08db7bdeSVipin Varghese
331*08db7bdeSVipin Varghese.. _dtg_service:
332*08db7bdeSVipin Varghese
333*08db7bdeSVipin Varghese.. figure:: img/dtg_service.*
334*08db7bdeSVipin Varghese
335*08db7bdeSVipin Varghese   functions running on service cores
336*08db7bdeSVipin Varghese
337*08db7bdeSVipin Varghese#. Performance issue isolation
338*08db7bdeSVipin Varghese
339*08db7bdeSVipin Varghese   * Services configured for parallel execution should have
340*08db7bdeSVipin Varghese     ``rte_service_lcore_count`` should be equal to
341*08db7bdeSVipin Varghese     ``rte_service_lcore_count_services``.
342*08db7bdeSVipin Varghese
343*08db7bdeSVipin Varghese   * A service to run parallel on all cores should return
344*08db7bdeSVipin Varghese     ``RTE_SERVICE_CAP_MT_SAFE`` for ``rte_service_probe_capability`` and
345*08db7bdeSVipin Varghese     ``rte_service_map_lcore_get`` returns unique lcore.
346*08db7bdeSVipin Varghese
347*08db7bdeSVipin Varghese   * If service function execution cycles for dynamic service functions are
348*08db7bdeSVipin Varghese     not frequent?
349*08db7bdeSVipin Varghese
350*08db7bdeSVipin Varghese   * If services share the lcore, overall execution should fit budget.
351*08db7bdeSVipin Varghese
352*08db7bdeSVipin Varghese#. Configuration issue isolation
353*08db7bdeSVipin Varghese
354*08db7bdeSVipin Varghese   * Check if service is running with ``rte_service_runstate_get``.
355*08db7bdeSVipin Varghese
356*08db7bdeSVipin Varghese   * Generic debug via ``rte_service_dump``.
357*08db7bdeSVipin Varghese
358*08db7bdeSVipin Varghese
359*08db7bdeSVipin VargheseIs there a bottleneck in the performance of eventdev?
360*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
361*08db7bdeSVipin Varghese
362*08db7bdeSVipin Varghese#. Check for generic configuration
363*08db7bdeSVipin Varghese
364*08db7bdeSVipin Varghese   * Ensure the event devices created are right NUMA using
365*08db7bdeSVipin Varghese     ``rte_event_dev_count`` and ``rte_event_dev_socket_id``.
366*08db7bdeSVipin Varghese
367*08db7bdeSVipin Varghese   * Check for event stages if the events are looped back into the same queue.
368*08db7bdeSVipin Varghese
369*08db7bdeSVipin Varghese   * If the failure is on the enqueue stage for events, check if queue depth
370*08db7bdeSVipin Varghese     with ``rte_event_dev_info_get``.
371*08db7bdeSVipin Varghese
372*08db7bdeSVipin Varghese#. If there are performance drops in the enqueue stage
373*08db7bdeSVipin Varghese
374*08db7bdeSVipin Varghese   * Use ``rte_event_dev_dump`` to dump the eventdev information.
375*08db7bdeSVipin Varghese
376*08db7bdeSVipin Varghese   * Periodically checks stats for queue and port to identify the starvation.
377*08db7bdeSVipin Varghese
378*08db7bdeSVipin Varghese   * Check the in-flight events for the desired queue for enqueue and dequeue.
379*08db7bdeSVipin Varghese
380*08db7bdeSVipin Varghese
381*08db7bdeSVipin VargheseIs there a variance in traffic manager?
382*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383*08db7bdeSVipin Varghese
384*08db7bdeSVipin VargheseTraffic Manager on TX interface :numref:`dtg_qos_tx`.
385*08db7bdeSVipin Varghese
386*08db7bdeSVipin Varghese.. _dtg_qos_tx:
387*08db7bdeSVipin Varghese
388*08db7bdeSVipin Varghese.. figure:: img/dtg_qos_tx.*
389*08db7bdeSVipin Varghese
390*08db7bdeSVipin Varghese   Traffic Manager just before TX.
391*08db7bdeSVipin Varghese
392*08db7bdeSVipin Varghese#. Identify the cause for a variance from expected behavior, is due to
393*08db7bdeSVipin Varghese   insufficient CPU cycles. Use ``rte_tm_capabilities_get`` to fetch features
394*08db7bdeSVipin Varghese   for hierarchies, WRED and priority schedulers to be offloaded hardware.
395*08db7bdeSVipin Varghese
396*08db7bdeSVipin Varghese#. Undesired flow drops can be narrowed down to WRED, priority, and rates
397*08db7bdeSVipin Varghese   limiters.
398*08db7bdeSVipin Varghese
399*08db7bdeSVipin Varghese#. Isolate the flow in which the undesired drops occur. Use
400*08db7bdeSVipin Varghese   ``rte_tn_get_number_of_leaf_node`` and flow table to ping down the leaf
401*08db7bdeSVipin Varghese   where drops occur.
402*08db7bdeSVipin Varghese
403*08db7bdeSVipin Varghese#. Check the stats using ``rte_tm_stats_update`` and ``rte_tm_node_stats_read``
404*08db7bdeSVipin Varghese   for drops for hierarchy, schedulers and WRED configurations.
405*08db7bdeSVipin Varghese
406*08db7bdeSVipin Varghese
407*08db7bdeSVipin VargheseIs the packet not in the unexpected format?
408*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
409*08db7bdeSVipin Varghese
410*08db7bdeSVipin VarghesePacket capture before and after processing :numref:`dtg_pdump`.
411*08db7bdeSVipin Varghese
412*08db7bdeSVipin Varghese.. _dtg_pdump:
413*08db7bdeSVipin Varghese
414*08db7bdeSVipin Varghese.. figure:: img/dtg_pdump.*
415*08db7bdeSVipin Varghese
416*08db7bdeSVipin Varghese   Capture points of Traffic at RX-TX.
417*08db7bdeSVipin Varghese
418*08db7bdeSVipin Varghese#. To isolate the possible packet corruption in the processing pipeline,
419*08db7bdeSVipin Varghese   carefully staged capture packets are to be implemented.
420*08db7bdeSVipin Varghese
421*08db7bdeSVipin Varghese   * First, isolate at NIC entry and exit.
422*08db7bdeSVipin Varghese
423*08db7bdeSVipin Varghese     Use pdump in primary to allow secondary to access port-queue pair. The
424*08db7bdeSVipin Varghese     packets get copied over in RX|TX callback by the secondary process using
425*08db7bdeSVipin Varghese     ring buffers.
426*08db7bdeSVipin Varghese
427*08db7bdeSVipin Varghese   * Second, isolate at pipeline entry and exit.
428*08db7bdeSVipin Varghese
429*08db7bdeSVipin Varghese     Using hooks or callbacks capture the packet middle of the pipeline stage
430*08db7bdeSVipin Varghese     to copy the packets, which can be shared to the secondary debug process
431*08db7bdeSVipin Varghese     via user-defined custom rings.
432*08db7bdeSVipin Varghese
433*08db7bdeSVipin Varghese.. note::
434*08db7bdeSVipin Varghese
435*08db7bdeSVipin Varghese   Use similar analysis to objects and metadata corruption.
436*08db7bdeSVipin Varghese
437*08db7bdeSVipin Varghese
438*08db7bdeSVipin VargheseDoes the issue still persist?
439*08db7bdeSVipin Varghese~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
440*08db7bdeSVipin Varghese
441*08db7bdeSVipin VargheseThe issue can be further narrowed down to the following causes.
442*08db7bdeSVipin Varghese
443*08db7bdeSVipin Varghese#. If there are vendor or application specific metadata, check for errors due
444*08db7bdeSVipin Varghese   to META data error flags. Dumping private meta-data in the objects can give
445*08db7bdeSVipin Varghese   insight into details for debugging.
446*08db7bdeSVipin Varghese
447*08db7bdeSVipin Varghese#. If there are multi-process for either data or configuration, check for
448*08db7bdeSVipin Varghese   possible errors in the secondary process where the configuration fails and
449*08db7bdeSVipin Varghese   possible data corruption in the data plane.
450*08db7bdeSVipin Varghese
451*08db7bdeSVipin Varghese#. Random drops in the RX or TX when opening other application is an indication
452*08db7bdeSVipin Varghese   of the effect of a noisy neighbor. Try using the cache allocation technique
453*08db7bdeSVipin Varghese   to minimize the effect between applications.
454*08db7bdeSVipin Varghese
455*08db7bdeSVipin Varghese
456*08db7bdeSVipin VargheseHow to develop a custom code to debug?
457*08db7bdeSVipin Varghese--------------------------------------
458*08db7bdeSVipin Varghese
459*08db7bdeSVipin Varghese#. For an application that runs as the primary process only, debug functionality
460*08db7bdeSVipin Varghese   is added in the same process. These can be invoked by timer call-back,
461*08db7bdeSVipin Varghese   service core and signal handler.
462*08db7bdeSVipin Varghese
463*08db7bdeSVipin Varghese#. For the application that runs as multiple processes. debug functionality in
464*08db7bdeSVipin Varghese   a standalone secondary process.
465