xref: /dpdk/doc/guides/prog_guide/profile_app.rst (revision a67ce452919a3381ccdc53af7bcc0196838b3586)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2014 Intel Corporation.
3fc1f2750SBernard Iremonger
4fc1f2750SBernard IremongerProfile Your Application
5fc1f2750SBernard Iremonger========================
6fc1f2750SBernard Iremonger
79d5ba88cSJerin JacobThe following sections describe methods of profiling DPDK applications on
89d5ba88cSJerin Jacobdifferent architectures.
99d5ba88cSJerin Jacob
109d5ba88cSJerin Jacob
119d5ba88cSJerin JacobProfiling on x86
129d5ba88cSJerin Jacob----------------
139d5ba88cSJerin Jacob
14fc1f2750SBernard IremongerIntel processors provide performance counters to monitor events.
155dce9fcdSIlia KurakinSome tools provided by Intel, such as Intel® VTune™ Amplifier, can be used
165dce9fcdSIlia Kurakinto profile and benchmark an application.
1748624fd9SSiobhan ButlerSee the *VTune Performance Analyzer Essentials* publication from Intel Press for more information.
18fc1f2750SBernard Iremonger
1948624fd9SSiobhan ButlerFor a DPDK application, this can be done in a Linux* application environment only.
20fc1f2750SBernard Iremonger
21fc1f2750SBernard IremongerThe main situations that should be monitored through event counters are:
22fc1f2750SBernard Iremonger
23fc1f2750SBernard Iremonger*   Cache misses
24fc1f2750SBernard Iremonger
25fc1f2750SBernard Iremonger*   Branch mis-predicts
26fc1f2750SBernard Iremonger
27fc1f2750SBernard Iremonger*   DTLB misses
28fc1f2750SBernard Iremonger
29fc1f2750SBernard Iremonger*   Long latency instructions and exceptions
30fc1f2750SBernard Iremonger
31fc1f2750SBernard IremongerRefer to the
32fc1f2750SBernard Iremonger`Intel Performance Analysis Guide <http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf>`_
33fc1f2750SBernard Iremongerfor details about application profiling.
349d5ba88cSJerin Jacob
359d5ba88cSJerin Jacob
366a9d1e28SEugeny ParshutinProfiling with VTune
376a9d1e28SEugeny Parshutin~~~~~~~~~~~~~~~~~~~~
386a9d1e28SEugeny Parshutin
396a9d1e28SEugeny ParshutinTo allow VTune attaching to the DPDK application, reconfigure a DPDK build
406a9d1e28SEugeny Parshutinfolder by passing ``-Dc_args=-DRTE_ETHDEV_PROFILE_WITH_VTUNE`` meson option
416a9d1e28SEugeny Parshutinand recompile the DPDK:
426a9d1e28SEugeny Parshutin
436a9d1e28SEugeny Parshutin.. code-block:: console
446a9d1e28SEugeny Parshutin
45e24b8ad4SStephen Hemminger   meson setup build
466a9d1e28SEugeny Parshutin   meson configure build -Dc_args=-DRTE_ETHDEV_PROFILE_WITH_VTUNE
476a9d1e28SEugeny Parshutin   ninja -C build
486a9d1e28SEugeny Parshutin
496a9d1e28SEugeny Parshutin
509d5ba88cSJerin JacobProfiling on ARM64
519d5ba88cSJerin Jacob------------------
529d5ba88cSJerin Jacob
539d5ba88cSJerin JacobUsing Linux perf
549d5ba88cSJerin Jacob~~~~~~~~~~~~~~~~
559d5ba88cSJerin Jacob
569d5ba88cSJerin JacobThe ARM64 architecture provide performance counters to monitor events.  The
579d5ba88cSJerin JacobLinux ``perf`` tool can be used to profile and benchmark an application.  In
589d5ba88cSJerin Jacobaddition to the standard events, ``perf`` can be used to profile arm64
599d5ba88cSJerin Jacobspecific PMU (Performance Monitor Unit) events through raw events (``-e``
609d5ba88cSJerin Jacob``-rXX``).
619d5ba88cSJerin Jacob
62*a67ce452SEmi AokiFor more details refer to the
639d5ba88cSJerin Jacob`ARM64 specific PMU events enumeration <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100095_0002_04_en/way1382543438508.html>`_.
649d5ba88cSJerin Jacob
659d5ba88cSJerin Jacob
66975a75c0SDharmik ThakkarLow-resolution generic counter
67975a75c0SDharmik Thakkar~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
689d5ba88cSJerin Jacob
699d5ba88cSJerin JacobThe default ``cntvct_el0`` based ``rte_rdtsc()`` provides a portable means to
70975a75c0SDharmik Thakkarget a wall clock counter in user space. Typically it runs at a lower clock frequency than the CPU clock frequency.
71975a75c0SDharmik ThakkarCycles counted using this method should be scaled to CPU clock frequency.
72975a75c0SDharmik Thakkar
73975a75c0SDharmik Thakkar
74975a75c0SDharmik ThakkarHigh-resolution cycle counter
75975a75c0SDharmik Thakkar~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
769d5ba88cSJerin Jacob
779d5ba88cSJerin JacobThe alternative method to enable ``rte_rdtsc()`` for a high resolution wall
78d629b7b5SJohn McNamaraclock counter is through the ARMv8 PMU subsystem. The PMU cycle counter runs
799d5ba88cSJerin Jacobat CPU frequency. However, access to the PMU cycle counter from user space is
809d5ba88cSJerin Jacobnot enabled by default in the arm64 linux kernel. It is possible to enable
819d5ba88cSJerin Jacobcycle counter for user space access by configuring the PMU from the privileged
829d5ba88cSJerin Jacobmode (kernel space).
839d5ba88cSJerin Jacob
849d5ba88cSJerin JacobBy default the ``rte_rdtsc()`` implementation uses a portable ``cntvct_el0``
8589c67ae2SCiara Powerscheme.
869d5ba88cSJerin Jacob
879d5ba88cSJerin JacobThe example below shows the steps to configure the PMU based cycle counter on
88d629b7b5SJohn McNamaraan ARMv8 machine.
899d5ba88cSJerin Jacob
909d5ba88cSJerin Jacob.. code-block:: console
919d5ba88cSJerin Jacob
929d5ba88cSJerin Jacob    git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
939d5ba88cSJerin Jacob    cd armv8_pmu_cycle_counter_el0
949d5ba88cSJerin Jacob    make
959d5ba88cSJerin Jacob    sudo insmod pmu_el0_cycle_counter.ko
9689c67ae2SCiara Power
97705e1974SJerin JacobPlease refer to :doc:`../linux_gsg/build_dpdk` for generic details on compiling DPDK with meson.
98705e1974SJerin Jacob
99705e1974SJerin JacobIn order to enable ``PMU`` based ``rte_rdtsc()``, user needs to configure the
100705e1974SJerin Jacobbuild with ``-Dc_args='-DRTE_ARM_EAL_RDTSC_USE_PMU'``.
101705e1974SJerin Jacob
102705e1974SJerin JacobExample:
103705e1974SJerin Jacob
104705e1974SJerin Jacob.. code-block:: console
105705e1974SJerin Jacob
106e24b8ad4SStephen Hemminger   meson setup --cross config/arm/arm64_armv8_linux_gcc -Dc_args='-DRTE_ARM_EAL_RDTSC_USE_PMU' build
1079d5ba88cSJerin Jacob
1089d5ba88cSJerin Jacob.. warning::
1099d5ba88cSJerin Jacob
1109d5ba88cSJerin Jacob   The PMU based scheme is useful for high accuracy performance profiling with
1119d5ba88cSJerin Jacob   ``rte_rdtsc()``. However, this method can not be used in conjunction with
1129d5ba88cSJerin Jacob   Linux userspace profiling tools like ``perf`` as this scheme alters the PMU
1139d5ba88cSJerin Jacob   registers state.
114