15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2014 Intel Corporation. 3fc1f2750SBernard Iremonger 4fc1f2750SBernard IremongerProfile Your Application 5fc1f2750SBernard Iremonger======================== 6fc1f2750SBernard Iremonger 79d5ba88cSJerin JacobThe following sections describe methods of profiling DPDK applications on 89d5ba88cSJerin Jacobdifferent architectures. 99d5ba88cSJerin Jacob 109d5ba88cSJerin Jacob 119d5ba88cSJerin JacobProfiling on x86 129d5ba88cSJerin Jacob---------------- 139d5ba88cSJerin Jacob 14fc1f2750SBernard IremongerIntel processors provide performance counters to monitor events. 155dce9fcdSIlia KurakinSome tools provided by Intel, such as Intel® VTune™ Amplifier, can be used 165dce9fcdSIlia Kurakinto profile and benchmark an application. 1748624fd9SSiobhan ButlerSee the *VTune Performance Analyzer Essentials* publication from Intel Press for more information. 18fc1f2750SBernard Iremonger 1948624fd9SSiobhan ButlerFor a DPDK application, this can be done in a Linux* application environment only. 20fc1f2750SBernard Iremonger 21fc1f2750SBernard IremongerThe main situations that should be monitored through event counters are: 22fc1f2750SBernard Iremonger 23fc1f2750SBernard Iremonger* Cache misses 24fc1f2750SBernard Iremonger 25fc1f2750SBernard Iremonger* Branch mis-predicts 26fc1f2750SBernard Iremonger 27fc1f2750SBernard Iremonger* DTLB misses 28fc1f2750SBernard Iremonger 29fc1f2750SBernard Iremonger* Long latency instructions and exceptions 30fc1f2750SBernard Iremonger 31fc1f2750SBernard IremongerRefer to the 32fc1f2750SBernard Iremonger`Intel Performance Analysis Guide <http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf>`_ 33fc1f2750SBernard Iremongerfor details about application profiling. 349d5ba88cSJerin Jacob 359d5ba88cSJerin Jacob 366a9d1e28SEugeny ParshutinProfiling with VTune 376a9d1e28SEugeny Parshutin~~~~~~~~~~~~~~~~~~~~ 386a9d1e28SEugeny Parshutin 396a9d1e28SEugeny ParshutinTo allow VTune attaching to the DPDK application, reconfigure a DPDK build 406a9d1e28SEugeny Parshutinfolder by passing ``-Dc_args=-DRTE_ETHDEV_PROFILE_WITH_VTUNE`` meson option 416a9d1e28SEugeny Parshutinand recompile the DPDK: 426a9d1e28SEugeny Parshutin 436a9d1e28SEugeny Parshutin.. code-block:: console 446a9d1e28SEugeny Parshutin 45e24b8ad4SStephen Hemminger meson setup build 466a9d1e28SEugeny Parshutin meson configure build -Dc_args=-DRTE_ETHDEV_PROFILE_WITH_VTUNE 476a9d1e28SEugeny Parshutin ninja -C build 486a9d1e28SEugeny Parshutin 496a9d1e28SEugeny Parshutin 509d5ba88cSJerin JacobProfiling on ARM64 519d5ba88cSJerin Jacob------------------ 529d5ba88cSJerin Jacob 539d5ba88cSJerin JacobUsing Linux perf 549d5ba88cSJerin Jacob~~~~~~~~~~~~~~~~ 559d5ba88cSJerin Jacob 569d5ba88cSJerin JacobThe ARM64 architecture provide performance counters to monitor events. The 579d5ba88cSJerin JacobLinux ``perf`` tool can be used to profile and benchmark an application. In 589d5ba88cSJerin Jacobaddition to the standard events, ``perf`` can be used to profile arm64 599d5ba88cSJerin Jacobspecific PMU (Performance Monitor Unit) events through raw events (``-e`` 609d5ba88cSJerin Jacob``-rXX``). 619d5ba88cSJerin Jacob 62*a67ce452SEmi AokiFor more details refer to the 639d5ba88cSJerin Jacob`ARM64 specific PMU events enumeration <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100095_0002_04_en/way1382543438508.html>`_. 649d5ba88cSJerin Jacob 659d5ba88cSJerin Jacob 66975a75c0SDharmik ThakkarLow-resolution generic counter 67975a75c0SDharmik Thakkar~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 689d5ba88cSJerin Jacob 699d5ba88cSJerin JacobThe default ``cntvct_el0`` based ``rte_rdtsc()`` provides a portable means to 70975a75c0SDharmik Thakkarget a wall clock counter in user space. Typically it runs at a lower clock frequency than the CPU clock frequency. 71975a75c0SDharmik ThakkarCycles counted using this method should be scaled to CPU clock frequency. 72975a75c0SDharmik Thakkar 73975a75c0SDharmik Thakkar 74975a75c0SDharmik ThakkarHigh-resolution cycle counter 75975a75c0SDharmik Thakkar~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 769d5ba88cSJerin Jacob 779d5ba88cSJerin JacobThe alternative method to enable ``rte_rdtsc()`` for a high resolution wall 78d629b7b5SJohn McNamaraclock counter is through the ARMv8 PMU subsystem. The PMU cycle counter runs 799d5ba88cSJerin Jacobat CPU frequency. However, access to the PMU cycle counter from user space is 809d5ba88cSJerin Jacobnot enabled by default in the arm64 linux kernel. It is possible to enable 819d5ba88cSJerin Jacobcycle counter for user space access by configuring the PMU from the privileged 829d5ba88cSJerin Jacobmode (kernel space). 839d5ba88cSJerin Jacob 849d5ba88cSJerin JacobBy default the ``rte_rdtsc()`` implementation uses a portable ``cntvct_el0`` 8589c67ae2SCiara Powerscheme. 869d5ba88cSJerin Jacob 879d5ba88cSJerin JacobThe example below shows the steps to configure the PMU based cycle counter on 88d629b7b5SJohn McNamaraan ARMv8 machine. 899d5ba88cSJerin Jacob 909d5ba88cSJerin Jacob.. code-block:: console 919d5ba88cSJerin Jacob 929d5ba88cSJerin Jacob git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0 939d5ba88cSJerin Jacob cd armv8_pmu_cycle_counter_el0 949d5ba88cSJerin Jacob make 959d5ba88cSJerin Jacob sudo insmod pmu_el0_cycle_counter.ko 9689c67ae2SCiara Power 97705e1974SJerin JacobPlease refer to :doc:`../linux_gsg/build_dpdk` for generic details on compiling DPDK with meson. 98705e1974SJerin Jacob 99705e1974SJerin JacobIn order to enable ``PMU`` based ``rte_rdtsc()``, user needs to configure the 100705e1974SJerin Jacobbuild with ``-Dc_args='-DRTE_ARM_EAL_RDTSC_USE_PMU'``. 101705e1974SJerin Jacob 102705e1974SJerin JacobExample: 103705e1974SJerin Jacob 104705e1974SJerin Jacob.. code-block:: console 105705e1974SJerin Jacob 106e24b8ad4SStephen Hemminger meson setup --cross config/arm/arm64_armv8_linux_gcc -Dc_args='-DRTE_ARM_EAL_RDTSC_USE_PMU' build 1079d5ba88cSJerin Jacob 1089d5ba88cSJerin Jacob.. warning:: 1099d5ba88cSJerin Jacob 1109d5ba88cSJerin Jacob The PMU based scheme is useful for high accuracy performance profiling with 1119d5ba88cSJerin Jacob ``rte_rdtsc()``. However, this method can not be used in conjunction with 1129d5ba88cSJerin Jacob Linux userspace profiling tools like ``perf`` as this scheme alters the PMU 1139d5ba88cSJerin Jacob registers state. 114