1*2f2acf4eSJim Harris# Tracing Framework {#tracepoints} 2*2f2acf4eSJim Harris 3*2f2acf4eSJim Harris## Introduction {#tracepoints_intro} 4*2f2acf4eSJim Harris 5*2f2acf4eSJim HarrisSPDK has a tracing framework for capturing low-level event information at runtime. 6*2f2acf4eSJim HarrisTracepoints provide a high-performance tracing mechanism that is accessible at runtime. 7*2f2acf4eSJim HarrisThey are implemented as a circular buffer in shared memory that is accessible from other 8*2f2acf4eSJim Harrisprocesses. SPDK libraries and modules are instrumented with tracepoints to enable analysis of 9*2f2acf4eSJim Harrisboth performance and application crashes. 10*2f2acf4eSJim Harris 11*2f2acf4eSJim Harris## Enabling Tracepoints {#enable_tracepoints} 12*2f2acf4eSJim Harris 13*2f2acf4eSJim HarrisTracepoints are placed in groups. They are enabled and disabled as a group or individually 14*2f2acf4eSJim Harrisinside a group. 15*2f2acf4eSJim Harris 16*2f2acf4eSJim Harris### Enabling Tracepoints in Groups 17*2f2acf4eSJim Harris 18*2f2acf4eSJim HarrisTo enable the instrumentation of all the tracepoints groups in an SPDK target 19*2f2acf4eSJim Harrisapplication, start the target with `-e` parameter set to `all`: 20*2f2acf4eSJim Harris 21*2f2acf4eSJim Harris~~~bash 22*2f2acf4eSJim Harrisbuild/bin/nvmf_tgt -e all 23*2f2acf4eSJim Harris~~~ 24*2f2acf4eSJim Harris 25*2f2acf4eSJim HarrisTo enable the instrumentation of just the `NVMe-oF RDMA` tracepoints in an SPDK target 26*2f2acf4eSJim Harrisapplication, start the target with the `-e` parameter set to `nvmf_rdma`: 27*2f2acf4eSJim Harris 28*2f2acf4eSJim Harris~~~bash 29*2f2acf4eSJim Harrisbuild/bin/nvmf_tgt -e nvmf_rdma 30*2f2acf4eSJim Harris~~~ 31*2f2acf4eSJim Harris 32*2f2acf4eSJim Harris### Enabling Individual Tracepoints 33*2f2acf4eSJim Harris 34*2f2acf4eSJim HarrisTo enable individual tracepoints inside a group: 35*2f2acf4eSJim Harris 36*2f2acf4eSJim Harris~~~bash 37*2f2acf4eSJim Harrisbuild/bin/nvmf_tgt -e nvmf_rdma:B 38*2f2acf4eSJim Harris~~~ 39*2f2acf4eSJim Harris 40*2f2acf4eSJim Harriswhere `:` is a separator and `B` is the tracepoint mask. This will enable only the first, second and fourth (binary: 1011) tracepoint inside `NVMe-oF RDMA` group. 41*2f2acf4eSJim Harris 42*2f2acf4eSJim Harris### Combining Tracepoint Masks 43*2f2acf4eSJim Harris 44*2f2acf4eSJim HarrisIt is also possible to combine enabling whole groups of tpoints and individual ones: 45*2f2acf4eSJim Harris 46*2f2acf4eSJim Harris~~~bash 47*2f2acf4eSJim Harrisbuild/bin/nvmf_tgt -e nvmf_rdma:2,thread 48*2f2acf4eSJim Harris~~~ 49*2f2acf4eSJim Harris 50*2f2acf4eSJim HarrisThis will enable the second tracepoint inside `NVMe-oF RDMA` group (0x10) and all of the tracepoints defined by the `thread` group (0x400). 51*2f2acf4eSJim Harris 52*2f2acf4eSJim Harris### Tracepoint Group Names 53*2f2acf4eSJim Harris 54*2f2acf4eSJim HarrisThe best way to get the current list of group names is from an application's 55*2f2acf4eSJim Harrishelp message. The list of available tracepoint group's for that application 56*2f2acf4eSJim Harriswill be shown next to the `-e` option. 57*2f2acf4eSJim Harris 58*2f2acf4eSJim Harris### Starting the SPDK Target 59*2f2acf4eSJim Harris 60*2f2acf4eSJim HarrisWhen the target starts, a message is logged with the information you need to view 61*2f2acf4eSJim Harristhe tracepoints in a human-readable format using the spdk_trace application. The target 62*2f2acf4eSJim Harriswill also log information about the shared memory file. 63*2f2acf4eSJim Harris 64*2f2acf4eSJim Harris~~~bash 65*2f2acf4eSJim Harrisapp.c: 527:spdk_app_setup_trace: *NOTICE*: Tracepoint Group Mask 0xFFFF specified. 66*2f2acf4eSJim Harrisapp.c: 531:spdk_app_setup_trace: *NOTICE*: Use 'spdk_trace -s nvmf -p 24147' to capture a snapshot of events at runtime. 67*2f2acf4eSJim Harrisapp.c: 533:spdk_app_setup_trace: *NOTICE*: Or copy /dev/shm/nvmf_trace.pid24147 for offline analysis/debug. 68*2f2acf4eSJim Harris~~~ 69*2f2acf4eSJim Harris 70*2f2acf4eSJim HarrisNote that when tracepoints are enabled, the shared memory files are not deleted when the application 71*2f2acf4eSJim Harrisexits. This ensures the file can be used for analysis after the application exits. On Linux, the 72*2f2acf4eSJim Harrisshared memory files are in /dev/shm, and can be deleted manually to free shm space if needed. A system 73*2f2acf4eSJim Harrisreboot will also free all of the /dev/shm files. 74*2f2acf4eSJim Harris 75*2f2acf4eSJim Harris## Capturing a snapshot of events {#capture_tracepoints} 76*2f2acf4eSJim Harris 77*2f2acf4eSJim HarrisSend I/Os to the SPDK target application to generate events. The following is 78*2f2acf4eSJim Harrisan example usage of spdk_nvme_perf to send I/Os to the NVMe-oF target over an RDMA network 79*2f2acf4eSJim Harrisinterface for 10 minutes. 80*2f2acf4eSJim Harris 81*2f2acf4eSJim Harris~~~bash 82*2f2acf4eSJim Harrisspdk_nvme_perf -q 128 -o 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420' 83*2f2acf4eSJim Harris~~~ 84*2f2acf4eSJim Harris 85*2f2acf4eSJim HarrisThe spdk_trace program can be found in the app/trace directory. To analyze the tracepoints on the same 86*2f2acf4eSJim Harrissystem running the NVMe-oF target, simply execute the command line shown in the log: 87*2f2acf4eSJim Harris 88*2f2acf4eSJim Harris~~~bash 89*2f2acf4eSJim Harrisbuild/bin/spdk_trace -s nvmf -p 24147 90*2f2acf4eSJim Harris~~~ 91*2f2acf4eSJim Harris 92*2f2acf4eSJim HarrisNote that you can also call `build/bin/spdk_trace` without any parameters on Linux, and it will 93*2f2acf4eSJim Harrisuse the most recent SPDK trace file generated on that system. 94*2f2acf4eSJim Harris 95*2f2acf4eSJim HarrisTo analyze the tracepoints on a different system, first prepare the tracepoint file for transfer. The 96*2f2acf4eSJim Harristracepoint file can be large, but usually compresses very well. This step can also be used to prepare 97*2f2acf4eSJim Harrisa tracepoint file to attach to a GitHub issue for debugging NVMe-oF application crashes. 98*2f2acf4eSJim Harris 99*2f2acf4eSJim Harris~~~bash 100*2f2acf4eSJim Harrisbzip2 -c /dev/shm/nvmf_trace.pid24147 > /tmp/trace.bz2 101*2f2acf4eSJim Harris~~~ 102*2f2acf4eSJim Harris 103*2f2acf4eSJim HarrisAfter transferring the /tmp/trace.bz2 tracepoint file to a different system: 104*2f2acf4eSJim Harris 105*2f2acf4eSJim Harris~~~bash 106*2f2acf4eSJim Harrisbunzip2 /tmp/trace.bz2 107*2f2acf4eSJim Harrisbuild/bin/spdk_trace -f /tmp/trace 108*2f2acf4eSJim Harris~~~ 109*2f2acf4eSJim Harris 110*2f2acf4eSJim HarrisThe following is sample trace capture showing the cumulative time that each 111*2f2acf4eSJim HarrisI/O spends at each RDMA state. All the trace captures with the same id are for 112*2f2acf4eSJim Harristhe same I/O. 113*2f2acf4eSJim Harris 114*2f2acf4eSJim Harris~~~bash 115*2f2acf4eSJim Harris28: 6026.658 ( 12656064) RDMA_REQ_NEED_BUFFER id: r3622 time: 0.019 116*2f2acf4eSJim Harris28: 6026.694 ( 12656140) RDMA_REQ_RDY_TO_EXECUTE id: r3622 time: 0.055 117*2f2acf4eSJim Harris28: 6026.820 ( 12656406) RDMA_REQ_EXECUTING id: r3622 time: 0.182 118*2f2acf4eSJim Harris28: 6026.992 ( 12656766) RDMA_REQ_EXECUTED id: r3477 time: 228.510 119*2f2acf4eSJim Harris28: 6027.010 ( 12656804) RDMA_REQ_TX_PENDING_C_TO_H id: r3477 time: 228.528 120*2f2acf4eSJim Harris28: 6027.022 ( 12656828) RDMA_REQ_RDY_TO_COMPLETE id: r3477 time: 228.539 121*2f2acf4eSJim Harris28: 6027.115 ( 12657024) RDMA_REQ_COMPLETING id: r3477 time: 228.633 122*2f2acf4eSJim Harris28: 6027.471 ( 12657770) RDMA_REQ_COMPLETED id: r3518 time: 171.577 123*2f2acf4eSJim Harris28: 6028.027 ( 12658940) RDMA_REQ_NEW id: r3623 124*2f2acf4eSJim Harris28: 6028.057 ( 12659002) RDMA_REQ_NEED_BUFFER id: r3623 time: 0.030 125*2f2acf4eSJim Harris28: 6028.095 ( 12659082) RDMA_REQ_RDY_TO_EXECUTE id: r3623 time: 0.068 126*2f2acf4eSJim Harris28: 6028.216 ( 12659336) RDMA_REQ_EXECUTING id: r3623 time: 0.189 127*2f2acf4eSJim Harris28: 6028.408 ( 12659740) RDMA_REQ_EXECUTED id: r3505 time: 190.509 128*2f2acf4eSJim Harris28: 6028.441 ( 12659808) RDMA_REQ_TX_PENDING_C_TO_H id: r3505 time: 190.542 129*2f2acf4eSJim Harris28: 6028.452 ( 12659832) RDMA_REQ_RDY_TO_COMPLETE id: r3505 time: 190.553 130*2f2acf4eSJim Harris28: 6028.536 ( 12660008) RDMA_REQ_COMPLETING id: r3505 time: 190.637 131*2f2acf4eSJim Harris28: 6028.854 ( 12660676) RDMA_REQ_COMPLETED id: r3465 time: 247.000 132*2f2acf4eSJim Harris28: 6029.433 ( 12661892) RDMA_REQ_NEW id: r3624 133*2f2acf4eSJim Harris28: 6029.452 ( 12661932) RDMA_REQ_NEED_BUFFER id: r3624 time: 0.019 134*2f2acf4eSJim Harris28: 6029.482 ( 12661996) RDMA_REQ_RDY_TO_EXECUTE id: r3624 time: 0.050 135*2f2acf4eSJim Harris28: 6029.591 ( 12662224) RDMA_REQ_EXECUTING id: r3624 time: 0.158 136*2f2acf4eSJim Harris28: 6029.782 ( 12662624) RDMA_REQ_EXECUTED id: r3564 time: 96.937 137*2f2acf4eSJim Harris28: 6029.798 ( 12662658) RDMA_REQ_TX_PENDING_C_TO_H id: r3564 time: 96.953 138*2f2acf4eSJim Harris28: 6029.812 ( 12662688) RDMA_REQ_RDY_TO_COMPLETE id: r3564 time: 96.967 139*2f2acf4eSJim Harris28: 6029.899 ( 12662870) RDMA_REQ_COMPLETING id: r3564 time: 97.054 140*2f2acf4eSJim Harris28: 6030.262 ( 12663634) RDMA_REQ_COMPLETED id: r3477 time: 231.780 141*2f2acf4eSJim Harris28: 6030.786 ( 12664734) RDMA_REQ_NEW id: r3625 142*2f2acf4eSJim Harris28: 6030.804 ( 12664772) RDMA_REQ_NEED_BUFFER id: r3625 time: 0.018 143*2f2acf4eSJim Harris28: 6030.841 ( 12664848) RDMA_REQ_RDY_TO_EXECUTE id: r3625 time: 0.054 144*2f2acf4eSJim Harris28: 6030.963 ( 12665104) RDMA_REQ_EXECUTING id: r3625 time: 0.176 145*2f2acf4eSJim Harris28: 6031.139 ( 12665474) RDMA_REQ_EXECUTED id: r3552 time: 114.906 146*2f2acf4eSJim Harris28: 6031.196 ( 12665594) RDMA_REQ_TX_PENDING_C_TO_H id: r3552 time: 114.963 147*2f2acf4eSJim Harris28: 6031.210 ( 12665624) RDMA_REQ_RDY_TO_COMPLETE id: r3552 time: 114.977 148*2f2acf4eSJim Harris28: 6031.293 ( 12665798) RDMA_REQ_COMPLETING id: r3552 time: 115.060 149*2f2acf4eSJim Harris28: 6031.633 ( 12666512) RDMA_REQ_COMPLETED id: r3505 time: 193.734 150*2f2acf4eSJim Harris28: 6032.230 ( 12667766) RDMA_REQ_NEW id: r3626 151*2f2acf4eSJim Harris28: 6032.248 ( 12667804) RDMA_REQ_NEED_BUFFER id: r3626 time: 0.018 152*2f2acf4eSJim Harris28: 6032.288 ( 12667888) RDMA_REQ_RDY_TO_EXECUTE id: r3626 time: 0.058 153*2f2acf4eSJim Harris28: 6032.396 ( 12668114) RDMA_REQ_EXECUTING id: r3626 time: 0.166 154*2f2acf4eSJim Harris28: 6032.593 ( 12668528) RDMA_REQ_EXECUTED id: r3570 time: 90.443 155*2f2acf4eSJim Harris28: 6032.611 ( 12668564) RDMA_REQ_TX_PENDING_C_TO_H id: r3570 time: 90.460 156*2f2acf4eSJim Harris28: 6032.623 ( 12668590) RDMA_REQ_RDY_TO_COMPLETE id: r3570 time: 90.473 157*2f2acf4eSJim Harris28: 6032.707 ( 12668766) RDMA_REQ_COMPLETING id: r3570 time: 90.557 158*2f2acf4eSJim Harris28: 6033.056 ( 12669500) RDMA_REQ_COMPLETED id: r3564 time: 100.211 159*2f2acf4eSJim Harris~~~ 160*2f2acf4eSJim Harris 161*2f2acf4eSJim Harris## Capturing sufficient trace events {#capture_trace_events} 162*2f2acf4eSJim Harris 163*2f2acf4eSJim HarrisSince the tracepoint file generated directly by SPDK application is a circular buffer in shared memory, 164*2f2acf4eSJim Harristhe trace events captured by it may be insufficient for further analysis. 165*2f2acf4eSJim HarrisThe spdk_trace_record program can be found in the app/trace_record directory. 166*2f2acf4eSJim Harrisspdk_trace_record is used to poll the spdk tracepoint shared memory, record new entries from it, 167*2f2acf4eSJim Harrisand store all entries into specified output file at its shutdown on SIGINT or SIGTERM. 168*2f2acf4eSJim HarrisAfter SPDK nvmf target is launched, simply execute the command line shown in the log: 169*2f2acf4eSJim Harris 170*2f2acf4eSJim Harris~~~bash 171*2f2acf4eSJim Harrisbuild/bin/spdk_trace_record -q -s nvmf -p 24147 -f /tmp/spdk_nvmf_record.trace 172*2f2acf4eSJim Harris~~~ 173*2f2acf4eSJim Harris 174*2f2acf4eSJim HarrisAlso send I/Os to the SPDK target application to generate events by previous perf example for 10 minutes. 175*2f2acf4eSJim Harris 176*2f2acf4eSJim Harris~~~bash 177*2f2acf4eSJim Harrisspdk_nvme_perf -q 128 -o 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420' 178*2f2acf4eSJim Harris~~~ 179*2f2acf4eSJim Harris 180*2f2acf4eSJim HarrisAfter the completion of perf example, shut down spdk_trace_record by signal SIGINT (Ctrl + C). 181*2f2acf4eSJim HarrisTo analyze the tracepoints output file from spdk_trace_record, simply run spdk_trace program by: 182*2f2acf4eSJim Harris 183*2f2acf4eSJim Harris~~~bash 184*2f2acf4eSJim Harrisbuild/bin/spdk_trace -f /tmp/spdk_nvmf_record.trace 185*2f2acf4eSJim Harris~~~ 186*2f2acf4eSJim Harris 187*2f2acf4eSJim Harris## Adding New Tracepoints {#add_tracepoints} 188*2f2acf4eSJim Harris 189*2f2acf4eSJim HarrisSPDK applications and libraries provide several trace points. You can add new 190*2f2acf4eSJim Harristracepoints to the existing trace groups. For example, to add a new tracepoints 191*2f2acf4eSJim Harristo the SPDK RDMA library (lib/nvmf/rdma.c) trace group TRACE_GROUP_NVMF_RDMA, 192*2f2acf4eSJim Harrisdefine the tracepoints and assigning them a unique ID using the SPDK_TPOINT_ID macro: 193*2f2acf4eSJim Harris 194*2f2acf4eSJim Harris~~~c 195*2f2acf4eSJim Harris#define TRACE_GROUP_NVMF_RDMA 0x4 196*2f2acf4eSJim Harris#define TRACE_RDMA_REQUEST_STATE_NEW SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, 0x0) 197*2f2acf4eSJim Harris#define TRACE_RDMA_REQUEST_STATE_NEED_BUFFER SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, 0x1) 198*2f2acf4eSJim Harris... 199*2f2acf4eSJim Harris#define NEW_TRACE_POINT_NAME SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, UNIQUE_ID) 200*2f2acf4eSJim Harris~~~ 201*2f2acf4eSJim Harris 202*2f2acf4eSJim HarrisYou also need to register the new trace points in the SPDK_TRACE_REGISTER_FN macro call 203*2f2acf4eSJim Harriswithin the application/library using the spdk_trace_register_description function 204*2f2acf4eSJim Harrisas shown below: 205*2f2acf4eSJim Harris 206*2f2acf4eSJim Harris~~~c 207*2f2acf4eSJim Harrisstatic void 208*2f2acf4eSJim Harrisnvmf_trace(void) 209*2f2acf4eSJim Harris{ 210*2f2acf4eSJim Harris spdk_trace_register_object(OBJECT_NVMF_RDMA_IO, 'r'); 211*2f2acf4eSJim Harris spdk_trace_register_description("RDMA_REQ_NEW", TRACE_RDMA_REQUEST_STATE_NEW, 212*2f2acf4eSJim Harris OWNER_TYPE_NONE, OBJECT_NVMF_RDMA_IO, 1, 213*2f2acf4eSJim Harris SPDK_TRACE_ARG_TYPE_PTR, "qpair"); 214*2f2acf4eSJim Harris spdk_trace_register_description("RDMA_REQ_NEED_BUFFER", TRACE_RDMA_REQUEST_STATE_NEED_BUFFER, 215*2f2acf4eSJim Harris OWNER_TYPE_NONE, OBJECT_NVMF_RDMA_IO, 0, 216*2f2acf4eSJim Harris SPDK_TRACE_ARG_TYPE_PTR, "qpair"); 217*2f2acf4eSJim Harris ... 218*2f2acf4eSJim Harris} 219*2f2acf4eSJim HarrisSPDK_TRACE_REGISTER_FN(nvmf_trace, "nvmf_rdma", TRACE_GROUP_NVMF_RDMA) 220*2f2acf4eSJim Harris~~~ 221*2f2acf4eSJim Harris 222*2f2acf4eSJim HarrisFinally, use the spdk_trace_record function at the appropriate point in the 223*2f2acf4eSJim Harrisapplication/library to record the current trace state for the new trace points. 224*2f2acf4eSJim HarrisThe following example shows the usage of the spdk_trace_record function to 225*2f2acf4eSJim Harrisrecord the current trace state of several tracepoints. 226*2f2acf4eSJim Harris 227*2f2acf4eSJim Harris~~~c 228*2f2acf4eSJim Harris case RDMA_REQUEST_STATE_NEW: 229*2f2acf4eSJim Harris spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEW, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair); 230*2f2acf4eSJim Harris ... 231*2f2acf4eSJim Harris break; 232*2f2acf4eSJim Harris case RDMA_REQUEST_STATE_NEED_BUFFER: 233*2f2acf4eSJim Harris spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEED_BUFFER, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair); 234*2f2acf4eSJim Harris ... 235*2f2acf4eSJim Harris break; 236*2f2acf4eSJim Harris case RDMA_REQUEST_STATE_DATA_TRANSFER_TO_CONTROLLER_PENDING: 237*2f2acf4eSJim Harris spdk_trace_record(RDMA_REQUEST_STATE_DATA_TRANSFER_TO_CONTROLLER_PENDING, 0, 0, 238*2f2acf4eSJim Harris (uintptr_t)rdma_req, (uintptr_t)rqpair); 239*2f2acf4eSJim Harris ... 240*2f2acf4eSJim Harris~~~ 241*2f2acf4eSJim Harris 242*2f2acf4eSJim HarrisAll the tracing functions are documented in the [Tracepoint library documentation](https://spdk.io/doc/trace_8h.html) 243