1*a9fa9459Szrj\input texinfo @c -*-texinfo-*- 2*a9fa9459Szrj@setfilename gprof.info 3*a9fa9459Szrj@c Copyright (C) 1988-2016 Free Software Foundation, Inc. 4*a9fa9459Szrj@settitle GNU gprof 5*a9fa9459Szrj@setchapternewpage odd 6*a9fa9459Szrj 7*a9fa9459Szrj@c man begin INCLUDE 8*a9fa9459Szrj@include bfdver.texi 9*a9fa9459Szrj@c man end 10*a9fa9459Szrj 11*a9fa9459Szrj@ifnottex 12*a9fa9459Szrj@c This is a dir.info fragment to support semi-automated addition of 13*a9fa9459Szrj@c manuals to an info tree. zoo@cygnus.com is developing this facility. 14*a9fa9459Szrj@dircategory Software development 15*a9fa9459Szrj@direntry 16*a9fa9459Szrj* gprof: (gprof). Profiling your program's execution 17*a9fa9459Szrj@end direntry 18*a9fa9459Szrj@end ifnottex 19*a9fa9459Szrj 20*a9fa9459Szrj@copying 21*a9fa9459SzrjThis file documents the gprof profiler of the GNU system. 22*a9fa9459Szrj 23*a9fa9459Szrj@c man begin COPYRIGHT 24*a9fa9459SzrjCopyright @copyright{} 1988-2016 Free Software Foundation, Inc. 25*a9fa9459Szrj 26*a9fa9459SzrjPermission is granted to copy, distribute and/or modify this document 27*a9fa9459Szrjunder the terms of the GNU Free Documentation License, Version 1.3 28*a9fa9459Szrjor any later version published by the Free Software Foundation; 29*a9fa9459Szrjwith no Invariant Sections, with no Front-Cover Texts, and with no 30*a9fa9459SzrjBack-Cover Texts. A copy of the license is included in the 31*a9fa9459Szrjsection entitled ``GNU Free Documentation License''. 32*a9fa9459Szrj 33*a9fa9459Szrj@c man end 34*a9fa9459Szrj@end copying 35*a9fa9459Szrj 36*a9fa9459Szrj@finalout 37*a9fa9459Szrj@smallbook 38*a9fa9459Szrj 39*a9fa9459Szrj@titlepage 40*a9fa9459Szrj@title GNU gprof 41*a9fa9459Szrj@subtitle The @sc{gnu} Profiler 42*a9fa9459Szrj@ifset VERSION_PACKAGE 43*a9fa9459Szrj@subtitle @value{VERSION_PACKAGE} 44*a9fa9459Szrj@end ifset 45*a9fa9459Szrj@subtitle Version @value{VERSION} 46*a9fa9459Szrj@author Jay Fenlason and Richard Stallman 47*a9fa9459Szrj 48*a9fa9459Szrj@page 49*a9fa9459Szrj 50*a9fa9459SzrjThis manual describes the @sc{gnu} profiler, @code{gprof}, and how you 51*a9fa9459Szrjcan use it to determine which parts of a program are taking most of the 52*a9fa9459Szrjexecution time. We assume that you know how to write, compile, and 53*a9fa9459Szrjexecute programs. @sc{gnu} @code{gprof} was written by Jay Fenlason. 54*a9fa9459SzrjEric S. Raymond made some minor corrections and additions in 2003. 55*a9fa9459Szrj 56*a9fa9459Szrj@vskip 0pt plus 1filll 57*a9fa9459SzrjCopyright @copyright{} 1988-2016 Free Software Foundation, Inc. 58*a9fa9459Szrj 59*a9fa9459Szrj Permission is granted to copy, distribute and/or modify this document 60*a9fa9459Szrj under the terms of the GNU Free Documentation License, Version 1.3 61*a9fa9459Szrj or any later version published by the Free Software Foundation; 62*a9fa9459Szrj with no Invariant Sections, with no Front-Cover Texts, and with no 63*a9fa9459Szrj Back-Cover Texts. A copy of the license is included in the 64*a9fa9459Szrj section entitled ``GNU Free Documentation License''. 65*a9fa9459Szrj 66*a9fa9459Szrj@end titlepage 67*a9fa9459Szrj@contents 68*a9fa9459Szrj 69*a9fa9459Szrj@ifnottex 70*a9fa9459Szrj@node Top 71*a9fa9459Szrj@top Profiling a Program: Where Does It Spend Its Time? 72*a9fa9459Szrj 73*a9fa9459SzrjThis manual describes the @sc{gnu} profiler, @code{gprof}, and how you 74*a9fa9459Szrjcan use it to determine which parts of a program are taking most of the 75*a9fa9459Szrjexecution time. We assume that you know how to write, compile, and 76*a9fa9459Szrjexecute programs. @sc{gnu} @code{gprof} was written by Jay Fenlason. 77*a9fa9459Szrj 78*a9fa9459SzrjThis manual is for @code{gprof} 79*a9fa9459Szrj@ifset VERSION_PACKAGE 80*a9fa9459Szrj@value{VERSION_PACKAGE} 81*a9fa9459Szrj@end ifset 82*a9fa9459Szrjversion @value{VERSION}. 83*a9fa9459Szrj 84*a9fa9459SzrjThis document is distributed under the terms of the GNU Free 85*a9fa9459SzrjDocumentation License version 1.3. A copy of the license is included 86*a9fa9459Szrjin the section entitled ``GNU Free Documentation License''. 87*a9fa9459Szrj 88*a9fa9459Szrj@menu 89*a9fa9459Szrj* Introduction:: What profiling means, and why it is useful. 90*a9fa9459Szrj 91*a9fa9459Szrj* Compiling:: How to compile your program for profiling. 92*a9fa9459Szrj* Executing:: Executing your program to generate profile data 93*a9fa9459Szrj* Invoking:: How to run @code{gprof}, and its options 94*a9fa9459Szrj 95*a9fa9459Szrj* Output:: Interpreting @code{gprof}'s output 96*a9fa9459Szrj 97*a9fa9459Szrj* Inaccuracy:: Potential problems you should be aware of 98*a9fa9459Szrj* How do I?:: Answers to common questions 99*a9fa9459Szrj* Incompatibilities:: (between @sc{gnu} @code{gprof} and Unix @code{gprof}.) 100*a9fa9459Szrj* Details:: Details of how profiling is done 101*a9fa9459Szrj* GNU Free Documentation License:: GNU Free Documentation License 102*a9fa9459Szrj@end menu 103*a9fa9459Szrj@end ifnottex 104*a9fa9459Szrj 105*a9fa9459Szrj@node Introduction 106*a9fa9459Szrj@chapter Introduction to Profiling 107*a9fa9459Szrj 108*a9fa9459Szrj@ifset man 109*a9fa9459Szrj@c man title gprof display call graph profile data 110*a9fa9459Szrj 111*a9fa9459Szrj@smallexample 112*a9fa9459Szrj@c man begin SYNOPSIS 113*a9fa9459Szrjgprof [ -[abcDhilLrsTvwxyz] ] [ -[ACeEfFJnNOpPqQZ][@var{name}] ] 114*a9fa9459Szrj [ -I @var{dirs} ] [ -d[@var{num}] ] [ -k @var{from/to} ] 115*a9fa9459Szrj [ -m @var{min-count} ] [ -R @var{map_file} ] [ -t @var{table-length} ] 116*a9fa9459Szrj [ --[no-]annotated-source[=@var{name}] ] 117*a9fa9459Szrj [ --[no-]exec-counts[=@var{name}] ] 118*a9fa9459Szrj [ --[no-]flat-profile[=@var{name}] ] [ --[no-]graph[=@var{name}] ] 119*a9fa9459Szrj [ --[no-]time=@var{name}] [ --all-lines ] [ --brief ] 120*a9fa9459Szrj [ --debug[=@var{level}] ] [ --function-ordering ] 121*a9fa9459Szrj [ --file-ordering @var{map_file} ] [ --directory-path=@var{dirs} ] 122*a9fa9459Szrj [ --display-unused-functions ] [ --file-format=@var{name} ] 123*a9fa9459Szrj [ --file-info ] [ --help ] [ --line ] [ --inline-file-names ] 124*a9fa9459Szrj [ --min-count=@var{n} ] [ --no-static ] [ --print-path ] 125*a9fa9459Szrj [ --separate-files ] [ --static-call-graph ] [ --sum ] 126*a9fa9459Szrj [ --table-length=@var{len} ] [ --traditional ] [ --version ] 127*a9fa9459Szrj [ --width=@var{n} ] [ --ignore-non-functions ] 128*a9fa9459Szrj [ --demangle[=@var{STYLE}] ] [ --no-demangle ] 129*a9fa9459Szrj [--external-symbol-table=name] 130*a9fa9459Szrj [ @var{image-file} ] [ @var{profile-file} @dots{} ] 131*a9fa9459Szrj@c man end 132*a9fa9459Szrj@end smallexample 133*a9fa9459Szrj 134*a9fa9459Szrj@c man begin DESCRIPTION 135*a9fa9459Szrj@code{gprof} produces an execution profile of C, Pascal, or Fortran77 136*a9fa9459Szrjprograms. The effect of called routines is incorporated in the profile 137*a9fa9459Szrjof each caller. The profile data is taken from the call graph profile file 138*a9fa9459Szrj(@file{gmon.out} default) which is created by programs 139*a9fa9459Szrjthat are compiled with the @samp{-pg} option of 140*a9fa9459Szrj@code{cc}, @code{pc}, and @code{f77}. 141*a9fa9459SzrjThe @samp{-pg} option also links in versions of the library routines 142*a9fa9459Szrjthat are compiled for profiling. @code{Gprof} reads the given object 143*a9fa9459Szrjfile (the default is @code{a.out}) and establishes the relation between 144*a9fa9459Szrjits symbol table and the call graph profile from @file{gmon.out}. 145*a9fa9459SzrjIf more than one profile file is specified, the @code{gprof} 146*a9fa9459Szrjoutput shows the sum of the profile information in the given profile files. 147*a9fa9459Szrj 148*a9fa9459Szrj@code{Gprof} calculates the amount of time spent in each routine. 149*a9fa9459SzrjNext, these times are propagated along the edges of the call graph. 150*a9fa9459SzrjCycles are discovered, and calls into a cycle are made to share the time 151*a9fa9459Szrjof the cycle. 152*a9fa9459Szrj 153*a9fa9459Szrj@c man end 154*a9fa9459Szrj 155*a9fa9459Szrj@c man begin BUGS 156*a9fa9459SzrjThe granularity of the sampling is shown, but remains 157*a9fa9459Szrjstatistical at best. 158*a9fa9459SzrjWe assume that the time for each execution of a function 159*a9fa9459Szrjcan be expressed by the total time for the function divided 160*a9fa9459Szrjby the number of times the function is called. 161*a9fa9459SzrjThus the time propagated along the call graph arcs to the function's 162*a9fa9459Szrjparents is directly proportional to the number of times that 163*a9fa9459Szrjarc is traversed. 164*a9fa9459Szrj 165*a9fa9459SzrjParents that are not themselves profiled will have the time of 166*a9fa9459Szrjtheir profiled children propagated to them, but they will appear 167*a9fa9459Szrjto be spontaneously invoked in the call graph listing, and will 168*a9fa9459Szrjnot have their time propagated further. 169*a9fa9459SzrjSimilarly, signal catchers, even though profiled, will appear 170*a9fa9459Szrjto be spontaneous (although for more obscure reasons). 171*a9fa9459SzrjAny profiled children of signal catchers should have their times 172*a9fa9459Szrjpropagated properly, unless the signal catcher was invoked during 173*a9fa9459Szrjthe execution of the profiling routine, in which case all is lost. 174*a9fa9459Szrj 175*a9fa9459SzrjThe profiled program must call @code{exit}(2) 176*a9fa9459Szrjor return normally for the profiling information to be saved 177*a9fa9459Szrjin the @file{gmon.out} file. 178*a9fa9459Szrj@c man end 179*a9fa9459Szrj 180*a9fa9459Szrj@c man begin FILES 181*a9fa9459Szrj@table @code 182*a9fa9459Szrj@item @file{a.out} 183*a9fa9459Szrjthe namelist and text space. 184*a9fa9459Szrj@item @file{gmon.out} 185*a9fa9459Szrjdynamic call graph and profile. 186*a9fa9459Szrj@item @file{gmon.sum} 187*a9fa9459Szrjsummarized dynamic call graph and profile. 188*a9fa9459Szrj@end table 189*a9fa9459Szrj@c man end 190*a9fa9459Szrj 191*a9fa9459Szrj@c man begin SEEALSO 192*a9fa9459Szrjmonitor(3), profil(2), cc(1), prof(1), and the Info entry for @file{gprof}. 193*a9fa9459Szrj 194*a9fa9459Szrj``An Execution Profiler for Modular Programs'', 195*a9fa9459Szrjby S. Graham, P. Kessler, M. McKusick; 196*a9fa9459SzrjSoftware - Practice and Experience, 197*a9fa9459SzrjVol. 13, pp. 671-685, 1983. 198*a9fa9459Szrj 199*a9fa9459Szrj``gprof: A Call Graph Execution Profiler'', 200*a9fa9459Szrjby S. Graham, P. Kessler, M. McKusick; 201*a9fa9459SzrjProceedings of the SIGPLAN '82 Symposium on Compiler Construction, 202*a9fa9459SzrjSIGPLAN Notices, Vol. 17, No 6, pp. 120-126, June 1982. 203*a9fa9459Szrj@c man end 204*a9fa9459Szrj@end ifset 205*a9fa9459Szrj 206*a9fa9459SzrjProfiling allows you to learn where your program spent its time and which 207*a9fa9459Szrjfunctions called which other functions while it was executing. This 208*a9fa9459Szrjinformation can show you which pieces of your program are slower than you 209*a9fa9459Szrjexpected, and might be candidates for rewriting to make your program 210*a9fa9459Szrjexecute faster. It can also tell you which functions are being called more 211*a9fa9459Szrjor less often than you expected. This may help you spot bugs that had 212*a9fa9459Szrjotherwise been unnoticed. 213*a9fa9459Szrj 214*a9fa9459SzrjSince the profiler uses information collected during the actual execution 215*a9fa9459Szrjof your program, it can be used on programs that are too large or too 216*a9fa9459Szrjcomplex to analyze by reading the source. However, how your program is run 217*a9fa9459Szrjwill affect the information that shows up in the profile data. If you 218*a9fa9459Szrjdon't use some feature of your program while it is being profiled, no 219*a9fa9459Szrjprofile information will be generated for that feature. 220*a9fa9459Szrj 221*a9fa9459SzrjProfiling has several steps: 222*a9fa9459Szrj 223*a9fa9459Szrj@itemize @bullet 224*a9fa9459Szrj@item 225*a9fa9459SzrjYou must compile and link your program with profiling enabled. 226*a9fa9459Szrj@xref{Compiling, ,Compiling a Program for Profiling}. 227*a9fa9459Szrj 228*a9fa9459Szrj@item 229*a9fa9459SzrjYou must execute your program to generate a profile data file. 230*a9fa9459Szrj@xref{Executing, ,Executing the Program}. 231*a9fa9459Szrj 232*a9fa9459Szrj@item 233*a9fa9459SzrjYou must run @code{gprof} to analyze the profile data. 234*a9fa9459Szrj@xref{Invoking, ,@code{gprof} Command Summary}. 235*a9fa9459Szrj@end itemize 236*a9fa9459Szrj 237*a9fa9459SzrjThe next three chapters explain these steps in greater detail. 238*a9fa9459Szrj 239*a9fa9459Szrj@c man begin DESCRIPTION 240*a9fa9459Szrj 241*a9fa9459SzrjSeveral forms of output are available from the analysis. 242*a9fa9459Szrj 243*a9fa9459SzrjThe @dfn{flat profile} shows how much time your program spent in each function, 244*a9fa9459Szrjand how many times that function was called. If you simply want to know 245*a9fa9459Szrjwhich functions burn most of the cycles, it is stated concisely here. 246*a9fa9459Szrj@xref{Flat Profile, ,The Flat Profile}. 247*a9fa9459Szrj 248*a9fa9459SzrjThe @dfn{call graph} shows, for each function, which functions called it, which 249*a9fa9459Szrjother functions it called, and how many times. There is also an estimate 250*a9fa9459Szrjof how much time was spent in the subroutines of each function. This can 251*a9fa9459Szrjsuggest places where you might try to eliminate function calls that use a 252*a9fa9459Szrjlot of time. @xref{Call Graph, ,The Call Graph}. 253*a9fa9459Szrj 254*a9fa9459SzrjThe @dfn{annotated source} listing is a copy of the program's 255*a9fa9459Szrjsource code, labeled with the number of times each line of the 256*a9fa9459Szrjprogram was executed. @xref{Annotated Source, ,The Annotated Source 257*a9fa9459SzrjListing}. 258*a9fa9459Szrj@c man end 259*a9fa9459Szrj 260*a9fa9459SzrjTo better understand how profiling works, you may wish to read 261*a9fa9459Szrja description of its implementation. 262*a9fa9459Szrj@xref{Implementation, ,Implementation of Profiling}. 263*a9fa9459Szrj 264*a9fa9459Szrj@node Compiling 265*a9fa9459Szrj@chapter Compiling a Program for Profiling 266*a9fa9459Szrj 267*a9fa9459SzrjThe first step in generating profile information for your program is 268*a9fa9459Szrjto compile and link it with profiling enabled. 269*a9fa9459Szrj 270*a9fa9459SzrjTo compile a source file for profiling, specify the @samp{-pg} option when 271*a9fa9459Szrjyou run the compiler. (This is in addition to the options you normally 272*a9fa9459Szrjuse.) 273*a9fa9459Szrj 274*a9fa9459SzrjTo link the program for profiling, if you use a compiler such as @code{cc} 275*a9fa9459Szrjto do the linking, simply specify @samp{-pg} in addition to your usual 276*a9fa9459Szrjoptions. The same option, @samp{-pg}, alters either compilation or linking 277*a9fa9459Szrjto do what is necessary for profiling. Here are examples: 278*a9fa9459Szrj 279*a9fa9459Szrj@example 280*a9fa9459Szrjcc -g -c myprog.c utils.c -pg 281*a9fa9459Szrjcc -o myprog myprog.o utils.o -pg 282*a9fa9459Szrj@end example 283*a9fa9459Szrj 284*a9fa9459SzrjThe @samp{-pg} option also works with a command that both compiles and links: 285*a9fa9459Szrj 286*a9fa9459Szrj@example 287*a9fa9459Szrjcc -o myprog myprog.c utils.c -g -pg 288*a9fa9459Szrj@end example 289*a9fa9459Szrj 290*a9fa9459SzrjNote: The @samp{-pg} option must be part of your compilation options 291*a9fa9459Szrjas well as your link options. If it is not then no call-graph data 292*a9fa9459Szrjwill be gathered and when you run @code{gprof} you will get an error 293*a9fa9459Szrjmessage like this: 294*a9fa9459Szrj 295*a9fa9459Szrj@example 296*a9fa9459Szrjgprof: gmon.out file is missing call-graph data 297*a9fa9459Szrj@end example 298*a9fa9459Szrj 299*a9fa9459SzrjIf you add the @samp{-Q} switch to suppress the printing of the call 300*a9fa9459Szrjgraph data you will still be able to see the time samples: 301*a9fa9459Szrj 302*a9fa9459Szrj@example 303*a9fa9459SzrjFlat profile: 304*a9fa9459Szrj 305*a9fa9459SzrjEach sample counts as 0.01 seconds. 306*a9fa9459Szrj % cumulative self self total 307*a9fa9459Szrj time seconds seconds calls Ts/call Ts/call name 308*a9fa9459Szrj 44.12 0.07 0.07 zazLoop 309*a9fa9459Szrj 35.29 0.14 0.06 main 310*a9fa9459Szrj 20.59 0.17 0.04 bazMillion 311*a9fa9459Szrj@end example 312*a9fa9459Szrj 313*a9fa9459SzrjIf you run the linker @code{ld} directly instead of through a compiler 314*a9fa9459Szrjsuch as @code{cc}, you may have to specify a profiling startup file 315*a9fa9459Szrj@file{gcrt0.o} as the first input file instead of the usual startup 316*a9fa9459Szrjfile @file{crt0.o}. In addition, you would probably want to 317*a9fa9459Szrjspecify the profiling C library, @file{libc_p.a}, by writing 318*a9fa9459Szrj@samp{-lc_p} instead of the usual @samp{-lc}. This is not absolutely 319*a9fa9459Szrjnecessary, but doing this gives you number-of-calls information for 320*a9fa9459Szrjstandard library functions such as @code{read} and @code{open}. For 321*a9fa9459Szrjexample: 322*a9fa9459Szrj 323*a9fa9459Szrj@example 324*a9fa9459Szrjld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p 325*a9fa9459Szrj@end example 326*a9fa9459Szrj 327*a9fa9459SzrjIf you are running the program on a system which supports shared 328*a9fa9459Szrjlibraries you may run into problems with the profiling support code in 329*a9fa9459Szrja shared library being called before that library has been fully 330*a9fa9459Szrjinitialised. This is usually detected by the program encountering a 331*a9fa9459Szrjsegmentation fault as soon as it is run. The solution is to link 332*a9fa9459Szrjagainst a static version of the library containing the profiling 333*a9fa9459Szrjsupport code, which for @code{gcc} users can be done via the 334*a9fa9459Szrj@samp{-static} or @samp{-static-libgcc} command line option. For 335*a9fa9459Szrjexample: 336*a9fa9459Szrj 337*a9fa9459Szrj@example 338*a9fa9459Szrjgcc -g -pg -static-libgcc myprog.c utils.c -o myprog 339*a9fa9459Szrj@end example 340*a9fa9459Szrj 341*a9fa9459SzrjIf you compile only some of the modules of the program with @samp{-pg}, you 342*a9fa9459Szrjcan still profile the program, but you won't get complete information about 343*a9fa9459Szrjthe modules that were compiled without @samp{-pg}. The only information 344*a9fa9459Szrjyou get for the functions in those modules is the total time spent in them; 345*a9fa9459Szrjthere is no record of how many times they were called, or from where. This 346*a9fa9459Szrjwill not affect the flat profile (except that the @code{calls} field for 347*a9fa9459Szrjthe functions will be blank), but will greatly reduce the usefulness of the 348*a9fa9459Szrjcall graph. 349*a9fa9459Szrj 350*a9fa9459SzrjIf you wish to perform line-by-line profiling you should use the 351*a9fa9459Szrj@code{gcov} tool instead of @code{gprof}. See that tool's manual or 352*a9fa9459Szrjinfo pages for more details of how to do this. 353*a9fa9459Szrj 354*a9fa9459SzrjNote, older versions of @code{gcc} produce line-by-line profiling 355*a9fa9459Szrjinformation that works with @code{gprof} rather than @code{gcov} so 356*a9fa9459Szrjthere is still support for displaying this kind of information in 357*a9fa9459Szrj@code{gprof}. @xref{Line-by-line, ,Line-by-line Profiling}. 358*a9fa9459Szrj 359*a9fa9459SzrjIt also worth noting that @code{gcc} implements a 360*a9fa9459Szrj@samp{-finstrument-functions} command line option which will insert 361*a9fa9459Szrjcalls to special user supplied instrumentation routines at the entry 362*a9fa9459Szrjand exit of every function in their program. This can be used to 363*a9fa9459Szrjimplement an alternative profiling scheme. 364*a9fa9459Szrj 365*a9fa9459Szrj@node Executing 366*a9fa9459Szrj@chapter Executing the Program 367*a9fa9459Szrj 368*a9fa9459SzrjOnce the program is compiled for profiling, you must run it in order to 369*a9fa9459Szrjgenerate the information that @code{gprof} needs. Simply run the program 370*a9fa9459Szrjas usual, using the normal arguments, file names, etc. The program should 371*a9fa9459Szrjrun normally, producing the same output as usual. It will, however, run 372*a9fa9459Szrjsomewhat slower than normal because of the time spent collecting and 373*a9fa9459Szrjwriting the profile data. 374*a9fa9459Szrj 375*a9fa9459SzrjThe way you run the program---the arguments and input that you give 376*a9fa9459Szrjit---may have a dramatic effect on what the profile information shows. The 377*a9fa9459Szrjprofile data will describe the parts of the program that were activated for 378*a9fa9459Szrjthe particular input you use. For example, if the first command you give 379*a9fa9459Szrjto your program is to quit, the profile data will show the time used in 380*a9fa9459Szrjinitialization and in cleanup, but not much else. 381*a9fa9459Szrj 382*a9fa9459SzrjYour program will write the profile data into a file called @file{gmon.out} 383*a9fa9459Szrjjust before exiting. If there is already a file called @file{gmon.out}, 384*a9fa9459Szrjits contents are overwritten. There is currently no way to tell the 385*a9fa9459Szrjprogram to write the profile data under a different name, but you can rename 386*a9fa9459Szrjthe file afterwards if you are concerned that it may be overwritten. 387*a9fa9459Szrj 388*a9fa9459SzrjIn order to write the @file{gmon.out} file properly, your program must exit 389*a9fa9459Szrjnormally: by returning from @code{main} or by calling @code{exit}. Calling 390*a9fa9459Szrjthe low-level function @code{_exit} does not write the profile data, and 391*a9fa9459Szrjneither does abnormal termination due to an unhandled signal. 392*a9fa9459Szrj 393*a9fa9459SzrjThe @file{gmon.out} file is written in the program's @emph{current working 394*a9fa9459Szrjdirectory} at the time it exits. This means that if your program calls 395*a9fa9459Szrj@code{chdir}, the @file{gmon.out} file will be left in the last directory 396*a9fa9459Szrjyour program @code{chdir}'d to. If you don't have permission to write in 397*a9fa9459Szrjthis directory, the file is not written, and you will get an error message. 398*a9fa9459Szrj 399*a9fa9459SzrjOlder versions of the @sc{gnu} profiling library may also write a file 400*a9fa9459Szrjcalled @file{bb.out}. This file, if present, contains an human-readable 401*a9fa9459Szrjlisting of the basic-block execution counts. Unfortunately, the 402*a9fa9459Szrjappearance of a human-readable @file{bb.out} means the basic-block 403*a9fa9459Szrjcounts didn't get written into @file{gmon.out}. 404*a9fa9459SzrjThe Perl script @code{bbconv.pl}, included with the @code{gprof} 405*a9fa9459Szrjsource distribution, will convert a @file{bb.out} file into 406*a9fa9459Szrja format readable by @code{gprof}. Invoke it like this: 407*a9fa9459Szrj 408*a9fa9459Szrj@smallexample 409*a9fa9459Szrjbbconv.pl < bb.out > @var{bh-data} 410*a9fa9459Szrj@end smallexample 411*a9fa9459Szrj 412*a9fa9459SzrjThis translates the information in @file{bb.out} into a form that 413*a9fa9459Szrj@code{gprof} can understand. But you still need to tell @code{gprof} 414*a9fa9459Szrjabout the existence of this translated information. To do that, include 415*a9fa9459Szrj@var{bb-data} on the @code{gprof} command line, @emph{along with 416*a9fa9459Szrj@file{gmon.out}}, like this: 417*a9fa9459Szrj 418*a9fa9459Szrj@smallexample 419*a9fa9459Szrjgprof @var{options} @var{executable-file} gmon.out @var{bb-data} [@var{yet-more-profile-data-files}@dots{}] [> @var{outfile}] 420*a9fa9459Szrj@end smallexample 421*a9fa9459Szrj 422*a9fa9459Szrj@node Invoking 423*a9fa9459Szrj@chapter @code{gprof} Command Summary 424*a9fa9459Szrj 425*a9fa9459SzrjAfter you have a profile data file @file{gmon.out}, you can run @code{gprof} 426*a9fa9459Szrjto interpret the information in it. The @code{gprof} program prints a 427*a9fa9459Szrjflat profile and a call graph on standard output. Typically you would 428*a9fa9459Szrjredirect the output of @code{gprof} into a file with @samp{>}. 429*a9fa9459Szrj 430*a9fa9459SzrjYou run @code{gprof} like this: 431*a9fa9459Szrj 432*a9fa9459Szrj@smallexample 433*a9fa9459Szrjgprof @var{options} [@var{executable-file} [@var{profile-data-files}@dots{}]] [> @var{outfile}] 434*a9fa9459Szrj@end smallexample 435*a9fa9459Szrj 436*a9fa9459Szrj@noindent 437*a9fa9459SzrjHere square-brackets indicate optional arguments. 438*a9fa9459Szrj 439*a9fa9459SzrjIf you omit the executable file name, the file @file{a.out} is used. If 440*a9fa9459Szrjyou give no profile data file name, the file @file{gmon.out} is used. If 441*a9fa9459Szrjany file is not in the proper format, or if the profile data file does not 442*a9fa9459Szrjappear to belong to the executable file, an error message is printed. 443*a9fa9459Szrj 444*a9fa9459SzrjYou can give more than one profile data file by entering all their names 445*a9fa9459Szrjafter the executable file name; then the statistics in all the data files 446*a9fa9459Szrjare summed together. 447*a9fa9459Szrj 448*a9fa9459SzrjThe order of these options does not matter. 449*a9fa9459Szrj 450*a9fa9459Szrj@menu 451*a9fa9459Szrj* Output Options:: Controlling @code{gprof}'s output style 452*a9fa9459Szrj* Analysis Options:: Controlling how @code{gprof} analyzes its data 453*a9fa9459Szrj* Miscellaneous Options:: 454*a9fa9459Szrj* Deprecated Options:: Options you no longer need to use, but which 455*a9fa9459Szrj have been retained for compatibility 456*a9fa9459Szrj* Symspecs:: Specifying functions to include or exclude 457*a9fa9459Szrj@end menu 458*a9fa9459Szrj 459*a9fa9459Szrj@node Output Options 460*a9fa9459Szrj@section Output Options 461*a9fa9459Szrj 462*a9fa9459Szrj@c man begin OPTIONS 463*a9fa9459SzrjThese options specify which of several output formats 464*a9fa9459Szrj@code{gprof} should produce. 465*a9fa9459Szrj 466*a9fa9459SzrjMany of these options take an optional @dfn{symspec} to specify 467*a9fa9459Szrjfunctions to be included or excluded. These options can be 468*a9fa9459Szrjspecified multiple times, with different symspecs, to include 469*a9fa9459Szrjor exclude sets of symbols. @xref{Symspecs, ,Symspecs}. 470*a9fa9459Szrj 471*a9fa9459SzrjSpecifying any of these options overrides the default (@samp{-p -q}), 472*a9fa9459Szrjwhich prints a flat profile and call graph analysis 473*a9fa9459Szrjfor all functions. 474*a9fa9459Szrj 475*a9fa9459Szrj@table @code 476*a9fa9459Szrj 477*a9fa9459Szrj@item -A[@var{symspec}] 478*a9fa9459Szrj@itemx --annotated-source[=@var{symspec}] 479*a9fa9459SzrjThe @samp{-A} option causes @code{gprof} to print annotated source code. 480*a9fa9459SzrjIf @var{symspec} is specified, print output only for matching symbols. 481*a9fa9459Szrj@xref{Annotated Source, ,The Annotated Source Listing}. 482*a9fa9459Szrj 483*a9fa9459Szrj@item -b 484*a9fa9459Szrj@itemx --brief 485*a9fa9459SzrjIf the @samp{-b} option is given, @code{gprof} doesn't print the 486*a9fa9459Szrjverbose blurbs that try to explain the meaning of all of the fields in 487*a9fa9459Szrjthe tables. This is useful if you intend to print out the output, or 488*a9fa9459Szrjare tired of seeing the blurbs. 489*a9fa9459Szrj 490*a9fa9459Szrj@item -C[@var{symspec}] 491*a9fa9459Szrj@itemx --exec-counts[=@var{symspec}] 492*a9fa9459SzrjThe @samp{-C} option causes @code{gprof} to 493*a9fa9459Szrjprint a tally of functions and the number of times each was called. 494*a9fa9459SzrjIf @var{symspec} is specified, print tally only for matching symbols. 495*a9fa9459Szrj 496*a9fa9459SzrjIf the profile data file contains basic-block count records, specifying 497*a9fa9459Szrjthe @samp{-l} option, along with @samp{-C}, will cause basic-block 498*a9fa9459Szrjexecution counts to be tallied and displayed. 499*a9fa9459Szrj 500*a9fa9459Szrj@item -i 501*a9fa9459Szrj@itemx --file-info 502*a9fa9459SzrjThe @samp{-i} option causes @code{gprof} to display summary information 503*a9fa9459Szrjabout the profile data file(s) and then exit. The number of histogram, 504*a9fa9459Szrjcall graph, and basic-block count records is displayed. 505*a9fa9459Szrj 506*a9fa9459Szrj@item -I @var{dirs} 507*a9fa9459Szrj@itemx --directory-path=@var{dirs} 508*a9fa9459SzrjThe @samp{-I} option specifies a list of search directories in 509*a9fa9459Szrjwhich to find source files. Environment variable @var{GPROF_PATH} 510*a9fa9459Szrjcan also be used to convey this information. 511*a9fa9459SzrjUsed mostly for annotated source output. 512*a9fa9459Szrj 513*a9fa9459Szrj@item -J[@var{symspec}] 514*a9fa9459Szrj@itemx --no-annotated-source[=@var{symspec}] 515*a9fa9459SzrjThe @samp{-J} option causes @code{gprof} not to 516*a9fa9459Szrjprint annotated source code. 517*a9fa9459SzrjIf @var{symspec} is specified, @code{gprof} prints annotated source, 518*a9fa9459Szrjbut excludes matching symbols. 519*a9fa9459Szrj 520*a9fa9459Szrj@item -L 521*a9fa9459Szrj@itemx --print-path 522*a9fa9459SzrjNormally, source filenames are printed with the path 523*a9fa9459Szrjcomponent suppressed. The @samp{-L} option causes @code{gprof} 524*a9fa9459Szrjto print the full pathname of 525*a9fa9459Szrjsource filenames, which is determined 526*a9fa9459Szrjfrom symbolic debugging information in the image file 527*a9fa9459Szrjand is relative to the directory in which the compiler 528*a9fa9459Szrjwas invoked. 529*a9fa9459Szrj 530*a9fa9459Szrj@item -p[@var{symspec}] 531*a9fa9459Szrj@itemx --flat-profile[=@var{symspec}] 532*a9fa9459SzrjThe @samp{-p} option causes @code{gprof} to print a flat profile. 533*a9fa9459SzrjIf @var{symspec} is specified, print flat profile only for matching symbols. 534*a9fa9459Szrj@xref{Flat Profile, ,The Flat Profile}. 535*a9fa9459Szrj 536*a9fa9459Szrj@item -P[@var{symspec}] 537*a9fa9459Szrj@itemx --no-flat-profile[=@var{symspec}] 538*a9fa9459SzrjThe @samp{-P} option causes @code{gprof} to suppress printing a flat profile. 539*a9fa9459SzrjIf @var{symspec} is specified, @code{gprof} prints a flat profile, 540*a9fa9459Szrjbut excludes matching symbols. 541*a9fa9459Szrj 542*a9fa9459Szrj@item -q[@var{symspec}] 543*a9fa9459Szrj@itemx --graph[=@var{symspec}] 544*a9fa9459SzrjThe @samp{-q} option causes @code{gprof} to print the call graph analysis. 545*a9fa9459SzrjIf @var{symspec} is specified, print call graph only for matching symbols 546*a9fa9459Szrjand their children. 547*a9fa9459Szrj@xref{Call Graph, ,The Call Graph}. 548*a9fa9459Szrj 549*a9fa9459Szrj@item -Q[@var{symspec}] 550*a9fa9459Szrj@itemx --no-graph[=@var{symspec}] 551*a9fa9459SzrjThe @samp{-Q} option causes @code{gprof} to suppress printing the 552*a9fa9459Szrjcall graph. 553*a9fa9459SzrjIf @var{symspec} is specified, @code{gprof} prints a call graph, 554*a9fa9459Szrjbut excludes matching symbols. 555*a9fa9459Szrj 556*a9fa9459Szrj@item -t 557*a9fa9459Szrj@itemx --table-length=@var{num} 558*a9fa9459SzrjThe @samp{-t} option causes the @var{num} most active source lines in 559*a9fa9459Szrjeach source file to be listed when source annotation is enabled. The 560*a9fa9459Szrjdefault is 10. 561*a9fa9459Szrj 562*a9fa9459Szrj@item -y 563*a9fa9459Szrj@itemx --separate-files 564*a9fa9459SzrjThis option affects annotated source output only. 565*a9fa9459SzrjNormally, @code{gprof} prints annotated source files 566*a9fa9459Szrjto standard-output. If this option is specified, 567*a9fa9459Szrjannotated source for a file named @file{path/@var{filename}} 568*a9fa9459Szrjis generated in the file @file{@var{filename}-ann}. If the underlying 569*a9fa9459Szrjfile system would truncate @file{@var{filename}-ann} so that it 570*a9fa9459Szrjoverwrites the original @file{@var{filename}}, @code{gprof} generates 571*a9fa9459Szrjannotated source in the file @file{@var{filename}.ann} instead (if the 572*a9fa9459Szrjoriginal file name has an extension, that extension is @emph{replaced} 573*a9fa9459Szrjwith @file{.ann}). 574*a9fa9459Szrj 575*a9fa9459Szrj@item -Z[@var{symspec}] 576*a9fa9459Szrj@itemx --no-exec-counts[=@var{symspec}] 577*a9fa9459SzrjThe @samp{-Z} option causes @code{gprof} not to 578*a9fa9459Szrjprint a tally of functions and the number of times each was called. 579*a9fa9459SzrjIf @var{symspec} is specified, print tally, but exclude matching symbols. 580*a9fa9459Szrj 581*a9fa9459Szrj@item -r 582*a9fa9459Szrj@itemx --function-ordering 583*a9fa9459SzrjThe @samp{--function-ordering} option causes @code{gprof} to print a 584*a9fa9459Szrjsuggested function ordering for the program based on profiling data. 585*a9fa9459SzrjThis option suggests an ordering which may improve paging, tlb and 586*a9fa9459Szrjcache behavior for the program on systems which support arbitrary 587*a9fa9459Szrjordering of functions in an executable. 588*a9fa9459Szrj 589*a9fa9459SzrjThe exact details of how to force the linker to place functions 590*a9fa9459Szrjin a particular order is system dependent and out of the scope of this 591*a9fa9459Szrjmanual. 592*a9fa9459Szrj 593*a9fa9459Szrj@item -R @var{map_file} 594*a9fa9459Szrj@itemx --file-ordering @var{map_file} 595*a9fa9459SzrjThe @samp{--file-ordering} option causes @code{gprof} to print a 596*a9fa9459Szrjsuggested .o link line ordering for the program based on profiling data. 597*a9fa9459SzrjThis option suggests an ordering which may improve paging, tlb and 598*a9fa9459Szrjcache behavior for the program on systems which do not support arbitrary 599*a9fa9459Szrjordering of functions in an executable. 600*a9fa9459Szrj 601*a9fa9459SzrjUse of the @samp{-a} argument is highly recommended with this option. 602*a9fa9459Szrj 603*a9fa9459SzrjThe @var{map_file} argument is a pathname to a file which provides 604*a9fa9459Szrjfunction name to object file mappings. The format of the file is similar to 605*a9fa9459Szrjthe output of the program @code{nm}. 606*a9fa9459Szrj 607*a9fa9459Szrj@smallexample 608*a9fa9459Szrj@group 609*a9fa9459Szrjc-parse.o:00000000 T yyparse 610*a9fa9459Szrjc-parse.o:00000004 C yyerrflag 611*a9fa9459Szrjc-lang.o:00000000 T maybe_objc_method_name 612*a9fa9459Szrjc-lang.o:00000000 T print_lang_statistics 613*a9fa9459Szrjc-lang.o:00000000 T recognize_objc_keyword 614*a9fa9459Szrjc-decl.o:00000000 T print_lang_identifier 615*a9fa9459Szrjc-decl.o:00000000 T print_lang_type 616*a9fa9459Szrj@dots{} 617*a9fa9459Szrj 618*a9fa9459Szrj@end group 619*a9fa9459Szrj@end smallexample 620*a9fa9459Szrj 621*a9fa9459SzrjTo create a @var{map_file} with @sc{gnu} @code{nm}, type a command like 622*a9fa9459Szrj@kbd{nm --extern-only --defined-only -v --print-file-name program-name}. 623*a9fa9459Szrj 624*a9fa9459Szrj@item -T 625*a9fa9459Szrj@itemx --traditional 626*a9fa9459SzrjThe @samp{-T} option causes @code{gprof} to print its output in 627*a9fa9459Szrj``traditional'' BSD style. 628*a9fa9459Szrj 629*a9fa9459Szrj@item -w @var{width} 630*a9fa9459Szrj@itemx --width=@var{width} 631*a9fa9459SzrjSets width of output lines to @var{width}. 632*a9fa9459SzrjCurrently only used when printing the function index at the bottom 633*a9fa9459Szrjof the call graph. 634*a9fa9459Szrj 635*a9fa9459Szrj@item -x 636*a9fa9459Szrj@itemx --all-lines 637*a9fa9459SzrjThis option affects annotated source output only. 638*a9fa9459SzrjBy default, only the lines at the beginning of a basic-block 639*a9fa9459Szrjare annotated. If this option is specified, every line in 640*a9fa9459Szrja basic-block is annotated by repeating the annotation for the 641*a9fa9459Szrjfirst line. This behavior is similar to @code{tcov}'s @samp{-a}. 642*a9fa9459Szrj 643*a9fa9459Szrj@item --demangle[=@var{style}] 644*a9fa9459Szrj@itemx --no-demangle 645*a9fa9459SzrjThese options control whether C++ symbol names should be demangled when 646*a9fa9459Szrjprinting output. The default is to demangle symbols. The 647*a9fa9459Szrj@code{--no-demangle} option may be used to turn off demangling. Different 648*a9fa9459Szrjcompilers have different mangling styles. The optional demangling style 649*a9fa9459Szrjargument can be used to choose an appropriate demangling style for your 650*a9fa9459Szrjcompiler. 651*a9fa9459Szrj@end table 652*a9fa9459Szrj 653*a9fa9459Szrj@node Analysis Options 654*a9fa9459Szrj@section Analysis Options 655*a9fa9459Szrj 656*a9fa9459Szrj@table @code 657*a9fa9459Szrj 658*a9fa9459Szrj@item -a 659*a9fa9459Szrj@itemx --no-static 660*a9fa9459SzrjThe @samp{-a} option causes @code{gprof} to suppress the printing of 661*a9fa9459Szrjstatically declared (private) functions. (These are functions whose 662*a9fa9459Szrjnames are not listed as global, and which are not visible outside the 663*a9fa9459Szrjfile/function/block where they were defined.) Time spent in these 664*a9fa9459Szrjfunctions, calls to/from them, etc., will all be attributed to the 665*a9fa9459Szrjfunction that was loaded directly before it in the executable file. 666*a9fa9459Szrj@c This is compatible with Unix @code{gprof}, but a bad idea. 667*a9fa9459SzrjThis option affects both the flat profile and the call graph. 668*a9fa9459Szrj 669*a9fa9459Szrj@item -c 670*a9fa9459Szrj@itemx --static-call-graph 671*a9fa9459SzrjThe @samp{-c} option causes the call graph of the program to be 672*a9fa9459Szrjaugmented by a heuristic which examines the text space of the object 673*a9fa9459Szrjfile and identifies function calls in the binary machine code. 674*a9fa9459SzrjSince normal call graph records are only generated when functions are 675*a9fa9459Szrjentered, this option identifies children that could have been called, 676*a9fa9459Szrjbut never were. Calls to functions that were not compiled with 677*a9fa9459Szrjprofiling enabled are also identified, but only if symbol table 678*a9fa9459Szrjentries are present for them. 679*a9fa9459SzrjCalls to dynamic library routines are typically @emph{not} found 680*a9fa9459Szrjby this option. 681*a9fa9459SzrjParents or children identified via this heuristic 682*a9fa9459Szrjare indicated in the call graph with call counts of @samp{0}. 683*a9fa9459Szrj 684*a9fa9459Szrj@item -D 685*a9fa9459Szrj@itemx --ignore-non-functions 686*a9fa9459SzrjThe @samp{-D} option causes @code{gprof} to ignore symbols which 687*a9fa9459Szrjare not known to be functions. This option will give more accurate 688*a9fa9459Szrjprofile data on systems where it is supported (Solaris and HPUX for 689*a9fa9459Szrjexample). 690*a9fa9459Szrj 691*a9fa9459Szrj@item -k @var{from}/@var{to} 692*a9fa9459SzrjThe @samp{-k} option allows you to delete from the call graph any arcs from 693*a9fa9459Szrjsymbols matching symspec @var{from} to those matching symspec @var{to}. 694*a9fa9459Szrj 695*a9fa9459Szrj@item -l 696*a9fa9459Szrj@itemx --line 697*a9fa9459SzrjThe @samp{-l} option enables line-by-line profiling, which causes 698*a9fa9459Szrjhistogram hits to be charged to individual source code lines, 699*a9fa9459Szrjinstead of functions. This feature only works with programs compiled 700*a9fa9459Szrjby older versions of the @code{gcc} compiler. Newer versions of 701*a9fa9459Szrj@code{gcc} are designed to work with the @code{gcov} tool instead. 702*a9fa9459Szrj 703*a9fa9459SzrjIf the program was compiled with basic-block counting enabled, 704*a9fa9459Szrjthis option will also identify how many times each line of 705*a9fa9459Szrjcode was executed. 706*a9fa9459SzrjWhile line-by-line profiling can help isolate where in a large function 707*a9fa9459Szrja program is spending its time, it also significantly increases 708*a9fa9459Szrjthe running time of @code{gprof}, and magnifies statistical 709*a9fa9459Szrjinaccuracies. 710*a9fa9459Szrj@xref{Sampling Error, ,Statistical Sampling Error}. 711*a9fa9459Szrj 712*a9fa9459Szrj@item --inline-file-names 713*a9fa9459SzrjThis option causes @code{gprof} to print the source file after each 714*a9fa9459Szrjsymbol in both the flat profile and the call graph. The full path to the 715*a9fa9459Szrjfile is printed if used with the @samp{-L} option. 716*a9fa9459Szrj 717*a9fa9459Szrj@item -m @var{num} 718*a9fa9459Szrj@itemx --min-count=@var{num} 719*a9fa9459SzrjThis option affects execution count output only. 720*a9fa9459SzrjSymbols that are executed less than @var{num} times are suppressed. 721*a9fa9459Szrj 722*a9fa9459Szrj@item -n@var{symspec} 723*a9fa9459Szrj@itemx --time=@var{symspec} 724*a9fa9459SzrjThe @samp{-n} option causes @code{gprof}, in its call graph analysis, 725*a9fa9459Szrjto only propagate times for symbols matching @var{symspec}. 726*a9fa9459Szrj 727*a9fa9459Szrj@item -N@var{symspec} 728*a9fa9459Szrj@itemx --no-time=@var{symspec} 729*a9fa9459SzrjThe @samp{-n} option causes @code{gprof}, in its call graph analysis, 730*a9fa9459Szrjnot to propagate times for symbols matching @var{symspec}. 731*a9fa9459Szrj 732*a9fa9459Szrj@item -S@var{filename} 733*a9fa9459Szrj@itemx --external-symbol-table=@var{filename} 734*a9fa9459SzrjThe @samp{-S} option causes @code{gprof} to read an external symbol table 735*a9fa9459Szrjfile, such as @file{/proc/kallsyms}, rather than read the symbol table 736*a9fa9459Szrjfrom the given object file (the default is @code{a.out}). This is useful 737*a9fa9459Szrjfor profiling kernel modules. 738*a9fa9459Szrj 739*a9fa9459Szrj@item -z 740*a9fa9459Szrj@itemx --display-unused-functions 741*a9fa9459SzrjIf you give the @samp{-z} option, @code{gprof} will mention all 742*a9fa9459Szrjfunctions in the flat profile, even those that were never called, and 743*a9fa9459Szrjthat had no time spent in them. This is useful in conjunction with the 744*a9fa9459Szrj@samp{-c} option for discovering which routines were never called. 745*a9fa9459Szrj 746*a9fa9459Szrj@end table 747*a9fa9459Szrj 748*a9fa9459Szrj@node Miscellaneous Options 749*a9fa9459Szrj@section Miscellaneous Options 750*a9fa9459Szrj 751*a9fa9459Szrj@table @code 752*a9fa9459Szrj 753*a9fa9459Szrj@item -d[@var{num}] 754*a9fa9459Szrj@itemx --debug[=@var{num}] 755*a9fa9459SzrjThe @samp{-d @var{num}} option specifies debugging options. 756*a9fa9459SzrjIf @var{num} is not specified, enable all debugging. 757*a9fa9459Szrj@xref{Debugging, ,Debugging @code{gprof}}. 758*a9fa9459Szrj 759*a9fa9459Szrj@item -h 760*a9fa9459Szrj@itemx --help 761*a9fa9459SzrjThe @samp{-h} option prints command line usage. 762*a9fa9459Szrj 763*a9fa9459Szrj@item -O@var{name} 764*a9fa9459Szrj@itemx --file-format=@var{name} 765*a9fa9459SzrjSelects the format of the profile data files. Recognized formats are 766*a9fa9459Szrj@samp{auto} (the default), @samp{bsd}, @samp{4.4bsd}, @samp{magic}, and 767*a9fa9459Szrj@samp{prof} (not yet supported). 768*a9fa9459Szrj 769*a9fa9459Szrj@item -s 770*a9fa9459Szrj@itemx --sum 771*a9fa9459SzrjThe @samp{-s} option causes @code{gprof} to summarize the information 772*a9fa9459Szrjin the profile data files it read in, and write out a profile data 773*a9fa9459Szrjfile called @file{gmon.sum}, which contains all the information from 774*a9fa9459Szrjthe profile data files that @code{gprof} read in. The file @file{gmon.sum} 775*a9fa9459Szrjmay be one of the specified input files; the effect of this is to 776*a9fa9459Szrjmerge the data in the other input files into @file{gmon.sum}. 777*a9fa9459Szrj 778*a9fa9459SzrjEventually you can run @code{gprof} again without @samp{-s} to analyze the 779*a9fa9459Szrjcumulative data in the file @file{gmon.sum}. 780*a9fa9459Szrj 781*a9fa9459Szrj@item -v 782*a9fa9459Szrj@itemx --version 783*a9fa9459SzrjThe @samp{-v} flag causes @code{gprof} to print the current version 784*a9fa9459Szrjnumber, and then exit. 785*a9fa9459Szrj 786*a9fa9459Szrj@end table 787*a9fa9459Szrj 788*a9fa9459Szrj@node Deprecated Options 789*a9fa9459Szrj@section Deprecated Options 790*a9fa9459Szrj 791*a9fa9459SzrjThese options have been replaced with newer versions that use symspecs. 792*a9fa9459Szrj 793*a9fa9459Szrj@table @code 794*a9fa9459Szrj 795*a9fa9459Szrj@item -e @var{function_name} 796*a9fa9459SzrjThe @samp{-e @var{function}} option tells @code{gprof} to not print 797*a9fa9459Szrjinformation about the function @var{function_name} (and its 798*a9fa9459Szrjchildren@dots{}) in the call graph. The function will still be listed 799*a9fa9459Szrjas a child of any functions that call it, but its index number will be 800*a9fa9459Szrjshown as @samp{[not printed]}. More than one @samp{-e} option may be 801*a9fa9459Szrjgiven; only one @var{function_name} may be indicated with each @samp{-e} 802*a9fa9459Szrjoption. 803*a9fa9459Szrj 804*a9fa9459Szrj@item -E @var{function_name} 805*a9fa9459SzrjThe @code{-E @var{function}} option works like the @code{-e} option, but 806*a9fa9459Szrjtime spent in the function (and children who were not called from 807*a9fa9459Szrjanywhere else), will not be used to compute the percentages-of-time for 808*a9fa9459Szrjthe call graph. More than one @samp{-E} option may be given; only one 809*a9fa9459Szrj@var{function_name} may be indicated with each @samp{-E} option. 810*a9fa9459Szrj 811*a9fa9459Szrj@item -f @var{function_name} 812*a9fa9459SzrjThe @samp{-f @var{function}} option causes @code{gprof} to limit the 813*a9fa9459Szrjcall graph to the function @var{function_name} and its children (and 814*a9fa9459Szrjtheir children@dots{}). More than one @samp{-f} option may be given; 815*a9fa9459Szrjonly one @var{function_name} may be indicated with each @samp{-f} 816*a9fa9459Szrjoption. 817*a9fa9459Szrj 818*a9fa9459Szrj@item -F @var{function_name} 819*a9fa9459SzrjThe @samp{-F @var{function}} option works like the @code{-f} option, but 820*a9fa9459Szrjonly time spent in the function and its children (and their 821*a9fa9459Szrjchildren@dots{}) will be used to determine total-time and 822*a9fa9459Szrjpercentages-of-time for the call graph. More than one @samp{-F} option 823*a9fa9459Szrjmay be given; only one @var{function_name} may be indicated with each 824*a9fa9459Szrj@samp{-F} option. The @samp{-F} option overrides the @samp{-E} option. 825*a9fa9459Szrj 826*a9fa9459Szrj@end table 827*a9fa9459Szrj 828*a9fa9459Szrj@c man end 829*a9fa9459Szrj 830*a9fa9459SzrjNote that only one function can be specified with each @code{-e}, 831*a9fa9459Szrj@code{-E}, @code{-f} or @code{-F} option. To specify more than one 832*a9fa9459Szrjfunction, use multiple options. For example, this command: 833*a9fa9459Szrj 834*a9fa9459Szrj@example 835*a9fa9459Szrjgprof -e boring -f foo -f bar myprogram > gprof.output 836*a9fa9459Szrj@end example 837*a9fa9459Szrj 838*a9fa9459Szrj@noindent 839*a9fa9459Szrjlists in the call graph all functions that were reached from either 840*a9fa9459Szrj@code{foo} or @code{bar} and were not reachable from @code{boring}. 841*a9fa9459Szrj 842*a9fa9459Szrj@node Symspecs 843*a9fa9459Szrj@section Symspecs 844*a9fa9459Szrj 845*a9fa9459SzrjMany of the output options allow functions to be included or excluded 846*a9fa9459Szrjusing @dfn{symspecs} (symbol specifications), which observe the 847*a9fa9459Szrjfollowing syntax: 848*a9fa9459Szrj 849*a9fa9459Szrj@example 850*a9fa9459Szrj filename_containing_a_dot 851*a9fa9459Szrj| funcname_not_containing_a_dot 852*a9fa9459Szrj| linenumber 853*a9fa9459Szrj| ( [ any_filename ] `:' ( any_funcname | linenumber ) ) 854*a9fa9459Szrj@end example 855*a9fa9459Szrj 856*a9fa9459SzrjHere are some sample symspecs: 857*a9fa9459Szrj 858*a9fa9459Szrj@table @samp 859*a9fa9459Szrj@item main.c 860*a9fa9459SzrjSelects everything in file @file{main.c}---the 861*a9fa9459Szrjdot in the string tells @code{gprof} to interpret 862*a9fa9459Szrjthe string as a filename, rather than as 863*a9fa9459Szrja function name. To select a file whose 864*a9fa9459Szrjname does not contain a dot, a trailing colon 865*a9fa9459Szrjshould be specified. For example, @samp{odd:} is 866*a9fa9459Szrjinterpreted as the file named @file{odd}. 867*a9fa9459Szrj 868*a9fa9459Szrj@item main 869*a9fa9459SzrjSelects all functions named @samp{main}. 870*a9fa9459Szrj 871*a9fa9459SzrjNote that there may be multiple instances of the same function name 872*a9fa9459Szrjbecause some of the definitions may be local (i.e., static). Unless a 873*a9fa9459Szrjfunction name is unique in a program, you must use the colon notation 874*a9fa9459Szrjexplained below to specify a function from a specific source file. 875*a9fa9459Szrj 876*a9fa9459SzrjSometimes, function names contain dots. In such cases, it is necessary 877*a9fa9459Szrjto add a leading colon to the name. For example, @samp{:.mul} selects 878*a9fa9459Szrjfunction @samp{.mul}. 879*a9fa9459Szrj 880*a9fa9459SzrjIn some object file formats, symbols have a leading underscore. 881*a9fa9459Szrj@code{gprof} will normally not print these underscores. When you name a 882*a9fa9459Szrjsymbol in a symspec, you should type it exactly as @code{gprof} prints 883*a9fa9459Szrjit in its output. For example, if the compiler produces a symbol 884*a9fa9459Szrj@samp{_main} from your @code{main} function, @code{gprof} still prints 885*a9fa9459Szrjit as @samp{main} in its output, so you should use @samp{main} in 886*a9fa9459Szrjsymspecs. 887*a9fa9459Szrj 888*a9fa9459Szrj@item main.c:main 889*a9fa9459SzrjSelects function @samp{main} in file @file{main.c}. 890*a9fa9459Szrj 891*a9fa9459Szrj@item main.c:134 892*a9fa9459SzrjSelects line 134 in file @file{main.c}. 893*a9fa9459Szrj@end table 894*a9fa9459Szrj 895*a9fa9459Szrj@node Output 896*a9fa9459Szrj@chapter Interpreting @code{gprof}'s Output 897*a9fa9459Szrj 898*a9fa9459Szrj@code{gprof} can produce several different output styles, the 899*a9fa9459Szrjmost important of which are described below. The simplest output 900*a9fa9459Szrjstyles (file information, execution count, and function and file ordering) 901*a9fa9459Szrjare not described here, but are documented with the respective options 902*a9fa9459Szrjthat trigger them. 903*a9fa9459Szrj@xref{Output Options, ,Output Options}. 904*a9fa9459Szrj 905*a9fa9459Szrj@menu 906*a9fa9459Szrj* Flat Profile:: The flat profile shows how much time was spent 907*a9fa9459Szrj executing directly in each function. 908*a9fa9459Szrj* Call Graph:: The call graph shows which functions called which 909*a9fa9459Szrj others, and how much time each function used 910*a9fa9459Szrj when its subroutine calls are included. 911*a9fa9459Szrj* Line-by-line:: @code{gprof} can analyze individual source code lines 912*a9fa9459Szrj* Annotated Source:: The annotated source listing displays source code 913*a9fa9459Szrj labeled with execution counts 914*a9fa9459Szrj@end menu 915*a9fa9459Szrj 916*a9fa9459Szrj 917*a9fa9459Szrj@node Flat Profile 918*a9fa9459Szrj@section The Flat Profile 919*a9fa9459Szrj@cindex flat profile 920*a9fa9459Szrj 921*a9fa9459SzrjThe @dfn{flat profile} shows the total amount of time your program 922*a9fa9459Szrjspent executing each function. Unless the @samp{-z} option is given, 923*a9fa9459Szrjfunctions with no apparent time spent in them, and no apparent calls 924*a9fa9459Szrjto them, are not mentioned. Note that if a function was not compiled 925*a9fa9459Szrjfor profiling, and didn't run long enough to show up on the program 926*a9fa9459Szrjcounter histogram, it will be indistinguishable from a function that 927*a9fa9459Szrjwas never called. 928*a9fa9459Szrj 929*a9fa9459SzrjThis is part of a flat profile for a small program: 930*a9fa9459Szrj 931*a9fa9459Szrj@smallexample 932*a9fa9459Szrj@group 933*a9fa9459SzrjFlat profile: 934*a9fa9459Szrj 935*a9fa9459SzrjEach sample counts as 0.01 seconds. 936*a9fa9459Szrj % cumulative self self total 937*a9fa9459Szrj time seconds seconds calls ms/call ms/call name 938*a9fa9459Szrj 33.34 0.02 0.02 7208 0.00 0.00 open 939*a9fa9459Szrj 16.67 0.03 0.01 244 0.04 0.12 offtime 940*a9fa9459Szrj 16.67 0.04 0.01 8 1.25 1.25 memccpy 941*a9fa9459Szrj 16.67 0.05 0.01 7 1.43 1.43 write 942*a9fa9459Szrj 16.67 0.06 0.01 mcount 943*a9fa9459Szrj 0.00 0.06 0.00 236 0.00 0.00 tzset 944*a9fa9459Szrj 0.00 0.06 0.00 192 0.00 0.00 tolower 945*a9fa9459Szrj 0.00 0.06 0.00 47 0.00 0.00 strlen 946*a9fa9459Szrj 0.00 0.06 0.00 45 0.00 0.00 strchr 947*a9fa9459Szrj 0.00 0.06 0.00 1 0.00 50.00 main 948*a9fa9459Szrj 0.00 0.06 0.00 1 0.00 0.00 memcpy 949*a9fa9459Szrj 0.00 0.06 0.00 1 0.00 10.11 print 950*a9fa9459Szrj 0.00 0.06 0.00 1 0.00 0.00 profil 951*a9fa9459Szrj 0.00 0.06 0.00 1 0.00 50.00 report 952*a9fa9459Szrj@dots{} 953*a9fa9459Szrj@end group 954*a9fa9459Szrj@end smallexample 955*a9fa9459Szrj 956*a9fa9459Szrj@noindent 957*a9fa9459SzrjThe functions are sorted first by decreasing run-time spent in them, 958*a9fa9459Szrjthen by decreasing number of calls, then alphabetically by name. The 959*a9fa9459Szrjfunctions @samp{mcount} and @samp{profil} are part of the profiling 960*a9fa9459Szrjapparatus and appear in every flat profile; their time gives a measure of 961*a9fa9459Szrjthe amount of overhead due to profiling. 962*a9fa9459Szrj 963*a9fa9459SzrjJust before the column headers, a statement appears indicating 964*a9fa9459Szrjhow much time each sample counted as. 965*a9fa9459SzrjThis @dfn{sampling period} estimates the margin of error in each of the time 966*a9fa9459Szrjfigures. A time figure that is not much larger than this is not 967*a9fa9459Szrjreliable. In this example, each sample counted as 0.01 seconds, 968*a9fa9459Szrjsuggesting a 100 Hz sampling rate. 969*a9fa9459SzrjThe program's total execution time was 0.06 970*a9fa9459Szrjseconds, as indicated by the @samp{cumulative seconds} field. Since 971*a9fa9459Szrjeach sample counted for 0.01 seconds, this means only six samples 972*a9fa9459Szrjwere taken during the run. Two of the samples occurred while the 973*a9fa9459Szrjprogram was in the @samp{open} function, as indicated by the 974*a9fa9459Szrj@samp{self seconds} field. Each of the other four samples 975*a9fa9459Szrjoccurred one each in @samp{offtime}, @samp{memccpy}, @samp{write}, 976*a9fa9459Szrjand @samp{mcount}. 977*a9fa9459SzrjSince only six samples were taken, none of these values can 978*a9fa9459Szrjbe regarded as particularly reliable. 979*a9fa9459SzrjIn another run, 980*a9fa9459Szrjthe @samp{self seconds} field for 981*a9fa9459Szrj@samp{mcount} might well be @samp{0.00} or @samp{0.02}. 982*a9fa9459Szrj@xref{Sampling Error, ,Statistical Sampling Error}, 983*a9fa9459Szrjfor a complete discussion. 984*a9fa9459Szrj 985*a9fa9459SzrjThe remaining functions in the listing (those whose 986*a9fa9459Szrj@samp{self seconds} field is @samp{0.00}) didn't appear 987*a9fa9459Szrjin the histogram samples at all. However, the call graph 988*a9fa9459Szrjindicated that they were called, so therefore they are listed, 989*a9fa9459Szrjsorted in decreasing order by the @samp{calls} field. 990*a9fa9459SzrjClearly some time was spent executing these functions, 991*a9fa9459Szrjbut the paucity of histogram samples prevents any 992*a9fa9459Szrjdetermination of how much time each took. 993*a9fa9459Szrj 994*a9fa9459SzrjHere is what the fields in each line mean: 995*a9fa9459Szrj 996*a9fa9459Szrj@table @code 997*a9fa9459Szrj@item % time 998*a9fa9459SzrjThis is the percentage of the total execution time your program spent 999*a9fa9459Szrjin this function. These should all add up to 100%. 1000*a9fa9459Szrj 1001*a9fa9459Szrj@item cumulative seconds 1002*a9fa9459SzrjThis is the cumulative total number of seconds the computer spent 1003*a9fa9459Szrjexecuting this functions, plus the time spent in all the functions 1004*a9fa9459Szrjabove this one in this table. 1005*a9fa9459Szrj 1006*a9fa9459Szrj@item self seconds 1007*a9fa9459SzrjThis is the number of seconds accounted for by this function alone. 1008*a9fa9459SzrjThe flat profile listing is sorted first by this number. 1009*a9fa9459Szrj 1010*a9fa9459Szrj@item calls 1011*a9fa9459SzrjThis is the total number of times the function was called. If the 1012*a9fa9459Szrjfunction was never called, or the number of times it was called cannot 1013*a9fa9459Szrjbe determined (probably because the function was not compiled with 1014*a9fa9459Szrjprofiling enabled), the @dfn{calls} field is blank. 1015*a9fa9459Szrj 1016*a9fa9459Szrj@item self ms/call 1017*a9fa9459SzrjThis represents the average number of milliseconds spent in this 1018*a9fa9459Szrjfunction per call, if this function is profiled. Otherwise, this field 1019*a9fa9459Szrjis blank for this function. 1020*a9fa9459Szrj 1021*a9fa9459Szrj@item total ms/call 1022*a9fa9459SzrjThis represents the average number of milliseconds spent in this 1023*a9fa9459Szrjfunction and its descendants per call, if this function is profiled. 1024*a9fa9459SzrjOtherwise, this field is blank for this function. 1025*a9fa9459SzrjThis is the only field in the flat profile that uses call graph analysis. 1026*a9fa9459Szrj 1027*a9fa9459Szrj@item name 1028*a9fa9459SzrjThis is the name of the function. The flat profile is sorted by this 1029*a9fa9459Szrjfield alphabetically after the @dfn{self seconds} and @dfn{calls} 1030*a9fa9459Szrjfields are sorted. 1031*a9fa9459Szrj@end table 1032*a9fa9459Szrj 1033*a9fa9459Szrj@node Call Graph 1034*a9fa9459Szrj@section The Call Graph 1035*a9fa9459Szrj@cindex call graph 1036*a9fa9459Szrj 1037*a9fa9459SzrjThe @dfn{call graph} shows how much time was spent in each function 1038*a9fa9459Szrjand its children. From this information, you can find functions that, 1039*a9fa9459Szrjwhile they themselves may not have used much time, called other 1040*a9fa9459Szrjfunctions that did use unusual amounts of time. 1041*a9fa9459Szrj 1042*a9fa9459SzrjHere is a sample call from a small program. This call came from the 1043*a9fa9459Szrjsame @code{gprof} run as the flat profile example in the previous 1044*a9fa9459Szrjsection. 1045*a9fa9459Szrj 1046*a9fa9459Szrj@smallexample 1047*a9fa9459Szrj@group 1048*a9fa9459Szrjgranularity: each sample hit covers 2 byte(s) for 20.00% of 0.05 seconds 1049*a9fa9459Szrj 1050*a9fa9459Szrjindex % time self children called name 1051*a9fa9459Szrj <spontaneous> 1052*a9fa9459Szrj[1] 100.0 0.00 0.05 start [1] 1053*a9fa9459Szrj 0.00 0.05 1/1 main [2] 1054*a9fa9459Szrj 0.00 0.00 1/2 on_exit [28] 1055*a9fa9459Szrj 0.00 0.00 1/1 exit [59] 1056*a9fa9459Szrj----------------------------------------------- 1057*a9fa9459Szrj 0.00 0.05 1/1 start [1] 1058*a9fa9459Szrj[2] 100.0 0.00 0.05 1 main [2] 1059*a9fa9459Szrj 0.00 0.05 1/1 report [3] 1060*a9fa9459Szrj----------------------------------------------- 1061*a9fa9459Szrj 0.00 0.05 1/1 main [2] 1062*a9fa9459Szrj[3] 100.0 0.00 0.05 1 report [3] 1063*a9fa9459Szrj 0.00 0.03 8/8 timelocal [6] 1064*a9fa9459Szrj 0.00 0.01 1/1 print [9] 1065*a9fa9459Szrj 0.00 0.01 9/9 fgets [12] 1066*a9fa9459Szrj 0.00 0.00 12/34 strncmp <cycle 1> [40] 1067*a9fa9459Szrj 0.00 0.00 8/8 lookup [20] 1068*a9fa9459Szrj 0.00 0.00 1/1 fopen [21] 1069*a9fa9459Szrj 0.00 0.00 8/8 chewtime [24] 1070*a9fa9459Szrj 0.00 0.00 8/16 skipspace [44] 1071*a9fa9459Szrj----------------------------------------------- 1072*a9fa9459Szrj[4] 59.8 0.01 0.02 8+472 <cycle 2 as a whole> [4] 1073*a9fa9459Szrj 0.01 0.02 244+260 offtime <cycle 2> [7] 1074*a9fa9459Szrj 0.00 0.00 236+1 tzset <cycle 2> [26] 1075*a9fa9459Szrj----------------------------------------------- 1076*a9fa9459Szrj@end group 1077*a9fa9459Szrj@end smallexample 1078*a9fa9459Szrj 1079*a9fa9459SzrjThe lines full of dashes divide this table into @dfn{entries}, one for each 1080*a9fa9459Szrjfunction. Each entry has one or more lines. 1081*a9fa9459Szrj 1082*a9fa9459SzrjIn each entry, the primary line is the one that starts with an index number 1083*a9fa9459Szrjin square brackets. The end of this line says which function the entry is 1084*a9fa9459Szrjfor. The preceding lines in the entry describe the callers of this 1085*a9fa9459Szrjfunction and the following lines describe its subroutines (also called 1086*a9fa9459Szrj@dfn{children} when we speak of the call graph). 1087*a9fa9459Szrj 1088*a9fa9459SzrjThe entries are sorted by time spent in the function and its subroutines. 1089*a9fa9459Szrj 1090*a9fa9459SzrjThe internal profiling function @code{mcount} (@pxref{Flat Profile, ,The 1091*a9fa9459SzrjFlat Profile}) is never mentioned in the call graph. 1092*a9fa9459Szrj 1093*a9fa9459Szrj@menu 1094*a9fa9459Szrj* Primary:: Details of the primary line's contents. 1095*a9fa9459Szrj* Callers:: Details of caller-lines' contents. 1096*a9fa9459Szrj* Subroutines:: Details of subroutine-lines' contents. 1097*a9fa9459Szrj* Cycles:: When there are cycles of recursion, 1098*a9fa9459Szrj such as @code{a} calls @code{b} calls @code{a}@dots{} 1099*a9fa9459Szrj@end menu 1100*a9fa9459Szrj 1101*a9fa9459Szrj@node Primary 1102*a9fa9459Szrj@subsection The Primary Line 1103*a9fa9459Szrj 1104*a9fa9459SzrjThe @dfn{primary line} in a call graph entry is the line that 1105*a9fa9459Szrjdescribes the function which the entry is about and gives the overall 1106*a9fa9459Szrjstatistics for this function. 1107*a9fa9459Szrj 1108*a9fa9459SzrjFor reference, we repeat the primary line from the entry for function 1109*a9fa9459Szrj@code{report} in our main example, together with the heading line that 1110*a9fa9459Szrjshows the names of the fields: 1111*a9fa9459Szrj 1112*a9fa9459Szrj@smallexample 1113*a9fa9459Szrj@group 1114*a9fa9459Szrjindex % time self children called name 1115*a9fa9459Szrj@dots{} 1116*a9fa9459Szrj[3] 100.0 0.00 0.05 1 report [3] 1117*a9fa9459Szrj@end group 1118*a9fa9459Szrj@end smallexample 1119*a9fa9459Szrj 1120*a9fa9459SzrjHere is what the fields in the primary line mean: 1121*a9fa9459Szrj 1122*a9fa9459Szrj@table @code 1123*a9fa9459Szrj@item index 1124*a9fa9459SzrjEntries are numbered with consecutive integers. Each function 1125*a9fa9459Szrjtherefore has an index number, which appears at the beginning of its 1126*a9fa9459Szrjprimary line. 1127*a9fa9459Szrj 1128*a9fa9459SzrjEach cross-reference to a function, as a caller or subroutine of 1129*a9fa9459Szrjanother, gives its index number as well as its name. The index number 1130*a9fa9459Szrjguides you if you wish to look for the entry for that function. 1131*a9fa9459Szrj 1132*a9fa9459Szrj@item % time 1133*a9fa9459SzrjThis is the percentage of the total time that was spent in this 1134*a9fa9459Szrjfunction, including time spent in subroutines called from this 1135*a9fa9459Szrjfunction. 1136*a9fa9459Szrj 1137*a9fa9459SzrjThe time spent in this function is counted again for the callers of 1138*a9fa9459Szrjthis function. Therefore, adding up these percentages is meaningless. 1139*a9fa9459Szrj 1140*a9fa9459Szrj@item self 1141*a9fa9459SzrjThis is the total amount of time spent in this function. This 1142*a9fa9459Szrjshould be identical to the number printed in the @code{seconds} field 1143*a9fa9459Szrjfor this function in the flat profile. 1144*a9fa9459Szrj 1145*a9fa9459Szrj@item children 1146*a9fa9459SzrjThis is the total amount of time spent in the subroutine calls made by 1147*a9fa9459Szrjthis function. This should be equal to the sum of all the @code{self} 1148*a9fa9459Szrjand @code{children} entries of the children listed directly below this 1149*a9fa9459Szrjfunction. 1150*a9fa9459Szrj 1151*a9fa9459Szrj@item called 1152*a9fa9459SzrjThis is the number of times the function was called. 1153*a9fa9459Szrj 1154*a9fa9459SzrjIf the function called itself recursively, there are two numbers, 1155*a9fa9459Szrjseparated by a @samp{+}. The first number counts non-recursive calls, 1156*a9fa9459Szrjand the second counts recursive calls. 1157*a9fa9459Szrj 1158*a9fa9459SzrjIn the example above, the function @code{report} was called once from 1159*a9fa9459Szrj@code{main}. 1160*a9fa9459Szrj 1161*a9fa9459Szrj@item name 1162*a9fa9459SzrjThis is the name of the current function. The index number is 1163*a9fa9459Szrjrepeated after it. 1164*a9fa9459Szrj 1165*a9fa9459SzrjIf the function is part of a cycle of recursion, the cycle number is 1166*a9fa9459Szrjprinted between the function's name and the index number 1167*a9fa9459Szrj(@pxref{Cycles, ,How Mutually Recursive Functions Are Described}). 1168*a9fa9459SzrjFor example, if function @code{gnurr} is part of 1169*a9fa9459Szrjcycle number one, and has index number twelve, its primary line would 1170*a9fa9459Szrjbe end like this: 1171*a9fa9459Szrj 1172*a9fa9459Szrj@example 1173*a9fa9459Szrjgnurr <cycle 1> [12] 1174*a9fa9459Szrj@end example 1175*a9fa9459Szrj@end table 1176*a9fa9459Szrj 1177*a9fa9459Szrj@node Callers 1178*a9fa9459Szrj@subsection Lines for a Function's Callers 1179*a9fa9459Szrj 1180*a9fa9459SzrjA function's entry has a line for each function it was called by. 1181*a9fa9459SzrjThese lines' fields correspond to the fields of the primary line, but 1182*a9fa9459Szrjtheir meanings are different because of the difference in context. 1183*a9fa9459Szrj 1184*a9fa9459SzrjFor reference, we repeat two lines from the entry for the function 1185*a9fa9459Szrj@code{report}, the primary line and one caller-line preceding it, together 1186*a9fa9459Szrjwith the heading line that shows the names of the fields: 1187*a9fa9459Szrj 1188*a9fa9459Szrj@smallexample 1189*a9fa9459Szrjindex % time self children called name 1190*a9fa9459Szrj@dots{} 1191*a9fa9459Szrj 0.00 0.05 1/1 main [2] 1192*a9fa9459Szrj[3] 100.0 0.00 0.05 1 report [3] 1193*a9fa9459Szrj@end smallexample 1194*a9fa9459Szrj 1195*a9fa9459SzrjHere are the meanings of the fields in the caller-line for @code{report} 1196*a9fa9459Szrjcalled from @code{main}: 1197*a9fa9459Szrj 1198*a9fa9459Szrj@table @code 1199*a9fa9459Szrj@item self 1200*a9fa9459SzrjAn estimate of the amount of time spent in @code{report} itself when it was 1201*a9fa9459Szrjcalled from @code{main}. 1202*a9fa9459Szrj 1203*a9fa9459Szrj@item children 1204*a9fa9459SzrjAn estimate of the amount of time spent in subroutines of @code{report} 1205*a9fa9459Szrjwhen @code{report} was called from @code{main}. 1206*a9fa9459Szrj 1207*a9fa9459SzrjThe sum of the @code{self} and @code{children} fields is an estimate 1208*a9fa9459Szrjof the amount of time spent within calls to @code{report} from @code{main}. 1209*a9fa9459Szrj 1210*a9fa9459Szrj@item called 1211*a9fa9459SzrjTwo numbers: the number of times @code{report} was called from @code{main}, 1212*a9fa9459Szrjfollowed by the total number of non-recursive calls to @code{report} from 1213*a9fa9459Szrjall its callers. 1214*a9fa9459Szrj 1215*a9fa9459Szrj@item name and index number 1216*a9fa9459SzrjThe name of the caller of @code{report} to which this line applies, 1217*a9fa9459Szrjfollowed by the caller's index number. 1218*a9fa9459Szrj 1219*a9fa9459SzrjNot all functions have entries in the call graph; some 1220*a9fa9459Szrjoptions to @code{gprof} request the omission of certain functions. 1221*a9fa9459SzrjWhen a caller has no entry of its own, it still has caller-lines 1222*a9fa9459Szrjin the entries of the functions it calls. 1223*a9fa9459Szrj 1224*a9fa9459SzrjIf the caller is part of a recursion cycle, the cycle number is 1225*a9fa9459Szrjprinted between the name and the index number. 1226*a9fa9459Szrj@end table 1227*a9fa9459Szrj 1228*a9fa9459SzrjIf the identity of the callers of a function cannot be determined, a 1229*a9fa9459Szrjdummy caller-line is printed which has @samp{<spontaneous>} as the 1230*a9fa9459Szrj``caller's name'' and all other fields blank. This can happen for 1231*a9fa9459Szrjsignal handlers. 1232*a9fa9459Szrj@c What if some calls have determinable callers' names but not all? 1233*a9fa9459Szrj@c FIXME - still relevant? 1234*a9fa9459Szrj 1235*a9fa9459Szrj@node Subroutines 1236*a9fa9459Szrj@subsection Lines for a Function's Subroutines 1237*a9fa9459Szrj 1238*a9fa9459SzrjA function's entry has a line for each of its subroutines---in other 1239*a9fa9459Szrjwords, a line for each other function that it called. These lines' 1240*a9fa9459Szrjfields correspond to the fields of the primary line, but their meanings 1241*a9fa9459Szrjare different because of the difference in context. 1242*a9fa9459Szrj 1243*a9fa9459SzrjFor reference, we repeat two lines from the entry for the function 1244*a9fa9459Szrj@code{main}, the primary line and a line for a subroutine, together 1245*a9fa9459Szrjwith the heading line that shows the names of the fields: 1246*a9fa9459Szrj 1247*a9fa9459Szrj@smallexample 1248*a9fa9459Szrjindex % time self children called name 1249*a9fa9459Szrj@dots{} 1250*a9fa9459Szrj[2] 100.0 0.00 0.05 1 main [2] 1251*a9fa9459Szrj 0.00 0.05 1/1 report [3] 1252*a9fa9459Szrj@end smallexample 1253*a9fa9459Szrj 1254*a9fa9459SzrjHere are the meanings of the fields in the subroutine-line for @code{main} 1255*a9fa9459Szrjcalling @code{report}: 1256*a9fa9459Szrj 1257*a9fa9459Szrj@table @code 1258*a9fa9459Szrj@item self 1259*a9fa9459SzrjAn estimate of the amount of time spent directly within @code{report} 1260*a9fa9459Szrjwhen @code{report} was called from @code{main}. 1261*a9fa9459Szrj 1262*a9fa9459Szrj@item children 1263*a9fa9459SzrjAn estimate of the amount of time spent in subroutines of @code{report} 1264*a9fa9459Szrjwhen @code{report} was called from @code{main}. 1265*a9fa9459Szrj 1266*a9fa9459SzrjThe sum of the @code{self} and @code{children} fields is an estimate 1267*a9fa9459Szrjof the total time spent in calls to @code{report} from @code{main}. 1268*a9fa9459Szrj 1269*a9fa9459Szrj@item called 1270*a9fa9459SzrjTwo numbers, the number of calls to @code{report} from @code{main} 1271*a9fa9459Szrjfollowed by the total number of non-recursive calls to @code{report}. 1272*a9fa9459SzrjThis ratio is used to determine how much of @code{report}'s @code{self} 1273*a9fa9459Szrjand @code{children} time gets credited to @code{main}. 1274*a9fa9459Szrj@xref{Assumptions, ,Estimating @code{children} Times}. 1275*a9fa9459Szrj 1276*a9fa9459Szrj@item name 1277*a9fa9459SzrjThe name of the subroutine of @code{main} to which this line applies, 1278*a9fa9459Szrjfollowed by the subroutine's index number. 1279*a9fa9459Szrj 1280*a9fa9459SzrjIf the caller is part of a recursion cycle, the cycle number is 1281*a9fa9459Szrjprinted between the name and the index number. 1282*a9fa9459Szrj@end table 1283*a9fa9459Szrj 1284*a9fa9459Szrj@node Cycles 1285*a9fa9459Szrj@subsection How Mutually Recursive Functions Are Described 1286*a9fa9459Szrj@cindex cycle 1287*a9fa9459Szrj@cindex recursion cycle 1288*a9fa9459Szrj 1289*a9fa9459SzrjThe graph may be complicated by the presence of @dfn{cycles of 1290*a9fa9459Szrjrecursion} in the call graph. A cycle exists if a function calls 1291*a9fa9459Szrjanother function that (directly or indirectly) calls (or appears to 1292*a9fa9459Szrjcall) the original function. For example: if @code{a} calls @code{b}, 1293*a9fa9459Szrjand @code{b} calls @code{a}, then @code{a} and @code{b} form a cycle. 1294*a9fa9459Szrj 1295*a9fa9459SzrjWhenever there are call paths both ways between a pair of functions, they 1296*a9fa9459Szrjbelong to the same cycle. If @code{a} and @code{b} call each other and 1297*a9fa9459Szrj@code{b} and @code{c} call each other, all three make one cycle. Note that 1298*a9fa9459Szrjeven if @code{b} only calls @code{a} if it was not called from @code{a}, 1299*a9fa9459Szrj@code{gprof} cannot determine this, so @code{a} and @code{b} are still 1300*a9fa9459Szrjconsidered a cycle. 1301*a9fa9459Szrj 1302*a9fa9459SzrjThe cycles are numbered with consecutive integers. When a function 1303*a9fa9459Szrjbelongs to a cycle, each time the function name appears in the call graph 1304*a9fa9459Szrjit is followed by @samp{<cycle @var{number}>}. 1305*a9fa9459Szrj 1306*a9fa9459SzrjThe reason cycles matter is that they make the time values in the call 1307*a9fa9459Szrjgraph paradoxical. The ``time spent in children'' of @code{a} should 1308*a9fa9459Szrjinclude the time spent in its subroutine @code{b} and in @code{b}'s 1309*a9fa9459Szrjsubroutines---but one of @code{b}'s subroutines is @code{a}! How much of 1310*a9fa9459Szrj@code{a}'s time should be included in the children of @code{a}, when 1311*a9fa9459Szrj@code{a} is indirectly recursive? 1312*a9fa9459Szrj 1313*a9fa9459SzrjThe way @code{gprof} resolves this paradox is by creating a single entry 1314*a9fa9459Szrjfor the cycle as a whole. The primary line of this entry describes the 1315*a9fa9459Szrjtotal time spent directly in the functions of the cycle. The 1316*a9fa9459Szrj``subroutines'' of the cycle are the individual functions of the cycle, and 1317*a9fa9459Szrjall other functions that were called directly by them. The ``callers'' of 1318*a9fa9459Szrjthe cycle are the functions, outside the cycle, that called functions in 1319*a9fa9459Szrjthe cycle. 1320*a9fa9459Szrj 1321*a9fa9459SzrjHere is an example portion of a call graph which shows a cycle containing 1322*a9fa9459Szrjfunctions @code{a} and @code{b}. The cycle was entered by a call to 1323*a9fa9459Szrj@code{a} from @code{main}; both @code{a} and @code{b} called @code{c}. 1324*a9fa9459Szrj 1325*a9fa9459Szrj@smallexample 1326*a9fa9459Szrjindex % time self children called name 1327*a9fa9459Szrj---------------------------------------- 1328*a9fa9459Szrj 1.77 0 1/1 main [2] 1329*a9fa9459Szrj[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3] 1330*a9fa9459Szrj 1.02 0 3 b <cycle 1> [4] 1331*a9fa9459Szrj 0.75 0 2 a <cycle 1> [5] 1332*a9fa9459Szrj---------------------------------------- 1333*a9fa9459Szrj 3 a <cycle 1> [5] 1334*a9fa9459Szrj[4] 52.85 1.02 0 0 b <cycle 1> [4] 1335*a9fa9459Szrj 2 a <cycle 1> [5] 1336*a9fa9459Szrj 0 0 3/6 c [6] 1337*a9fa9459Szrj---------------------------------------- 1338*a9fa9459Szrj 1.77 0 1/1 main [2] 1339*a9fa9459Szrj 2 b <cycle 1> [4] 1340*a9fa9459Szrj[5] 38.86 0.75 0 1 a <cycle 1> [5] 1341*a9fa9459Szrj 3 b <cycle 1> [4] 1342*a9fa9459Szrj 0 0 3/6 c [6] 1343*a9fa9459Szrj---------------------------------------- 1344*a9fa9459Szrj@end smallexample 1345*a9fa9459Szrj 1346*a9fa9459Szrj@noindent 1347*a9fa9459Szrj(The entire call graph for this program contains in addition an entry for 1348*a9fa9459Szrj@code{main}, which calls @code{a}, and an entry for @code{c}, with callers 1349*a9fa9459Szrj@code{a} and @code{b}.) 1350*a9fa9459Szrj 1351*a9fa9459Szrj@smallexample 1352*a9fa9459Szrjindex % time self children called name 1353*a9fa9459Szrj <spontaneous> 1354*a9fa9459Szrj[1] 100.00 0 1.93 0 start [1] 1355*a9fa9459Szrj 0.16 1.77 1/1 main [2] 1356*a9fa9459Szrj---------------------------------------- 1357*a9fa9459Szrj 0.16 1.77 1/1 start [1] 1358*a9fa9459Szrj[2] 100.00 0.16 1.77 1 main [2] 1359*a9fa9459Szrj 1.77 0 1/1 a <cycle 1> [5] 1360*a9fa9459Szrj---------------------------------------- 1361*a9fa9459Szrj 1.77 0 1/1 main [2] 1362*a9fa9459Szrj[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3] 1363*a9fa9459Szrj 1.02 0 3 b <cycle 1> [4] 1364*a9fa9459Szrj 0.75 0 2 a <cycle 1> [5] 1365*a9fa9459Szrj 0 0 6/6 c [6] 1366*a9fa9459Szrj---------------------------------------- 1367*a9fa9459Szrj 3 a <cycle 1> [5] 1368*a9fa9459Szrj[4] 52.85 1.02 0 0 b <cycle 1> [4] 1369*a9fa9459Szrj 2 a <cycle 1> [5] 1370*a9fa9459Szrj 0 0 3/6 c [6] 1371*a9fa9459Szrj---------------------------------------- 1372*a9fa9459Szrj 1.77 0 1/1 main [2] 1373*a9fa9459Szrj 2 b <cycle 1> [4] 1374*a9fa9459Szrj[5] 38.86 0.75 0 1 a <cycle 1> [5] 1375*a9fa9459Szrj 3 b <cycle 1> [4] 1376*a9fa9459Szrj 0 0 3/6 c [6] 1377*a9fa9459Szrj---------------------------------------- 1378*a9fa9459Szrj 0 0 3/6 b <cycle 1> [4] 1379*a9fa9459Szrj 0 0 3/6 a <cycle 1> [5] 1380*a9fa9459Szrj[6] 0.00 0 0 6 c [6] 1381*a9fa9459Szrj---------------------------------------- 1382*a9fa9459Szrj@end smallexample 1383*a9fa9459Szrj 1384*a9fa9459SzrjThe @code{self} field of the cycle's primary line is the total time 1385*a9fa9459Szrjspent in all the functions of the cycle. It equals the sum of the 1386*a9fa9459Szrj@code{self} fields for the individual functions in the cycle, found 1387*a9fa9459Szrjin the entry in the subroutine lines for these functions. 1388*a9fa9459Szrj 1389*a9fa9459SzrjThe @code{children} fields of the cycle's primary line and subroutine lines 1390*a9fa9459Szrjcount only subroutines outside the cycle. Even though @code{a} calls 1391*a9fa9459Szrj@code{b}, the time spent in those calls to @code{b} is not counted in 1392*a9fa9459Szrj@code{a}'s @code{children} time. Thus, we do not encounter the problem of 1393*a9fa9459Szrjwhat to do when the time in those calls to @code{b} includes indirect 1394*a9fa9459Szrjrecursive calls back to @code{a}. 1395*a9fa9459Szrj 1396*a9fa9459SzrjThe @code{children} field of a caller-line in the cycle's entry estimates 1397*a9fa9459Szrjthe amount of time spent @emph{in the whole cycle}, and its other 1398*a9fa9459Szrjsubroutines, on the times when that caller called a function in the cycle. 1399*a9fa9459Szrj 1400*a9fa9459SzrjThe @code{called} field in the primary line for the cycle has two numbers: 1401*a9fa9459Szrjfirst, the number of times functions in the cycle were called by functions 1402*a9fa9459Szrjoutside the cycle; second, the number of times they were called by 1403*a9fa9459Szrjfunctions in the cycle (including times when a function in the cycle calls 1404*a9fa9459Szrjitself). This is a generalization of the usual split into non-recursive and 1405*a9fa9459Szrjrecursive calls. 1406*a9fa9459Szrj 1407*a9fa9459SzrjThe @code{called} field of a subroutine-line for a cycle member in the 1408*a9fa9459Szrjcycle's entry says how many time that function was called from functions in 1409*a9fa9459Szrjthe cycle. The total of all these is the second number in the primary line's 1410*a9fa9459Szrj@code{called} field. 1411*a9fa9459Szrj 1412*a9fa9459SzrjIn the individual entry for a function in a cycle, the other functions in 1413*a9fa9459Szrjthe same cycle can appear as subroutines and as callers. These lines show 1414*a9fa9459Szrjhow many times each function in the cycle called or was called from each other 1415*a9fa9459Szrjfunction in the cycle. The @code{self} and @code{children} fields in these 1416*a9fa9459Szrjlines are blank because of the difficulty of defining meanings for them 1417*a9fa9459Szrjwhen recursion is going on. 1418*a9fa9459Szrj 1419*a9fa9459Szrj@node Line-by-line 1420*a9fa9459Szrj@section Line-by-line Profiling 1421*a9fa9459Szrj 1422*a9fa9459Szrj@code{gprof}'s @samp{-l} option causes the program to perform 1423*a9fa9459Szrj@dfn{line-by-line} profiling. In this mode, histogram 1424*a9fa9459Szrjsamples are assigned not to functions, but to individual 1425*a9fa9459Szrjlines of source code. This only works with programs compiled with 1426*a9fa9459Szrjolder versions of the @code{gcc} compiler. Newer versions of @code{gcc} 1427*a9fa9459Szrjuse a different program - @code{gcov} - to display line-by-line 1428*a9fa9459Szrjprofiling information. 1429*a9fa9459Szrj 1430*a9fa9459SzrjWith the older versions of @code{gcc} the program usually has to be 1431*a9fa9459Szrjcompiled with a @samp{-g} option, in addition to @samp{-pg}, in order 1432*a9fa9459Szrjto generate debugging symbols for tracking source code lines. 1433*a9fa9459SzrjNote, in much older versions of @code{gcc} the program had to be 1434*a9fa9459Szrjcompiled with the @samp{-a} command line option as well. 1435*a9fa9459Szrj 1436*a9fa9459SzrjThe flat profile is the most useful output table 1437*a9fa9459Szrjin line-by-line mode. 1438*a9fa9459SzrjThe call graph isn't as useful as normal, since 1439*a9fa9459Szrjthe current version of @code{gprof} does not propagate 1440*a9fa9459Szrjcall graph arcs from source code lines to the enclosing function. 1441*a9fa9459SzrjThe call graph does, however, show each line of code 1442*a9fa9459Szrjthat called each function, along with a count. 1443*a9fa9459Szrj 1444*a9fa9459SzrjHere is a section of @code{gprof}'s output, without line-by-line profiling. 1445*a9fa9459SzrjNote that @code{ct_init} accounted for four histogram hits, and 1446*a9fa9459Szrj13327 calls to @code{init_block}. 1447*a9fa9459Szrj 1448*a9fa9459Szrj@smallexample 1449*a9fa9459SzrjFlat profile: 1450*a9fa9459Szrj 1451*a9fa9459SzrjEach sample counts as 0.01 seconds. 1452*a9fa9459Szrj % cumulative self self total 1453*a9fa9459Szrj time seconds seconds calls us/call us/call name 1454*a9fa9459Szrj 30.77 0.13 0.04 6335 6.31 6.31 ct_init 1455*a9fa9459Szrj 1456*a9fa9459Szrj 1457*a9fa9459Szrj Call graph (explanation follows) 1458*a9fa9459Szrj 1459*a9fa9459Szrj 1460*a9fa9459Szrjgranularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds 1461*a9fa9459Szrj 1462*a9fa9459Szrjindex % time self children called name 1463*a9fa9459Szrj 1464*a9fa9459Szrj 0.00 0.00 1/13496 name_too_long 1465*a9fa9459Szrj 0.00 0.00 40/13496 deflate 1466*a9fa9459Szrj 0.00 0.00 128/13496 deflate_fast 1467*a9fa9459Szrj 0.00 0.00 13327/13496 ct_init 1468*a9fa9459Szrj[7] 0.0 0.00 0.00 13496 init_block 1469*a9fa9459Szrj 1470*a9fa9459Szrj@end smallexample 1471*a9fa9459Szrj 1472*a9fa9459SzrjNow let's look at some of @code{gprof}'s output from the same program run, 1473*a9fa9459Szrjthis time with line-by-line profiling enabled. Note that @code{ct_init}'s 1474*a9fa9459Szrjfour histogram hits are broken down into four lines of source code---one hit 1475*a9fa9459Szrjoccurred on each of lines 349, 351, 382 and 385. In the call graph, 1476*a9fa9459Szrjnote how 1477*a9fa9459Szrj@code{ct_init}'s 13327 calls to @code{init_block} are broken down 1478*a9fa9459Szrjinto one call from line 396, 3071 calls from line 384, 3730 calls 1479*a9fa9459Szrjfrom line 385, and 6525 calls from 387. 1480*a9fa9459Szrj 1481*a9fa9459Szrj@smallexample 1482*a9fa9459SzrjFlat profile: 1483*a9fa9459Szrj 1484*a9fa9459SzrjEach sample counts as 0.01 seconds. 1485*a9fa9459Szrj % cumulative self 1486*a9fa9459Szrj time seconds seconds calls name 1487*a9fa9459Szrj 7.69 0.10 0.01 ct_init (trees.c:349) 1488*a9fa9459Szrj 7.69 0.11 0.01 ct_init (trees.c:351) 1489*a9fa9459Szrj 7.69 0.12 0.01 ct_init (trees.c:382) 1490*a9fa9459Szrj 7.69 0.13 0.01 ct_init (trees.c:385) 1491*a9fa9459Szrj 1492*a9fa9459Szrj 1493*a9fa9459Szrj Call graph (explanation follows) 1494*a9fa9459Szrj 1495*a9fa9459Szrj 1496*a9fa9459Szrjgranularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds 1497*a9fa9459Szrj 1498*a9fa9459Szrj % time self children called name 1499*a9fa9459Szrj 1500*a9fa9459Szrj 0.00 0.00 1/13496 name_too_long (gzip.c:1440) 1501*a9fa9459Szrj 0.00 0.00 1/13496 deflate (deflate.c:763) 1502*a9fa9459Szrj 0.00 0.00 1/13496 ct_init (trees.c:396) 1503*a9fa9459Szrj 0.00 0.00 2/13496 deflate (deflate.c:727) 1504*a9fa9459Szrj 0.00 0.00 4/13496 deflate (deflate.c:686) 1505*a9fa9459Szrj 0.00 0.00 5/13496 deflate (deflate.c:675) 1506*a9fa9459Szrj 0.00 0.00 12/13496 deflate (deflate.c:679) 1507*a9fa9459Szrj 0.00 0.00 16/13496 deflate (deflate.c:730) 1508*a9fa9459Szrj 0.00 0.00 128/13496 deflate_fast (deflate.c:654) 1509*a9fa9459Szrj 0.00 0.00 3071/13496 ct_init (trees.c:384) 1510*a9fa9459Szrj 0.00 0.00 3730/13496 ct_init (trees.c:385) 1511*a9fa9459Szrj 0.00 0.00 6525/13496 ct_init (trees.c:387) 1512*a9fa9459Szrj[6] 0.0 0.00 0.00 13496 init_block (trees.c:408) 1513*a9fa9459Szrj 1514*a9fa9459Szrj@end smallexample 1515*a9fa9459Szrj 1516*a9fa9459Szrj 1517*a9fa9459Szrj@node Annotated Source 1518*a9fa9459Szrj@section The Annotated Source Listing 1519*a9fa9459Szrj 1520*a9fa9459Szrj@code{gprof}'s @samp{-A} option triggers an annotated source listing, 1521*a9fa9459Szrjwhich lists the program's source code, each function labeled with the 1522*a9fa9459Szrjnumber of times it was called. You may also need to specify the 1523*a9fa9459Szrj@samp{-I} option, if @code{gprof} can't find the source code files. 1524*a9fa9459Szrj 1525*a9fa9459SzrjWith older versions of @code{gcc} compiling with @samp{gcc @dots{} -g 1526*a9fa9459Szrj-pg -a} augments your program with basic-block counting code, in 1527*a9fa9459Szrjaddition to function counting code. This enables @code{gprof} to 1528*a9fa9459Szrjdetermine how many times each line of code was executed. With newer 1529*a9fa9459Szrjversions of @code{gcc} support for displaying basic-block counts is 1530*a9fa9459Szrjprovided by the @code{gcov} program. 1531*a9fa9459Szrj 1532*a9fa9459SzrjFor example, consider the following function, taken from gzip, 1533*a9fa9459Szrjwith line numbers added: 1534*a9fa9459Szrj 1535*a9fa9459Szrj@smallexample 1536*a9fa9459Szrj 1 ulg updcrc(s, n) 1537*a9fa9459Szrj 2 uch *s; 1538*a9fa9459Szrj 3 unsigned n; 1539*a9fa9459Szrj 4 @{ 1540*a9fa9459Szrj 5 register ulg c; 1541*a9fa9459Szrj 6 1542*a9fa9459Szrj 7 static ulg crc = (ulg)0xffffffffL; 1543*a9fa9459Szrj 8 1544*a9fa9459Szrj 9 if (s == NULL) @{ 1545*a9fa9459Szrj10 c = 0xffffffffL; 1546*a9fa9459Szrj11 @} else @{ 1547*a9fa9459Szrj12 c = crc; 1548*a9fa9459Szrj13 if (n) do @{ 1549*a9fa9459Szrj14 c = crc_32_tab[...]; 1550*a9fa9459Szrj15 @} while (--n); 1551*a9fa9459Szrj16 @} 1552*a9fa9459Szrj17 crc = c; 1553*a9fa9459Szrj18 return c ^ 0xffffffffL; 1554*a9fa9459Szrj19 @} 1555*a9fa9459Szrj 1556*a9fa9459Szrj@end smallexample 1557*a9fa9459Szrj 1558*a9fa9459Szrj@code{updcrc} has at least five basic-blocks. 1559*a9fa9459SzrjOne is the function itself. The 1560*a9fa9459Szrj@code{if} statement on line 9 generates two more basic-blocks, one 1561*a9fa9459Szrjfor each branch of the @code{if}. A fourth basic-block results from 1562*a9fa9459Szrjthe @code{if} on line 13, and the contents of the @code{do} loop form 1563*a9fa9459Szrjthe fifth basic-block. The compiler may also generate additional 1564*a9fa9459Szrjbasic-blocks to handle various special cases. 1565*a9fa9459Szrj 1566*a9fa9459SzrjA program augmented for basic-block counting can be analyzed with 1567*a9fa9459Szrj@samp{gprof -l -A}. 1568*a9fa9459SzrjThe @samp{-x} option is also helpful, 1569*a9fa9459Szrjto ensure that each line of code is labeled at least once. 1570*a9fa9459SzrjHere is @code{updcrc}'s 1571*a9fa9459Szrjannotated source listing for a sample @code{gzip} run: 1572*a9fa9459Szrj 1573*a9fa9459Szrj@smallexample 1574*a9fa9459Szrj ulg updcrc(s, n) 1575*a9fa9459Szrj uch *s; 1576*a9fa9459Szrj unsigned n; 1577*a9fa9459Szrj 2 ->@{ 1578*a9fa9459Szrj register ulg c; 1579*a9fa9459Szrj 1580*a9fa9459Szrj static ulg crc = (ulg)0xffffffffL; 1581*a9fa9459Szrj 1582*a9fa9459Szrj 2 -> if (s == NULL) @{ 1583*a9fa9459Szrj 1 -> c = 0xffffffffL; 1584*a9fa9459Szrj 1 -> @} else @{ 1585*a9fa9459Szrj 1 -> c = crc; 1586*a9fa9459Szrj 1 -> if (n) do @{ 1587*a9fa9459Szrj 26312 -> c = crc_32_tab[...]; 1588*a9fa9459Szrj26312,1,26311 -> @} while (--n); 1589*a9fa9459Szrj @} 1590*a9fa9459Szrj 2 -> crc = c; 1591*a9fa9459Szrj 2 -> return c ^ 0xffffffffL; 1592*a9fa9459Szrj 2 ->@} 1593*a9fa9459Szrj@end smallexample 1594*a9fa9459Szrj 1595*a9fa9459SzrjIn this example, the function was called twice, passing once through 1596*a9fa9459Szrjeach branch of the @code{if} statement. The body of the @code{do} 1597*a9fa9459Szrjloop was executed a total of 26312 times. Note how the @code{while} 1598*a9fa9459Szrjstatement is annotated. It began execution 26312 times, once for 1599*a9fa9459Szrjeach iteration through the loop. One of those times (the last time) 1600*a9fa9459Szrjit exited, while it branched back to the beginning of the loop 26311 times. 1601*a9fa9459Szrj 1602*a9fa9459Szrj@node Inaccuracy 1603*a9fa9459Szrj@chapter Inaccuracy of @code{gprof} Output 1604*a9fa9459Szrj 1605*a9fa9459Szrj@menu 1606*a9fa9459Szrj* Sampling Error:: Statistical margins of error 1607*a9fa9459Szrj* Assumptions:: Estimating children times 1608*a9fa9459Szrj@end menu 1609*a9fa9459Szrj 1610*a9fa9459Szrj@node Sampling Error 1611*a9fa9459Szrj@section Statistical Sampling Error 1612*a9fa9459Szrj 1613*a9fa9459SzrjThe run-time figures that @code{gprof} gives you are based on a sampling 1614*a9fa9459Szrjprocess, so they are subject to statistical inaccuracy. If a function runs 1615*a9fa9459Szrjonly a small amount of time, so that on the average the sampling process 1616*a9fa9459Szrjought to catch that function in the act only once, there is a pretty good 1617*a9fa9459Szrjchance it will actually find that function zero times, or twice. 1618*a9fa9459Szrj 1619*a9fa9459SzrjBy contrast, the number-of-calls and basic-block figures are derived 1620*a9fa9459Szrjby counting, not sampling. They are completely accurate and will not 1621*a9fa9459Szrjvary from run to run if your program is deterministic and single 1622*a9fa9459Szrjthreaded. In multi-threaded applications, or single threaded 1623*a9fa9459Szrjapplications that link with multi-threaded libraries, the counts are 1624*a9fa9459Szrjonly deterministic if the counting function is thread-safe. (Note: 1625*a9fa9459Szrjbeware that the mcount counting function in glibc is @emph{not} 1626*a9fa9459Szrjthread-safe). @xref{Implementation, ,Implementation of Profiling}. 1627*a9fa9459Szrj 1628*a9fa9459SzrjThe @dfn{sampling period} that is printed at the beginning of the flat 1629*a9fa9459Szrjprofile says how often samples are taken. The rule of thumb is that a 1630*a9fa9459Szrjrun-time figure is accurate if it is considerably bigger than the sampling 1631*a9fa9459Szrjperiod. 1632*a9fa9459Szrj 1633*a9fa9459SzrjThe actual amount of error can be predicted. 1634*a9fa9459SzrjFor @var{n} samples, the @emph{expected} error 1635*a9fa9459Szrjis the square-root of @var{n}. For example, 1636*a9fa9459Szrjif the sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second, 1637*a9fa9459Szrj@var{n} is 100 samples (1 second/0.01 seconds), sqrt(@var{n}) is 10 samples, so 1638*a9fa9459Szrjthe expected error in @code{foo}'s run-time is 0.1 seconds (10*0.01 seconds), 1639*a9fa9459Szrjor ten percent of the observed value. 1640*a9fa9459SzrjAgain, if the sampling period is 0.01 seconds and @code{bar}'s run-time is 1641*a9fa9459Szrj100 seconds, @var{n} is 10000 samples, sqrt(@var{n}) is 100 samples, so 1642*a9fa9459Szrjthe expected error in @code{bar}'s run-time is 1 second, 1643*a9fa9459Szrjor one percent of the observed value. 1644*a9fa9459SzrjIt is likely to 1645*a9fa9459Szrjvary this much @emph{on the average} from one profiling run to the next. 1646*a9fa9459Szrj(@emph{Sometimes} it will vary more.) 1647*a9fa9459Szrj 1648*a9fa9459SzrjThis does not mean that a small run-time figure is devoid of information. 1649*a9fa9459SzrjIf the program's @emph{total} run-time is large, a small run-time for one 1650*a9fa9459Szrjfunction does tell you that that function used an insignificant fraction of 1651*a9fa9459Szrjthe whole program's time. Usually this means it is not worth optimizing. 1652*a9fa9459Szrj 1653*a9fa9459SzrjOne way to get more accuracy is to give your program more (but similar) 1654*a9fa9459Szrjinput data so it will take longer. Another way is to combine the data from 1655*a9fa9459Szrjseveral runs, using the @samp{-s} option of @code{gprof}. Here is how: 1656*a9fa9459Szrj 1657*a9fa9459Szrj@enumerate 1658*a9fa9459Szrj@item 1659*a9fa9459SzrjRun your program once. 1660*a9fa9459Szrj 1661*a9fa9459Szrj@item 1662*a9fa9459SzrjIssue the command @samp{mv gmon.out gmon.sum}. 1663*a9fa9459Szrj 1664*a9fa9459Szrj@item 1665*a9fa9459SzrjRun your program again, the same as before. 1666*a9fa9459Szrj 1667*a9fa9459Szrj@item 1668*a9fa9459SzrjMerge the new data in @file{gmon.out} into @file{gmon.sum} with this command: 1669*a9fa9459Szrj 1670*a9fa9459Szrj@example 1671*a9fa9459Szrjgprof -s @var{executable-file} gmon.out gmon.sum 1672*a9fa9459Szrj@end example 1673*a9fa9459Szrj 1674*a9fa9459Szrj@item 1675*a9fa9459SzrjRepeat the last two steps as often as you wish. 1676*a9fa9459Szrj 1677*a9fa9459Szrj@item 1678*a9fa9459SzrjAnalyze the cumulative data using this command: 1679*a9fa9459Szrj 1680*a9fa9459Szrj@example 1681*a9fa9459Szrjgprof @var{executable-file} gmon.sum > @var{output-file} 1682*a9fa9459Szrj@end example 1683*a9fa9459Szrj@end enumerate 1684*a9fa9459Szrj 1685*a9fa9459Szrj@node Assumptions 1686*a9fa9459Szrj@section Estimating @code{children} Times 1687*a9fa9459Szrj 1688*a9fa9459SzrjSome of the figures in the call graph are estimates---for example, the 1689*a9fa9459Szrj@code{children} time values and all the time figures in caller and 1690*a9fa9459Szrjsubroutine lines. 1691*a9fa9459Szrj 1692*a9fa9459SzrjThere is no direct information about these measurements in the profile 1693*a9fa9459Szrjdata itself. Instead, @code{gprof} estimates them by making an assumption 1694*a9fa9459Szrjabout your program that might or might not be true. 1695*a9fa9459Szrj 1696*a9fa9459SzrjThe assumption made is that the average time spent in each call to any 1697*a9fa9459Szrjfunction @code{foo} is not correlated with who called @code{foo}. If 1698*a9fa9459Szrj@code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} came 1699*a9fa9459Szrjfrom @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s 1700*a9fa9459Szrj@code{children} time, by assumption. 1701*a9fa9459Szrj 1702*a9fa9459SzrjThis assumption is usually true enough, but for some programs it is far 1703*a9fa9459Szrjfrom true. Suppose that @code{foo} returns very quickly when its argument 1704*a9fa9459Szrjis zero; suppose that @code{a} always passes zero as an argument, while 1705*a9fa9459Szrjother callers of @code{foo} pass other arguments. In this program, all the 1706*a9fa9459Szrjtime spent in @code{foo} is in the calls from callers other than @code{a}. 1707*a9fa9459SzrjBut @code{gprof} has no way of knowing this; it will blindly and 1708*a9fa9459Szrjincorrectly charge 2 seconds of time in @code{foo} to the children of 1709*a9fa9459Szrj@code{a}. 1710*a9fa9459Szrj 1711*a9fa9459Szrj@c FIXME - has this been fixed? 1712*a9fa9459SzrjWe hope some day to put more complete data into @file{gmon.out}, so that 1713*a9fa9459Szrjthis assumption is no longer needed, if we can figure out how. For the 1714*a9fa9459Szrjnovice, the estimated figures are usually more useful than misleading. 1715*a9fa9459Szrj 1716*a9fa9459Szrj@node How do I? 1717*a9fa9459Szrj@chapter Answers to Common Questions 1718*a9fa9459Szrj 1719*a9fa9459Szrj@table @asis 1720*a9fa9459Szrj@item How can I get more exact information about hot spots in my program? 1721*a9fa9459Szrj 1722*a9fa9459SzrjLooking at the per-line call counts only tells part of the story. 1723*a9fa9459SzrjBecause @code{gprof} can only report call times and counts by function, 1724*a9fa9459Szrjthe best way to get finer-grained information on where the program 1725*a9fa9459Szrjis spending its time is to re-factor large functions into sequences 1726*a9fa9459Szrjof calls to smaller ones. Beware however that this can introduce 1727*a9fa9459Szrjartificial hot spots since compiling with @samp{-pg} adds a significant 1728*a9fa9459Szrjoverhead to function calls. An alternative solution is to use a 1729*a9fa9459Szrjnon-intrusive profiler, e.g.@: oprofile. 1730*a9fa9459Szrj 1731*a9fa9459Szrj@item How do I find which lines in my program were executed the most times? 1732*a9fa9459Szrj 1733*a9fa9459SzrjUse the @code{gcov} program. 1734*a9fa9459Szrj 1735*a9fa9459Szrj@item How do I find which lines in my program called a particular function? 1736*a9fa9459Szrj 1737*a9fa9459SzrjUse @samp{gprof -l} and lookup the function in the call graph. 1738*a9fa9459SzrjThe callers will be broken down by function and line number. 1739*a9fa9459Szrj 1740*a9fa9459Szrj@item How do I analyze a program that runs for less than a second? 1741*a9fa9459Szrj 1742*a9fa9459SzrjTry using a shell script like this one: 1743*a9fa9459Szrj 1744*a9fa9459Szrj@example 1745*a9fa9459Szrjfor i in `seq 1 100`; do 1746*a9fa9459Szrj fastprog 1747*a9fa9459Szrj mv gmon.out gmon.out.$i 1748*a9fa9459Szrjdone 1749*a9fa9459Szrj 1750*a9fa9459Szrjgprof -s fastprog gmon.out.* 1751*a9fa9459Szrj 1752*a9fa9459Szrjgprof fastprog gmon.sum 1753*a9fa9459Szrj@end example 1754*a9fa9459Szrj 1755*a9fa9459SzrjIf your program is completely deterministic, all the call counts 1756*a9fa9459Szrjwill be simple multiples of 100 (i.e., a function called once in 1757*a9fa9459Szrjeach run will appear with a call count of 100). 1758*a9fa9459Szrj 1759*a9fa9459Szrj@end table 1760*a9fa9459Szrj 1761*a9fa9459Szrj@node Incompatibilities 1762*a9fa9459Szrj@chapter Incompatibilities with Unix @code{gprof} 1763*a9fa9459Szrj 1764*a9fa9459Szrj@sc{gnu} @code{gprof} and Berkeley Unix @code{gprof} use the same data 1765*a9fa9459Szrjfile @file{gmon.out}, and provide essentially the same information. But 1766*a9fa9459Szrjthere are a few differences. 1767*a9fa9459Szrj 1768*a9fa9459Szrj@itemize @bullet 1769*a9fa9459Szrj@item 1770*a9fa9459Szrj@sc{gnu} @code{gprof} uses a new, generalized file format with support 1771*a9fa9459Szrjfor basic-block execution counts and non-realtime histograms. A magic 1772*a9fa9459Szrjcookie and version number allows @code{gprof} to easily identify 1773*a9fa9459Szrjnew style files. Old BSD-style files can still be read. 1774*a9fa9459Szrj@xref{File Format, ,Profiling Data File Format}. 1775*a9fa9459Szrj 1776*a9fa9459Szrj@item 1777*a9fa9459SzrjFor a recursive function, Unix @code{gprof} lists the function as a 1778*a9fa9459Szrjparent and as a child, with a @code{calls} field that lists the number 1779*a9fa9459Szrjof recursive calls. @sc{gnu} @code{gprof} omits these lines and puts 1780*a9fa9459Szrjthe number of recursive calls in the primary line. 1781*a9fa9459Szrj 1782*a9fa9459Szrj@item 1783*a9fa9459SzrjWhen a function is suppressed from the call graph with @samp{-e}, @sc{gnu} 1784*a9fa9459Szrj@code{gprof} still lists it as a subroutine of functions that call it. 1785*a9fa9459Szrj 1786*a9fa9459Szrj@item 1787*a9fa9459Szrj@sc{gnu} @code{gprof} accepts the @samp{-k} with its argument 1788*a9fa9459Szrjin the form @samp{from/to}, instead of @samp{from to}. 1789*a9fa9459Szrj 1790*a9fa9459Szrj@item 1791*a9fa9459SzrjIn the annotated source listing, 1792*a9fa9459Szrjif there are multiple basic blocks on the same line, 1793*a9fa9459Szrj@sc{gnu} @code{gprof} prints all of their counts, separated by commas. 1794*a9fa9459Szrj 1795*a9fa9459Szrj@ignore - it does this now 1796*a9fa9459Szrj@item 1797*a9fa9459SzrjThe function names printed in @sc{gnu} @code{gprof} output do not include 1798*a9fa9459Szrjthe leading underscores that are added internally to the front of all 1799*a9fa9459SzrjC identifiers on many operating systems. 1800*a9fa9459Szrj@end ignore 1801*a9fa9459Szrj 1802*a9fa9459Szrj@item 1803*a9fa9459SzrjThe blurbs, field widths, and output formats are different. @sc{gnu} 1804*a9fa9459Szrj@code{gprof} prints blurbs after the tables, so that you can see the 1805*a9fa9459Szrjtables without skipping the blurbs. 1806*a9fa9459Szrj@end itemize 1807*a9fa9459Szrj 1808*a9fa9459Szrj@node Details 1809*a9fa9459Szrj@chapter Details of Profiling 1810*a9fa9459Szrj 1811*a9fa9459Szrj@menu 1812*a9fa9459Szrj* Implementation:: How a program collects profiling information 1813*a9fa9459Szrj* File Format:: Format of @samp{gmon.out} files 1814*a9fa9459Szrj* Internals:: @code{gprof}'s internal operation 1815*a9fa9459Szrj* Debugging:: Using @code{gprof}'s @samp{-d} option 1816*a9fa9459Szrj@end menu 1817*a9fa9459Szrj 1818*a9fa9459Szrj@node Implementation 1819*a9fa9459Szrj@section Implementation of Profiling 1820*a9fa9459Szrj 1821*a9fa9459SzrjProfiling works by changing how every function in your program is compiled 1822*a9fa9459Szrjso that when it is called, it will stash away some information about where 1823*a9fa9459Szrjit was called from. From this, the profiler can figure out what function 1824*a9fa9459Szrjcalled it, and can count how many times it was called. This change is made 1825*a9fa9459Szrjby the compiler when your program is compiled with the @samp{-pg} option, 1826*a9fa9459Szrjwhich causes every function to call @code{mcount} 1827*a9fa9459Szrj(or @code{_mcount}, or @code{__mcount}, depending on the OS and compiler) 1828*a9fa9459Szrjas one of its first operations. 1829*a9fa9459Szrj 1830*a9fa9459SzrjThe @code{mcount} routine, included in the profiling library, 1831*a9fa9459Szrjis responsible for recording in an in-memory call graph table 1832*a9fa9459Szrjboth its parent routine (the child) and its parent's parent. This is 1833*a9fa9459Szrjtypically done by examining the stack frame to find both 1834*a9fa9459Szrjthe address of the child, and the return address in the original parent. 1835*a9fa9459SzrjSince this is a very machine-dependent operation, @code{mcount} 1836*a9fa9459Szrjitself is typically a short assembly-language stub routine 1837*a9fa9459Szrjthat extracts the required 1838*a9fa9459Szrjinformation, and then calls @code{__mcount_internal} 1839*a9fa9459Szrj(a normal C function) with two arguments---@code{frompc} and @code{selfpc}. 1840*a9fa9459Szrj@code{__mcount_internal} is responsible for maintaining 1841*a9fa9459Szrjthe in-memory call graph, which records @code{frompc}, @code{selfpc}, 1842*a9fa9459Szrjand the number of times each of these call arcs was traversed. 1843*a9fa9459Szrj 1844*a9fa9459SzrjGCC Version 2 provides a magical function (@code{__builtin_return_address}), 1845*a9fa9459Szrjwhich allows a generic @code{mcount} function to extract the 1846*a9fa9459Szrjrequired information from the stack frame. However, on some 1847*a9fa9459Szrjarchitectures, most notably the SPARC, using this builtin can be 1848*a9fa9459Szrjvery computationally expensive, and an assembly language version 1849*a9fa9459Szrjof @code{mcount} is used for performance reasons. 1850*a9fa9459Szrj 1851*a9fa9459SzrjNumber-of-calls information for library routines is collected by using a 1852*a9fa9459Szrjspecial version of the C library. The programs in it are the same as in 1853*a9fa9459Szrjthe usual C library, but they were compiled with @samp{-pg}. If you 1854*a9fa9459Szrjlink your program with @samp{gcc @dots{} -pg}, it automatically uses the 1855*a9fa9459Szrjprofiling version of the library. 1856*a9fa9459Szrj 1857*a9fa9459SzrjProfiling also involves watching your program as it runs, and keeping a 1858*a9fa9459Szrjhistogram of where the program counter happens to be every now and then. 1859*a9fa9459SzrjTypically the program counter is looked at around 100 times per second of 1860*a9fa9459Szrjrun time, but the exact frequency may vary from system to system. 1861*a9fa9459Szrj 1862*a9fa9459SzrjThis is done is one of two ways. Most UNIX-like operating systems 1863*a9fa9459Szrjprovide a @code{profil()} system call, which registers a memory 1864*a9fa9459Szrjarray with the kernel, along with a scale 1865*a9fa9459Szrjfactor that determines how the program's address space maps 1866*a9fa9459Szrjinto the array. 1867*a9fa9459SzrjTypical scaling values cause every 2 to 8 bytes of address space 1868*a9fa9459Szrjto map into a single array slot. 1869*a9fa9459SzrjOn every tick of the system clock 1870*a9fa9459Szrj(assuming the profiled program is running), the value of the 1871*a9fa9459Szrjprogram counter is examined and the corresponding slot in 1872*a9fa9459Szrjthe memory array is incremented. Since this is done in the kernel, 1873*a9fa9459Szrjwhich had to interrupt the process anyway to handle the clock 1874*a9fa9459Szrjinterrupt, very little additional system overhead is required. 1875*a9fa9459Szrj 1876*a9fa9459SzrjHowever, some operating systems, most notably Linux 2.0 (and earlier), 1877*a9fa9459Szrjdo not provide a @code{profil()} system call. On such a system, 1878*a9fa9459Szrjarrangements are made for the kernel to periodically deliver 1879*a9fa9459Szrja signal to the process (typically via @code{setitimer()}), 1880*a9fa9459Szrjwhich then performs the same operation of examining the 1881*a9fa9459Szrjprogram counter and incrementing a slot in the memory array. 1882*a9fa9459SzrjSince this method requires a signal to be delivered to 1883*a9fa9459Szrjuser space every time a sample is taken, it uses considerably 1884*a9fa9459Szrjmore overhead than kernel-based profiling. Also, due to the 1885*a9fa9459Szrjadded delay required to deliver the signal, this method is 1886*a9fa9459Szrjless accurate as well. 1887*a9fa9459Szrj 1888*a9fa9459SzrjA special startup routine allocates memory for the histogram and 1889*a9fa9459Szrjeither calls @code{profil()} or sets up 1890*a9fa9459Szrja clock signal handler. 1891*a9fa9459SzrjThis routine (@code{monstartup}) can be invoked in several ways. 1892*a9fa9459SzrjOn Linux systems, a special profiling startup file @code{gcrt0.o}, 1893*a9fa9459Szrjwhich invokes @code{monstartup} before @code{main}, 1894*a9fa9459Szrjis used instead of the default @code{crt0.o}. 1895*a9fa9459SzrjUse of this special startup file is one of the effects 1896*a9fa9459Szrjof using @samp{gcc @dots{} -pg} to link. 1897*a9fa9459SzrjOn SPARC systems, no special startup files are used. 1898*a9fa9459SzrjRather, the @code{mcount} routine, when it is invoked for 1899*a9fa9459Szrjthe first time (typically when @code{main} is called), 1900*a9fa9459Szrjcalls @code{monstartup}. 1901*a9fa9459Szrj 1902*a9fa9459SzrjIf the compiler's @samp{-a} option was used, basic-block counting 1903*a9fa9459Szrjis also enabled. Each object file is then compiled with a static array 1904*a9fa9459Szrjof counts, initially zero. 1905*a9fa9459SzrjIn the executable code, every time a new basic-block begins 1906*a9fa9459Szrj(i.e., when an @code{if} statement appears), an extra instruction 1907*a9fa9459Szrjis inserted to increment the corresponding count in the array. 1908*a9fa9459SzrjAt compile time, a paired array was constructed that recorded 1909*a9fa9459Szrjthe starting address of each basic-block. Taken together, 1910*a9fa9459Szrjthe two arrays record the starting address of every basic-block, 1911*a9fa9459Szrjalong with the number of times it was executed. 1912*a9fa9459Szrj 1913*a9fa9459SzrjThe profiling library also includes a function (@code{mcleanup}) which is 1914*a9fa9459Szrjtypically registered using @code{atexit()} to be called as the 1915*a9fa9459Szrjprogram exits, and is responsible for writing the file @file{gmon.out}. 1916*a9fa9459SzrjProfiling is turned off, various headers are output, and the histogram 1917*a9fa9459Szrjis written, followed by the call-graph arcs and the basic-block counts. 1918*a9fa9459Szrj 1919*a9fa9459SzrjThe output from @code{gprof} gives no indication of parts of your program that 1920*a9fa9459Szrjare limited by I/O or swapping bandwidth. This is because samples of the 1921*a9fa9459Szrjprogram counter are taken at fixed intervals of the program's run time. 1922*a9fa9459SzrjTherefore, the 1923*a9fa9459Szrjtime measurements in @code{gprof} output say nothing about time that your 1924*a9fa9459Szrjprogram was not running. For example, a part of the program that creates 1925*a9fa9459Szrjso much data that it cannot all fit in physical memory at once may run very 1926*a9fa9459Szrjslowly due to thrashing, but @code{gprof} will say it uses little time. On 1927*a9fa9459Szrjthe other hand, sampling by run time has the advantage that the amount of 1928*a9fa9459Szrjload due to other users won't directly affect the output you get. 1929*a9fa9459Szrj 1930*a9fa9459Szrj@node File Format 1931*a9fa9459Szrj@section Profiling Data File Format 1932*a9fa9459Szrj 1933*a9fa9459SzrjThe old BSD-derived file format used for profile data does not contain a 1934*a9fa9459Szrjmagic cookie that allows to check whether a data file really is a 1935*a9fa9459Szrj@code{gprof} file. Furthermore, it does not provide a version number, thus 1936*a9fa9459Szrjrendering changes to the file format almost impossible. @sc{gnu} @code{gprof} 1937*a9fa9459Szrjuses a new file format that provides these features. For backward 1938*a9fa9459Szrjcompatibility, @sc{gnu} @code{gprof} continues to support the old BSD-derived 1939*a9fa9459Szrjformat, but not all features are supported with it. For example, 1940*a9fa9459Szrjbasic-block execution counts cannot be accommodated by the old file 1941*a9fa9459Szrjformat. 1942*a9fa9459Szrj 1943*a9fa9459SzrjThe new file format is defined in header file @file{gmon_out.h}. It 1944*a9fa9459Szrjconsists of a header containing the magic cookie and a version number, 1945*a9fa9459Szrjas well as some spare bytes available for future extensions. All data 1946*a9fa9459Szrjin a profile data file is in the native format of the target for which 1947*a9fa9459Szrjthe profile was collected. @sc{gnu} @code{gprof} adapts automatically 1948*a9fa9459Szrjto the byte-order in use. 1949*a9fa9459Szrj 1950*a9fa9459SzrjIn the new file format, the header is followed by a sequence of 1951*a9fa9459Szrjrecords. Currently, there are three different record types: histogram 1952*a9fa9459Szrjrecords, call-graph arc records, and basic-block execution count 1953*a9fa9459Szrjrecords. Each file can contain any number of each record type. When 1954*a9fa9459Szrjreading a file, @sc{gnu} @code{gprof} will ensure records of the same type are 1955*a9fa9459Szrjcompatible with each other and compute the union of all records. For 1956*a9fa9459Szrjexample, for basic-block execution counts, the union is simply the sum 1957*a9fa9459Szrjof all execution counts for each basic-block. 1958*a9fa9459Szrj 1959*a9fa9459Szrj@subsection Histogram Records 1960*a9fa9459Szrj 1961*a9fa9459SzrjHistogram records consist of a header that is followed by an array of 1962*a9fa9459Szrjbins. The header contains the text-segment range that the histogram 1963*a9fa9459Szrjspans, the size of the histogram in bytes (unlike in the old BSD 1964*a9fa9459Szrjformat, this does not include the size of the header), the rate of the 1965*a9fa9459Szrjprofiling clock, and the physical dimension that the bin counts 1966*a9fa9459Szrjrepresent after being scaled by the profiling clock rate. The 1967*a9fa9459Szrjphysical dimension is specified in two parts: a long name of up to 15 1968*a9fa9459Szrjcharacters and a single character abbreviation. For example, a 1969*a9fa9459Szrjhistogram representing real-time would specify the long name as 1970*a9fa9459Szrj``seconds'' and the abbreviation as ``s''. This feature is useful for 1971*a9fa9459Szrjarchitectures that support performance monitor hardware (which, 1972*a9fa9459Szrjfortunately, is becoming increasingly common). For example, under DEC 1973*a9fa9459SzrjOSF/1, the ``uprofile'' command can be used to produce a histogram of, 1974*a9fa9459Szrjsay, instruction cache misses. In this case, the dimension in the 1975*a9fa9459Szrjhistogram header could be set to ``i-cache misses'' and the abbreviation 1976*a9fa9459Szrjcould be set to ``1'' (because it is simply a count, not a physical 1977*a9fa9459Szrjdimension). Also, the profiling rate would have to be set to 1 in 1978*a9fa9459Szrjthis case. 1979*a9fa9459Szrj 1980*a9fa9459SzrjHistogram bins are 16-bit numbers and each bin represent an equal 1981*a9fa9459Szrjamount of text-space. For example, if the text-segment is one 1982*a9fa9459Szrjthousand bytes long and if there are ten bins in the histogram, each 1983*a9fa9459Szrjbin represents one hundred bytes. 1984*a9fa9459Szrj 1985*a9fa9459Szrj 1986*a9fa9459Szrj@subsection Call-Graph Records 1987*a9fa9459Szrj 1988*a9fa9459SzrjCall-graph records have a format that is identical to the one used in 1989*a9fa9459Szrjthe BSD-derived file format. It consists of an arc in the call graph 1990*a9fa9459Szrjand a count indicating the number of times the arc was traversed 1991*a9fa9459Szrjduring program execution. Arcs are specified by a pair of addresses: 1992*a9fa9459Szrjthe first must be within caller's function and the second must be 1993*a9fa9459Szrjwithin the callee's function. When performing profiling at the 1994*a9fa9459Szrjfunction level, these addresses can point anywhere within the 1995*a9fa9459Szrjrespective function. However, when profiling at the line-level, it is 1996*a9fa9459Szrjbetter if the addresses are as close to the call-site/entry-point as 1997*a9fa9459Szrjpossible. This will ensure that the line-level call-graph is able to 1998*a9fa9459Szrjidentify exactly which line of source code performed calls to a 1999*a9fa9459Szrjfunction. 2000*a9fa9459Szrj 2001*a9fa9459Szrj@subsection Basic-Block Execution Count Records 2002*a9fa9459Szrj 2003*a9fa9459SzrjBasic-block execution count records consist of a header followed by a 2004*a9fa9459Szrjsequence of address/count pairs. The header simply specifies the 2005*a9fa9459Szrjlength of the sequence. In an address/count pair, the address 2006*a9fa9459Szrjidentifies a basic-block and the count specifies the number of times 2007*a9fa9459Szrjthat basic-block was executed. Any address within the basic-address can 2008*a9fa9459Szrjbe used. 2009*a9fa9459Szrj 2010*a9fa9459Szrj@node Internals 2011*a9fa9459Szrj@section @code{gprof}'s Internal Operation 2012*a9fa9459Szrj 2013*a9fa9459SzrjLike most programs, @code{gprof} begins by processing its options. 2014*a9fa9459SzrjDuring this stage, it may building its symspec list 2015*a9fa9459Szrj(@code{sym_ids.c:@-sym_id_add}), if 2016*a9fa9459Szrjoptions are specified which use symspecs. 2017*a9fa9459Szrj@code{gprof} maintains a single linked list of symspecs, 2018*a9fa9459Szrjwhich will eventually get turned into 12 symbol tables, 2019*a9fa9459Szrjorganized into six include/exclude pairs---one 2020*a9fa9459Szrjpair each for the flat profile (INCL_FLAT/EXCL_FLAT), 2021*a9fa9459Szrjthe call graph arcs (INCL_ARCS/EXCL_ARCS), 2022*a9fa9459Szrjprinting in the call graph (INCL_GRAPH/EXCL_GRAPH), 2023*a9fa9459Szrjtiming propagation in the call graph (INCL_TIME/EXCL_TIME), 2024*a9fa9459Szrjthe annotated source listing (INCL_ANNO/EXCL_ANNO), 2025*a9fa9459Szrjand the execution count listing (INCL_EXEC/EXCL_EXEC). 2026*a9fa9459Szrj 2027*a9fa9459SzrjAfter option processing, @code{gprof} finishes 2028*a9fa9459Szrjbuilding the symspec list by adding all the symspecs in 2029*a9fa9459Szrj@code{default_excluded_list} to the exclude lists 2030*a9fa9459SzrjEXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is specified, 2031*a9fa9459SzrjEXCL_FLAT as well. 2032*a9fa9459SzrjThese default excludes are not added to EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC. 2033*a9fa9459Szrj 2034*a9fa9459SzrjNext, the BFD library is called to open the object file, 2035*a9fa9459Szrjverify that it is an object file, 2036*a9fa9459Szrjand read its symbol table (@code{core.c:@-core_init}), 2037*a9fa9459Szrjusing @code{bfd_canonicalize_symtab} after mallocing 2038*a9fa9459Szrjan appropriately sized array of symbols. At this point, 2039*a9fa9459Szrjfunction mappings are read (if the @samp{--file-ordering} option 2040*a9fa9459Szrjhas been specified), and the core text space is read into 2041*a9fa9459Szrjmemory (if the @samp{-c} option was given). 2042*a9fa9459Szrj 2043*a9fa9459Szrj@code{gprof}'s own symbol table, an array of Sym structures, 2044*a9fa9459Szrjis now built. 2045*a9fa9459SzrjThis is done in one of two ways, by one of two routines, depending 2046*a9fa9459Szrjon whether line-by-line profiling (@samp{-l} option) has been 2047*a9fa9459Szrjenabled. 2048*a9fa9459SzrjFor normal profiling, the BFD canonical symbol table is scanned. 2049*a9fa9459SzrjFor line-by-line profiling, every 2050*a9fa9459Szrjtext space address is examined, and a new symbol table entry 2051*a9fa9459Szrjgets created every time the line number changes. 2052*a9fa9459SzrjIn either case, two passes are made through the symbol 2053*a9fa9459Szrjtable---one to count the size of the symbol table required, 2054*a9fa9459Szrjand the other to actually read the symbols. In between the 2055*a9fa9459Szrjtwo passes, a single array of type @code{Sym} is created of 2056*a9fa9459Szrjthe appropriate length. 2057*a9fa9459SzrjFinally, @code{symtab.c:@-symtab_finalize} 2058*a9fa9459Szrjis called to sort the symbol table and remove duplicate entries 2059*a9fa9459Szrj(entries with the same memory address). 2060*a9fa9459Szrj 2061*a9fa9459SzrjThe symbol table must be a contiguous array for two reasons. 2062*a9fa9459SzrjFirst, the @code{qsort} library function (which sorts an array) 2063*a9fa9459Szrjwill be used to sort the symbol table. 2064*a9fa9459SzrjAlso, the symbol lookup routine (@code{symtab.c:@-sym_lookup}), 2065*a9fa9459Szrjwhich finds symbols 2066*a9fa9459Szrjbased on memory address, uses a binary search algorithm 2067*a9fa9459Szrjwhich requires the symbol table to be a sorted array. 2068*a9fa9459SzrjFunction symbols are indicated with an @code{is_func} flag. 2069*a9fa9459SzrjLine number symbols have no special flags set. 2070*a9fa9459SzrjAdditionally, a symbol can have an @code{is_static} flag 2071*a9fa9459Szrjto indicate that it is a local symbol. 2072*a9fa9459Szrj 2073*a9fa9459SzrjWith the symbol table read, the symspecs can now be translated 2074*a9fa9459Szrjinto Syms (@code{sym_ids.c:@-sym_id_parse}). Remember that a single 2075*a9fa9459Szrjsymspec can match multiple symbols. 2076*a9fa9459SzrjAn array of symbol tables 2077*a9fa9459Szrj(@code{syms}) is created, each entry of which is a symbol table 2078*a9fa9459Szrjof Syms to be included or excluded from a particular listing. 2079*a9fa9459SzrjThe master symbol table and the symspecs are examined by nested 2080*a9fa9459Szrjloops, and every symbol that matches a symspec is inserted 2081*a9fa9459Szrjinto the appropriate syms table. This is done twice, once to 2082*a9fa9459Szrjcount the size of each required symbol table, and again to build 2083*a9fa9459Szrjthe tables, which have been malloced between passes. 2084*a9fa9459SzrjFrom now on, to determine whether a symbol is on an include 2085*a9fa9459Szrjor exclude symspec list, @code{gprof} simply uses its 2086*a9fa9459Szrjstandard symbol lookup routine on the appropriate table 2087*a9fa9459Szrjin the @code{syms} array. 2088*a9fa9459Szrj 2089*a9fa9459SzrjNow the profile data file(s) themselves are read 2090*a9fa9459Szrj(@code{gmon_io.c:@-gmon_out_read}), 2091*a9fa9459Szrjfirst by checking for a new-style @samp{gmon.out} header, 2092*a9fa9459Szrjthen assuming this is an old-style BSD @samp{gmon.out} 2093*a9fa9459Szrjif the magic number test failed. 2094*a9fa9459Szrj 2095*a9fa9459SzrjNew-style histogram records are read by @code{hist.c:@-hist_read_rec}. 2096*a9fa9459SzrjFor the first histogram record, allocate a memory array to hold 2097*a9fa9459Szrjall the bins, and read them in. 2098*a9fa9459SzrjWhen multiple profile data files (or files with multiple histogram 2099*a9fa9459Szrjrecords) are read, the memory ranges of each pair of histogram records 2100*a9fa9459Szrjmust be either equal, or non-overlapping. For each pair of histogram 2101*a9fa9459Szrjrecords, the resolution (memory region size divided by the number of 2102*a9fa9459Szrjbins) must be the same. The time unit must be the same for all 2103*a9fa9459Szrjhistogram records. If the above containts are met, all histograms 2104*a9fa9459Szrjfor the same memory range are merged. 2105*a9fa9459Szrj 2106*a9fa9459SzrjAs each call graph record is read (@code{call_graph.c:@-cg_read_rec}), 2107*a9fa9459Szrjthe parent and child addresses 2108*a9fa9459Szrjare matched to symbol table entries, and a call graph arc is 2109*a9fa9459Szrjcreated by @code{cg_arcs.c:@-arc_add}, unless the arc fails a symspec 2110*a9fa9459Szrjcheck against INCL_ARCS/EXCL_ARCS. As each arc is added, 2111*a9fa9459Szrja linked list is maintained of the parent's child arcs, and of the child's 2112*a9fa9459Szrjparent arcs. 2113*a9fa9459SzrjBoth the child's call count and the arc's call count are 2114*a9fa9459Szrjincremented by the record's call count. 2115*a9fa9459Szrj 2116*a9fa9459SzrjBasic-block records are read (@code{basic_blocks.c:@-bb_read_rec}), 2117*a9fa9459Szrjbut only if line-by-line profiling has been selected. 2118*a9fa9459SzrjEach basic-block address is matched to a corresponding line 2119*a9fa9459Szrjsymbol in the symbol table, and an entry made in the symbol's 2120*a9fa9459Szrjbb_addr and bb_calls arrays. Again, if multiple basic-block 2121*a9fa9459Szrjrecords are present for the same address, the call counts 2122*a9fa9459Szrjare cumulative. 2123*a9fa9459Szrj 2124*a9fa9459SzrjA gmon.sum file is dumped, if requested (@code{gmon_io.c:@-gmon_out_write}). 2125*a9fa9459Szrj 2126*a9fa9459SzrjIf histograms were present in the data files, assign them to symbols 2127*a9fa9459Szrj(@code{hist.c:@-hist_assign_samples}) by iterating over all the sample 2128*a9fa9459Szrjbins and assigning them to symbols. Since the symbol table 2129*a9fa9459Szrjis sorted in order of ascending memory addresses, we can 2130*a9fa9459Szrjsimple follow along in the symbol table as we make our pass 2131*a9fa9459Szrjover the sample bins. 2132*a9fa9459SzrjThis step includes a symspec check against INCL_FLAT/EXCL_FLAT. 2133*a9fa9459SzrjDepending on the histogram 2134*a9fa9459Szrjscale factor, a sample bin may span multiple symbols, 2135*a9fa9459Szrjin which case a fraction of the sample count is allocated 2136*a9fa9459Szrjto each symbol, proportional to the degree of overlap. 2137*a9fa9459SzrjThis effect is rare for normal profiling, but overlaps 2138*a9fa9459Szrjare more common during line-by-line profiling, and can 2139*a9fa9459Szrjcause each of two adjacent lines to be credited with half 2140*a9fa9459Szrja hit, for example. 2141*a9fa9459Szrj 2142*a9fa9459SzrjIf call graph data is present, @code{cg_arcs.c:@-cg_assemble} is called. 2143*a9fa9459SzrjFirst, if @samp{-c} was specified, a machine-dependent 2144*a9fa9459Szrjroutine (@code{find_call}) scans through each symbol's machine code, 2145*a9fa9459Szrjlooking for subroutine call instructions, and adding them 2146*a9fa9459Szrjto the call graph with a zero call count. 2147*a9fa9459SzrjA topological sort is performed by depth-first numbering 2148*a9fa9459Szrjall the symbols (@code{cg_dfn.c:@-cg_dfn}), so that 2149*a9fa9459Szrjchildren are always numbered less than their parents, 2150*a9fa9459Szrjthen making a array of pointers into the symbol table and sorting it into 2151*a9fa9459Szrjnumerical order, which is reverse topological 2152*a9fa9459Szrjorder (children appear before parents). 2153*a9fa9459SzrjCycles are also detected at this point, all members 2154*a9fa9459Szrjof which are assigned the same topological number. 2155*a9fa9459SzrjTwo passes are now made through this sorted array of symbol pointers. 2156*a9fa9459SzrjThe first pass, from end to beginning (parents to children), 2157*a9fa9459Szrjcomputes the fraction of child time to propagate to each parent 2158*a9fa9459Szrjand a print flag. 2159*a9fa9459SzrjThe print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH, 2160*a9fa9459Szrjwith a parent's include or exclude (print or no print) property 2161*a9fa9459Szrjbeing propagated to its children, unless they themselves explicitly appear 2162*a9fa9459Szrjin INCL_GRAPH or EXCL_GRAPH. 2163*a9fa9459SzrjA second pass, from beginning to end (children to parents) actually 2164*a9fa9459Szrjpropagates the timings along the call graph, subject 2165*a9fa9459Szrjto a check against INCL_TIME/EXCL_TIME. 2166*a9fa9459SzrjWith the print flag, fractions, and timings now stored in the symbol 2167*a9fa9459Szrjstructures, the topological sort array is now discarded, and a 2168*a9fa9459Szrjnew array of pointers is assembled, this time sorted by propagated time. 2169*a9fa9459Szrj 2170*a9fa9459SzrjFinally, print the various outputs the user requested, which is now fairly 2171*a9fa9459Szrjstraightforward. The call graph (@code{cg_print.c:@-cg_print}) and 2172*a9fa9459Szrjflat profile (@code{hist.c:@-hist_print}) are regurgitations of values 2173*a9fa9459Szrjalready computed. The annotated source listing 2174*a9fa9459Szrj(@code{basic_blocks.c:@-print_annotated_source}) uses basic-block 2175*a9fa9459Szrjinformation, if present, to label each line of code with call counts, 2176*a9fa9459Szrjotherwise only the function call counts are presented. 2177*a9fa9459Szrj 2178*a9fa9459SzrjThe function ordering code is marginally well documented 2179*a9fa9459Szrjin the source code itself (@code{cg_print.c}). Basically, 2180*a9fa9459Szrjthe functions with the most use and the most parents are 2181*a9fa9459Szrjplaced first, followed by other functions with the most use, 2182*a9fa9459Szrjfollowed by lower use functions, followed by unused functions 2183*a9fa9459Szrjat the end. 2184*a9fa9459Szrj 2185*a9fa9459Szrj@node Debugging 2186*a9fa9459Szrj@section Debugging @code{gprof} 2187*a9fa9459Szrj 2188*a9fa9459SzrjIf @code{gprof} was compiled with debugging enabled, 2189*a9fa9459Szrjthe @samp{-d} option triggers debugging output 2190*a9fa9459Szrj(to stdout) which can be helpful in understanding its operation. 2191*a9fa9459SzrjThe debugging number specified is interpreted as a sum of the following 2192*a9fa9459Szrjoptions: 2193*a9fa9459Szrj 2194*a9fa9459Szrj@table @asis 2195*a9fa9459Szrj@item 2 - Topological sort 2196*a9fa9459SzrjMonitor depth-first numbering of symbols during call graph analysis 2197*a9fa9459Szrj@item 4 - Cycles 2198*a9fa9459SzrjShows symbols as they are identified as cycle heads 2199*a9fa9459Szrj@item 16 - Tallying 2200*a9fa9459SzrjAs the call graph arcs are read, show each arc and how 2201*a9fa9459Szrjthe total calls to each function are tallied 2202*a9fa9459Szrj@item 32 - Call graph arc sorting 2203*a9fa9459SzrjDetails sorting individual parents/children within each call graph entry 2204*a9fa9459Szrj@item 64 - Reading histogram and call graph records 2205*a9fa9459SzrjShows address ranges of histograms as they are read, and each 2206*a9fa9459Szrjcall graph arc 2207*a9fa9459Szrj@item 128 - Symbol table 2208*a9fa9459SzrjReading, classifying, and sorting the symbol table from the object file. 2209*a9fa9459SzrjFor line-by-line profiling (@samp{-l} option), also shows line numbers 2210*a9fa9459Szrjbeing assigned to memory addresses. 2211*a9fa9459Szrj@item 256 - Static call graph 2212*a9fa9459SzrjTrace operation of @samp{-c} option 2213*a9fa9459Szrj@item 512 - Symbol table and arc table lookups 2214*a9fa9459SzrjDetail operation of lookup routines 2215*a9fa9459Szrj@item 1024 - Call graph propagation 2216*a9fa9459SzrjShows how function times are propagated along the call graph 2217*a9fa9459Szrj@item 2048 - Basic-blocks 2218*a9fa9459SzrjShows basic-block records as they are read from profile data 2219*a9fa9459Szrj(only meaningful with @samp{-l} option) 2220*a9fa9459Szrj@item 4096 - Symspecs 2221*a9fa9459SzrjShows symspec-to-symbol pattern matching operation 2222*a9fa9459Szrj@item 8192 - Annotate source 2223*a9fa9459SzrjTracks operation of @samp{-A} option 2224*a9fa9459Szrj@end table 2225*a9fa9459Szrj 2226*a9fa9459Szrj@node GNU Free Documentation License 2227*a9fa9459Szrj@appendix GNU Free Documentation License 2228*a9fa9459Szrj@include fdl.texi 2229*a9fa9459Szrj 2230*a9fa9459Szrj@bye 2231*a9fa9459Szrj 2232*a9fa9459SzrjNEEDS AN INDEX 2233*a9fa9459Szrj 2234*a9fa9459Szrj-T - "traditional BSD style": How is it different? Should the 2235*a9fa9459Szrjdifferences be documented? 2236*a9fa9459Szrj 2237*a9fa9459Szrjexample flat file adds up to 100.01%... 2238*a9fa9459Szrj 2239*a9fa9459Szrjnote: time estimates now only go out to one decimal place (0.0), where 2240*a9fa9459Szrjthey used to extend two (78.67). 2241