1.TL 2Limbo profilers in Inferno 3.AU 4J R Firth 5.AI 6Vita Nuova 7.br 813 June 2002 9.SP 4 10.NH 1 11Introduction 12.LP 13Currently there are three application level profiling tools in the 14Inferno package. 15.I Prof 16is a time profiler which, by sampling, can provide statistics 17on the percentage of time spent on each line of limbo source. 18.I Cprof 19is a 20coverage profiler which provides the execution profile for limbo source 21code. It can accumulate results over a series of runs to allow full 22coverage testing. Finally, 23.I mprof 24is a memory profiler which provides 25statistics on the amount of memory used by each line 26of limbo source. 27.LP 28Two gui versions of these tools currently exist. 29.I Wm/cprof 30shows the 31coverage per module and highlights those lines which have not been 32executed or have been only partially executed. 33.I Wm/mprof 34shows the memory 35usage per module and highlights lines with high memory allocation in 36darker shades of red. 37.I Prof 38itself does not have a gui equivalent as it 39was originally written to determine why acme was so slow when using it's 40global editing command. A gui for it was not a requirement at that stage. 41.LP 42All these tools use a common library module 43.CW /module/profile.m 44and 45.CW /appl/lib/profile.b 46that acts as the direct interface with the kernel profiling device. 47.LP 48Note that none of these tools give kernel profile statistics. For that, the devmem driver should be used. 49.LP 50Although the use of these tools is very similar, there are a few differences 51when it comes to interactive testing as the profilers were written to 52answer different questions. Thus 53.I prof 54tries to determine 'why is blah so 55slow ?', 56.I cprof 57tries to accumulate coverage records over time and 58.I mprof 59tries to 60give a series of memory statistics at intervals during the execution of 61a program or series of programs. 62.NH 1 63Prof 64.LP 65The time profiler works by sampling. A kernel procedure sleeps for the 66given sample time and then notes the particular dis instruction currently 67being executed before repeating the process. After many such samples, an accurate profile can be obtained. 68.LP 69At it's simplest we can profile a particular command by giving the command 70to execute followed by any arguments to the command eg 71.sp 72.RS 73 prof wm/polyhedra 74.RE 75.sp 76profiles the polyhedra displayer. After letting the latter run for a reasonable 77amount of time, we exit and then get the following statistics. 78.br 79.DS 80Module: Wmlib(/dis/lib/wmlib.dis) 81 8234 0.06 str = load String String->PATH; 83 84**** module sampling points 1 **** 85 86.DE 87.DS 88Module: Bufio(/dis/lib/bufio.dis) 89 90340 0.06 n := 0; 91341 0.12 while(b.index < b.size){ 92342 0.99 (ch, i, nil) = sys->byte2char(b.buffer[0:b.size], b.index); 93 94**** module sampling points 19 **** 95 96Module: Polyfill(/dis/math/polyfill.dis) 97 9837 10.80 for(i := 0; i < n; i++) 9938 19.86 b0[i] = b1[i] = ∞; 10057 0.06 p.y += y; 10158 1.18 dst.line((left, y), (right, y), Endsquare, Endsquare, 0, src, p); 10263 0.12 prevx := ∞; 10364 9.93 for(x := left; x <= right; x++){ 10465 20.61 if(z+e < zbuf0[k] || (z-e <= zbuf1[k] && x != right && prevx != ∞)){ 10566 6.46 zbuf0[k] = z-e; 10667 5.71 zbuf1[k] = z+e; 10768 0.74 if(prevx == ∞) 10869 0.74 prevx = x; 10971 0.12 else if(prevx != ∞){ 11072 0.25 fillline(dst, prevx, x-1, y, src, p); 11173 2.61 prevx = ∞; 11275 4.35 z += dx; 11376 3.17 k++; 11478 0.06 if(prevx != ∞) 11579 0.87 fillline(dst, prevx, right, y, src, p); 11680 0.06 } 117152 0.06 return (vx/z, mod); 118186 0.06 sp.dzrem = mod(sp.num, sp.den) << fixshift; 119187 0.06 sp.dz += sdiv(sp.dzrem, sp.den); 120217 0.62 for(q = p = 0; p < ep; p++) { 121218 0.37 sp = seg[p]; 122220 0.12 continue; 123221 0.12 sp.z += sp.dz; 124222 0.19 sp.zerr += sp.dzrem; 125223 0.12 if(sp.zerr >= sp.den) { 126224 0.19 sp.z++; 127225 0.25 sp.zerr -= sp.den; 128226 0.25 if(sp.zerr < 0 || sp.zerr >= sp.den) 129227 0.25 sys->print("bad ratzerr1: %d den %d dzrem %d\n", sp.zerr, sp.den, sp.dzrem); 130229 0.31 seg[q] = sp; 131230 0.31 q++; 132233 0.25 for(p = next; seg[p] != nil; p++) { 133234 0.06 sp = seg[p]; 134247 0.12 ep = q; 135248 0.06 next = p; 136257 0.06 continue; 137260 0.62 zsort(seg, ep); 138262 0.25 for(p = 0; p < ep; p++) { 139263 0.19 sp = seg[p]; 140264 0.06 cnt = 0; 141265 0.06 x = sp.z; 142266 0.25 ix = (x + onehalf) >> fixshift; 143267 0.06 if(ix >= maxx) 144271 0.06 cnt += sp.d; 145272 0.12 p++; 146273 0.25 sp = seg[p]; 147275 0.06 if(p == ep) { 148277 0.06 return; 149279 0.06 cnt += sp.d; 150280 0.12 if((cnt&wind) == 0) 151283 0.19 sp = seg[p]; 152286 0.25 ix2 = (x2 + onehalf) >> fixshift; 153291 1.92 filllinez(dst, ix, ix2, iy, zv+ix*dx, er, dx, k+ix-zr.min.x, zbuf0, zbuf1, src, spt); 154293 0.06 y += (1<<fixshift); 155294 0.31 iy++; 156295 0.06 k += xlen; 157296 0.06 zv += dy; 158298 0.06 } 159310 0.06 done = 1; 160311 0.12 q--; 161312 0.25 for(p = 0; p < q; p++) { 162313 0.87 if(seg[p].z > seg[p+1].z) { 163367 0.06 t = a[0]; a[0] = a[i]; a[i] = t; 164373 0.06 while(i < n && ycompare(a[i], a[0]) < 0); 165379 0.12 t = a[i]; a[i] = a[j]; a[j] = t; 166384 0.06 qsortycompare(a, j); 167 168**** module sampling points 1584 **** 169 170Module: Polyhedra(/dis/wm/polyhedra.dis) 171 172327 0.12 return (int (geo.sx*v.x)+geo.tx, int (geo.sy*v.y)+geo.ty); 173471 0.06 if(allf || dot(geo.view, newn[j]) < 0.0) 174472 0.06 polyfilla(fv[j], new, newn[j], dot(geo.light, newn[j]), geo, concave, inc); 175496 0.06 ap[j] = map(vtx, geo); 176512 0.06 if(a <= -LIMIT || a >= LIMIT) 177531 0.06 fillpoly(RDisp, ap, ~0, face, (0, 0), geo.zstate, dc, dx, dy); 178 179**** module sampling points 7 **** 180 181 182**** total sampling points 1611 **** 183.DE 184.br 185The output lists all lines in all modules with a sampling point. Each line 186shows the line number in the corresponding source file, the percentage of 187time spent on that line and the source code. We can see that about 60% of 188the sampling points occur on lines 37, 38, 64 and 65 of the Polyfill module. 189With this information we might then try to speed up this part of the code 190by altering the algorithm or making the limbo code more efficient (for 191instance by moving constant calculations or addressing out of loops). 192.LP 193The number of sampling points is also shown. The sampling rate can be 194increased with the -s option to give better granularity. 195This will cause a decrease in apparent performance but increases the 196accuracy of the results. The above example showed the results for all 197modules sampled. We might have restricted attention to the two main 198polyhedra modules instead by executing 199.sp 200.RS 201 prof -m Polyhedra -m Polyfill wm/polyhedra 202.RE 203.sp 204See the manual page for other options to 205.I prof 206and further examples. 207.NH1 208Cprof 209.LP 210Coverage of instructions is achieved by running a special dis instruction execute routine in place of the usual one (just as the debugger does). 211This routine notes down each 212instruction as it is executed. The profile device then passes this information 213to 214.I cprof 215via the io system. 216.LP 217The coverage profiler is used in a similar way to the time profiler. 218.sp 219.RS 220 cprof -m Zeros zeros 1024 2880 221.RE 222.sp 223gives 224.br 225.DS 226Module: Zeros(zeros.dis) 56% coverage 227 2281 implement Zeros; 2292 2303 include "sys.m"; 2314 sys: Sys; 2325 include "arg.m"; 2336 arg: Arg; 2347 include "string.m"; 2358 str: String; 2369 include "keyring.m"; 23710 include "security.m"; 23811 random: Random; 23912 24013 include "draw.m"; 24114 24215 Zeros: module 24316 { 24417 init: fn(nil: ref Draw->Context, argv: list of string); 24518 }; 24619 24720 init(nil: ref Draw->Context, argv: list of string) 24821 { 24922 z: array of byte; 25023 i: int; 25124 + sys = load Sys Sys->PATH; 25225 + arg = load Arg Arg->PATH; 25326 + str = load String String->PATH; 25427 25528 + if(sys == nil || arg == nil) 25629 - return; 25730 25831 + bs := 0; 25932 + n := 0; 26033 + val := 0; 26134 + rflag := 0; 26235 + arg->init(argv); 26336 + while ((c := arg->opt()) != 0) 26437 - case c { 26538 - 'r' => rflag = 1; 26639 - 'v' => (val, nil) = str->toint(arg->arg(), 16); 26740 - * => sys->raise(sys->sprint("fail: unknown option (%c)\n", c)); 26841 } 269.DE 270.DS 27142 + argv = arg->argv(); 27243 + if(len argv >= 1) 27344 + bs = int hd argv; 27445 else 27546 - bs = 1; 27647 + if (len argv >= 2) 27748 + n = int hd tl argv; 27849 else 27950 - n = 1; 28051 + if(bs == 0 || n == 0) { 28152 - sys->fprint(sys->fildes(2), "usage: zeros [-r] [-v value] blocksize [number]\n"); 28253 - sys->raise("fail: usage"); 28354 } 28455 + if (rflag) { 28556 - random = load Random Random->PATH; 28657 - if (random == nil) 28758 - sys->raise("fail: no security module\n"); 28859 - z = random->randombuf(random->NotQuiteRandom, bs); 28960 } 29061 else { 29162 + z = array[bs] of byte; 29263 + for(i=0;i<bs;i++) 29364 + z[i] = byte val; 29465 } 29566 + for(i=0;i<n;i++) 29667 + sys->write(sys->fildes(1), z, bs); 29768 + } 298 299**** module dis instructions 39725 **** 300.DE 301.br 302Here the -m option has restricted attention to the Zeros module itself. 303The output shows the source line number, an indicator of coverage and 304the source. The indicator is + if the line has been executed, - if 305it hasn't and ? if only part of it has (for instance a loop statement that has 306never had it's iteration part executed). Lines with no indicator have no 307corresponding dis instructions associated with them. Another option (-f) 308shows coverage frequencies instead. 309.LP 310An alternative to 311.I cprof 312is 313.I wm/cprof 314which shows the statistics graphically. 315It's options are pretty much the same as those to 316.I cprof . 317Unexecuted and 318partially executed lines of code are shown in colour. See the man page 319for exact details of the colouring convention 320.LP 321Results may be accumulated with the -r option so that multiple runs of 322code can be made. The resulting statistics go into a file <xxx>.prf when 323<xxx>.dis is the corresponding dis file. See the manual page for further 324details on how to use this option and then review the accumulated 325results. 326.NH 1 327Mprof 328.LP 329When memory profiling, the kernel profile device associates each heap allocation with a line of limbo source and each heap deallocation with the line of 330limbo source that allocated it. In this way, current memory usage and 331high-water usage can be determined on a line by line basis. 332.LP 333Here it seems that memory usage at a particular point in the execution of 334a program is more appropriate than the post-mortem approach of 335.I prof 336and 337.I cprof 338, so an interactive example is described (though 339.I mprof 340can be 341used non-interactively and 342.I prof 343interactively if so desired). See the manual 344pages for complete details and further examples. 345.LP 346To do this we execute 347.sp 348.RS 349 mprof -b -m Polyhedra 350.RE 351.sp 352which kicks off profiling and restricts attention to the Polyhedra module 353whenever it runs. The -b simply says begin profiling. Note that no command 354to execute is given to 355.I mprof 356at this stage. Then run the command 357.sp 358.RS 359 wm/polyhedra & 360.RE 361.sp 362and interact with it. Now show memory statistics 363.sp 364.RS 365 mprof 366.RE 367.sp 368This gives 369.br 370.DS 371Module: Polyhedra(/dis/wm/polyhedra.dis) 372 37344 100 100 sys = load Sys Sys->PATH; 37445 132 132 draw = load Draw Draw->PATH; 37546 68 68 tk = load Tk Tk->PATH; 37647 1788 1788 wmlib = load Wmlib Wmlib->PATH; 37748 232 232 bufio = load Bufio Bufio->PATH; 37849 68 68 math = load Math Math->PATH; 37950 204 204 rand = load Rand Rand->PATH; 38051 0 3504 daytime = load Daytime Daytime->PATH; 38152 544 544 polyfill = load Polyfill Polyfill->PATH; 38253 1824 1824 smenu = load Smenu Smenu->PATH; 38386 36 36 cmdch := chan of string; 38495 36 36 sync := chan of int; 38596 36 36 chanθ := chan of real; 386103 68 68 shade = array[NSHADES] of ref Image; 387116 36 36 yieldc := chan of int; 388120 36 36 sm := array[2] of ref Scrollmenu; 389378 68 176 s += " (" + string p.indx + ")"; 390403 36 36 vec := array[2] of array of Vector; 391404 740 740 vec[0] = array[V] of Vector; 392405 740 740 vec[1] = array[V] of Vector; 393407 36 36 norm = array[2] of array of Vector; 394408 612 612 norm[0] = array[F] of Vector; 395409 612 612 norm[1] = array[F] of Vector; 396492 68 68 ap := array[n+1] of Point; 397609 164 164 geo := ref Geom; 398610 36 36 TM := array[4] of array of real; 399612 272 272 TM[i] = array[4] of real; 400663 8000 8000 p := ref Polyhedron; 401707 740 740 p.v = array[p.V] of Vector; 402710 612 612 p.f = array[p.F] of Vector; 403713 132 132 p.fv = array[p.F] of array of int; 404716 164 164 p.vf = array[p.V] of array of int; 405729 9504 9640 return s[0: len s - 1]; 406750 3672 3672 a := array[n+2] of int; 407768 0 136 return (n, s[i+1:]); 408779 0 104 return (r, s[i+1:]); 409802 0 68 s = s[1:]; 410806 0 72 s = s[0: ln-1]; 411.DE 412.DS 413866 0 200 cmd(mainwin, ".f1.txt configure -text {" + s + "}"); 414874 0 356 labs := array[n] of string; 415881 0 5128 labs[i++] = string q.indx + " " + name; 416884 0 68 cmd(top, mname + " configure -borderwidth 3"); 417920 0 104 cmd(win, ". configure -height " + string (scrsize.y - bd * 2)); 418934 0 244 cmd(win, ". configure -x " + string actr.min.x + " -y " + string actr.min.y); 419 420Module totals 31416 33984 421.DE 422.br 423We get the source line number, the amount of memory in bytes 424currently allocated on that line, the high-water mark in bytes and then the source. Thus 425loading the Sys module on line 44 used 100 bytes and this memory is 426still allocated. Loading Daytime on line 51 used 3504 bytes but this is now 427released (because the module pointer is set to nil in the source and the 428memory has been reclaimed). The string concatenation on line 378 currently uses 68 bytes 429but at it's worst it was 176 bytes. 430.LP 431Further interaction with wm/polyhedra can now be done and memory 432statistics reviewed before the 433profiling session is ended, throwing away the accumulated memory 434statistics inside the kernel with 435.sp 436.RS 437 mprof -c 438.RE 439.sp 440The -c option simply says cease profiling. 441.LP 442An alternative to 443.I mprof 444is 445.I wm/mprof 446which shows the statistics graphically. 447It's options are pretty much the same as those to 448.I mprof . 449Lines of code 450which have allocated more of the memory are shown in darker shades of red. 451.NH 1 452Manual pages 453.LP 454Further information and other examples are given in the following 455manual pages :- 456.sp 457.RS 458.I cprof(1) 459.br 460.I mprof(1) 461.br 462.I prof(1) 463.br 464.I wm-cprof(1) 465.br 466.I wm-mprof(1) 467.RE 468.sp 469For the lower level library module interface to these profilers 470.sp 471.RS 472.I prof(2) 473.RE 474.sp 475For the kernel profile device which gathers timing, coverage and memory 476statistics 477.sp 478.RS 479.I prof(3) 480.RE 481.br 482.NH 1 483Sources 484.LP 485The relevant sources are 486.sp 487.RS 488.CW /module/profile.m 489.br 490.CW /appl/lib/profile.b 491.br 492.CW /appl/cmd/cprof.b 493.br 494.CW /appl/cmd/mprof.b 495.br 496.CW /appl/cmd/prof.b 497.br 498.CW /appl/wm/cprof.b 499.br 500.CW /appl/wm/mprof.b 501.br 502.CW /emu/devprof.c 503.br 504.CW /os/port/devprof.c 505.RE 506.NH 1 507Addendum 508.LP 509A gui version of 510.I prof 511has been added for completeness. See the manual page 512.I wm-prof(1) 513and the source 514.CW /appl/wm/prof.b . 515