xref: /inferno-os/doc/lprof.ms (revision 46439007cf417cbd9ac8049bb4122c890097a0fa)
1.TL
2Limbo profilers in Inferno
3.AU
4J R Firth
5.AI
6Vita Nuova
7.br
813 June 2002
9.SP 4
10.NH 1
11Introduction
12.LP
13Currently there are three application level profiling tools in the
14Inferno package.
15.I Prof
16is a time profiler which, by sampling, can provide statistics
17on the percentage of time spent on each line of limbo source.
18.I Cprof
19is a
20coverage profiler which provides the execution profile for limbo source
21code. It can accumulate results over a series of runs to allow full
22coverage testing. Finally,
23.I mprof
24is a memory profiler which provides
25statistics on the amount of memory used by each line
26of limbo source.
27.LP
28Two gui versions of these tools currently exist.
29.I Wm/cprof
30shows the
31coverage per module and highlights those lines which have not been
32executed or have been only partially executed.
33.I Wm/mprof
34shows the memory
35usage per module and highlights lines with high memory allocation in
36darker shades of red.
37.I Prof
38itself does not have a gui equivalent as it
39was originally written to determine why acme was so slow when using it's
40global editing command. A gui for it was not a requirement at that stage.
41.LP
42All these tools use a common library module
43.CW /module/profile.m
44and
45.CW /appl/lib/profile.b
46that acts as the direct interface with the kernel profiling device.
47.LP
48Note that none of these tools give kernel profile statistics. For that, the devmem driver should be used.
49.LP
50Although the use of these tools is very similar, there are a few differences
51when it comes to interactive testing as the profilers were written to
52answer different questions. Thus
53.I prof
54tries to determine 'why is blah so
55slow ?',
56.I cprof
57tries to accumulate coverage records over time and
58.I mprof
59tries to
60give a series of memory statistics at intervals during the execution of
61a program or series of programs.
62.NH 1
63Prof
64.LP
65The time profiler works by sampling. A kernel procedure sleeps for the
66given sample time and then notes the particular dis instruction currently
67being executed before repeating the process. After many such samples, an accurate profile can be obtained.
68.LP
69At it's simplest we can profile a particular command by giving the command
70to execute followed by any arguments to the command eg
71.sp
72.RS
73	prof wm/polyhedra
74.RE
75.sp
76profiles the polyhedra displayer. After letting the latter run for a reasonable
77amount of time, we exit and then get the following statistics.
78.br
79.DS
80Module: Wmlib(/dis/lib/wmlib.dis)
81
8234	0.06		str = load String String->PATH;
83
84**** module sampling points 1 ****
85
86.DE
87.DS
88Module: Bufio(/dis/lib/bufio.dis)
89
90340	0.06			n := 0;
91341	0.12			while(b.index < b.size){
92342	0.99				(ch, i, nil) = sys->byte2char(b.buffer[0:b.size], b.index);
93
94**** module sampling points 19 ****
95
96Module: Polyfill(/dis/math/polyfill.dis)
97
9837	10.80		for(i := 0; i < n; i++)
9938	19.86			b0[i] = b1[i] = ∞;
10057	0.06		p.y += y;
10158	1.18		dst.line((left, y), (right, y), Endsquare, Endsquare, 0, src, p);
10263	0.12		prevx := ∞;
10364	9.93		for(x := left; x <= right; x++){
10465	20.61			if(z+e < zbuf0[k] || (z-e <= zbuf1[k] && x != right && prevx != ∞)){
10566	6.46				zbuf0[k] = z-e;
10667	5.71				zbuf1[k] = z+e;
10768	0.74				if(prevx == ∞)
10869	0.74					prevx = x;
10971	0.12			else if(prevx != ∞){
11072	0.25				fillline(dst, prevx, x-1, y, src, p);
11173	2.61				prevx = ∞;
11275	4.35			z += dx;
11376	3.17			k++;
11478	0.06		if(prevx != ∞)
11579	0.87			fillline(dst, prevx, right, y, src, p);
11680	0.06	}
117152	0.06			return (vx/z, mod);
118186	0.06			sp.dzrem = mod(sp.num, sp.den) << fixshift;
119187	0.06			sp.dz += sdiv(sp.dzrem, sp.den);
120217	0.62			for(q = p = 0; p < ep; p++) {
121218	0.37				sp = seg[p];
122220	0.12					continue;
123221	0.12				sp.z += sp.dz;
124222	0.19				sp.zerr += sp.dzrem;
125223	0.12				if(sp.zerr >= sp.den) {
126224	0.19					sp.z++;
127225	0.25					sp.zerr -= sp.den;
128226	0.25					if(sp.zerr < 0 || sp.zerr >= sp.den)
129227	0.25						sys->print("bad ratzerr1: %d den %d dzrem %d\n", sp.zerr, sp.den, sp.dzrem);
130229	0.31				seg[q] = sp;
131230	0.31				q++;
132233	0.25			for(p = next; seg[p] != nil; p++) {
133234	0.06				sp = seg[p];
134247	0.12			ep = q;
135248	0.06			next = p;
136257	0.06				continue;
137260	0.62			zsort(seg, ep);
138262	0.25			for(p = 0; p < ep; p++) {
139263	0.19				sp = seg[p];
140264	0.06				cnt = 0;
141265	0.06				x = sp.z;
142266	0.25				ix = (x + onehalf) >> fixshift;
143267	0.06				if(ix >= maxx)
144271	0.06				cnt += sp.d;
145272	0.12				p++;
146273	0.25				sp = seg[p];
147275	0.06					if(p == ep) {
148277	0.06						return;
149279	0.06					cnt += sp.d;
150280	0.12					if((cnt&wind) == 0)
151283	0.19					sp = seg[p];
152286	0.25				ix2 = (x2 + onehalf) >> fixshift;
153291	1.92				filllinez(dst, ix, ix2, iy, zv+ix*dx, er, dx, k+ix-zr.min.x, zbuf0, zbuf1, src, spt);
154293	0.06			y += (1<<fixshift);
155294	0.31			iy++;
156295	0.06			k += xlen;
157296	0.06			zv += dy;
158298	0.06	}
159310	0.06				done = 1;
160311	0.12				q--;
161312	0.25				for(p = 0; p < q; p++) {
162313	0.87					if(seg[p].z > seg[p+1].z) {
163367	0.06			t = a[0]; a[0] = a[i]; a[i] = t;
164373	0.06				while(i < n && ycompare(a[i], a[0]) < 0);
165379	0.12				t = a[i]; a[i] = a[j]; a[j] = t;
166384	0.06				qsortycompare(a, j);
167
168**** module sampling points 1584 ****
169
170Module: Polyhedra(/dis/wm/polyhedra.dis)
171
172327	0.12		return (int (geo.sx*v.x)+geo.tx, int (geo.sy*v.y)+geo.ty);
173471	0.06					if(allf || dot(geo.view, newn[j]) < 0.0)
174472	0.06						polyfilla(fv[j], new, newn[j], dot(geo.light, newn[j]), geo, concave, inc);
175496	0.06			ap[j] = map(vtx, geo);
176512	0.06			if(a <= -LIMIT || a >= LIMIT)
177531	0.06			fillpoly(RDisp, ap, ~0, face, (0, 0), geo.zstate, dc, dx, dy);
178
179**** module sampling points 7 ****
180
181
182**** total sampling points 1611 ****
183.DE
184.br
185The output lists all lines in all modules with a sampling point. Each line
186shows the line number in the corresponding source file, the percentage of
187time spent on that line and the source code. We can see that about 60% of
188the sampling points occur on lines 37, 38, 64 and 65 of the Polyfill module.
189With this information we might then try to speed up this part of the code
190by altering the algorithm or making the limbo code more efficient (for
191instance by moving constant calculations or addressing out of loops).
192.LP
193The number of sampling points is also shown. The sampling rate can be
194increased with the -s option to give better granularity.
195This will cause a decrease in apparent performance but increases the
196accuracy of the results. The above example showed the results for all
197modules sampled. We might have restricted attention to the two main
198polyhedra modules instead by executing
199.sp
200.RS
201	prof -m Polyhedra -m Polyfill wm/polyhedra
202.RE
203.sp
204See the manual page for other options to
205.I prof
206and further examples.
207.NH1
208Cprof
209.LP
210Coverage of instructions is achieved by running a special dis instruction execute routine in place of the usual one (just as the debugger does).
211This routine notes down each
212instruction as it is executed. The profile device then passes this information
213to
214.I cprof
215via the io system.
216.LP
217The coverage profiler is used in a similar way to the time profiler.
218.sp
219.RS
220	cprof -m Zeros zeros 1024 2880
221.RE
222.sp
223gives
224.br
225.DS
226Module: Zeros(zeros.dis)	56% coverage
227
2281	 	implement Zeros;
2292
2303	 	include "sys.m";
2314	 		sys: Sys;
2325	 	include "arg.m";
2336	 		arg: Arg;
2347	 	include "string.m";
2358	 		str: String;
2369	 	include "keyring.m";
23710	 	include "security.m";
23811	 		random: Random;
23912
24013	 	include "draw.m";
24114
24215	 	Zeros: module
24316	 	{
24417	 		init: fn(nil: ref Draw->Context, argv: list of string);
24518	 	};
24619
24720	 	init(nil: ref Draw->Context, argv: list of string)
24821	 	{
24922	 		z: array of byte;
25023	 		i: int;
25124	+		sys = load Sys Sys->PATH;
25225	+		arg = load Arg Arg->PATH;
25326	+		str = load String String->PATH;
25427
25528	+		if(sys == nil || arg == nil)
25629	-			return;
25730
25831	+		bs := 0;
25932	+		n := 0;
26033	+		val := 0;
26134	+		rflag := 0;
26235	+		arg->init(argv);
26336	+		while ((c := arg->opt()) != 0)
26437	-			case c {
26538	-			'r' => rflag = 1;
26639	-			'v' => (val, nil) = str->toint(arg->arg(), 16);
26740	-			* => sys->raise(sys->sprint("fail: unknown option (%c)\n", c));
26841	 			}
269.DE
270.DS
27142	+		argv = arg->argv();
27243	+		if(len argv >= 1)
27344	+			bs = int hd argv;
27445	 		else
27546	-			bs = 1;
27647	+		if (len argv >= 2)
27748	+			n = int hd tl argv;
27849	 		else
27950	-			n = 1;
28051	+		if(bs == 0 || n == 0) {
28152	-			sys->fprint(sys->fildes(2), "usage: zeros [-r] [-v value] blocksize [number]\n");
28253	-			sys->raise("fail: usage");
28354	 		}
28455	+		if (rflag) {
28556	-			random = load Random Random->PATH;
28657	-			if (random == nil)
28758	-				sys->raise("fail: no security module\n");
28859	-			z = random->randombuf(random->NotQuiteRandom, bs);
28960	 		}
29061	 		else {
29162	+			z = array[bs] of byte;
29263	+			for(i=0;i<bs;i++)
29364	+				z[i] = byte val;
29465	 		}
29566	+		for(i=0;i<n;i++)
29667	+			sys->write(sys->fildes(1), z, bs);
29768	+	}
298
299**** module dis instructions 39725 ****
300.DE
301.br
302Here the -m option has restricted attention to the Zeros module itself.
303The output shows the source line number, an indicator of coverage and
304the source. The indicator is + if the line has been executed, - if
305it hasn't and ? if only part of it has (for instance a loop statement that has
306never had it's iteration part executed). Lines with no indicator have no
307corresponding dis instructions associated with them. Another option (-f)
308shows coverage frequencies instead.
309.LP
310An alternative to
311.I cprof
312is
313.I wm/cprof
314which shows the statistics graphically.
315It's options are pretty much the same as those to
316.I cprof .
317Unexecuted and
318partially executed lines of code are shown in colour. See the man page
319for exact details of the colouring convention
320.LP
321Results may be accumulated with the -r option so that multiple runs of
322code can be made. The resulting statistics go into a file <xxx>.prf when
323<xxx>.dis is the corresponding dis file. See the manual page for further
324details on how to use this option and then review the accumulated
325results.
326.NH 1
327Mprof
328.LP
329When memory profiling, the kernel profile device associates each heap allocation with a line of limbo source and each heap deallocation with the line of
330limbo source that allocated it. In this way, current memory usage and
331high-water usage can be determined on a line by line basis.
332.LP
333Here it seems that memory usage at a particular point in the execution of
334a program is more appropriate than the post-mortem approach of
335.I prof
336and
337.I cprof
338, so an interactive example is described (though
339.I mprof
340can be
341used non-interactively and
342.I prof
343interactively if so desired). See the manual
344pages for complete details and further examples.
345.LP
346To do this we execute
347.sp
348.RS
349	mprof -b -m Polyhedra
350.RE
351.sp
352which kicks off profiling and restricts attention to the Polyhedra module
353whenever it runs. The -b simply says begin profiling. Note that no command
354to execute is given to
355.I mprof
356at this stage. Then run the command
357.sp
358.RS
359	wm/polyhedra &
360.RE
361.sp
362and interact with it. Now show memory statistics
363.sp
364.RS
365	mprof
366.RE
367.sp
368This gives
369.br
370.DS
371Module: Polyhedra(/dis/wm/polyhedra.dis)
372
37344	100	100		sys = load Sys Sys->PATH;
37445	132	132		draw = load Draw Draw->PATH;
37546	68	68		tk = load Tk Tk->PATH;
37647	1788	1788		wmlib = load Wmlib Wmlib->PATH;
37748	232	232		bufio = load Bufio Bufio->PATH;
37849	68	68		math = load Math Math->PATH;
37950	204	204		rand = load Rand Rand->PATH;
38051	0	3504		daytime = load Daytime Daytime->PATH;
38152	544	544		polyfill = load Polyfill Polyfill->PATH;
38253	1824	1824		smenu = load Smenu Smenu->PATH;
38386	36	36		cmdch := chan of string;
38495	36	36		sync := chan of int;
38596	36	36		chanθ := chan of real;
386103	68	68		shade = array[NSHADES] of ref Image;
387116	36	36		yieldc := chan of int;
388120	36	36		sm := array[2] of ref Scrollmenu;
389378	68	176			s += " (" + string p.indx + ")";
390403	36	36		vec := array[2] of array of Vector;
391404	740	740		vec[0] = array[V] of Vector;
392405	740	740		vec[1] = array[V] of Vector;
393407	36	36			norm = array[2] of array of Vector;
394408	612	612			norm[0] = array[F] of Vector;
395409	612	612			norm[1] = array[F] of Vector;
396492	68	68		ap := array[n+1] of Point;
397609	164	164		geo := ref Geom;
398610	36	36		TM := array[4] of array of real;
399612	272	272			TM[i] = array[4] of real;
400663	8000	8000			p := ref Polyhedron;
401707	740	740		p.v = array[p.V] of Vector;
402710	612	612		p.f = array[p.F] of Vector;
403713	132	132		p.fv = array[p.F] of array of int;
404716	164	164		p.vf = array[p.V] of array of int;
405729	9504	9640			return s[0: len s - 1];
406750	3672	3672		a := array[n+2] of int;
407768	0	136				return (n, s[i+1:]);
408779	0	104				return (r, s[i+1:]);
409802	0	68			s = s[1:];
410806	0	72			s = s[0: ln-1];
411.DE
412.DS
413866	0	200		cmd(mainwin, ".f1.txt configure -text {" + s + "}");
414874	0	356		labs := array[n] of string;
415881	0	5128			labs[i++] = string q.indx + " " + name;
416884	0	68		cmd(top, mname + " configure -borderwidth 3");
417920	0	104			cmd(win, ". configure -height " + string (scrsize.y - bd * 2));
418934	0	244		cmd(win, ". configure -x " + string actr.min.x + " -y " + string actr.min.y);
419
420Module totals	31416	33984
421.DE
422.br
423We get the source line number, the amount of memory in bytes
424currently allocated on that line, the high-water mark in bytes and then the source. Thus
425loading the Sys module on line 44 used 100 bytes and this memory is
426still allocated. Loading Daytime on line 51 used 3504 bytes but this is now
427released (because the module pointer is set to nil in the source and the
428memory has been reclaimed). The string concatenation on line 378 currently uses 68 bytes
429but at it's worst it was 176 bytes.
430.LP
431Further interaction with wm/polyhedra can now be done and memory
432statistics reviewed before the
433profiling session is ended, throwing away the accumulated memory
434statistics inside the kernel with
435.sp
436.RS
437	mprof -c
438.RE
439.sp
440The -c option simply says cease profiling.
441.LP
442An alternative to
443.I mprof
444is
445.I wm/mprof
446which shows the statistics graphically.
447It's options are pretty much the same as those to
448.I mprof .
449Lines of code
450which have allocated more of the memory are shown in darker shades of red.
451.NH 1
452Manual pages
453.LP
454Further information and other examples are given in the following
455manual pages :-
456.sp
457.RS
458.I cprof(1)
459.br
460.I mprof(1)
461.br
462.I prof(1)
463.br
464.I wm-cprof(1)
465.br
466.I wm-mprof(1)
467.RE
468.sp
469For the lower level library module interface to these profilers
470.sp
471.RS
472.I prof(2)
473.RE
474.sp
475For the kernel profile device which gathers timing, coverage and memory
476statistics
477.sp
478.RS
479.I prof(3)
480.RE
481.br
482.NH 1
483Sources
484.LP
485The relevant sources are
486.sp
487.RS
488.CW /module/profile.m
489.br
490.CW /appl/lib/profile.b
491.br
492.CW /appl/cmd/cprof.b
493.br
494.CW /appl/cmd/mprof.b
495.br
496.CW /appl/cmd/prof.b
497.br
498.CW /appl/wm/cprof.b
499.br
500.CW /appl/wm/mprof.b
501.br
502.CW /emu/devprof.c
503.br
504.CW /os/port/devprof.c
505.RE
506.NH 1
507Addendum
508.LP
509A gui version of
510.I prof
511has been added for completeness. See the manual page
512.I wm-prof(1)
513and the source
514.CW /appl/wm/prof.b .
515