xref: /netbsd-src/external/bsd/flex/dist/examples/fastwc/README (revision 3c3a7b7603b4ed4cb76dd5c5a3e781ddca2349bb)
1*3c3a7b76SchristosThis directory contains some examples illustrating techniques for extracting
2*3c3a7b76Schristoshigh-performance from flex scanners.  Each program implements a simplified
3*3c3a7b76Schristosversion of the Unix "wc" tool: read text from stdin and print the number of
4*3c3a7b76Schristoscharacters, words, and lines present in the text.  All programs were compiled
5*3c3a7b76Schristosusing gcc (version unavailable, sorry) with the -O flag, and run on a
6*3c3a7b76SchristosSPARCstation 1+.  The input used was a PostScript file, mainly containing
7*3c3a7b76Schristosfigures, with the following "wc" counts:
8*3c3a7b76Schristos
9*3c3a7b76Schristos	lines  words  characters
10*3c3a7b76Schristos	214217 635954 2592172
11*3c3a7b76Schristos
12*3c3a7b76Schristos
13*3c3a7b76SchristosThe basic principles illustrated by these programs are:
14*3c3a7b76Schristos
15*3c3a7b76Schristos	- match as much text with each rule as possible
16*3c3a7b76Schristos	- adding rules does not slow you down!
17*3c3a7b76Schristos	- avoid backing up
18*3c3a7b76Schristos
19*3c3a7b76Schristosand the big caveat that comes with them is:
20*3c3a7b76Schristos
21*3c3a7b76Schristos	- you buy performance with decreased maintainability; make
22*3c3a7b76Schristos	  sure you really need it before applying the above techniques.
23*3c3a7b76Schristos
24*3c3a7b76SchristosSee the "Performance Considerations" section of flexdoc for more
25*3c3a7b76Schristosdetails regarding these principles.
26*3c3a7b76Schristos
27*3c3a7b76Schristos
28*3c3a7b76SchristosThe different versions of "wc":
29*3c3a7b76Schristos
30*3c3a7b76Schristos	mywc.c
31*3c3a7b76Schristos		a simple but fairly efficient C version
32*3c3a7b76Schristos
33*3c3a7b76Schristos	wc1.l	a naive flex "wc" implementation
34*3c3a7b76Schristos
35*3c3a7b76Schristos	wc2.l	somewhat faster; adds rules to match multiple tokens at once
36*3c3a7b76Schristos
37*3c3a7b76Schristos	wc3.l	faster still; adds more rules to match longer runs of tokens
38*3c3a7b76Schristos
39*3c3a7b76Schristos	wc4.l	fastest; still more rules added; hard to do much better
40*3c3a7b76Schristos		using flex (or, I suspect, hand-coding)
41*3c3a7b76Schristos
42*3c3a7b76Schristos	wc5.l	identical to wc3.l except one rule has been slightly
43*3c3a7b76Schristos		shortened, introducing backing-up
44*3c3a7b76Schristos
45*3c3a7b76SchristosTiming results (all times in user CPU seconds):
46*3c3a7b76Schristos
47*3c3a7b76Schristos	program	  time 	 notes
48*3c3a7b76Schristos	-------   ----   -----
49*3c3a7b76Schristos	wc1       16.4   default flex table compression (= -Cem)
50*3c3a7b76Schristos	wc1        6.7   -Cf compression option
51*3c3a7b76Schristos	/bin/wc	   5.8	 Sun's standard "wc" tool
52*3c3a7b76Schristos	mywc	   4.6   simple but better C implementation!
53*3c3a7b76Schristos	wc2	   4.6   as good as C implementation; built using -Cf
54*3c3a7b76Schristos	wc3	   3.8   -Cf
55*3c3a7b76Schristos	wc4	   3.3   -Cf
56*3c3a7b76Schristos	wc5	   5.7   -Cf; ouch, backing up is expensive
57