1*357f1050SThomas VeermanThis directory contains some examples illustrating techniques for extracting 2*357f1050SThomas Veermanhigh-performance from flex scanners. Each program implements a simplified 3*357f1050SThomas Veermanversion of the Unix "wc" tool: read text from stdin and print the number of 4*357f1050SThomas Veermancharacters, words, and lines present in the text. All programs were compiled 5*357f1050SThomas Veermanusing gcc (version unavailable, sorry) with the -O flag, and run on a 6*357f1050SThomas VeermanSPARCstation 1+. The input used was a PostScript file, mainly containing 7*357f1050SThomas Veermanfigures, with the following "wc" counts: 8*357f1050SThomas Veerman 9*357f1050SThomas Veerman lines words characters 10*357f1050SThomas Veerman 214217 635954 2592172 11*357f1050SThomas Veerman 12*357f1050SThomas Veerman 13*357f1050SThomas VeermanThe basic principles illustrated by these programs are: 14*357f1050SThomas Veerman 15*357f1050SThomas Veerman - match as much text with each rule as possible 16*357f1050SThomas Veerman - adding rules does not slow you down! 17*357f1050SThomas Veerman - avoid backing up 18*357f1050SThomas Veerman 19*357f1050SThomas Veermanand the big caveat that comes with them is: 20*357f1050SThomas Veerman 21*357f1050SThomas Veerman - you buy performance with decreased maintainability; make 22*357f1050SThomas Veerman sure you really need it before applying the above techniques. 23*357f1050SThomas Veerman 24*357f1050SThomas VeermanSee the "Performance Considerations" section of flexdoc for more 25*357f1050SThomas Veermandetails regarding these principles. 26*357f1050SThomas Veerman 27*357f1050SThomas Veerman 28*357f1050SThomas VeermanThe different versions of "wc": 29*357f1050SThomas Veerman 30*357f1050SThomas Veerman mywc.c 31*357f1050SThomas Veerman a simple but fairly efficient C version 32*357f1050SThomas Veerman 33*357f1050SThomas Veerman wc1.l a naive flex "wc" implementation 34*357f1050SThomas Veerman 35*357f1050SThomas Veerman wc2.l somewhat faster; adds rules to match multiple tokens at once 36*357f1050SThomas Veerman 37*357f1050SThomas Veerman wc3.l faster still; adds more rules to match longer runs of tokens 38*357f1050SThomas Veerman 39*357f1050SThomas Veerman wc4.l fastest; still more rules added; hard to do much better 40*357f1050SThomas Veerman using flex (or, I suspect, hand-coding) 41*357f1050SThomas Veerman 42*357f1050SThomas Veerman wc5.l identical to wc3.l except one rule has been slightly 43*357f1050SThomas Veerman shortened, introducing backing-up 44*357f1050SThomas Veerman 45*357f1050SThomas VeermanTiming results (all times in user CPU seconds): 46*357f1050SThomas Veerman 47*357f1050SThomas Veerman program time notes 48*357f1050SThomas Veerman ------- ---- ----- 49*357f1050SThomas Veerman wc1 16.4 default flex table compression (= -Cem) 50*357f1050SThomas Veerman wc1 6.7 -Cf compression option 51*357f1050SThomas Veerman /bin/wc 5.8 Sun's standard "wc" tool 52*357f1050SThomas Veerman mywc 4.6 simple but better C implementation! 53*357f1050SThomas Veerman wc2 4.6 as good as C implementation; built using -Cf 54*357f1050SThomas Veerman wc3 3.8 -Cf 55*357f1050SThomas Veerman wc4 3.3 -Cf 56*357f1050SThomas Veerman wc5 5.7 -Cf; ouch, backing up is expensive 57