1.\" $OpenBSD: flex.1,v 1.46 2024/11/09 18:06:00 op Exp $ 2.\" 3.\" Copyright (c) 1990 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" Vern Paxson. 8.\" 9.\" The United States Government has rights in this work pursuant 10.\" to contract no. DE-AC03-76SF00098 between the United States 11.\" Department of Energy and the University of California. 12.\" 13.\" Redistribution and use in source and binary forms, with or without 14.\" modification, are permitted provided that the following conditions 15.\" are met: 16.\" 17.\" 1. Redistributions of source code must retain the above copyright 18.\" notice, this list of conditions and the following disclaimer. 19.\" 2. Redistributions in binary form must reproduce the above copyright 20.\" notice, this list of conditions and the following disclaimer in the 21.\" documentation and/or other materials provided with the distribution. 22.\" 23.\" Neither the name of the University nor the names of its contributors 24.\" may be used to endorse or promote products derived from this software 25.\" without specific prior written permission. 26.\" 27.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR 28.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED 29.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 30.\" PURPOSE. 31.\" 32.Dd $Mdocdate: November 9 2024 $ 33.Dt FLEX 1 34.Os 35.Sh NAME 36.Nm flex , 37.Nm flex++ , 38.Nm lex 39.Nd fast lexical analyzer generator 40.Sh SYNOPSIS 41.Nm 42.Bk -words 43.Op Fl 78BbdFfhIiLlnpsTtVvw+? 44.Op Fl C Ns Op Cm aeFfmr 45.Op Fl Fl help 46.Op Fl Fl version 47.Op Fl o Ns Ar output 48.Op Fl P Ns Ar prefix 49.Op Fl S Ns Ar skeleton 50.Op Ar 51.Ek 52.Sh DESCRIPTION 53.Nm 54is a tool for generating 55.Em scanners : 56programs which recognize lexical patterns in text. 57.Nm 58reads the given input files, or its standard input if no file names are given, 59for a description of a scanner to generate. 60The description is in the form of pairs of regular expressions and C code, 61called 62.Em rules . 63.Nm 64generates as output a C source file, 65.Pa lex.yy.c , 66which defines a routine 67.Fn yylex . 68This file is compiled and linked with the 69.Fl lfl 70library to produce an executable. 71When the executable is run, it analyzes its input for occurrences 72of the regular expressions. 73Whenever it finds one, it executes the corresponding C code. 74.Pp 75.Nm lex 76is a synonym for 77.Nm flex . 78.Nm flex++ 79is a synonym for 80.Nm 81.Fl + . 82.Pp 83The manual includes both tutorial and reference sections: 84.Bl -ohang 85.It Sy Some Simple Examples 86.It Sy Format of the Input File 87.It Sy Patterns 88The extended regular expressions used by 89.Nm . 90.It Sy How the Input is Matched 91The rules for determining what has been matched. 92.It Sy Actions 93How to specify what to do when a pattern is matched. 94.It Sy The Generated Scanner 95Details regarding the scanner that 96.Nm 97produces; 98how to control the input source. 99.It Sy Start Conditions 100Introducing context into scanners, and managing 101.Qq mini-scanners . 102.It Sy Multiple Input Buffers 103How to manipulate multiple input sources; 104how to scan from strings instead of files. 105.It Sy End-of-File Rules 106Special rules for matching the end of the input. 107.It Sy Miscellaneous Macros 108A summary of macros available to the actions. 109.It Sy Values Available to the User 110A summary of values available to the actions. 111.It Sy Interfacing with Yacc 112Connecting flex scanners together with 113.Xr yacc 1 114parsers. 115.It Sy Options 116.Nm 117command-line options, and the 118.Dq %option 119directive. 120.It Sy Performance Considerations 121How to make scanners go as fast as possible. 122.It Sy Generating C++ Scanners 123The 124.Pq experimental 125facility for generating C++ scanner classes. 126.It Sy Incompatibilities with Lex and POSIX 127How 128.Nm 129differs from 130.At 131.Nm lex 132and the 133.Tn POSIX 134.Nm lex 135standard. 136.It Sy Files 137Files used by 138.Nm . 139.It Sy Diagnostics 140Those error messages produced by 141.Nm 142.Pq or scanners it generates 143whose meanings might not be apparent. 144.It Sy See Also 145Other documentation, related tools. 146.It Sy Authors 147Includes contact information. 148.It Sy Bugs 149Known problems with 150.Nm . 151.El 152.Sh SOME SIMPLE EXAMPLES 153First some simple examples to get the flavor of how one uses 154.Nm . 155The following 156.Nm 157input specifies a scanner which whenever it encounters the string 158.Qq username 159will replace it with the user's login name: 160.Bd -literal -offset indent 161%% 162username printf("%s", getlogin()); 163.Ed 164.Pp 165By default, any text not matched by a 166.Nm 167scanner is copied to the output, so the net effect of this scanner is 168to copy its input file to its output with each occurrence of 169.Qq username 170expanded. 171In this input, there is just one rule. 172.Qq username 173is the 174.Em pattern 175and the 176.Qq printf 177is the 178.Em action . 179The 180.Qq %% 181marks the beginning of the rules. 182.Pp 183Here's another simple example: 184.Bd -literal -offset indent 185%{ 186int num_lines = 0, num_chars = 0; 187%} 188 189%% 190\en ++num_lines; ++num_chars; 191\&. ++num_chars; 192 193%% 194main() 195{ 196 yylex(); 197 printf("# of lines = %d, # of chars = %d\en", 198 num_lines, num_chars); 199} 200.Ed 201.Pp 202This scanner counts the number of characters and the number 203of lines in its input 204(it produces no output other than the final report on the counts). 205The first line declares two globals, 206.Qq num_lines 207and 208.Qq num_chars , 209which are accessible both inside 210.Fn yylex 211and in the 212.Fn main 213routine declared after the second 214.Qq %% . 215There are two rules, one which matches a newline 216.Pq \&"\en\&" 217and increments both the line count and the character count, 218and one which matches any character other than a newline 219(indicated by the 220.Qq \&. 221regular expression). 222.Pp 223A somewhat more complicated example: 224.Bd -literal -offset indent 225/* scanner for a toy Pascal-like language */ 226 227DIGIT [0-9] 228ID [a-z][a-z0-9]* 229 230%% 231 232{DIGIT}+ { 233 printf("An integer: %s\en", yytext); 234} 235 236{DIGIT}+"."{DIGIT}* { 237 printf("A float: %s\en", yytext); 238} 239 240if|then|begin|end|procedure|function { 241 printf("A keyword: %s\en", yytext); 242} 243 244{ID} printf("An identifier: %s\en", yytext); 245 246"+"|"-"|"*"|"/" printf("An operator: %s\en", yytext); 247 248"{"[^}\en]*"}" /* eat up one-line comments */ 249 250[ \et\en]+ /* eat up whitespace */ 251 252\&. printf("Unrecognized character: %s\en", yytext); 253 254%% 255 256int 257main(int argc, char *argv[]) 258{ 259 ++argv; --argc; /* skip over program name */ 260 if (argc > 0) 261 yyin = fopen(argv[0], "r"); 262 else 263 yyin = stdin; 264 265 yylex(); 266} 267.Ed 268.Pp 269This is the beginnings of a simple scanner for a language like Pascal. 270It identifies different types of 271.Em tokens 272and reports on what it has seen. 273.Pp 274The details of this example will be explained in the following sections. 275.Sh FORMAT OF THE INPUT FILE 276The 277.Nm 278input file consists of three sections, separated by a line with just 279.Qq %% 280in it: 281.Bd -unfilled -offset indent 282definitions 283%% 284rules 285%% 286user code 287.Ed 288.Pp 289The 290.Em definitions 291section contains declarations of simple 292.Em name 293definitions to simplify the scanner specification, and declarations of 294.Em start conditions , 295which are explained in a later section. 296.Pp 297Name definitions have the form: 298.Pp 299.D1 name definition 300.Pp 301The 302.Qq name 303is a word beginning with a letter or an underscore 304.Pq Sq _ 305followed by zero or more letters, digits, 306.Sq _ , 307or 308.Sq - 309.Pq dash . 310The definition is taken to begin at the first non-whitespace character 311following the name and continuing to the end of the line. 312The definition can subsequently be referred to using 313.Qq {name} , 314which will expand to 315.Qq (definition) . 316For example: 317.Bd -literal -offset indent 318DIGIT [0-9] 319ID [a-z][a-z0-9]* 320.Ed 321.Pp 322This defines 323.Qq DIGIT 324to be a regular expression which matches a single digit, and 325.Qq ID 326to be a regular expression which matches a letter 327followed by zero-or-more letters-or-digits. 328A subsequent reference to 329.Pp 330.Dl {DIGIT}+"."{DIGIT}* 331.Pp 332is identical to 333.Pp 334.Dl ([0-9])+"."([0-9])* 335.Pp 336and matches one-or-more digits followed by a 337.Sq .\& 338followed by zero-or-more digits. 339.Pp 340The 341.Em rules 342section of the 343.Nm 344input contains a series of rules of the form: 345.Pp 346.Dl pattern action 347.Pp 348The pattern must be unindented and the action must begin 349on the same line. 350.Pp 351See below for a further description of patterns and actions. 352.Pp 353Finally, the user code section is simply copied to 354.Pa lex.yy.c 355verbatim. 356It is used for companion routines which call or are called by the scanner. 357The presence of this section is optional; 358if it is missing, the second 359.Qq %% 360in the input file may be skipped too. 361.Pp 362In the definitions and rules sections, any indented text or text enclosed in 363.Sq %{ 364and 365.Sq %} 366is copied verbatim to the output 367.Pq with the %{}'s removed . 368The %{}'s must appear unindented on lines by themselves. 369.Pp 370In the rules section, 371any indented or %{} text appearing before the first rule may be used to 372declare variables which are local to the scanning routine and 373.Pq after the declarations 374code which is to be executed whenever the scanning routine is entered. 375Other indented or %{} text in the rule section is still copied to the output, 376but its meaning is not well-defined and it may well cause compile-time 377errors (this feature is present for 378.Tn POSIX 379compliance; see below for other such features). 380.Pp 381In the definitions section 382.Pq but not in the rules section , 383an unindented comment 384(i.e., a line beginning with 385.Qq /* ) 386is also copied verbatim to the output up to the next 387.Qq */ . 388.Sh PATTERNS 389The patterns in the input are written using an extended set of regular 390expressions. 391These are: 392.Bl -tag -width "XXXXXXXX" 393.It x 394Match the character 395.Sq x . 396.It .\& 397Any character 398.Pq byte 399except newline. 400.It [xyz] 401A 402.Qq character class ; 403in this case, the pattern matches either an 404.Sq x , 405a 406.Sq y , 407or a 408.Sq z . 409.It [abj-oZ] 410A 411.Qq character class 412with a range in it; matches an 413.Sq a , 414a 415.Sq b , 416any letter from 417.Sq j 418through 419.Sq o , 420or a 421.Sq Z . 422.It [^A-Z] 423A 424.Qq negated character class , 425i.e., any character but those in the class. 426In this case, any character EXCEPT an uppercase letter. 427.It [^A-Z\en] 428Any character EXCEPT an uppercase letter or a newline. 429.It r* 430Zero or more r's, where 431.Sq r 432is any regular expression. 433.It r+ 434One or more r's. 435.It r? 436Zero or one r's (that is, 437.Qq an optional r ) . 438.It r{2,5} 439Anywhere from two to five r's. 440.It r{2,} 441Two or more r's. 442.It r{4} 443Exactly 4 r's. 444.It {name} 445The expansion of the 446.Qq name 447definition 448.Pq see above . 449.It \&"[xyz]\e\&"foo\&" 450The literal string: [xyz]"foo. 451.It \eX 452If 453.Sq X 454is an 455.Sq a , 456.Sq b , 457.Sq f , 458.Sq n , 459.Sq r , 460.Sq t , 461or 462.Sq v , 463then the ANSI-C interpretation of 464.Sq \eX . 465Otherwise, a literal 466.Sq X 467(used to escape operators such as 468.Sq * ) . 469.It \e0 470A NUL character 471.Pq ASCII code 0 . 472.It \e123 473The character with octal value 123. 474.It \ex2a 475The character with hexadecimal value 2a. 476.It (r) 477Match an 478.Sq r ; 479parentheses are used to override precedence 480.Pq see below . 481.It rs 482The regular expression 483.Sq r 484followed by the regular expression 485.Sq s ; 486called 487.Qq concatenation . 488.It r|s 489Either an 490.Sq r 491or an 492.Sq s . 493.It r/s 494An 495.Sq r , 496but only if it is followed by an 497.Sq s . 498The text matched by 499.Sq s 500is included when determining whether this rule is the 501.Qq longest match , 502but is then returned to the input before the action is executed. 503So the action only sees the text matched by 504.Sq r . 505This type of pattern is called 506.Qq trailing context . 507(There are some combinations of r/s that 508.Nm 509cannot match correctly; see notes in the 510.Sx BUGS 511section below regarding 512.Qq dangerous trailing context . ) 513.It ^r 514An 515.Sq r , 516but only at the beginning of a line 517(i.e., just starting to scan, or right after a newline has been scanned). 518.It r$ 519An 520.Sq r , 521but only at the end of a line 522.Pq i.e., just before a newline . 523Equivalent to 524.Qq r/\en . 525.Pp 526Note that 527.Nm flex Ns 's 528notion of 529.Qq newline 530is exactly whatever the C compiler used to compile 531.Nm 532interprets 533.Sq \en 534as. 535.\" In particular, on some DOS systems you must either filter out \er's in the 536.\" input yourself, or explicitly use r/\er\en for 537.\" .Qq r$ . 538.It <s>r 539An 540.Sq r , 541but only in start condition 542.Sq s 543.Pq see below for discussion of start conditions . 544.It <s1,s2,s3>r 545The same, but in any of start conditions s1, s2, or s3. 546.It <*>r 547An 548.Sq r 549in any start condition, even an exclusive one. 550.It <<EOF>> 551An end-of-file. 552.It <s1,s2><<EOF>> 553An end-of-file when in start condition s1 or s2. 554.El 555.Pp 556Note that inside of a character class, all regular expression operators 557lose their special meaning except escape 558.Pq Sq \e 559and the character class operators, 560.Sq - , 561.Sq ]\& , 562and, at the beginning of the class, 563.Sq ^ . 564.Pp 565The regular expressions listed above are grouped according to 566precedence, from highest precedence at the top to lowest at the bottom. 567Those grouped together have equal precedence. 568For example, 569.Pp 570.D1 foo|bar* 571.Pp 572is the same as 573.Pp 574.D1 (foo)|(ba(r*)) 575.Pp 576since the 577.Sq * 578operator has higher precedence than concatenation, 579and concatenation higher than alternation 580.Pq Sq |\& . 581This pattern therefore matches 582.Em either 583the string 584.Qq foo 585.Em or 586the string 587.Qq ba 588followed by zero-or-more r's. 589To match 590.Qq foo 591or zero-or-more "bar"'s, 592use: 593.Pp 594.D1 foo|(bar)* 595.Pp 596and to match zero-or-more "foo"'s-or-"bar"'s: 597.Pp 598.D1 (foo|bar)* 599.Pp 600In addition to characters and ranges of characters, character classes 601can also contain character class 602.Em expressions . 603These are expressions enclosed inside 604.Sq [: 605and 606.Sq :] 607delimiters (which themselves must appear between the 608.Sq \&[ 609and 610.Sq ]\& 611of the 612character class; other elements may occur inside the character class, too). 613The valid expressions are: 614.Bd -unfilled -offset indent 615[:alnum:] [:alpha:] [:blank:] 616[:cntrl:] [:digit:] [:graph:] 617[:lower:] [:print:] [:punct:] 618[:space:] [:upper:] [:xdigit:] 619.Ed 620.Pp 621These expressions all designate a set of characters equivalent to 622the corresponding standard C 623.Fn isXXX 624function. 625For example, [:alnum:] designates those characters for which 626.Xr isalnum 3 627returns true \- i.e., any alphabetic or numeric. 628Some systems don't provide 629.Xr isblank 3 , 630so 631.Nm 632defines [:blank:] as a blank or a tab. 633.Pp 634For example, the following character classes are all equivalent: 635.Bd -unfilled -offset indent 636[[:alnum:]] 637[[:alpha:][:digit:]] 638[[:alpha:]0-9] 639[a-zA-Z0-9] 640.Ed 641.Pp 642If the scanner is case-insensitive (the 643.Fl i 644flag), then [:upper:] and [:lower:] are equivalent to [:alpha:]. 645.Pp 646Some notes on patterns: 647.Bl -dash 648.It 649A negated character class such as the example 650.Qq [^A-Z] 651above will match a newline unless "\en" 652.Pq or an equivalent escape sequence 653is one of the characters explicitly present in the negated character class 654(e.g., 655.Qq [^A-Z\en] ) . 656This is unlike how many other regular expression tools treat negated character 657classes, but unfortunately the inconsistency is historically entrenched. 658Matching newlines means that a pattern like 659.Qq [^"]* 660can match the entire input unless there's another quote in the input. 661.It 662A rule can have at most one instance of trailing context 663(the 664.Sq / 665operator or the 666.Sq $ 667operator). 668The start condition, 669.Sq ^ , 670and 671.Qq <<EOF>> 672patterns can only occur at the beginning of a pattern and, as well as with 673.Sq / 674and 675.Sq $ , 676cannot be grouped inside parentheses. 677A 678.Sq ^ 679which does not occur at the beginning of a rule or a 680.Sq $ 681which does not occur at the end of a rule loses its special properties 682and is treated as a normal character. 683.It 684The following are illegal: 685.Bd -unfilled -offset indent 686foo/bar$ 687<sc1>foo<sc2>bar 688.Ed 689.Pp 690Note that the first of these, can be written 691.Qq foo/bar\en . 692.It 693The following will result in 694.Sq $ 695or 696.Sq ^ 697being treated as a normal character: 698.Bd -unfilled -offset indent 699foo|(bar$) 700foo|^bar 701.Ed 702.Pp 703If what's wanted is a 704.Qq foo 705or a bar-followed-by-a-newline, the following could be used 706(the special 707.Sq |\& 708action is explained below): 709.Bd -unfilled -offset indent 710foo | 711bar$ /* action goes here */ 712.Ed 713.Pp 714A similar trick will work for matching a foo or a 715bar-at-the-beginning-of-a-line. 716.El 717.Sh HOW THE INPUT IS MATCHED 718When the generated scanner is run, 719it analyzes its input looking for strings which match any of its patterns. 720If it finds more than one match, 721it takes the one matching the most text 722(for trailing context rules, this includes the length of the trailing part, 723even though it will then be returned to the input). 724If it finds two or more matches of the same length, 725the rule listed first in the 726.Nm 727input file is chosen. 728.Pp 729Once the match is determined, the text corresponding to the match 730(called the 731.Em token ) 732is made available in the global character pointer 733.Fa yytext , 734and its length in the global integer 735.Fa yyleng . 736The 737.Em action 738corresponding to the matched pattern is then executed 739.Pq a more detailed description of actions follows , 740and then the remaining input is scanned for another match. 741.Pp 742If no match is found, then the default rule is executed: 743the next character in the input is considered matched and 744copied to the standard output. 745Thus, the simplest legal 746.Nm 747input is: 748.Pp 749.D1 %% 750.Pp 751which generates a scanner that simply copies its input 752.Pq one character at a time 753to its output. 754.Pp 755Note that 756.Fa yytext 757can be defined in two different ways: 758either as a character pointer or as a character array. 759Which definition 760.Nm 761uses can be controlled by including one of the special directives 762.Dq %pointer 763or 764.Dq %array 765in the first 766.Pq definitions 767section of flex input. 768The default is 769.Dq %pointer , 770unless the 771.Fl l 772.Nm lex 773compatibility option is used, in which case 774.Fa yytext 775will be an array. 776The advantage of using 777.Dq %pointer 778is substantially faster scanning and no buffer overflow when matching 779very large tokens 780.Pq unless not enough dynamic memory is available . 781The disadvantage is that actions are restricted in how they can modify 782.Fa yytext 783.Pq see the next section , 784and calls to the 785.Fn unput 786function destroy the present contents of 787.Fa yytext , 788which can be a considerable porting headache when moving between different 789.Nm lex 790versions. 791.Pp 792The advantage of 793.Dq %array 794is that 795.Fa yytext 796can be modified as much as wanted, and calls to 797.Fn unput 798do not destroy 799.Fa yytext 800.Pq see below . 801Furthermore, existing 802.Nm lex 803programs sometimes access 804.Fa yytext 805externally using declarations of the form: 806.Pp 807.D1 extern char yytext[]; 808.Pp 809This definition is erroneous when used with 810.Dq %pointer , 811but correct for 812.Dq %array . 813.Pp 814.Dq %array 815defines 816.Fa yytext 817to be an array of 818.Dv YYLMAX 819characters, which defaults to a fairly large value. 820The size can be changed by simply #define'ing 821.Dv YYLMAX 822to a different value in the first section of 823.Nm 824input. 825As mentioned above, with 826.Dq %pointer 827yytext grows dynamically to accommodate large tokens. 828While this means a 829.Dq %pointer 830scanner can accommodate very large tokens 831.Pq such as matching entire blocks of comments , 832bear in mind that each time the scanner must resize 833.Fa yytext 834it also must rescan the entire token from the beginning, so matching such 835tokens can prove slow. 836.Fa yytext 837presently does not dynamically grow if a call to 838.Fn unput 839results in too much text being pushed back; instead, a run-time error results. 840.Pp 841Also note that 842.Dq %array 843cannot be used with C++ scanner classes 844.Pq the c++ option; see below . 845.Sh ACTIONS 846Each pattern in a rule has a corresponding action, 847which can be any arbitrary C statement. 848The pattern ends at the first non-escaped whitespace character; 849the remainder of the line is its action. 850If the action is empty, 851then when the pattern is matched the input token is simply discarded. 852For example, here is the specification for a program 853which deletes all occurrences of 854.Qq zap me 855from its input: 856.Bd -literal -offset indent 857%% 858"zap me" 859.Ed 860.Pp 861(It will copy all other characters in the input to the output since 862they will be matched by the default rule.) 863.Pp 864Here is a program which compresses multiple blanks and tabs down to 865a single blank, and throws away whitespace found at the end of a line: 866.Bd -literal -offset indent 867%% 868[ \et]+ putchar(' '); 869[ \et]+$ /* ignore this token */ 870.Ed 871.Pp 872If the action contains a 873.Sq { , 874then the action spans till the balancing 875.Sq } 876is found, and the action may cross multiple lines. 877.Nm 878knows about C strings and comments and won't be fooled by braces found 879within them, but also allows actions to begin with 880.Sq %{ 881and will consider the action to be all the text up to the next 882.Sq %} 883.Pq regardless of ordinary braces inside the action . 884.Pp 885An action consisting solely of a vertical bar 886.Pq Sq |\& 887means 888.Qq same as the action for the next rule . 889See below for an illustration. 890.Pp 891Actions can include arbitrary C code, 892including return statements to return a value to whatever routine called 893.Fn yylex . 894Each time 895.Fn yylex 896is called, it continues processing tokens from where it last left off 897until it either reaches the end of the file or executes a return. 898.Pp 899Actions are free to modify 900.Fa yytext 901except for lengthening it 902(adding characters to its end \- these will overwrite later characters in the 903input stream). 904This, however, does not apply when using 905.Dq %array 906.Pq see above ; 907in that case, 908.Fa yytext 909may be freely modified in any way. 910.Pp 911Actions are free to modify 912.Fa yyleng 913except they should not do so if the action also includes use of 914.Fn yymore 915.Pq see below . 916.Pp 917There are a number of special directives which can be included within 918an action: 919.Bl -tag -width Ds 920.It ECHO 921Copies 922.Fa yytext 923to the scanner's output. 924.It BEGIN 925Followed by the name of a start condition, places the scanner in the 926corresponding start condition 927.Pq see below . 928.It REJECT 929Directs the scanner to proceed on to the 930.Qq second best 931rule which matched the input 932.Pq or a prefix of the input . 933The rule is chosen as described above in 934.Sx HOW THE INPUT IS MATCHED , 935and 936.Fa yytext 937and 938.Fa yyleng 939set up appropriately. 940It may either be one which matched as much text 941as the originally chosen rule but came later in the 942.Nm 943input file, or one which matched less text. 944For example, the following will both count the 945words in the input and call the routine 946.Fn special 947whenever 948.Qq frob 949is seen: 950.Bd -literal -offset indent 951int word_count = 0; 952%% 953 954frob special(); REJECT; 955[^ \et\en]+ ++word_count; 956.Ed 957.Pp 958Without the 959.Em REJECT , 960any "frob"'s in the input would not be counted as words, 961since the scanner normally executes only one action per token. 962Multiple 963.Em REJECT Ns 's 964are allowed, 965each one finding the next best choice to the currently active rule. 966For example, when the following scanner scans the token 967.Qq abcd , 968it will write 969.Qq abcdabcaba 970to the output: 971.Bd -literal -offset indent 972%% 973a | 974ab | 975abc | 976abcd ECHO; REJECT; 977\&.|\en /* eat up any unmatched character */ 978.Ed 979.Pp 980(The first three rules share the fourth's action since they use 981the special 982.Sq |\& 983action.) 984.Em REJECT 985is a particularly expensive feature in terms of scanner performance; 986if it is used in any of the scanner's actions it will slow down 987all of the scanner's matching. 988Furthermore, 989.Em REJECT 990cannot be used with the 991.Fl Cf 992or 993.Fl CF 994options 995.Pq see below . 996.Pp 997Note also that unlike the other special actions, 998.Em REJECT 999is a 1000.Em branch ; 1001code immediately following it in the action will not be executed. 1002.It yymore() 1003Tells the scanner that the next time it matches a rule, the corresponding 1004token should be appended onto the current value of 1005.Fa yytext 1006rather than replacing it. 1007For example, given the input 1008.Qq mega-kludge 1009the following will write 1010.Qq mega-mega-kludge 1011to the output: 1012.Bd -literal -offset indent 1013%% 1014mega- ECHO; yymore(); 1015kludge ECHO; 1016.Ed 1017.Pp 1018First 1019.Qq mega- 1020is matched and echoed to the output. 1021Then 1022.Qq kludge 1023is matched, but the previous 1024.Qq mega- 1025is still hanging around at the beginning of 1026.Fa yytext 1027so the 1028.Em ECHO 1029for the 1030.Qq kludge 1031rule will actually write 1032.Qq mega-kludge . 1033.Pp 1034Two notes regarding use of 1035.Fn yymore : 1036First, 1037.Fn yymore 1038depends on the value of 1039.Fa yyleng 1040correctly reflecting the size of the current token, so 1041.Fa yyleng 1042must not be modified when using 1043.Fn yymore . 1044Second, the presence of 1045.Fn yymore 1046in the scanner's action entails a minor performance penalty in the 1047scanner's matching speed. 1048.It yyless(n) 1049Returns all but the first 1050.Ar n 1051characters of the current token back to the input stream, where they 1052will be rescanned when the scanner looks for the next match. 1053.Fa yytext 1054and 1055.Fa yyleng 1056are adjusted appropriately (e.g., 1057.Fa yyleng 1058will now be equal to 1059.Ar n ) . 1060For example, on the input 1061.Qq foobar 1062the following will write out 1063.Qq foobarbar : 1064.Bd -literal -offset indent 1065%% 1066foobar ECHO; yyless(3); 1067[a-z]+ ECHO; 1068.Ed 1069.Pp 1070An argument of 0 to 1071.Fa yyless 1072will cause the entire current input string to be scanned again. 1073Unless how the scanner will subsequently process its input has been changed 1074(using 1075.Em BEGIN , 1076for example), 1077this will result in an endless loop. 1078.Pp 1079Note that 1080.Fa yyless 1081is a macro and can only be used in the 1082.Nm 1083input file, not from other source files. 1084.It unput(c) 1085Puts the character 1086.Ar c 1087back into the input stream. 1088It will be the next character scanned. 1089The following action will take the current token and cause it 1090to be rescanned enclosed in parentheses. 1091.Bd -literal -offset indent 1092{ 1093 int i; 1094 char *yycopy; 1095 1096 /* Copy yytext because unput() trashes yytext */ 1097 if ((yycopy = strdup(yytext)) == NULL) 1098 err(1, NULL); 1099 unput(')'); 1100 for (i = yyleng - 1; i >= 0; --i) 1101 unput(yycopy[i]); 1102 unput('('); 1103 free(yycopy); 1104} 1105.Ed 1106.Pp 1107Note that since each 1108.Fn unput 1109puts the given character back at the beginning of the input stream, 1110pushing back strings must be done back-to-front. 1111.Pp 1112An important potential problem when using 1113.Fn unput 1114is that if using 1115.Dq %pointer 1116.Pq the default , 1117a call to 1118.Fn unput 1119destroys the contents of 1120.Fa yytext , 1121starting with its rightmost character and devouring one character to 1122the left with each call. 1123If the value of 1124.Fa yytext 1125should be preserved after a call to 1126.Fn unput 1127.Pq as in the above example , 1128it must either first be copied elsewhere, or the scanner must be built using 1129.Dq %array 1130instead (see 1131.Sx HOW THE INPUT IS MATCHED ) . 1132.Pp 1133Finally, note that EOF cannot be put back 1134to attempt to mark the input stream with an end-of-file. 1135.It input() 1136Reads the next character from the input stream. 1137For example, the following is one way to eat up C comments: 1138.Bd -literal -offset indent 1139%% 1140"/*" { 1141 int c; 1142 1143 for (;;) { 1144 while ((c = input()) != '*' && c != EOF) 1145 ; /* eat up text of comment */ 1146 1147 if (c == '*') { 1148 while ((c = input()) == '*') 1149 ; 1150 if (c == '/') 1151 break; /* found the end */ 1152 } 1153 1154 if (c == EOF) { 1155 errx(1, "EOF in comment"); 1156 break; 1157 } 1158 } 1159} 1160.Ed 1161.Pp 1162(Note that if the scanner is compiled using C++, then 1163.Fn input 1164is instead referred to as 1165.Fn yyinput , 1166in order to avoid a name clash with the C++ stream by the name of input.) 1167.It YY_FLUSH_BUFFER 1168Flushes the scanner's internal buffer 1169so that the next time the scanner attempts to match a token, 1170it will first refill the buffer using 1171.Dv YY_INPUT 1172(see 1173.Sx THE GENERATED SCANNER , 1174below). 1175This action is a special case of the more general 1176.Fn yy_flush_buffer 1177function, described below in the section 1178.Sx MULTIPLE INPUT BUFFERS . 1179.It yyterminate() 1180Can be used in lieu of a return statement in an action. 1181It terminates the scanner and returns a 0 to the scanner's caller, indicating 1182.Qq all done . 1183By default, 1184.Fn yyterminate 1185is also called when an end-of-file is encountered. 1186It is a macro and may be redefined. 1187.El 1188.Sh THE GENERATED SCANNER 1189The output of 1190.Nm 1191is the file 1192.Pa lex.yy.c , 1193which contains the scanning routine 1194.Fn yylex , 1195a number of tables used by it for matching tokens, 1196and a number of auxiliary routines and macros. 1197By default, 1198.Fn yylex 1199is declared as follows: 1200.Bd -unfilled -offset indent 1201int yylex() 1202{ 1203 ... various definitions and the actions in here ... 1204} 1205.Ed 1206.Pp 1207(If the environment supports function prototypes, then it will 1208be "int yylex(void)".) 1209This definition may be changed by defining the 1210.Dv YY_DECL 1211macro. 1212For example: 1213.Bd -literal -offset indent 1214#define YY_DECL float lexscan(a, b) float a, b; 1215.Ed 1216.Pp 1217would give the scanning routine the name 1218.Em lexscan , 1219returning a float, and taking two floats as arguments. 1220Note that if arguments are given to the scanning routine using a 1221K&R-style/non-prototyped function declaration, 1222the definition must be terminated with a semi-colon 1223.Pq Sq ;\& . 1224.Pp 1225Whenever 1226.Fn yylex 1227is called, it scans tokens from the global input file 1228.Pa yyin 1229.Pq which defaults to stdin . 1230It continues until it either reaches an end-of-file 1231.Pq at which point it returns the value 0 1232or one of its actions executes a 1233.Em return 1234statement. 1235.Pp 1236If the scanner reaches an end-of-file, subsequent calls are undefined 1237unless either 1238.Em yyin 1239is pointed at a new input file 1240.Pq in which case scanning continues from that file , 1241or 1242.Fn yyrestart 1243is called. 1244.Fn yyrestart 1245takes one argument, a 1246.Fa FILE * 1247pointer (which can be nil, if 1248.Dv YY_INPUT 1249has been set up to scan from a source other than 1250.Em yyin ) , 1251and initializes 1252.Em yyin 1253for scanning from that file. 1254Essentially there is no difference between just assigning 1255.Em yyin 1256to a new input file or using 1257.Fn yyrestart 1258to do so; the latter is available for compatibility with previous versions of 1259.Nm , 1260and because it can be used to switch input files in the middle of scanning. 1261It can also be used to throw away the current input buffer, 1262by calling it with an argument of 1263.Em yyin ; 1264but better is to use 1265.Dv YY_FLUSH_BUFFER 1266.Pq see above . 1267Note that 1268.Fn yyrestart 1269does not reset the start condition to 1270.Em INITIAL 1271(see 1272.Sx START CONDITIONS , 1273below). 1274.Pp 1275If 1276.Fn yylex 1277stops scanning due to executing a 1278.Em return 1279statement in one of the actions, the scanner may then be called again and it 1280will resume scanning where it left off. 1281.Pp 1282By default 1283.Pq and for purposes of efficiency , 1284the scanner uses block-reads rather than simple 1285.Xr getc 3 1286calls to read characters from 1287.Em yyin . 1288The nature of how it gets its input can be controlled by defining the 1289.Dv YY_INPUT 1290macro. 1291.Dv YY_INPUT Ns 's 1292calling sequence is 1293.Qq YY_INPUT(buf,result,max_size) . 1294Its action is to place up to 1295.Dv max_size 1296characters in the character array 1297.Em buf 1298and return in the integer variable 1299.Em result 1300either the number of characters read or the constant 1301.Dv YY_NULL 1302(0 on 1303.Ux 1304systems) 1305to indicate 1306.Dv EOF . 1307The default 1308.Dv YY_INPUT 1309reads from the global file-pointer 1310.Qq yyin . 1311.Pp 1312A sample definition of 1313.Dv YY_INPUT 1314.Pq in the definitions section of the input file : 1315.Bd -unfilled -offset indent 1316%{ 1317#define YY_INPUT(buf,result,max_size) \e 1318{ \e 1319 int c = getchar(); \e 1320 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \e 1321} 1322%} 1323.Ed 1324.Pp 1325This definition will change the input processing to occur 1326one character at a time. 1327.Pp 1328When the scanner receives an end-of-file indication from 1329.Dv YY_INPUT , 1330it then checks the 1331.Fn yywrap 1332function. 1333If 1334.Fn yywrap 1335returns false 1336.Pq zero , 1337then it is assumed that the function has gone ahead and set up 1338.Em yyin 1339to point to another input file, and scanning continues. 1340If it returns true 1341.Pq non-zero , 1342then the scanner terminates, returning 0 to its caller. 1343Note that in either case, the start condition remains unchanged; 1344it does not revert to 1345.Em INITIAL . 1346.Pp 1347If you do not supply your own version of 1348.Fn yywrap , 1349then you must either use 1350.Dq %option noyywrap 1351(in which case the scanner behaves as though 1352.Fn yywrap 1353returned 1), or you must link with 1354.Fl lfl 1355to obtain the default version of the routine, which always returns 1. 1356.Pp 1357Three routines are available for scanning from in-memory buffers rather 1358than files: 1359.Fn yy_scan_string , 1360.Fn yy_scan_bytes , 1361and 1362.Fn yy_scan_buffer . 1363See the discussion of them below in the section 1364.Sx MULTIPLE INPUT BUFFERS . 1365.Pp 1366The scanner writes its 1367.Em ECHO 1368output to the 1369.Em yyout 1370global 1371.Pq default, stdout , 1372which may be redefined by the user simply by assigning it to some other 1373.Va FILE 1374pointer. 1375.Sh START CONDITIONS 1376.Nm 1377provides a mechanism for conditionally activating rules. 1378Any rule whose pattern is prefixed with 1379.Qq Aq sc 1380will only be active when the scanner is in the start condition named 1381.Qq sc . 1382For example, 1383.Bd -literal -offset indent 1384<STRING>[^"]* { /* eat up the string body ... */ 1385 ... 1386} 1387.Ed 1388.Pp 1389will be active only when the scanner is in the 1390.Qq STRING 1391start condition, and 1392.Bd -literal -offset indent 1393<INITIAL,STRING,QUOTE>\e. { /* handle an escape ... */ 1394 ... 1395} 1396.Ed 1397.Pp 1398will be active only when the current start condition is either 1399.Qq INITIAL , 1400.Qq STRING , 1401or 1402.Qq QUOTE . 1403.Pp 1404Start conditions are declared in the definitions 1405.Pq first 1406section of the input using unindented lines beginning with either 1407.Sq %s 1408or 1409.Sq %x 1410followed by a list of names. 1411The former declares 1412.Em inclusive 1413start conditions, the latter 1414.Em exclusive 1415start conditions. 1416A start condition is activated using the 1417.Em BEGIN 1418action. 1419Until the next 1420.Em BEGIN 1421action is executed, rules with the given start condition will be active and 1422rules with other start conditions will be inactive. 1423If the start condition is inclusive, 1424then rules with no start conditions at all will also be active. 1425If it is exclusive, 1426then only rules qualified with the start condition will be active. 1427A set of rules contingent on the same exclusive start condition 1428describe a scanner which is independent of any of the other rules in the 1429.Nm 1430input. 1431Because of this, exclusive start conditions make it easy to specify 1432.Qq mini-scanners 1433which scan portions of the input that are syntactically different 1434from the rest 1435.Pq e.g., comments . 1436.Pp 1437If the distinction between inclusive and exclusive start conditions 1438is still a little vague, here's a simple example illustrating the 1439connection between the two. 1440The set of rules: 1441.Bd -literal -offset indent 1442%s example 1443%% 1444 1445<example>foo do_something(); 1446 1447bar something_else(); 1448.Ed 1449.Pp 1450is equivalent to 1451.Bd -literal -offset indent 1452%x example 1453%% 1454 1455<example>foo do_something(); 1456 1457<INITIAL,example>bar something_else(); 1458.Ed 1459.Pp 1460Without the 1461.Aq INITIAL,example 1462qualifier, the 1463.Dq bar 1464pattern in the second example wouldn't be active 1465.Pq i.e., couldn't match 1466when in start condition 1467.Dq example . 1468If we just used 1469.Aq example 1470to qualify 1471.Dq bar , 1472though, then it would only be active in 1473.Dq example 1474and not in 1475.Em INITIAL , 1476while in the first example it's active in both, 1477because in the first example the 1478.Dq example 1479start condition is an inclusive 1480.Pq Sq %s 1481start condition. 1482.Pp 1483Also note that the special start-condition specifier 1484.Sq Aq * 1485matches every start condition. 1486Thus, the above example could also have been written: 1487.Bd -literal -offset indent 1488%x example 1489%% 1490 1491<example>foo do_something(); 1492 1493<*>bar something_else(); 1494.Ed 1495.Pp 1496The default rule (to 1497.Em ECHO 1498any unmatched character) remains active in start conditions. 1499It is equivalent to: 1500.Bd -literal -offset indent 1501<*>.|\en ECHO; 1502.Ed 1503.Pp 1504.Dq BEGIN(0) 1505returns to the original state where only the rules with 1506no start conditions are active. 1507This state can also be referred to as the start-condition 1508.Em INITIAL , 1509so 1510.Dq BEGIN(INITIAL) 1511is equivalent to 1512.Dq BEGIN(0) . 1513(The parentheses around the start condition name are not required but 1514are considered good style.) 1515.Pp 1516.Em BEGIN 1517actions can also be given as indented code at the beginning 1518of the rules section. 1519For example, the following will cause the scanner to enter the 1520.Qq SPECIAL 1521start condition whenever 1522.Fn yylex 1523is called and the global variable 1524.Fa enter_special 1525is true: 1526.Bd -literal -offset indent 1527int enter_special; 1528 1529%x SPECIAL 1530%% 1531 if (enter_special) 1532 BEGIN(SPECIAL); 1533 1534<SPECIAL>blahblahblah 1535\&...more rules follow... 1536.Ed 1537.Pp 1538To illustrate the uses of start conditions, 1539here is a scanner which provides two different interpretations 1540of a string like 1541.Qq 123.456 . 1542By default it will treat it as three tokens: the integer 1543.Qq 123 , 1544a dot 1545.Pq Sq .\& , 1546and the integer 1547.Qq 456 . 1548But if the string is preceded earlier in the line by the string 1549.Qq expect-floats 1550it will treat it as a single token, the floating-point number 123.456: 1551.Bd -literal -offset indent 1552%{ 1553#include <math.h> 1554%} 1555%s expect 1556 1557%% 1558expect-floats BEGIN(expect); 1559 1560<expect>[0-9]+"."[0-9]+ { 1561 printf("found a float, = %s\en", yytext); 1562} 1563<expect>\en { 1564 /* 1565 * That's the end of the line, so 1566 * we need another "expect-number" 1567 * before we'll recognize any more 1568 * numbers. 1569 */ 1570 BEGIN(INITIAL); 1571} 1572 1573[0-9]+ { 1574 printf("found an integer, = %s\en", yytext); 1575} 1576 1577"." printf("found a dot\en"); 1578.Ed 1579.Pp 1580Here is a scanner which recognizes 1581.Pq and discards 1582C comments while maintaining a count of the current input line: 1583.Bd -literal -offset indent 1584%x comment 1585%% 1586int line_num = 1; 1587 1588"/*" BEGIN(comment); 1589 1590<comment>[^*\en]* /* eat anything that's not a '*' */ 1591<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1592<comment>\en ++line_num; 1593<comment>"*"+"/" BEGIN(INITIAL); 1594.Ed 1595.Pp 1596This scanner goes to a bit of trouble to match as much 1597text as possible with each rule. 1598In general, when attempting to write a high-speed scanner 1599try to match as much as possible in each rule, as it's a big win. 1600.Pp 1601Note that start-condition names are really integer values and 1602can be stored as such. 1603Thus, the above could be extended in the following fashion: 1604.Bd -literal -offset indent 1605%x comment foo 1606%% 1607int line_num = 1; 1608int comment_caller; 1609 1610"/*" { 1611 comment_caller = INITIAL; 1612 BEGIN(comment); 1613} 1614 1615\&... 1616 1617<foo>"/*" { 1618 comment_caller = foo; 1619 BEGIN(comment); 1620} 1621 1622<comment>[^*\en]* /* eat anything that's not a '*' */ 1623<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */ 1624<comment>\en ++line_num; 1625<comment>"*"+"/" BEGIN(comment_caller); 1626.Ed 1627.Pp 1628Furthermore, the current start condition can be accessed by using 1629the integer-valued 1630.Dv YY_START 1631macro. 1632For example, the above assignments to 1633.Em comment_caller 1634could instead be written 1635.Pp 1636.Dl comment_caller = YY_START; 1637.Pp 1638Flex provides 1639.Dv YYSTATE 1640as an alias for 1641.Dv YY_START 1642(since that is what's used by 1643.At 1644.Nm lex ) . 1645.Pp 1646Note that start conditions do not have their own name-space; 1647%s's and %x's declare names in the same fashion as #define's. 1648.Pp 1649Finally, here's an example of how to match C-style quoted strings using 1650exclusive start conditions, including expanded escape sequences 1651(but not including checking for a string that's too long): 1652.Bd -literal -offset indent 1653%x str 1654 1655%% 1656#define MAX_STR_CONST 1024 1657char string_buf[MAX_STR_CONST]; 1658char *string_buf_ptr; 1659 1660\e" string_buf_ptr = string_buf; BEGIN(str); 1661 1662<str>\e" { /* saw closing quote - all done */ 1663 BEGIN(INITIAL); 1664 *string_buf_ptr = '\e0'; 1665 /* 1666 * return string constant token type and 1667 * value to parser 1668 */ 1669} 1670 1671<str>\en { 1672 /* error - unterminated string constant */ 1673 /* generate error message */ 1674} 1675 1676<str>\e\e[0-7]{1,3} { 1677 /* octal escape sequence */ 1678 int result; 1679 1680 (void) sscanf(yytext + 1, "%o", &result); 1681 1682 if (result > 0xff) { 1683 /* error, constant is out-of-bounds */ 1684 } else 1685 *string_buf_ptr++ = result; 1686} 1687 1688<str>\e\e[0-9]+ { 1689 /* 1690 * generate error - bad escape sequence; something 1691 * like '\e48' or '\e0777777' 1692 */ 1693} 1694 1695<str>\e\en *string_buf_ptr++ = '\en'; 1696<str>\e\et *string_buf_ptr++ = '\et'; 1697<str>\e\er *string_buf_ptr++ = '\er'; 1698<str>\e\eb *string_buf_ptr++ = '\eb'; 1699<str>\e\ef *string_buf_ptr++ = '\ef'; 1700 1701<str>\e\e(.|\en) *string_buf_ptr++ = yytext[1]; 1702 1703<str>[^\e\e\en\e"]+ { 1704 char *yptr = yytext; 1705 1706 while (*yptr) 1707 *string_buf_ptr++ = *yptr++; 1708} 1709.Ed 1710.Pp 1711Often, such as in some of the examples above, 1712a whole bunch of rules are all preceded by the same start condition(s). 1713.Nm 1714makes this a little easier and cleaner by introducing a notion of 1715start condition 1716.Em scope . 1717A start condition scope is begun with: 1718.Pp 1719.Dl <SCs>{ 1720.Pp 1721where 1722.Dq SCs 1723is a list of one or more start conditions. 1724Inside the start condition scope, every rule automatically has the prefix 1725.Aq SCs 1726applied to it, until a 1727.Sq } 1728which matches the initial 1729.Sq { . 1730So, for example, 1731.Bd -literal -offset indent 1732<ESC>{ 1733 "\e\en" return '\en'; 1734 "\e\er" return '\er'; 1735 "\e\ef" return '\ef'; 1736 "\e\e0" return '\e0'; 1737} 1738.Ed 1739.Pp 1740is equivalent to: 1741.Bd -literal -offset indent 1742<ESC>"\e\en" return '\en'; 1743<ESC>"\e\er" return '\er'; 1744<ESC>"\e\ef" return '\ef'; 1745<ESC>"\e\e0" return '\e0'; 1746.Ed 1747.Pp 1748Start condition scopes may be nested. 1749.Pp 1750Three routines are available for manipulating stacks of start conditions: 1751.Bl -tag -width Ds 1752.It void yy_push_state(int new_state) 1753Pushes the current start condition onto the top of the start condition 1754stack and switches to 1755.Fa new_state 1756as though 1757.Dq BEGIN new_state 1758had been used 1759.Pq recall that start condition names are also integers . 1760.It void yy_pop_state() 1761Pops the top of the stack and switches to it via 1762.Em BEGIN . 1763.It int yy_top_state() 1764Returns the top of the stack without altering the stack's contents. 1765.El 1766.Pp 1767The start condition stack grows dynamically and so has no built-in 1768size limitation. 1769If memory is exhausted, program execution aborts. 1770.Pp 1771To use start condition stacks, scanners must include a 1772.Dq %option stack 1773directive (see 1774.Sx OPTIONS 1775below). 1776.Sh MULTIPLE INPUT BUFFERS 1777Some scanners 1778(such as those which support 1779.Qq include 1780files) 1781require reading from several input streams. 1782As 1783.Nm 1784scanners do a large amount of buffering, one cannot control 1785where the next input will be read from by simply writing a 1786.Dv YY_INPUT 1787which is sensitive to the scanning context. 1788.Dv YY_INPUT 1789is only called when the scanner reaches the end of its buffer, which 1790may be a long time after scanning a statement such as an 1791.Qq include 1792which requires switching the input source. 1793.Pp 1794To negotiate these sorts of problems, 1795.Nm 1796provides a mechanism for creating and switching between multiple 1797input buffers. 1798An input buffer is created by using: 1799.Pp 1800.D1 YY_BUFFER_STATE yy_create_buffer(FILE *file, int size) 1801.Pp 1802which takes a 1803.Fa FILE 1804pointer and a 1805.Fa size 1806and creates a buffer associated with the given file and large enough to hold 1807.Fa size 1808characters (when in doubt, use 1809.Dv YY_BUF_SIZE 1810for the size). 1811It returns a 1812.Dv YY_BUFFER_STATE 1813handle, which may then be passed to other routines 1814.Pq see below . 1815The 1816.Dv YY_BUFFER_STATE 1817type is a pointer to an opaque 1818.Dq struct yy_buffer_state 1819structure, so 1820.Dv YY_BUFFER_STATE 1821variables may be safely initialized to 1822.Dq ((YY_BUFFER_STATE) 0) 1823if desired, and the opaque structure can also be referred to in order to 1824correctly declare input buffers in source files other than that of scanners. 1825Note that the 1826.Fa FILE 1827pointer in the call to 1828.Fn yy_create_buffer 1829is only used as the value of 1830.Fa yyin 1831seen by 1832.Dv YY_INPUT ; 1833if 1834.Dv YY_INPUT 1835is redefined so that it no longer uses 1836.Fa yyin , 1837then a nil 1838.Fa FILE 1839pointer can safely be passed to 1840.Fn yy_create_buffer . 1841To select a particular buffer to scan: 1842.Pp 1843.D1 void yy_switch_to_buffer(YY_BUFFER_STATE new_buffer) 1844.Pp 1845It switches the scanner's input buffer so subsequent tokens will 1846come from 1847.Fa new_buffer . 1848Note that 1849.Fn yy_switch_to_buffer 1850may be used by 1851.Fn yywrap 1852to set things up for continued scanning, 1853instead of opening a new file and pointing 1854.Fa yyin 1855at it. 1856Note also that switching input sources via either 1857.Fn yy_switch_to_buffer 1858or 1859.Fn yywrap 1860does not change the start condition. 1861.Pp 1862.D1 void yy_delete_buffer(YY_BUFFER_STATE buffer) 1863.Pp 1864is used to reclaim the storage associated with a buffer. 1865.Pf ( Fa buffer 1866can be nil, in which case the routine does nothing.) 1867To clear the current contents of a buffer: 1868.Pp 1869.D1 void yy_flush_buffer(YY_BUFFER_STATE buffer) 1870.Pp 1871This function discards the buffer's contents, 1872so the next time the scanner attempts to match a token from the buffer, 1873it will first fill the buffer anew using 1874.Dv YY_INPUT . 1875.Pp 1876.Fn yy_new_buffer 1877is an alias for 1878.Fn yy_create_buffer , 1879provided for compatibility with the C++ use of 1880.Em new 1881and 1882.Em delete 1883for creating and destroying dynamic objects. 1884.Pp 1885Finally, the 1886.Dv YY_CURRENT_BUFFER 1887macro returns a 1888.Dv YY_BUFFER_STATE 1889handle to the current buffer. 1890.Pp 1891Here is an example of using these features for writing a scanner 1892which expands include files (the 1893.Aq Aq EOF 1894feature is discussed below): 1895.Bd -literal -offset indent 1896/* 1897 * the "incl" state is used for picking up the name 1898 * of an include file 1899 */ 1900%x incl 1901 1902%{ 1903#define MAX_INCLUDE_DEPTH 10 1904YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 1905int include_stack_ptr = 0; 1906%} 1907 1908%% 1909include BEGIN(incl); 1910 1911[a-z]+ ECHO; 1912[^a-z\en]*\en? ECHO; 1913 1914<incl>[ \et]* /* eat the whitespace */ 1915<incl>[^ \et\en]+ { /* got the include file name */ 1916 if (include_stack_ptr >= MAX_INCLUDE_DEPTH) 1917 errx(1, "Includes nested too deeply"); 1918 1919 include_stack[include_stack_ptr++] = 1920 YY_CURRENT_BUFFER; 1921 1922 yyin = fopen(yytext, "r"); 1923 1924 if (yyin == NULL) 1925 err(1, NULL); 1926 1927 yy_switch_to_buffer( 1928 yy_create_buffer(yyin, YY_BUF_SIZE)); 1929 1930 BEGIN(INITIAL); 1931} 1932 1933<<EOF>> { 1934 if (--include_stack_ptr < 0) 1935 yyterminate(); 1936 else { 1937 yy_delete_buffer(YY_CURRENT_BUFFER); 1938 yy_switch_to_buffer( 1939 include_stack[include_stack_ptr]); 1940 } 1941} 1942.Ed 1943.Pp 1944Three routines are available for setting up input buffers for 1945scanning in-memory strings instead of files. 1946All of them create a new input buffer for scanning the string, 1947and return a corresponding 1948.Dv YY_BUFFER_STATE 1949handle (which should be deleted afterwards using 1950.Fn yy_delete_buffer ) . 1951They also switch to the new buffer using 1952.Fn yy_switch_to_buffer , 1953so the next call to 1954.Fn yylex 1955will start scanning the string. 1956.Bl -tag -width Ds 1957.It yy_scan_string(const char *str) 1958Scans a NUL-terminated string. 1959.It yy_scan_bytes(const char *bytes, int len) 1960Scans 1961.Fa len 1962bytes 1963.Pq including possibly NUL's 1964starting at location 1965.Fa bytes . 1966.El 1967.Pp 1968Note that both of these functions create and scan a copy 1969of the string or bytes. 1970(This may be desirable, since 1971.Fn yylex 1972modifies the contents of the buffer it is scanning.) 1973The copy can be avoided by using: 1974.Bl -tag -width Ds 1975.It yy_scan_buffer(char *base, yy_size_t size) 1976Which scans the buffer starting at 1977.Fa base , 1978consisting of 1979.Fa size 1980bytes, the last two bytes of which must be 1981.Dv YY_END_OF_BUFFER_CHAR 1982.Pq ASCII NUL . 1983These last two bytes are not scanned; thus, scanning consists of 1984base[0] through base[size-2], inclusive. 1985.Pp 1986If 1987.Fa base 1988is not set up in this manner 1989(i.e., forget the final two 1990.Dv YY_END_OF_BUFFER_CHAR 1991bytes), then 1992.Fn yy_scan_buffer 1993returns a nil pointer instead of creating a new input buffer. 1994.Pp 1995The type 1996.Fa yy_size_t 1997is an integral type which can be cast to an integer expression 1998reflecting the size of the buffer. 1999.El 2000.Sh END-OF-FILE RULES 2001The special rule 2002.Qq Aq Aq EOF 2003indicates actions which are to be taken when an end-of-file is encountered and 2004.Fn yywrap 2005returns non-zero 2006.Pq i.e., indicates no further files to process . 2007The action must finish by doing one of four things: 2008.Bl -dash 2009.It 2010Assigning 2011.Em yyin 2012to a new input file 2013(in previous versions of 2014.Nm , 2015after doing the assignment, it was necessary to call the special action 2016.Dv YY_NEW_FILE ; 2017this is no longer necessary). 2018.It 2019Executing a 2020.Em return 2021statement. 2022.It 2023Executing the special 2024.Fn yyterminate 2025action. 2026.It 2027Switching to a new buffer using 2028.Fn yy_switch_to_buffer 2029as shown in the example above. 2030.El 2031.Pp 2032.Aq Aq EOF 2033rules may not be used with other patterns; 2034they may only be qualified with a list of start conditions. 2035If an unqualified 2036.Aq Aq EOF 2037rule is given, it applies to all start conditions which do not already have 2038.Aq Aq EOF 2039actions. 2040To specify an 2041.Aq Aq EOF 2042rule for only the initial start condition, use 2043.Pp 2044.Dl <INITIAL><<EOF>> 2045.Pp 2046These rules are useful for catching things like unclosed comments. 2047An example: 2048.Bd -literal -offset indent 2049%x quote 2050%% 2051 2052\&...other rules for dealing with quotes... 2053 2054<quote><<EOF>> { 2055 error("unterminated quote"); 2056 yyterminate(); 2057} 2058<<EOF>> { 2059 if (*++filelist) 2060 yyin = fopen(*filelist, "r"); 2061 else 2062 yyterminate(); 2063} 2064.Ed 2065.Sh MISCELLANEOUS MACROS 2066The macro 2067.Dv YY_USER_ACTION 2068can be defined to provide an action 2069which is always executed prior to the matched rule's action. 2070For example, 2071it could be #define'd to call a routine to convert yytext to lower-case. 2072When 2073.Dv YY_USER_ACTION 2074is invoked, the variable 2075.Fa yy_act 2076gives the number of the matched rule 2077.Pq rules are numbered starting with 1 . 2078For example, to profile how often each rule is matched, 2079the following would do the trick: 2080.Pp 2081.Dl #define YY_USER_ACTION ++ctr[yy_act] 2082.Pp 2083where 2084.Fa ctr 2085is an array to hold the counts for the different rules. 2086Note that the macro 2087.Dv YY_NUM_RULES 2088gives the total number of rules 2089(including the default rule, even if 2090.Fl s 2091is used), 2092so a correct declaration for 2093.Fa ctr 2094is: 2095.Pp 2096.Dl int ctr[YY_NUM_RULES]; 2097.Pp 2098The macro 2099.Dv YY_USER_INIT 2100may be defined to provide an action which is always executed before 2101the first scan 2102.Pq and before the scanner's internal initializations are done . 2103For example, it could be used to call a routine to read 2104in a data table or open a logging file. 2105.Pp 2106The macro 2107.Dv yy_set_interactive(is_interactive) 2108can be used to control whether the current buffer is considered 2109.Em interactive . 2110An interactive buffer is processed more slowly, 2111but must be used when the scanner's input source is indeed 2112interactive to avoid problems due to waiting to fill buffers 2113(see the discussion of the 2114.Fl I 2115flag below). 2116A non-zero value in the macro invocation marks the buffer as interactive, 2117a zero value as non-interactive. 2118Note that use of this macro overrides 2119.Dq %option always-interactive 2120or 2121.Dq %option never-interactive 2122(see 2123.Sx OPTIONS 2124below). 2125.Fn yy_set_interactive 2126must be invoked prior to beginning to scan the buffer that is 2127.Pq or is not 2128to be considered interactive. 2129.Pp 2130The macro 2131.Dv yy_set_bol(at_bol) 2132can be used to control whether the current buffer's scanning 2133context for the next token match is done as though at the 2134beginning of a line. 2135A non-zero macro argument makes rules anchored with 2136.Sq ^ 2137active, while a zero argument makes 2138.Sq ^ 2139rules inactive. 2140.Pp 2141The macro 2142.Dv YY_AT_BOL 2143returns true if the next token scanned from the current buffer will have 2144.Sq ^ 2145rules active, false otherwise. 2146.Pp 2147In the generated scanner, the actions are all gathered in one large 2148switch statement and separated using 2149.Dv YY_BREAK , 2150which may be redefined. 2151By default, it is simply a 2152.Qq break , 2153to separate each rule's action from the following rules. 2154Redefining 2155.Dv YY_BREAK 2156allows, for example, C++ users to 2157.Dq #define YY_BREAK 2158to do nothing 2159(while being very careful that every rule ends with a 2160.Qq break 2161or a 2162.Qq return ! ) 2163to avoid suffering from unreachable statement warnings where because a rule's 2164action ends with 2165.Dq return , 2166the 2167.Dv YY_BREAK 2168is inaccessible. 2169.Sh VALUES AVAILABLE TO THE USER 2170This section summarizes the various values available to the user 2171in the rule actions. 2172.Bl -tag -width Ds 2173.It char *yytext 2174Holds the text of the current token. 2175It may be modified but not lengthened 2176.Pq characters cannot be appended to the end . 2177.Pp 2178If the special directive 2179.Dq %array 2180appears in the first section of the scanner description, then 2181.Fa yytext 2182is instead declared 2183.Dq char yytext[YYLMAX] , 2184where 2185.Dv YYLMAX 2186is a macro definition that can be redefined in the first section 2187to change the default value 2188.Pq generally 8KB . 2189Using 2190.Dq %array 2191results in somewhat slower scanners, but the value of 2192.Fa yytext 2193becomes immune to calls to 2194.Fn input 2195and 2196.Fn unput , 2197which potentially destroy its value when 2198.Fa yytext 2199is a character pointer. 2200The opposite of 2201.Dq %array 2202is 2203.Dq %pointer , 2204which is the default. 2205.Pp 2206.Dq %array 2207cannot be used when generating C++ scanner classes 2208(the 2209.Fl + 2210flag). 2211.It int yyleng 2212Holds the length of the current token. 2213.It FILE *yyin 2214Is the file which by default 2215.Nm 2216reads from. 2217It may be redefined, but doing so only makes sense before 2218scanning begins or after an 2219.Dv EOF 2220has been encountered. 2221Changing it in the midst of scanning will have unexpected results since 2222.Nm 2223buffers its input; use 2224.Fn yyrestart 2225instead. 2226Once scanning terminates because an end-of-file 2227has been seen, 2228.Fa yyin 2229can be assigned as the new input file 2230and the scanner can be called again to continue scanning. 2231.It void yyrestart(FILE *new_file) 2232May be called to point 2233.Fa yyin 2234at the new input file. 2235The switch-over to the new file is immediate 2236.Pq any previously buffered-up input is lost . 2237Note that calling 2238.Fn yyrestart 2239with 2240.Fa yyin 2241as an argument thus throws away the current input buffer and continues 2242scanning the same input file. 2243.It FILE *yyout 2244Is the file to which 2245.Em ECHO 2246actions are done. 2247It can be reassigned by the user. 2248.It YY_CURRENT_BUFFER 2249Returns a 2250.Dv YY_BUFFER_STATE 2251handle to the current buffer. 2252.It YY_START 2253Returns an integer value corresponding to the current start condition. 2254This value can subsequently be used with 2255.Em BEGIN 2256to return to that start condition. 2257.El 2258.Sh INTERFACING WITH YACC 2259One of the main uses of 2260.Nm 2261is as a companion to the 2262.Xr yacc 1 2263parser-generator. 2264yacc parsers expect to call a routine named 2265.Fn yylex 2266to find the next input token. 2267The routine is supposed to return the type of the next token 2268as well as putting any associated value in the global 2269.Fa yylval , 2270which is defined externally, 2271and can be a union or any other complex data structure. 2272To use 2273.Nm 2274with yacc, one specifies the 2275.Fl d 2276option to yacc to instruct it to generate the file 2277.Pa y.tab.h 2278containing definitions of all the 2279.Dq %tokens 2280appearing in the yacc input. 2281This file is then included in the 2282.Nm 2283scanner. 2284For example, part of the scanner might look like: 2285.Bd -literal -offset indent 2286%{ 2287#include "y.tab.h" 2288%} 2289 2290%% 2291 2292if return TOK_IF; 2293then return TOK_THEN; 2294begin return TOK_BEGIN; 2295end return TOK_END; 2296.Ed 2297.Sh OPTIONS 2298.Nm 2299has the following options: 2300.Bl -tag -width Ds 2301.It Fl 7 2302Instructs 2303.Nm 2304to generate a 7-bit scanner, i.e., one which can only recognize 7-bit 2305characters in its input. 2306The advantage of using 2307.Fl 7 2308is that the scanner's tables can be up to half the size of those generated 2309using the 2310.Fl 8 2311option 2312.Pq see below . 2313The disadvantage is that such scanners often hang 2314or crash if their input contains an 8-bit character. 2315.Pp 2316Note, however, that unless generating a scanner using the 2317.Fl Cf 2318or 2319.Fl CF 2320table compression options, use of 2321.Fl 7 2322will save only a small amount of table space, 2323and make the scanner considerably less portable. 2324.Nm flex Ns 's 2325default behavior is to generate an 8-bit scanner unless 2326.Fl Cf 2327or 2328.Fl CF 2329is specified, in which case 2330.Nm 2331defaults to generating 7-bit scanners unless it was 2332configured to generate 8-bit scanners 2333(as will often be the case with non-USA sites). 2334It is possible tell whether 2335.Nm 2336generated a 7-bit or an 8-bit scanner by inspecting the flag summary in the 2337.Fl v 2338output as described below. 2339.Pp 2340Note that if 2341.Fl Cfe 2342or 2343.Fl CFe 2344are used 2345(the table compression options, but also using equivalence classes as 2346discussed below), 2347.Nm 2348still defaults to generating an 8-bit scanner, 2349since usually with these compression options full 8-bit tables 2350are not much more expensive than 7-bit tables. 2351.It Fl 8 2352Instructs 2353.Nm 2354to generate an 8-bit scanner, i.e., one which can recognize 8-bit 2355characters. 2356This flag is only needed for scanners generated using 2357.Fl Cf 2358or 2359.Fl CF , 2360as otherwise 2361.Nm 2362defaults to generating an 8-bit scanner anyway. 2363.Pp 2364See the discussion of 2365.Fl 7 2366above for 2367.Nm flex Ns 's 2368default behavior and the tradeoffs between 7-bit and 8-bit scanners. 2369.It Fl B 2370Instructs 2371.Nm 2372to generate a 2373.Em batch 2374scanner, the opposite of 2375.Em interactive 2376scanners generated by 2377.Fl I 2378.Pq see below . 2379In general, 2380.Fl B 2381is used when the scanner will never be used interactively, 2382and you want to squeeze a little more performance out of it. 2383If the aim is instead to squeeze out a lot more performance, 2384use the 2385.Fl Cf 2386or 2387.Fl CF 2388options 2389.Pq discussed below , 2390which turn on 2391.Fl B 2392automatically anyway. 2393.It Fl b 2394Generate backing-up information to 2395.Pa lex.backup . 2396This is a list of scanner states which require backing up 2397and the input characters on which they do so. 2398By adding rules one can remove backing-up states. 2399If all backing-up states are eliminated and 2400.Fl Cf 2401or 2402.Fl CF 2403is used, the generated scanner will run faster (see the 2404.Fl p 2405flag). 2406Only users who wish to squeeze every last cycle out of their 2407scanners need worry about this option. 2408(See the section on 2409.Sx PERFORMANCE CONSIDERATIONS 2410below.) 2411.It Fl C Ns Op Cm aeFfmr 2412Controls the degree of table compression and, more generally, trade-offs 2413between small scanners and fast scanners. 2414.Bl -tag -width Ds 2415.It Fl Ca 2416Instructs 2417.Nm 2418to trade off larger tables in the generated scanner for faster performance 2419because the elements of the tables are better aligned for memory access 2420and computation. 2421On some 2422.Tn RISC 2423architectures, fetching and manipulating longwords is more efficient 2424than with smaller-sized units such as shortwords. 2425This option can double the size of the tables used by the scanner. 2426.It Fl Ce 2427Directs 2428.Nm 2429to construct 2430.Em equivalence classes , 2431i.e., sets of characters which have identical lexical properties 2432(for example, if the only appearance of digits in the 2433.Nm 2434input is in the character class 2435.Qq [0-9] 2436then the digits 2437.Sq 0 , 2438.Sq 1 , 2439.Sq ... , 2440.Sq 9 2441will all be put in the same equivalence class). 2442Equivalence classes usually give dramatic reductions in the final 2443table/object file sizes 2444.Pq typically a factor of 2\-5 2445and are pretty cheap performance-wise 2446.Pq one array look-up per character scanned . 2447.It Fl CF 2448Specifies that the alternate fast scanner representation 2449(described below under the 2450.Fl F 2451option) 2452should be used. 2453This option cannot be used with 2454.Fl + . 2455.It Fl Cf 2456Specifies that the 2457.Em full 2458scanner tables should be generated \- 2459.Nm 2460should not compress the tables by taking advantage of 2461similar transition functions for different states. 2462.It Fl \&Cm 2463Directs 2464.Nm 2465to construct 2466.Em meta-equivalence classes , 2467which are sets of equivalence classes 2468(or characters, if equivalence classes are not being used) 2469that are commonly used together. 2470Meta-equivalence classes are often a big win when using compressed tables, 2471but they have a moderate performance impact 2472(one or two 2473.Qq if 2474tests and one array look-up per character scanned). 2475.It Fl Cr 2476Causes the generated scanner to 2477.Em bypass 2478use of the standard I/O library 2479.Pq stdio 2480for input. 2481Instead of calling 2482.Xr fread 3 2483or 2484.Xr getc 3 , 2485the scanner will use the 2486.Xr read 2 2487system call, 2488resulting in a performance gain which varies from system to system, 2489but in general is probably negligible unless 2490.Fl Cf 2491or 2492.Fl CF 2493are being used. 2494Using 2495.Fl Cr 2496can cause strange behavior if, for example, reading from 2497.Fa yyin 2498using stdio prior to calling the scanner 2499(because the scanner will miss whatever text previous reads left 2500in the stdio input buffer). 2501.Pp 2502.Fl Cr 2503has no effect if 2504.Dv YY_INPUT 2505is defined 2506(see 2507.Sx THE GENERATED SCANNER 2508above). 2509.El 2510.Pp 2511A lone 2512.Fl C 2513specifies that the scanner tables should be compressed but neither 2514equivalence classes nor meta-equivalence classes should be used. 2515.Pp 2516The options 2517.Fl Cf 2518or 2519.Fl CF 2520and 2521.Fl \&Cm 2522do not make sense together \- there is no opportunity for meta-equivalence 2523classes if the table is not being compressed. 2524Otherwise the options may be freely mixed, and are cumulative. 2525.Pp 2526The default setting is 2527.Fl Cem 2528which specifies that 2529.Nm 2530should generate equivalence classes and meta-equivalence classes. 2531This setting provides the highest degree of table compression. 2532It is possible to trade off faster-executing scanners at the cost of 2533larger tables with the following generally being true: 2534.Bd -unfilled -offset indent 2535slowest & smallest 2536 -Cem 2537 -Cm 2538 -Ce 2539 -C 2540 -C{f,F}e 2541 -C{f,F} 2542 -C{f,F}a 2543fastest & largest 2544.Ed 2545.Pp 2546Note that scanners with the smallest tables are usually generated and 2547compiled the quickest, 2548so during development the default is usually best, 2549maximal compression. 2550.Pp 2551.Fl Cfe 2552is often a good compromise between speed and size for production scanners. 2553.It Fl d 2554Makes the generated scanner run in debug mode. 2555Whenever a pattern is recognized and the global 2556.Fa yy_flex_debug 2557is non-zero 2558.Pq which is the default , 2559the scanner will write to stderr a line of the form: 2560.Pp 2561.D1 --accepting rule at line 53 ("the matched text") 2562.Pp 2563The line number refers to the location of the rule in the file 2564defining the scanner 2565(i.e., the file that was fed to 2566.Nm ) . 2567Messages are also generated when the scanner backs up, 2568accepts the default rule, 2569reaches the end of its input buffer 2570(or encounters a NUL; 2571at this point, the two look the same as far as the scanner's concerned), 2572or reaches an end-of-file. 2573.It Fl F 2574Specifies that the fast scanner table representation should be used 2575.Pq and stdio bypassed . 2576This representation is about as fast as the full table representation 2577.Pq Fl f , 2578and for some sets of patterns will be considerably smaller 2579.Pq and for others, larger . 2580In general, if the pattern set contains both 2581.Qq keywords 2582and a catch-all, 2583.Qq identifier 2584rule, such as in the set: 2585.Bd -unfilled -offset indent 2586"case" return TOK_CASE; 2587"switch" return TOK_SWITCH; 2588\&... 2589"default" return TOK_DEFAULT; 2590[a-z]+ return TOK_ID; 2591.Ed 2592.Pp 2593then it's better to use the full table representation. 2594If only the 2595.Qq identifier 2596rule is present and a hash table or some such is used to detect the keywords, 2597it's better to use 2598.Fl F . 2599.Pp 2600This option is equivalent to 2601.Fl CFr 2602.Pq see above . 2603It cannot be used with 2604.Fl + . 2605.It Fl f 2606Specifies 2607.Em fast scanner . 2608No table compression is done and stdio is bypassed. 2609The result is large but fast. 2610This option is equivalent to 2611.Fl Cfr 2612.Pq see above . 2613.It Fl h 2614Generates a help summary of 2615.Nm flex Ns 's 2616options to stdout and then exits. 2617.Fl ?\& 2618and 2619.Fl Fl help 2620are synonyms for 2621.Fl h . 2622.It Fl I 2623Instructs 2624.Nm 2625to generate an 2626.Em interactive 2627scanner. 2628An interactive scanner is one that only looks ahead to decide 2629what token has been matched if it absolutely must. 2630It turns out that always looking one extra character ahead, 2631even if the scanner has already seen enough text 2632to disambiguate the current token, is a bit faster than 2633only looking ahead when necessary. 2634But scanners that always look ahead give dreadful interactive performance; 2635for example, when a user types a newline, 2636it is not recognized as a newline token until they enter 2637.Em another 2638token, which often means typing in another whole line. 2639.Pp 2640.Nm 2641scanners default to 2642.Em interactive 2643unless 2644.Fl Cf 2645or 2646.Fl CF 2647table-compression options are specified 2648.Pq see above . 2649That's because if high-performance is most important, 2650one of these options should be used, 2651so if they weren't, 2652.Nm 2653assumes it is preferable to trade off a bit of run-time performance for 2654intuitive interactive behavior. 2655Note also that 2656.Fl I 2657cannot be used in conjunction with 2658.Fl Cf 2659or 2660.Fl CF . 2661Thus, this option is not really needed; it is on by default for all those 2662cases in which it is allowed. 2663.Pp 2664A scanner can be forced to not be interactive by using 2665.Fl B 2666.Pq see above . 2667.It Fl i 2668Instructs 2669.Nm 2670to generate a case-insensitive scanner. 2671The case of letters given in the 2672.Nm 2673input patterns will be ignored, 2674and tokens in the input will be matched regardless of case. 2675The matched text given in 2676.Fa yytext 2677will have the preserved case 2678.Pq i.e., it will not be folded . 2679.It Fl L 2680Instructs 2681.Nm 2682not to generate 2683.Dq #line 2684directives. 2685Without this option, 2686.Nm 2687peppers the generated scanner with #line directives so error messages 2688in the actions will be correctly located with respect to either the original 2689.Nm 2690input file 2691(if the errors are due to code in the input file), 2692or 2693.Pa lex.yy.c 2694(if the errors are 2695.Nm flex Ns 's 2696fault \- these sorts of errors should be reported to the email address 2697given below). 2698.It Fl l 2699Turns on maximum compatibility with the original 2700.At 2701.Nm lex 2702implementation. 2703Note that this does not mean full compatibility. 2704Use of this option costs a considerable amount of performance, 2705and it cannot be used with the 2706.Fl + , f , F , Cf , 2707or 2708.Fl CF 2709options. 2710For details on the compatibilities it provides, see the section 2711.Sx INCOMPATIBILITIES WITH LEX AND POSIX 2712below. 2713This option also results in the name 2714.Dv YY_FLEX_LEX_COMPAT 2715being #define'd in the generated scanner. 2716.It Fl n 2717Another do-nothing, deprecated option included only for 2718.Tn POSIX 2719compliance. 2720.It Fl o Ns Ar output 2721Directs 2722.Nm 2723to write the scanner to the file 2724.Ar output 2725instead of 2726.Pa lex.yy.c . 2727If 2728.Fl o 2729is combined with the 2730.Fl t 2731option, then the scanner is written to stdout but its 2732.Dq #line 2733directives 2734(see the 2735.Fl L 2736option above) 2737refer to the file 2738.Ar output . 2739.It Fl P Ns Ar prefix 2740Changes the default 2741.Qq yy 2742prefix used by 2743.Nm 2744for all globally visible variable and function names to instead be 2745.Ar prefix . 2746For example, 2747.Fl P Ns Ar foo 2748changes the name of 2749.Fa yytext 2750to 2751.Fa footext . 2752It also changes the name of the default output file from 2753.Pa lex.yy.c 2754to 2755.Pa lex.foo.c . 2756Here are all of the names affected: 2757.Bd -unfilled -offset indent 2758yy_create_buffer 2759yy_delete_buffer 2760yy_flex_debug 2761yy_init_buffer 2762yy_flush_buffer 2763yy_load_buffer_state 2764yy_switch_to_buffer 2765yyin 2766yyleng 2767yylex 2768yylineno 2769yyout 2770yyrestart 2771yytext 2772yywrap 2773.Ed 2774.Pp 2775(If using a C++ scanner, then only 2776.Fa yywrap 2777and 2778.Fa yyFlexLexer 2779are affected.) 2780Within the scanner itself, it is still possible to refer to the global variables 2781and functions using either version of their name; but externally, they 2782have the modified name. 2783.Pp 2784This option allows multiple 2785.Nm 2786programs to be easily linked together into the same executable. 2787Note, though, that using this option also renames 2788.Fn yywrap , 2789so now either an 2790.Pq appropriately named 2791version of the routine for the scanner must be supplied, or 2792.Dq %option noyywrap 2793must be used, as linking with 2794.Fl lfl 2795no longer provides one by default. 2796.It Fl p 2797Generates a performance report to stderr. 2798The report consists of comments regarding features of the 2799.Nm 2800input file which will cause a serious loss of performance in the resulting 2801scanner. 2802If the flag is specified twice, 2803comments regarding features that lead to minor performance losses 2804will also be reported> 2805.Pp 2806Note that the use of 2807.Em REJECT , 2808.Dq %option yylineno , 2809and variable trailing context 2810(see the 2811.Sx BUGS 2812section below) 2813entails a substantial performance penalty; use of 2814.Fn yymore , 2815the 2816.Sq ^ 2817operator, and the 2818.Fl I 2819flag entail minor performance penalties. 2820.It Fl S Ns Ar skeleton 2821Overrides the default skeleton file from which 2822.Nm 2823constructs its scanners. 2824This option is needed only for 2825.Nm 2826maintenance or development. 2827.It Fl s 2828Causes the default rule 2829.Pq that unmatched scanner input is echoed to stdout 2830to be suppressed. 2831If the scanner encounters input that does not 2832match any of its rules, it aborts with an error. 2833This option is useful for finding holes in a scanner's rule set. 2834.It Fl T 2835Makes 2836.Nm 2837run in 2838.Em trace 2839mode. 2840It will generate a lot of messages to stderr concerning 2841the form of the input and the resultant non-deterministic and deterministic 2842finite automata. 2843This option is mostly for use in maintaining 2844.Nm . 2845.It Fl t 2846Instructs 2847.Nm 2848to write the scanner it generates to standard output instead of 2849.Pa lex.yy.c . 2850.It Fl V 2851Prints the version number to stdout and exits. 2852.Fl Fl version 2853is a synonym for 2854.Fl V . 2855.It Fl v 2856Specifies that 2857.Nm 2858should write to stderr 2859a summary of statistics regarding the scanner it generates. 2860Most of the statistics are meaningless to the casual 2861.Nm 2862user, but the first line identifies the version of 2863.Nm 2864(same as reported by 2865.Fl V ) , 2866and the next line the flags used when generating the scanner, 2867including those that are on by default. 2868.It Fl w 2869Suppresses warning messages. 2870.It Fl + 2871Specifies that 2872.Nm 2873should generate a C++ scanner class. 2874See the section on 2875.Sx GENERATING C++ SCANNERS 2876below for details. 2877.El 2878.Pp 2879.Nm 2880also provides a mechanism for controlling options within the 2881scanner specification itself, rather than from the 2882.Nm 2883command line. 2884This is done by including 2885.Dq %option 2886directives in the first section of the scanner specification. 2887Multiple options can be specified with a single 2888.Dq %option 2889directive, and multiple directives in the first section of the 2890.Nm 2891input file. 2892.Pp 2893Most options are given simply as names, optionally preceded by the word 2894.Qq no 2895.Pq with no intervening whitespace 2896to negate their meaning. 2897A number are equivalent to 2898.Nm 2899flags or their negation: 2900.Bd -unfilled -offset indent 29017bit -7 option 29028bit -8 option 2903align -Ca option 2904backup -b option 2905batch -B option 2906c++ -+ option 2907 2908caseful or 2909case-sensitive opposite of -i (default) 2910 2911case-insensitive or 2912caseless -i option 2913 2914debug -d option 2915default opposite of -s option 2916ecs -Ce option 2917fast -F option 2918full -f option 2919interactive -I option 2920lex-compat -l option 2921meta-ecs -Cm option 2922perf-report -p option 2923read -Cr option 2924stdout -t option 2925verbose -v option 2926warn opposite of -w option 2927 (use "%option nowarn" for -w) 2928 2929array equivalent to "%array" 2930pointer equivalent to "%pointer" (default) 2931.Ed 2932.Pp 2933Some %option's provide features otherwise not available: 2934.Bl -tag -width Ds 2935.It always-interactive 2936Instructs 2937.Nm 2938to generate a scanner which always considers its input 2939.Qq interactive . 2940Normally, on each new input file the scanner calls 2941.Fn isatty 2942in an attempt to determine whether the scanner's input source is interactive 2943and thus should be read a character at a time. 2944When this option is used, however, no such call is made. 2945.It main 2946Directs 2947.Nm 2948to provide a default 2949.Fn main 2950program for the scanner, which simply calls 2951.Fn yylex . 2952This option implies 2953.Dq noyywrap 2954.Pq see below . 2955.It never-interactive 2956Instructs 2957.Nm 2958to generate a scanner which never considers its input 2959.Qq interactive 2960(again, no call made to 2961.Fn isatty ) . 2962This is the opposite of 2963.Dq always-interactive . 2964.It stack 2965Enables the use of start condition stacks 2966(see 2967.Sx START CONDITIONS 2968above). 2969.It stdinit 2970If set (i.e., 2971.Dq %option stdinit ) , 2972initializes 2973.Fa yyin 2974and 2975.Fa yyout 2976to stdin and stdout, instead of the default of 2977.Dq nil . 2978Some existing 2979.Nm lex 2980programs depend on this behavior, even though it is not compliant with ANSI C, 2981which does not require stdin and stdout to be compile-time constant. 2982.It yylineno 2983Directs 2984.Nm 2985to generate a scanner that maintains the number of the current line 2986read from its input in the global variable 2987.Fa yylineno . 2988This option is implied by 2989.Dq %option lex-compat . 2990.It yywrap 2991If unset (i.e., 2992.Dq %option noyywrap ) , 2993makes the scanner not call 2994.Fn yywrap 2995upon an end-of-file, but simply assume that there are no more files to scan 2996(until the user points 2997.Fa yyin 2998at a new file and calls 2999.Fn yylex 3000again). 3001.El 3002.Pp 3003.Nm 3004scans rule actions to determine whether the 3005.Em REJECT 3006or 3007.Fn yymore 3008features are being used. 3009The 3010.Dq reject 3011and 3012.Dq yymore 3013options are available to override its decision as to whether to use the 3014options, either by setting them (e.g., 3015.Dq %option reject ) 3016to indicate the feature is indeed used, 3017or unsetting them to indicate it actually is not used 3018(e.g., 3019.Dq %option noyymore ) . 3020.Pp 3021Three options take string-delimited values, offset with 3022.Sq = : 3023.Pp 3024.D1 %option outfile="ABC" 3025.Pp 3026is equivalent to 3027.Fl o Ns Ar ABC , 3028and 3029.Pp 3030.D1 %option prefix="XYZ" 3031.Pp 3032is equivalent to 3033.Fl P Ns Ar XYZ . 3034Finally, 3035.Pp 3036.D1 %option yyclass="foo" 3037.Pp 3038only applies when generating a C++ scanner 3039.Pf ( Fl + 3040option). 3041It informs 3042.Nm 3043that 3044.Dq foo 3045has been derived as a subclass of yyFlexLexer, so 3046.Nm 3047will place actions in the member function 3048.Dq foo::yylex() 3049instead of 3050.Dq yyFlexLexer::yylex() . 3051It also generates a 3052.Dq yyFlexLexer::yylex() 3053member function that emits a run-time error (by invoking 3054.Dq yyFlexLexer::LexerError() ) 3055if called. 3056See 3057.Sx GENERATING C++ SCANNERS , 3058below, for additional information. 3059.Pp 3060A number of options are available for 3061lint 3062purists who want to suppress the appearance of unneeded routines 3063in the generated scanner. 3064Each of the following, if unset 3065(e.g., 3066.Dq %option nounput ) , 3067results in the corresponding routine not appearing in the generated scanner: 3068.Bd -unfilled -offset indent 3069input, unput 3070yy_push_state, yy_pop_state, yy_top_state 3071yy_scan_buffer, yy_scan_bytes, yy_scan_string 3072.Ed 3073.Pp 3074(though 3075.Fn yy_push_state 3076and friends won't appear anyway unless 3077.Dq %option stack 3078is being used). 3079.Sh PERFORMANCE CONSIDERATIONS 3080The main design goal of 3081.Nm 3082is that it generate high-performance scanners. 3083It has been optimized for dealing well with large sets of rules. 3084Aside from the effects on scanner speed of the table compression 3085.Fl C 3086options outlined above, 3087there are a number of options/actions which degrade performance. 3088These are, from most expensive to least: 3089.Bd -unfilled -offset indent 3090REJECT 3091%option yylineno 3092arbitrary trailing context 3093 3094pattern sets that require backing up 3095%array 3096%option interactive 3097%option always-interactive 3098 3099\&'^' beginning-of-line operator 3100yymore() 3101.Ed 3102.Pp 3103with the first three all being quite expensive 3104and the last two being quite cheap. 3105Note also that 3106.Fn unput 3107is implemented as a routine call that potentially does quite a bit of work, 3108while 3109.Fn yyless 3110is a quite-cheap macro; so if just putting back some excess text, 3111use 3112.Fn yyless . 3113.Pp 3114.Em REJECT 3115should be avoided at all costs when performance is important. 3116It is a particularly expensive option. 3117.Pp 3118Getting rid of backing up is messy and often may be an enormous 3119amount of work for a complicated scanner. 3120In principal, one begins by using the 3121.Fl b 3122flag to generate a 3123.Pa lex.backup 3124file. 3125For example, on the input 3126.Bd -literal -offset indent 3127%% 3128foo return TOK_KEYWORD; 3129foobar return TOK_KEYWORD; 3130.Ed 3131.Pp 3132the file looks like: 3133.Bd -literal -offset indent 3134State #6 is non-accepting - 3135 associated rule line numbers: 3136 2 3 3137 out-transitions: [ o ] 3138 jam-transitions: EOF [ \e001-n p-\e177 ] 3139 3140State #8 is non-accepting - 3141 associated rule line numbers: 3142 3 3143 out-transitions: [ a ] 3144 jam-transitions: EOF [ \e001-` b-\e177 ] 3145 3146State #9 is non-accepting - 3147 associated rule line numbers: 3148 3 3149 out-transitions: [ r ] 3150 jam-transitions: EOF [ \e001-q s-\e177 ] 3151 3152Compressed tables always back up. 3153.Ed 3154.Pp 3155The first few lines tell us that there's a scanner state in 3156which it can make a transition on an 3157.Sq o 3158but not on any other character, 3159and that in that state the currently scanned text does not match any rule. 3160The state occurs when trying to match the rules found 3161at lines 2 and 3 in the input file. 3162If the scanner is in that state and then reads something other than an 3163.Sq o , 3164it will have to back up to find a rule which is matched. 3165With a bit of headscratching one can see that this must be the 3166state it's in when it has seen 3167.Sq fo . 3168When this has happened, if anything other than another 3169.Sq o 3170is seen, the scanner will have to back up to simply match the 3171.Sq f 3172.Pq by the default rule . 3173.Pp 3174The comment regarding State #8 indicates there's a problem when 3175.Qq foob 3176has been scanned. 3177Indeed, on any character other than an 3178.Sq a , 3179the scanner will have to back up to accept 3180.Qq foo . 3181Similarly, the comment for State #9 concerns when 3182.Qq fooba 3183has been scanned and an 3184.Sq r 3185does not follow. 3186.Pp 3187The final comment reminds us that there's no point going to 3188all the trouble of removing backing up from the rules unless we're using 3189.Fl Cf 3190or 3191.Fl CF , 3192since there's no performance gain doing so with compressed scanners. 3193.Pp 3194The way to remove the backing up is to add 3195.Qq error 3196rules: 3197.Bd -literal -offset indent 3198%% 3199foo return TOK_KEYWORD; 3200foobar return TOK_KEYWORD; 3201 3202fooba | 3203foob | 3204fo { 3205 /* false alarm, not really a keyword */ 3206 return TOK_ID; 3207} 3208.Ed 3209.Pp 3210Eliminating backing up among a list of keywords can also be done using a 3211.Qq catch-all 3212rule: 3213.Bd -literal -offset indent 3214%% 3215foo return TOK_KEYWORD; 3216foobar return TOK_KEYWORD; 3217 3218[a-z]+ return TOK_ID; 3219.Ed 3220.Pp 3221This is usually the best solution when appropriate. 3222.Pp 3223Backing up messages tend to cascade. 3224With a complicated set of rules it's not uncommon to get hundreds of messages. 3225If one can decipher them, though, 3226it often only takes a dozen or so rules to eliminate the backing up 3227(though it's easy to make a mistake and have an error rule accidentally match 3228a valid token; a possible future 3229.Nm 3230feature will be to automatically add rules to eliminate backing up). 3231.Pp 3232It's important to keep in mind that the benefits of eliminating 3233backing up are gained only if 3234.Em every 3235instance of backing up is eliminated. 3236Leaving just one gains nothing. 3237.Pp 3238.Em Variable 3239trailing context 3240(where both the leading and trailing parts do not have a fixed length) 3241entails almost the same performance loss as 3242.Em REJECT 3243.Pq i.e., substantial . 3244So when possible a rule like: 3245.Bd -literal -offset indent 3246%% 3247mouse|rat/(cat|dog) run(); 3248.Ed 3249.Pp 3250is better written: 3251.Bd -literal -offset indent 3252%% 3253mouse/cat|dog run(); 3254rat/cat|dog run(); 3255.Ed 3256.Pp 3257or as 3258.Bd -literal -offset indent 3259%% 3260mouse|rat/cat run(); 3261mouse|rat/dog run(); 3262.Ed 3263.Pp 3264Note that here the special 3265.Sq |\& 3266action does not provide any savings, and can even make things worse (see 3267.Sx BUGS 3268below). 3269.Pp 3270Another area where the user can increase a scanner's performance 3271.Pq and one that's easier to implement 3272arises from the fact that the longer the tokens matched, 3273the faster the scanner will run. 3274This is because with long tokens the processing of most input 3275characters takes place in the 3276.Pq short 3277inner scanning loop, and does not often have to go through the additional work 3278of setting up the scanning environment (e.g., 3279.Fa yytext ) 3280for the action. 3281Recall the scanner for C comments: 3282.Bd -literal -offset indent 3283%x comment 3284%% 3285int line_num = 1; 3286 3287"/*" BEGIN(comment); 3288 3289<comment>[^*\en]* 3290<comment>"*"+[^*/\en]* 3291<comment>\en ++line_num; 3292<comment>"*"+"/" BEGIN(INITIAL); 3293.Ed 3294.Pp 3295This could be sped up by writing it as: 3296.Bd -literal -offset indent 3297%x comment 3298%% 3299int line_num = 1; 3300 3301"/*" BEGIN(comment); 3302 3303<comment>[^*\en]* 3304<comment>[^*\en]*\en ++line_num; 3305<comment>"*"+[^*/\en]* 3306<comment>"*"+[^*/\en]*\en ++line_num; 3307<comment>"*"+"/" BEGIN(INITIAL); 3308.Ed 3309.Pp 3310Now instead of each newline requiring the processing of another action, 3311recognizing the newlines is 3312.Qq distributed 3313over the other rules to keep the matched text as long as possible. 3314Note that adding rules does 3315.Em not 3316slow down the scanner! 3317The speed of the scanner is independent of the number of rules or 3318(modulo the considerations given at the beginning of this section) 3319how complicated the rules are with regard to operators such as 3320.Sq * 3321and 3322.Sq |\& . 3323.Pp 3324A final example in speeding up a scanner: 3325scan through a file containing identifiers and keywords, one per line 3326and with no other extraneous characters, and recognize all the keywords. 3327A natural first approach is: 3328.Bd -literal -offset indent 3329%% 3330asm | 3331auto | 3332break | 3333\&... etc ... 3334volatile | 3335while /* it's a keyword */ 3336 3337\&.|\en /* it's not a keyword */ 3338.Ed 3339.Pp 3340To eliminate the back-tracking, introduce a catch-all rule: 3341.Bd -literal -offset indent 3342%% 3343asm | 3344auto | 3345break | 3346\&... etc ... 3347volatile | 3348while /* it's a keyword */ 3349 3350[a-z]+ | 3351\&.|\en /* it's not a keyword */ 3352.Ed 3353.Pp 3354Now, if it's guaranteed that there's exactly one word per line, 3355then we can reduce the total number of matches by a half by 3356merging in the recognition of newlines with that of the other tokens: 3357.Bd -literal -offset indent 3358%% 3359asm\en | 3360auto\en | 3361break\en | 3362\&... etc ... 3363volatile\en | 3364while\en /* it's a keyword */ 3365 3366[a-z]+\en | 3367\&.|\en /* it's not a keyword */ 3368.Ed 3369.Pp 3370One has to be careful here, 3371as we have now reintroduced backing up into the scanner. 3372In particular, while we know that there will never be any characters 3373in the input stream other than letters or newlines, 3374.Nm 3375can't figure this out, and it will plan for possibly needing to back up 3376when it has scanned a token like 3377.Qq auto 3378and then the next character is something other than a newline or a letter. 3379Previously it would then just match the 3380.Qq auto 3381rule and be done, but now it has no 3382.Qq auto 3383rule, only an 3384.Qq auto\en 3385rule. 3386To eliminate the possibility of backing up, 3387we could either duplicate all rules but without final newlines or, 3388since we never expect to encounter such an input and therefore don't 3389how it's classified, we can introduce one more catch-all rule, 3390this one which doesn't include a newline: 3391.Bd -literal -offset indent 3392%% 3393asm\en | 3394auto\en | 3395break\en | 3396\&... etc ... 3397volatile\en | 3398while\en /* it's a keyword */ 3399 3400[a-z]+\en | 3401[a-z]+ | 3402\&.|\en /* it's not a keyword */ 3403.Ed 3404.Pp 3405Compiled with 3406.Fl Cf , 3407this is about as fast as one can get a 3408.Nm 3409scanner to go for this particular problem. 3410.Pp 3411A final note: 3412.Nm 3413is slow when matching NUL's, 3414particularly when a token contains multiple NUL's. 3415It's best to write rules which match short 3416amounts of text if it's anticipated that the text will often include NUL's. 3417.Pp 3418Another final note regarding performance: as mentioned above in the section 3419.Sx HOW THE INPUT IS MATCHED , 3420dynamically resizing 3421.Fa yytext 3422to accommodate huge tokens is a slow process because it presently requires that 3423the 3424.Pq huge 3425token be rescanned from the beginning. 3426Thus if performance is vital, it is better to attempt to match 3427.Qq large 3428quantities of text but not 3429.Qq huge 3430quantities, where the cutoff between the two is at about 8K characters/token. 3431.Sh GENERATING C++ SCANNERS 3432.Nm 3433provides two different ways to generate scanners for use with C++. 3434The first way is to simply compile a scanner generated by 3435.Nm 3436using a C++ compiler instead of a C compiler. 3437This should not generate any compilation errors 3438(please report any found to the email address given in the 3439.Sx AUTHORS 3440section below). 3441C++ code can then be used in rule actions instead of C code. 3442Note that the default input source for scanners remains 3443.Fa yyin , 3444and default echoing is still done to 3445.Fa yyout . 3446Both of these remain 3447.Fa FILE * 3448variables and not C++ streams. 3449.Pp 3450.Nm 3451can also be used to generate a C++ scanner class, using the 3452.Fl + 3453option (or, equivalently, 3454.Dq %option c++ ) , 3455which is automatically specified if the name of the flex executable ends in a 3456.Sq + , 3457such as 3458.Nm flex++ . 3459When using this option, 3460.Nm 3461defaults to generating the scanner to the file 3462.Pa lex.yy.cc 3463instead of 3464.Pa lex.yy.c . 3465The generated scanner includes the header file 3466.In g++/FlexLexer.h , 3467which defines the interface to two C++ classes. 3468.Pp 3469The first class, 3470.Em FlexLexer , 3471provides an abstract base class defining the general scanner class interface. 3472It provides the following member functions: 3473.Bl -tag -width Ds 3474.It const char* YYText() 3475Returns the text of the most recently matched token, the equivalent of 3476.Fa yytext . 3477.It int YYLeng() 3478Returns the length of the most recently matched token, the equivalent of 3479.Fa yyleng . 3480.It int lineno() const 3481Returns the current input line number 3482(see 3483.Dq %option yylineno ) , 3484or 1 if 3485.Dq %option yylineno 3486was not used. 3487.It void set_debug(int flag) 3488Sets the debugging flag for the scanner, equivalent to assigning to 3489.Fa yy_flex_debug 3490(see the 3491.Sx OPTIONS 3492section above). 3493Note that the scanner must be built using 3494.Dq %option debug 3495to include debugging information in it. 3496.It int debug() const 3497Returns the current setting of the debugging flag. 3498.El 3499.Pp 3500Also provided are member functions equivalent to 3501.Fn yy_switch_to_buffer , 3502.Fn yy_create_buffer 3503(though the first argument is an 3504.Fa std::istream* 3505object pointer and not a 3506.Fa FILE* ) , 3507.Fn yy_flush_buffer , 3508.Fn yy_delete_buffer , 3509and 3510.Fn yyrestart 3511(again, the first argument is an 3512.Fa std::istream* 3513object pointer). 3514.Pp 3515The second class defined in 3516.In g++/FlexLexer.h 3517is 3518.Fa yyFlexLexer , 3519which is derived from 3520.Fa FlexLexer . 3521It defines the following additional member functions: 3522.Bl -tag -width Ds 3523.It "yyFlexLexer(std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0)" 3524Constructs a 3525.Fa yyFlexLexer 3526object using the given streams for input and output. 3527If not specified, the streams default to 3528.Fa cin 3529and 3530.Fa cout , 3531respectively. 3532.It virtual int yylex() 3533Performs the same role as 3534.Fn yylex 3535does for ordinary flex scanners: it scans the input stream, consuming 3536tokens, until a rule's action returns a value. 3537If subclass 3538.Sq S 3539is derived from 3540.Fa yyFlexLexer , 3541in order to access the member functions and variables of 3542.Sq S 3543inside 3544.Fn yylex , 3545use 3546.Dq %option yyclass="S" 3547to inform 3548.Nm 3549that the 3550.Sq S 3551subclass will be used instead of 3552.Fa yyFlexLexer . 3553In this case, rather than generating 3554.Dq yyFlexLexer::yylex() , 3555.Nm 3556generates 3557.Dq S::yylex() 3558(and also generates a dummy 3559.Dq yyFlexLexer::yylex() 3560that calls 3561.Dq yyFlexLexer::LexerError() 3562if called). 3563.It "virtual void switch_streams(std::istream* new_in = 0, std::ostream* new_out = 0)" 3564Reassigns 3565.Fa yyin 3566to 3567.Fa new_in 3568.Pq if non-nil 3569and 3570.Fa yyout 3571to 3572.Fa new_out 3573.Pq ditto , 3574deleting the previous input buffer if 3575.Fa yyin 3576is reassigned. 3577.It int yylex(std::istream* new_in, std::ostream* new_out = 0) 3578First switches the input streams via 3579.Dq switch_streams(new_in, new_out) 3580and then returns the value of 3581.Fn yylex . 3582.El 3583.Pp 3584In addition, 3585.Fa yyFlexLexer 3586defines the following protected virtual functions which can be redefined 3587in derived classes to tailor the scanner: 3588.Bl -tag -width Ds 3589.It virtual int LexerInput(char* buf, int max_size) 3590Reads up to 3591.Fa max_size 3592characters into 3593.Fa buf 3594and returns the number of characters read. 3595To indicate end-of-input, return 0 characters. 3596Note that 3597.Qq interactive 3598scanners (see the 3599.Fl B 3600and 3601.Fl I 3602flags) define the macro 3603.Dv YY_INTERACTIVE . 3604If 3605.Fn LexerInput 3606has been redefined, and it's necessary to take different actions depending on 3607whether or not the scanner might be scanning an interactive input source, 3608it's possible to test for the presence of this name via 3609.Dq #ifdef . 3610.It virtual void LexerOutput(const char* buf, int size) 3611Writes out 3612.Fa size 3613characters from the buffer 3614.Fa buf , 3615which, while NUL-terminated, may also contain 3616.Qq internal 3617NUL's if the scanner's rules can match text with NUL's in them. 3618.It virtual void LexerError(const char* msg) 3619Reports a fatal error message. 3620The default version of this function writes the message to the stream 3621.Fa cerr 3622and exits. 3623.El 3624.Pp 3625Note that a 3626.Fa yyFlexLexer 3627object contains its entire scanning state. 3628Thus such objects can be used to create reentrant scanners. 3629Multiple instances of the same 3630.Fa yyFlexLexer 3631class can be instantiated, and multiple C++ scanner classes can be combined 3632in the same program using the 3633.Fl P 3634option discussed above. 3635.Pp 3636Finally, note that the 3637.Dq %array 3638feature is not available to C++ scanner classes; 3639.Dq %pointer 3640must be used 3641.Pq the default . 3642.Pp 3643Here is an example of a simple C++ scanner: 3644.Bd -literal -offset indent 3645// An example of using the flex C++ scanner class. 3646 3647%{ 3648#include <errno.h> 3649int mylineno = 0; 3650%} 3651 3652string \e"[^\en"]+\e" 3653 3654ws [ \et]+ 3655 3656alpha [A-Za-z] 3657dig [0-9] 3658name ({alpha}|{dig}|\e$)({alpha}|{dig}|[_.\e-/$])* 3659num1 [-+]?{dig}+\e.?([eE][-+]?{dig}+)? 3660num2 [-+]?{dig}*\e.{dig}+([eE][-+]?{dig}+)? 3661number {num1}|{num2} 3662 3663%% 3664 3665{ws} /* skip blanks and tabs */ 3666 3667"/*" { 3668 int c; 3669 3670 while ((c = yyinput()) != 0) { 3671 if(c == '\en') 3672 ++mylineno; 3673 else if(c == '*') { 3674 if ((c = yyinput()) == '/') 3675 break; 3676 else 3677 unput(c); 3678 } 3679 } 3680} 3681 3682{number} cout << "number " << YYText() << '\en'; 3683 3684\en mylineno++; 3685 3686{name} cout << "name " << YYText() << '\en'; 3687 3688{string} cout << "string " << YYText() << '\en'; 3689 3690%% 3691 3692int main(int /* argc */, char** /* argv */) 3693{ 3694 FlexLexer* lexer = new yyFlexLexer; 3695 while(lexer->yylex() != 0) 3696 ; 3697 return 0; 3698} 3699.Ed 3700.Pp 3701To create multiple 3702.Pq different 3703lexer classes, use the 3704.Fl P 3705flag 3706(or the 3707.Dq prefix= 3708option) 3709to rename each 3710.Fa yyFlexLexer 3711to some other 3712.Fa xxFlexLexer . 3713.In g++/FlexLexer.h 3714can then be included in other sources once per lexer class, first renaming 3715.Fa yyFlexLexer 3716as follows: 3717.Bd -literal -offset indent 3718#undef yyFlexLexer 3719#define yyFlexLexer xxFlexLexer 3720#include <g++/FlexLexer.h> 3721 3722#undef yyFlexLexer 3723#define yyFlexLexer zzFlexLexer 3724#include <g++/FlexLexer.h> 3725.Ed 3726.Pp 3727If, for example, 3728.Dq %option prefix="xx" 3729is used for one scanner and 3730.Dq %option prefix="zz" 3731is used for the other. 3732.Pp 3733.Sy IMPORTANT : 3734the present form of the scanning class is experimental 3735and may change considerably between major releases. 3736.Sh INCOMPATIBILITIES WITH LEX AND POSIX 3737.Nm 3738is a rewrite of the 3739.At 3740.Nm lex 3741tool 3742(the two implementations do not share any code, though), 3743with some extensions and incompatibilities, both of which are of concern 3744to those who wish to write scanners acceptable to either implementation. 3745.Nm 3746is fully compliant with the 3747.Tn POSIX 3748.Nm lex 3749specification, except that when using 3750.Dq %pointer 3751.Pq the default , 3752a call to 3753.Fn unput 3754destroys the contents of 3755.Fa yytext , 3756which is counter to the 3757.Tn POSIX 3758specification. 3759.Pp 3760In this section we discuss all of the known areas of incompatibility between 3761.Nm , 3762.At 3763.Nm lex , 3764and the 3765.Tn POSIX 3766specification. 3767.Pp 3768.Nm flex Ns 's 3769.Fl l 3770option turns on maximum compatibility with the original 3771.At 3772.Nm lex 3773implementation, at the cost of a major loss in the generated scanner's 3774performance. 3775We note below which incompatibilities can be overcome using the 3776.Fl l 3777option. 3778.Pp 3779.Nm 3780is fully compatible with 3781.Nm lex 3782with the following exceptions: 3783.Bl -dash 3784.It 3785The undocumented 3786.Nm lex 3787scanner internal variable 3788.Fa yylineno 3789is not supported unless 3790.Fl l 3791or 3792.Dq %option yylineno 3793is used. 3794.Pp 3795.Fa yylineno 3796should be maintained on a per-buffer basis, rather than a per-scanner 3797.Pq single global variable 3798basis. 3799.Pp 3800.Fa yylineno 3801is not part of the 3802.Tn POSIX 3803specification. 3804.It 3805The 3806.Fn input 3807routine is not redefinable, though it may be called to read characters 3808following whatever has been matched by a rule. 3809If 3810.Fn input 3811encounters an end-of-file, the normal 3812.Fn yywrap 3813processing is done. 3814A 3815.Dq real 3816end-of-file is returned by 3817.Fn input 3818as 3819.Dv EOF . 3820.Pp 3821Input is instead controlled by defining the 3822.Dv YY_INPUT 3823macro. 3824.Pp 3825The 3826.Nm 3827restriction that 3828.Fn input 3829cannot be redefined is in accordance with the 3830.Tn POSIX 3831specification, which simply does not specify any way of controlling the 3832scanner's input other than by making an initial assignment to 3833.Fa yyin . 3834.It 3835The 3836.Fn unput 3837routine is not redefinable. 3838This restriction is in accordance with 3839.Tn POSIX . 3840.It 3841.Nm 3842scanners are not as reentrant as 3843.Nm lex 3844scanners. 3845In particular, if a scanner is interactive and 3846an interrupt handler long-jumps out of the scanner, 3847and the scanner is subsequently called again, 3848the following error message may be displayed: 3849.Pp 3850.D1 fatal flex scanner internal error--end of buffer missed 3851.Pp 3852To reenter the scanner, first use 3853.Pp 3854.Dl yyrestart(yyin); 3855.Pp 3856Note that this call will throw away any buffered input; 3857usually this isn't a problem with an interactive scanner. 3858.Pp 3859Also note that flex C++ scanner classes are reentrant, 3860so if using C++ is an option , they should be used instead. 3861See 3862.Sx GENERATING C++ SCANNERS 3863above for details. 3864.It 3865.Fn output 3866is not supported. 3867Output from the 3868.Em ECHO 3869macro is done to the file-pointer 3870.Fa yyout 3871.Pq default stdout . 3872.Pp 3873.Fn output 3874is not part of the 3875.Tn POSIX 3876specification. 3877.It 3878.Nm lex 3879does not support exclusive start conditions 3880.Pq %x , 3881though they are in the 3882.Tn POSIX 3883specification. 3884.It 3885When definitions are expanded, 3886.Nm 3887encloses them in parentheses. 3888With 3889.Nm lex , 3890the following: 3891.Bd -literal -offset indent 3892NAME [A-Z][A-Z0-9]* 3893%% 3894foo{NAME}? printf("Found it\en"); 3895%% 3896.Ed 3897.Pp 3898will not match the string 3899.Qq foo 3900because when the macro is expanded the rule is equivalent to 3901.Qq foo[A-Z][A-Z0-9]*? 3902and the precedence is such that the 3903.Sq ?\& 3904is associated with 3905.Qq [A-Z0-9]* . 3906With 3907.Nm , 3908the rule will be expanded to 3909.Qq foo([A-Z][A-Z0-9]*)? 3910and so the string 3911.Qq foo 3912will match. 3913.Pp 3914Note that if the definition begins with 3915.Sq ^ 3916or ends with 3917.Sq $ 3918then it is not expanded with parentheses, to allow these operators to appear in 3919definitions without losing their special meanings. 3920But the 3921.Sq Aq s , 3922.Sq / , 3923and 3924.Aq Aq EOF 3925operators cannot be used in a 3926.Nm 3927definition. 3928.Pp 3929Using 3930.Fl l 3931results in the 3932.Nm lex 3933behavior of no parentheses around the definition. 3934.Pp 3935The 3936.Tn POSIX 3937specification is that the definition be enclosed in parentheses. 3938.It 3939Some implementations of 3940.Nm lex 3941allow a rule's action to begin on a separate line, 3942if the rule's pattern has trailing whitespace: 3943.Bd -literal -offset indent 3944%% 3945foo|bar<space here> 3946 { foobar_action(); } 3947.Ed 3948.Pp 3949.Nm 3950does not support this feature. 3951.It 3952The 3953.Nm lex 3954.Sq %r 3955.Pq generate a Ratfor scanner 3956option is not supported. 3957It is not part of the 3958.Tn POSIX 3959specification. 3960.It 3961After a call to 3962.Fn unput , 3963.Fa yytext 3964is undefined until the next token is matched, 3965unless the scanner was built using 3966.Dq %array . 3967This is not the case with 3968.Nm lex 3969or the 3970.Tn POSIX 3971specification. 3972The 3973.Fl l 3974option does away with this incompatibility. 3975.It 3976The precedence of the 3977.Sq {} 3978.Pq numeric range 3979operator is different. 3980.Nm lex 3981interprets 3982.Qq abc{1,3} 3983as match one, two, or three occurrences of 3984.Sq abc , 3985whereas 3986.Nm 3987interprets it as match 3988.Sq ab 3989followed by one, two, or three occurrences of 3990.Sq c . 3991The latter is in agreement with the 3992.Tn POSIX 3993specification. 3994.It 3995The precedence of the 3996.Sq ^ 3997operator is different. 3998.Nm lex 3999interprets 4000.Qq ^foo|bar 4001as match either 4002.Sq foo 4003at the beginning of a line, or 4004.Sq bar 4005anywhere, whereas 4006.Nm 4007interprets it as match either 4008.Sq foo 4009or 4010.Sq bar 4011if they come at the beginning of a line. 4012The latter is in agreement with the 4013.Tn POSIX 4014specification. 4015.It 4016The special table-size declarations such as 4017.Sq %a 4018supported by 4019.Nm lex 4020are not required by 4021.Nm 4022scanners; 4023.Nm 4024ignores them. 4025.It 4026The name 4027.Dv FLEX_SCANNER 4028is #define'd so scanners may be written for use with either 4029.Nm 4030or 4031.Nm lex . 4032Scanners also include 4033.Dv YY_FLEX_MAJOR_VERSION 4034and 4035.Dv YY_FLEX_MINOR_VERSION 4036indicating which version of 4037.Nm 4038generated the scanner 4039(for example, for the 2.5 release, these defines would be 2 and 5, 4040respectively). 4041.El 4042.Pp 4043The following 4044.Nm 4045features are not included in 4046.Nm lex 4047or the 4048.Tn POSIX 4049specification: 4050.Bd -unfilled -offset indent 4051C++ scanners 4052%option 4053start condition scopes 4054start condition stacks 4055interactive/non-interactive scanners 4056yy_scan_string() and friends 4057yyterminate() 4058yy_set_interactive() 4059yy_set_bol() 4060YY_AT_BOL() 4061<<EOF>> 4062<*> 4063YY_DECL 4064YY_START 4065YY_USER_ACTION 4066YY_USER_INIT 4067#line directives 4068%{}'s around actions 4069multiple actions on a line 4070.Ed 4071.Pp 4072plus almost all of the 4073.Nm 4074flags. 4075The last feature in the list refers to the fact that with 4076.Nm 4077multiple actions can be placed on the same line, 4078separated with semi-colons, while with 4079.Nm lex , 4080the following 4081.Pp 4082.Dl foo handle_foo(); ++num_foos_seen; 4083.Pp 4084is 4085.Pq rather surprisingly 4086truncated to 4087.Pp 4088.Dl foo handle_foo(); 4089.Pp 4090.Nm 4091does not truncate the action. 4092Actions that are not enclosed in braces 4093are simply terminated at the end of the line. 4094.Sh FILES 4095.Bl -tag -width "<g++/FlexLexer.h>" 4096.It Pa flex.skl 4097Skeleton scanner. 4098This file is only used when building flex, not when 4099.Nm 4100executes. 4101.It Pa lex.backup 4102Backing-up information for the 4103.Fl b 4104flag (called 4105.Pa lex.bck 4106on some systems). 4107.It Pa lex.yy.c 4108Generated scanner 4109(called 4110.Pa lexyy.c 4111on some systems). 4112.It Pa lex.yy.cc 4113Generated C++ scanner class, when using 4114.Fl + . 4115.It In g++/FlexLexer.h 4116Header file defining the C++ scanner base class, 4117.Fa FlexLexer , 4118and its derived class, 4119.Fa yyFlexLexer . 4120.It Pa /usr/lib/libl.* 4121.Nm 4122libraries. 4123The 4124.Pa /usr/lib/libfl.*\& 4125libraries are links to these. 4126Scanners must be linked using either 4127.Fl \&ll 4128or 4129.Fl lfl . 4130.El 4131.Sh EXIT STATUS 4132.Ex -std flex 4133.Sh DIAGNOSTICS 4134.Bl -diag 4135.It warning, rule cannot be matched 4136Indicates that the given rule cannot be matched because it follows other rules 4137that will always match the same text as it. 4138For example, in the following 4139.Dq foo 4140cannot be matched because it comes after an identifier 4141.Qq catch-all 4142rule: 4143.Bd -literal -offset indent 4144[a-z]+ got_identifier(); 4145foo got_foo(); 4146.Ed 4147.Pp 4148Using 4149.Em REJECT 4150in a scanner suppresses this warning. 4151.It "warning, \-s option given but default rule can be matched" 4152Means that it is possible 4153.Pq perhaps only in a particular start condition 4154that the default rule 4155.Pq match any single character 4156is the only one that will match a particular input. 4157Since 4158.Fl s 4159was given, presumably this is not intended. 4160.It reject_used_but_not_detected undefined 4161.It yymore_used_but_not_detected undefined 4162These errors can occur at compile time. 4163They indicate that the scanner uses 4164.Em REJECT 4165or 4166.Fn yymore 4167but that 4168.Nm 4169failed to notice the fact, meaning that 4170.Nm 4171scanned the first two sections looking for occurrences of these actions 4172and failed to find any, but somehow they snuck in 4173.Pq via an #include file, for example . 4174Use 4175.Dq %option reject 4176or 4177.Dq %option yymore 4178to indicate to 4179.Nm 4180that these features are really needed. 4181.It flex scanner jammed 4182A scanner compiled with 4183.Fl s 4184has encountered an input string which wasn't matched by any of its rules. 4185This error can also occur due to internal problems. 4186.It token too large, exceeds YYLMAX 4187The scanner uses 4188.Dq %array 4189and one of its rules matched a string longer than the 4190.Dv YYLMAX 4191constant 4192.Pq 8K bytes by default . 4193The value can be increased by #define'ing 4194.Dv YYLMAX 4195in the definitions section of 4196.Nm 4197input. 4198.It "scanner requires \-8 flag to use the character 'x'" 4199The scanner specification includes recognizing the 8-bit character 4200.Sq x 4201and the 4202.Fl 8 4203flag was not specified, and defaulted to 7-bit because the 4204.Fl Cf 4205or 4206.Fl CF 4207table compression options were used. 4208See the discussion of the 4209.Fl 7 4210flag for details. 4211.It flex scanner push-back overflow 4212unput() was used to push back so much text that the scanner's buffer 4213could not hold both the pushed-back text and the current token in 4214.Fa yytext . 4215Ideally the scanner should dynamically resize the buffer in this case, 4216but at present it does not. 4217.It "input buffer overflow, can't enlarge buffer because scanner uses REJECT" 4218The scanner was working on matching an extremely large token and needed 4219to expand the input buffer. 4220This doesn't work with scanners that use 4221.Em REJECT . 4222.It "fatal flex scanner internal error--end of buffer missed" 4223This can occur in a scanner which is reentered after a long-jump 4224has jumped out 4225.Pq or over 4226the scanner's activation frame. 4227Before reentering the scanner, use: 4228.Pp 4229.Dl yyrestart(yyin); 4230.Pp 4231or, as noted above, switch to using the C++ scanner class. 4232.It "too many start conditions in <> construct!" 4233More start conditions than exist were listed in a <> construct 4234(so at least one of them must have been listed twice). 4235.El 4236.Sh SEE ALSO 4237.Xr awk 1 , 4238.Xr sed 1 , 4239.Xr yacc 1 4240.Rs 4241.\" 4.4BSD PSD:16 4242.%A M. E. Lesk 4243.%T Lex \(em Lexical Analyzer Generator 4244.%I AT&T Bell Laboratories 4245.%R Computing Science Technical Report 4246.%N 39 4247.%D October 1975 4248.Re 4249.Rs 4250.%A John Levine 4251.%A Tony Mason 4252.%A Doug Brown 4253.%B Lex & Yacc 4254.%I O'Reilly and Associates 4255.%N 2nd edition 4256.Re 4257.Rs 4258.%A Alfred Aho 4259.%A Ravi Sethi 4260.%A Jeffrey Ullman 4261.%B Compilers: Principles, Techniques and Tools 4262.%I Addison-Wesley 4263.%D 1986 4264.%O "Describes the pattern-matching techniques used by flex (deterministic finite automata)" 4265.Re 4266.Sh STANDARDS 4267The 4268.Nm lex 4269utility is compliant with the 4270.St -p1003.1-2008 4271specification, 4272though its presence is optional. 4273.Pp 4274The flags 4275.Op Fl 78BbCdFfhIiLloPpSsTVw+? , 4276.Op Fl -help , 4277and 4278.Op Fl -version 4279are extensions to that specification. 4280.Pp 4281See also the 4282.Sx INCOMPATIBILITIES WITH LEX AND POSIX 4283section, above. 4284.Sh AUTHORS 4285Vern Paxson, with the help of many ideas and much inspiration from 4286Van Jacobson. 4287Original version by Jef Poskanzer. 4288The fast table representation is a partial implementation of a design done by 4289Van Jacobson. 4290The implementation was done by Kevin Gong and Vern Paxson. 4291.Pp 4292Thanks to the many 4293.Nm 4294beta-testers, feedbackers, and contributors, especially Francois Pinard, 4295Casey Leedom, 4296Robert Abramovitz, 4297Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai, 4298Neal Becker, Nelson H.F. Beebe, 4299.Mt benson@odi.com , 4300Karl Berry, Peter A. Bigot, Simon Blanchard, 4301Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, 4302Brian Clapper, J.T. Conklin, 4303Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David 4304Daniels, Chris G. Demetriou, Theo de Raadt, 4305Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, 4306Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl, 4307Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz, 4308Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel, 4309Jan Hajic, Charles Hemphill, NORO Hideo, 4310Jarkko Hietaniemi, Scott Hofmann, 4311Jeff Honig, Dana Hudes, Eric Hughes, John Interrante, 4312Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, 4313Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane, 4314Amir Katz, 4315.Mt ken@ken.hilco.com , 4316Kevin B. Kenny, 4317Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht, 4318Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle, 4319David Loffredo, Mike Long, 4320Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall, 4321Bengt Martensson, Chris Metcalf, 4322Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum, 4323G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, 4324Richard Ohnemus, Karsten Pahnke, 4325Sven Panne, Roland Pesch, Walter Pelissero, Gaumond Pierre, 4326Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, 4327Frederic Raimbault, Pat Rankin, Rick Richardson, 4328Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini, 4329Andreas Scherer, Darrell Schiebel, Raf Schietekat, 4330Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, 4331Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist, 4332Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor, 4333Chris Thewalt, Richard M. Timoney, Jodi Tsai, 4334Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, 4335Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, 4336and those whose names have slipped my marginal mail-archiving skills 4337but whose contributions are appreciated all the 4338same. 4339.Pp 4340Thanks to Keith Bostic, Jon Forrest, Noah Friedman, 4341John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T. 4342Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various 4343distribution headaches. 4344.Pp 4345Thanks to Esmond Pitt and Earle Horton for 8-bit character support; 4346to Benson Margulies and Fred Burke for C++ support; 4347to Kent Williams and Tom Epperly for C++ class support; 4348to Ove Ewerlid for support of NUL's; 4349and to Eric Hughes for support of multiple buffers. 4350.Pp 4351This work was primarily done when I was with the Real Time Systems Group 4352at the Lawrence Berkeley Laboratory in Berkeley, CA. 4353Many thanks to all there for the support I received. 4354.Pp 4355Send comments to 4356.Aq Mt vern@ee.lbl.gov . 4357.Sh BUGS 4358Some trailing context patterns cannot be properly matched and generate 4359warning messages 4360.Pq "dangerous trailing context" . 4361These are patterns where the ending of the first part of the rule 4362matches the beginning of the second part, such as 4363.Qq zx*/xy* , 4364where the 4365.Sq x* 4366matches the 4367.Sq x 4368at the beginning of the trailing context. 4369(Note that the POSIX draft states that the text matched by such patterns 4370is undefined.) 4371.Pp 4372For some trailing context rules, parts which are actually fixed-length are 4373not recognized as such, leading to the above mentioned performance loss. 4374In particular, parts using 4375.Sq |\& 4376or 4377.Sq {n} 4378(such as 4379.Qq foo{3} ) 4380are always considered variable-length. 4381.Pp 4382Combining trailing context with the special 4383.Sq |\& 4384action can result in fixed trailing context being turned into 4385the more expensive variable trailing context. 4386For example, in the following: 4387.Bd -literal -offset indent 4388%% 4389abc | 4390xyz/def 4391.Ed 4392.Pp 4393Use of 4394.Fn unput 4395invalidates yytext and yyleng, unless the 4396.Dq %array 4397directive 4398or the 4399.Fl l 4400option has been used. 4401.Pp 4402Pattern-matching of NUL's is substantially slower than matching other 4403characters. 4404.Pp 4405Dynamic resizing of the input buffer is slow, as it entails rescanning 4406all the text matched so far by the current 4407.Pq generally huge 4408token. 4409.Pp 4410Due to both buffering of input and read-ahead, 4411it is not possible to intermix calls to 4412.In stdio.h 4413routines, such as, for example, 4414.Fn getchar , 4415with 4416.Nm 4417rules and expect it to work. 4418Call 4419.Fn input 4420instead. 4421.Pp 4422The total table entries listed by the 4423.Fl v 4424flag excludes the number of table entries needed to determine 4425what rule has been matched. 4426The number of entries is equal to the number of DFA states 4427if the scanner does not use 4428.Em REJECT , 4429and somewhat greater than the number of states if it does. 4430.Pp 4431.Em REJECT 4432cannot be used with the 4433.Fl f 4434or 4435.Fl F 4436options. 4437.Pp 4438The 4439.Nm 4440internal algorithms need documentation. 4441