1This is flex.info, produced by makeinfo version 4.8 from flex.texi. 2 3INFO-DIR-SECTION Programming 4START-INFO-DIR-ENTRY 5* flex: (flex). Fast lexical analyzer generator (lex replacement). 6END-INFO-DIR-ENTRY 7 8 The flex manual is placed under the same licensing conditions as the 9rest of flex: 10 11 Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007 The Flex 12Project. 13 14 Copyright (C) 1990, 1997 The Regents of the University of California. 15All rights reserved. 16 17 This code is derived from software contributed to Berkeley by Vern 18Paxson. 19 20 The United States Government has rights in this work pursuant to 21contract no. DE-AC03-76SF00098 between the United States Department of 22Energy and the University of California. 23 24 Redistribution and use in source and binary forms, with or without 25modification, are permitted provided that the following conditions are 26met: 27 28 1. Redistributions of source code must retain the above copyright 29 notice, this list of conditions and the following disclaimer. 30 31 2. Redistributions in binary form must reproduce the above copyright 32 notice, this list of conditions and the following disclaimer in the 33 documentation and/or other materials provided with the 34 distribution. 35 36 Neither the name of the University nor the names of its contributors 37may be used to endorse or promote products derived from this software 38without specific prior written permission. 39 40 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED 41WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 42MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 43 44 45File: flex.info, Node: Top, Next: Copyright, Prev: (dir), Up: (dir) 46 47flex 48**** 49 50This manual describes `flex', a tool for generating programs that 51perform pattern-matching on text. The manual includes both tutorial and 52reference sections. 53 54 This edition of `The flex Manual' documents `flex' version 2.5.35. 55It was last updated on 10 September 2007. 56 57 This manual was written by Vern Paxson, Will Estes and John Millaway. 58 59* Menu: 60 61* Copyright:: 62* Reporting Bugs:: 63* Introduction:: 64* Simple Examples:: 65* Format:: 66* Patterns:: 67* Matching:: 68* Actions:: 69* Generated Scanner:: 70* Start Conditions:: 71* Multiple Input Buffers:: 72* EOF:: 73* Misc Macros:: 74* User Values:: 75* Yacc:: 76* Scanner Options:: 77* Performance:: 78* Cxx:: 79* Reentrant:: 80* Lex and Posix:: 81* Memory Management:: 82* Serialized Tables:: 83* Diagnostics:: 84* Limitations:: 85* Bibliography:: 86* FAQ:: 87* Appendices:: 88* Indices:: 89 90 --- The Detailed Node Listing --- 91 92Format of the Input File 93 94* Definitions Section:: 95* Rules Section:: 96* User Code Section:: 97* Comments in the Input:: 98 99Scanner Options 100 101* Options for Specifying Filenames:: 102* Options Affecting Scanner Behavior:: 103* Code-Level And API Options:: 104* Options for Scanner Speed and Size:: 105* Debugging Options:: 106* Miscellaneous Options:: 107 108Reentrant C Scanners 109 110* Reentrant Uses:: 111* Reentrant Overview:: 112* Reentrant Example:: 113* Reentrant Detail:: 114* Reentrant Functions:: 115 116The Reentrant API in Detail 117 118* Specify Reentrant:: 119* Extra Reentrant Argument:: 120* Global Replacement:: 121* Init and Destroy Functions:: 122* Accessor Methods:: 123* Extra Data:: 124* About yyscan_t:: 125 126Memory Management 127 128* The Default Memory Management:: 129* Overriding The Default Memory Management:: 130* A Note About yytext And Memory:: 131 132Serialized Tables 133 134* Creating Serialized Tables:: 135* Loading and Unloading Serialized Tables:: 136* Tables File Format:: 137 138FAQ 139 140* When was flex born?:: 141* How do I expand backslash-escape sequences in C-style quoted strings?:: 142* Why do flex scanners call fileno if it is not ANSI compatible?:: 143* Does flex support recursive pattern definitions?:: 144* How do I skip huge chunks of input (tens of megabytes) while using flex?:: 145* Flex is not matching my patterns in the same order that I defined them.:: 146* My actions are executing out of order or sometimes not at all.:: 147* How can I have multiple input sources feed into the same scanner at the same time?:: 148* Can I build nested parsers that work with the same input file?:: 149* How can I match text only at the end of a file?:: 150* How can I make REJECT cascade across start condition boundaries?:: 151* Why cant I use fast or full tables with interactive mode?:: 152* How much faster is -F or -f than -C?:: 153* If I have a simple grammar cant I just parse it with flex?:: 154* Why doesn't yyrestart() set the start state back to INITIAL?:: 155* How can I match C-style comments?:: 156* The period isn't working the way I expected.:: 157* Can I get the flex manual in another format?:: 158* Does there exist a "faster" NDFA->DFA algorithm?:: 159* How does flex compile the DFA so quickly?:: 160* How can I use more than 8192 rules?:: 161* How do I abandon a file in the middle of a scan and switch to a new file?:: 162* How do I execute code only during initialization (only before the first scan)?:: 163* How do I execute code at termination?:: 164* Where else can I find help?:: 165* Can I include comments in the "rules" section of the file?:: 166* I get an error about undefined yywrap().:: 167* How can I change the matching pattern at run time?:: 168* How can I expand macros in the input?:: 169* How can I build a two-pass scanner?:: 170* How do I match any string not matched in the preceding rules?:: 171* I am trying to port code from AT&T lex that uses yysptr and yysbuf.:: 172* Is there a way to make flex treat NULL like a regular character?:: 173* Whenever flex can not match the input it says "flex scanner jammed".:: 174* Why doesn't flex have non-greedy operators like perl does?:: 175* Memory leak - 16386 bytes allocated by malloc.:: 176* How do I track the byte offset for lseek()?:: 177* How do I use my own I/O classes in a C++ scanner?:: 178* How do I skip as many chars as possible?:: 179* deleteme00:: 180* Are certain equivalent patterns faster than others?:: 181* Is backing up a big deal?:: 182* Can I fake multi-byte character support?:: 183* deleteme01:: 184* Can you discuss some flex internals?:: 185* unput() messes up yy_at_bol:: 186* The | operator is not doing what I want:: 187* Why can't flex understand this variable trailing context pattern?:: 188* The ^ operator isn't working:: 189* Trailing context is getting confused with trailing optional patterns:: 190* Is flex GNU or not?:: 191* ERASEME53:: 192* I need to scan if-then-else blocks and while loops:: 193* ERASEME55:: 194* ERASEME56:: 195* ERASEME57:: 196* Is there a repository for flex scanners?:: 197* How can I conditionally compile or preprocess my flex input file?:: 198* Where can I find grammars for lex and yacc?:: 199* I get an end-of-buffer message for each character scanned.:: 200* unnamed-faq-62:: 201* unnamed-faq-63:: 202* unnamed-faq-64:: 203* unnamed-faq-65:: 204* unnamed-faq-66:: 205* unnamed-faq-67:: 206* unnamed-faq-68:: 207* unnamed-faq-69:: 208* unnamed-faq-70:: 209* unnamed-faq-71:: 210* unnamed-faq-72:: 211* unnamed-faq-73:: 212* unnamed-faq-74:: 213* unnamed-faq-75:: 214* unnamed-faq-76:: 215* unnamed-faq-77:: 216* unnamed-faq-78:: 217* unnamed-faq-79:: 218* unnamed-faq-80:: 219* unnamed-faq-81:: 220* unnamed-faq-82:: 221* unnamed-faq-83:: 222* unnamed-faq-84:: 223* unnamed-faq-85:: 224* unnamed-faq-86:: 225* unnamed-faq-87:: 226* unnamed-faq-88:: 227* unnamed-faq-90:: 228* unnamed-faq-91:: 229* unnamed-faq-92:: 230* unnamed-faq-93:: 231* unnamed-faq-94:: 232* unnamed-faq-95:: 233* unnamed-faq-96:: 234* unnamed-faq-97:: 235* unnamed-faq-98:: 236* unnamed-faq-99:: 237* unnamed-faq-100:: 238* unnamed-faq-101:: 239* What is the difference between YYLEX_PARAM and YY_DECL?:: 240* Why do I get "conflicting types for yylex" error?:: 241* How do I access the values set in a Flex action from within a Bison action?:: 242 243Appendices 244 245* Makefiles and Flex:: 246* Bison Bridge:: 247* M4 Dependency:: 248* Common Patterns:: 249 250Indices 251 252* Concept Index:: 253* Index of Functions and Macros:: 254* Index of Variables:: 255* Index of Data Types:: 256* Index of Hooks:: 257* Index of Scanner Options:: 258 259 260File: flex.info, Node: Copyright, Next: Reporting Bugs, Prev: Top, Up: Top 261 2621 Copyright 263*********** 264 265The flex manual is placed under the same licensing conditions as the 266rest of flex: 267 268 Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007 The Flex 269Project. 270 271 Copyright (C) 1990, 1997 The Regents of the University of California. 272All rights reserved. 273 274 This code is derived from software contributed to Berkeley by Vern 275Paxson. 276 277 The United States Government has rights in this work pursuant to 278contract no. DE-AC03-76SF00098 between the United States Department of 279Energy and the University of California. 280 281 Redistribution and use in source and binary forms, with or without 282modification, are permitted provided that the following conditions are 283met: 284 285 1. Redistributions of source code must retain the above copyright 286 notice, this list of conditions and the following disclaimer. 287 288 2. Redistributions in binary form must reproduce the above copyright 289 notice, this list of conditions and the following disclaimer in the 290 documentation and/or other materials provided with the 291 distribution. 292 293 Neither the name of the University nor the names of its contributors 294may be used to endorse or promote products derived from this software 295without specific prior written permission. 296 297 THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED 298WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 299MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 300 301 302File: flex.info, Node: Reporting Bugs, Next: Introduction, Prev: Copyright, Up: Top 303 3042 Reporting Bugs 305**************** 306 307If you find a bug in `flex', please report it using the SourceForge Bug 308Tracking facilities which can be found on flex's SourceForge Page 309(http://sourceforge.net/projects/flex). 310 311 312File: flex.info, Node: Introduction, Next: Simple Examples, Prev: Reporting Bugs, Up: Top 313 3143 Introduction 315************** 316 317`flex' is a tool for generating "scanners". A scanner is a program 318which recognizes lexical patterns in text. The `flex' program reads 319the given input files, or its standard input if no file names are 320given, for a description of a scanner to generate. The description is 321in the form of pairs of regular expressions and C code, called "rules". 322`flex' generates as output a C source file, `lex.yy.c' by default, 323which defines a routine `yylex()'. This file can be compiled and 324linked with the flex runtime library to produce an executable. When 325the executable is run, it analyzes its input for occurrences of the 326regular expressions. Whenever it finds one, it executes the 327corresponding C code. 328 329 330File: flex.info, Node: Simple Examples, Next: Format, Prev: Introduction, Up: Top 331 3324 Some Simple Examples 333********************** 334 335First some simple examples to get the flavor of how one uses `flex'. 336 337 The following `flex' input specifies a scanner which, when it 338encounters the string `username' will replace it with the user's login 339name: 340 341 342 %% 343 username printf( "%s", getlogin() ); 344 345 By default, any text not matched by a `flex' scanner is copied to 346the output, so the net effect of this scanner is to copy its input file 347to its output with each occurrence of `username' expanded. In this 348input, there is just one rule. `username' is the "pattern" and the 349`printf' is the "action". The `%%' symbol marks the beginning of the 350rules. 351 352 Here's another simple example: 353 354 355 int num_lines = 0, num_chars = 0; 356 357 %% 358 \n ++num_lines; ++num_chars; 359 . ++num_chars; 360 361 %% 362 main() 363 { 364 yylex(); 365 printf( "# of lines = %d, # of chars = %d\n", 366 num_lines, num_chars ); 367 } 368 369 This scanner counts the number of characters and the number of lines 370in its input. It produces no output other than the final report on the 371character and line counts. The first line declares two globals, 372`num_lines' and `num_chars', which are accessible both inside `yylex()' 373and in the `main()' routine declared after the second `%%'. There are 374two rules, one which matches a newline (`\n') and increments both the 375line count and the character count, and one which matches any character 376other than a newline (indicated by the `.' regular expression). 377 378 A somewhat more complicated example: 379 380 381 /* scanner for a toy Pascal-like language */ 382 383 %{ 384 /* need this for the call to atof() below */ 385 #include math.h> 386 %} 387 388 DIGIT [0-9] 389 ID [a-z][a-z0-9]* 390 391 %% 392 393 {DIGIT}+ { 394 printf( "An integer: %s (%d)\n", yytext, 395 atoi( yytext ) ); 396 } 397 398 {DIGIT}+"."{DIGIT}* { 399 printf( "A float: %s (%g)\n", yytext, 400 atof( yytext ) ); 401 } 402 403 if|then|begin|end|procedure|function { 404 printf( "A keyword: %s\n", yytext ); 405 } 406 407 {ID} printf( "An identifier: %s\n", yytext ); 408 409 "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext ); 410 411 "{"[\^{}}\n]*"}" /* eat up one-line comments */ 412 413 [ \t\n]+ /* eat up whitespace */ 414 415 . printf( "Unrecognized character: %s\n", yytext ); 416 417 %% 418 419 main( argc, argv ) 420 int argc; 421 char **argv; 422 { 423 ++argv, --argc; /* skip over program name */ 424 if ( argc > 0 ) 425 yyin = fopen( argv[0], "r" ); 426 else 427 yyin = stdin; 428 429 yylex(); 430 } 431 432 This is the beginnings of a simple scanner for a language like 433Pascal. It identifies different types of "tokens" and reports on what 434it has seen. 435 436 The details of this example will be explained in the following 437sections. 438 439 440File: flex.info, Node: Format, Next: Patterns, Prev: Simple Examples, Up: Top 441 4425 Format of the Input File 443************************** 444 445The `flex' input file consists of three sections, separated by a line 446containing only `%%'. 447 448 449 definitions 450 %% 451 rules 452 %% 453 user code 454 455* Menu: 456 457* Definitions Section:: 458* Rules Section:: 459* User Code Section:: 460* Comments in the Input:: 461 462 463File: flex.info, Node: Definitions Section, Next: Rules Section, Prev: Format, Up: Format 464 4655.1 Format of the Definitions Section 466===================================== 467 468The "definitions section" contains declarations of simple "name" 469definitions to simplify the scanner specification, and declarations of 470"start conditions", which are explained in a later section. 471 472 Name definitions have the form: 473 474 475 name definition 476 477 The `name' is a word beginning with a letter or an underscore (`_') 478followed by zero or more letters, digits, `_', or `-' (dash). The 479definition is taken to begin at the first non-whitespace character 480following the name and continuing to the end of the line. The 481definition can subsequently be referred to using `{name}', which will 482expand to `(definition)'. For example, 483 484 485 DIGIT [0-9] 486 ID [a-z][a-z0-9]* 487 488 Defines `DIGIT' to be a regular expression which matches a single 489digit, and `ID' to be a regular expression which matches a letter 490followed by zero-or-more letters-or-digits. A subsequent reference to 491 492 493 {DIGIT}+"."{DIGIT}* 494 495 is identical to 496 497 498 ([0-9])+"."([0-9])* 499 500 and matches one-or-more digits followed by a `.' followed by 501zero-or-more digits. 502 503 An unindented comment (i.e., a line beginning with `/*') is copied 504verbatim to the output up to the next `*/'. 505 506 Any _indented_ text or text enclosed in `%{' and `%}' is also copied 507verbatim to the output (with the %{ and %} symbols removed). The %{ 508and %} symbols must appear unindented on lines by themselves. 509 510 A `%top' block is similar to a `%{' ... `%}' block, except that the 511code in a `%top' block is relocated to the _top_ of the generated file, 512before any flex definitions (1). The `%top' block is useful when you 513want certain preprocessor macros to be defined or certain files to be 514included before the generated code. The single characters, `{' and 515`}' are used to delimit the `%top' block, as show in the example below: 516 517 518 %top{ 519 /* This code goes at the "top" of the generated file. */ 520 #include <stdint.h> 521 #include <inttypes.h> 522 } 523 524 Multiple `%top' blocks are allowed, and their order is preserved. 525 526 ---------- Footnotes ---------- 527 528 (1) Actually, `yyIN_HEADER' is defined before the `%top' block. 529 530 531File: flex.info, Node: Rules Section, Next: User Code Section, Prev: Definitions Section, Up: Format 532 5335.2 Format of the Rules Section 534=============================== 535 536The "rules" section of the `flex' input contains a series of rules of 537the form: 538 539 540 pattern action 541 542 where the pattern must be unindented and the action must begin on 543the same line. *Note Patterns::, for a further description of patterns 544and actions. 545 546 In the rules section, any indented or %{ %} enclosed text appearing 547before the first rule may be used to declare variables which are local 548to the scanning routine and (after the declarations) code which is to be 549executed whenever the scanning routine is entered. Other indented or 550%{ %} text in the rule section is still copied to the output, but its 551meaning is not well-defined and it may well cause compile-time errors 552(this feature is present for POSIX compliance. *Note Lex and Posix::, 553for other such features). 554 555 Any _indented_ text or text enclosed in `%{' and `%}' is copied 556verbatim to the output (with the %{ and %} symbols removed). The %{ 557and %} symbols must appear unindented on lines by themselves. 558 559 560File: flex.info, Node: User Code Section, Next: Comments in the Input, Prev: Rules Section, Up: Format 561 5625.3 Format of the User Code Section 563=================================== 564 565The user code section is simply copied to `lex.yy.c' verbatim. It is 566used for companion routines which call or are called by the scanner. 567The presence of this section is optional; if it is missing, the second 568`%%' in the input file may be skipped, too. 569 570 571File: flex.info, Node: Comments in the Input, Prev: User Code Section, Up: Format 572 5735.4 Comments in the Input 574========================= 575 576Flex supports C-style comments, that is, anything between `/*' and `*/' 577is considered a comment. Whenever flex encounters a comment, it copies 578the entire comment verbatim to the generated source code. Comments may 579appear just about anywhere, but with the following exceptions: 580 581 * Comments may not appear in the Rules Section wherever flex is 582 expecting a regular expression. This means comments may not appear 583 at the beginning of a line, or immediately following a list of 584 scanner states. 585 586 * Comments may not appear on an `%option' line in the Definitions 587 Section. 588 589 If you want to follow a simple rule, then always begin a comment on a 590new line, with one or more whitespace characters before the initial 591`/*'). This rule will work anywhere in the input file. 592 593 All the comments in the following example are valid: 594 595 596 %{ 597 /* code block */ 598 %} 599 600 /* Definitions Section */ 601 %x STATE_X 602 603 %% 604 /* Rules Section */ 605 ruleA /* after regex */ { /* code block */ } /* after code block */ 606 /* Rules Section (indented) */ 607 <STATE_X>{ 608 ruleC ECHO; 609 ruleD ECHO; 610 %{ 611 /* code block */ 612 %} 613 } 614 %% 615 /* User Code Section */ 616 617 618File: flex.info, Node: Patterns, Next: Matching, Prev: Format, Up: Top 619 6206 Patterns 621********** 622 623The patterns in the input (see *Note Rules Section::) are written using 624an extended set of regular expressions. These are: 625 626`x' 627 match the character 'x' 628 629`.' 630 any character (byte) except newline 631 632`[xyz]' 633 a "character class"; in this case, the pattern matches either an 634 'x', a 'y', or a 'z' 635 636`[abj-oZ]' 637 a "character class" with a range in it; matches an 'a', a 'b', any 638 letter from 'j' through 'o', or a 'Z' 639 640`[^A-Z]' 641 a "negated character class", i.e., any character but those in the 642 class. In this case, any character EXCEPT an uppercase letter. 643 644`[^A-Z\n]' 645 any character EXCEPT an uppercase letter or a newline 646 647`[a-z]{-}[aeiou]' 648 the lowercase consonants 649 650`r*' 651 zero or more r's, where r is any regular expression 652 653`r+' 654 one or more r's 655 656`r?' 657 zero or one r's (that is, "an optional r") 658 659`r{2,5}' 660 anywhere from two to five r's 661 662`r{2,}' 663 two or more r's 664 665`r{4}' 666 exactly 4 r's 667 668`{name}' 669 the expansion of the `name' definition (*note Format::). 670 671`"[xyz]\"foo"' 672 the literal string: `[xyz]"foo' 673 674`\X' 675 if X is `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C 676 interpretation of `\x'. Otherwise, a literal `X' (used to escape 677 operators such as `*') 678 679`\0' 680 a NUL character (ASCII code 0) 681 682`\123' 683 the character with octal value 123 684 685`\x2a' 686 the character with hexadecimal value 2a 687 688`(r)' 689 match an `r'; parentheses are used to override precedence (see 690 below) 691 692`(?r-s:pattern)' 693 apply option `r' and omit option `s' while interpreting pattern. 694 Options may be zero or more of the characters `i', `s', or `x'. 695 696 `i' means case-insensitive. `-i' means case-sensitive. 697 698 `s' alters the meaning of the `.' syntax to match any single byte 699 whatsoever. `-s' alters the meaning of `.' to match any byte 700 except `\n'. 701 702 `x' ignores comments and whitespace in patterns. Whitespace is 703 ignored unless it is backslash-escaped, contained within `""'s, or 704 appears inside a character class. 705 706 The following are all valid: 707 708 709 (?:foo) same as (foo) 710 (?i:ab7) same as ([aA][bB]7) 711 (?-i:ab) same as (ab) 712 (?s:.) same as [\x00-\xFF] 713 (?-s:.) same as [^\n] 714 (?ix-s: a . b) same as ([Aa][^\n][bB]) 715 (?x:a b) same as ("ab") 716 (?x:a\ b) same as ("a b") 717 (?x:a" "b) same as ("a b") 718 (?x:a[ ]b) same as ("a b") 719 (?x:a 720 /* comment */ 721 b 722 c) same as (abc) 723 724`(?# comment )' 725 omit everything within `()'. The first `)' character encountered 726 ends the pattern. It is not possible to for the comment to contain 727 a `)' character. The comment may span lines. 728 729`rs' 730 the regular expression `r' followed by the regular expression `s'; 731 called "concatenation" 732 733`r|s' 734 either an `r' or an `s' 735 736`r/s' 737 an `r' but only if it is followed by an `s'. The text matched by 738 `s' is included when determining whether this rule is the longest 739 match, but is then returned to the input before the action is 740 executed. So the action only sees the text matched by `r'. This 741 type of pattern is called "trailing context". (There are some 742 combinations of `r/s' that flex cannot match correctly. *Note 743 Limitations::, regarding dangerous trailing context.) 744 745`^r' 746 an `r', but only at the beginning of a line (i.e., when just 747 starting to scan, or right after a newline has been scanned). 748 749`r$' 750 an `r', but only at the end of a line (i.e., just before a 751 newline). Equivalent to `r/\n'. 752 753 Note that `flex''s notion of "newline" is exactly whatever the C 754 compiler used to compile `flex' interprets `\n' as; in particular, 755 on some DOS systems you must either filter out `\r's in the input 756 yourself, or explicitly use `r/\r\n' for `r$'. 757 758`<s>r' 759 an `r', but only in start condition `s' (see *Note Start 760 Conditions:: for discussion of start conditions). 761 762`<s1,s2,s3>r' 763 same, but in any of start conditions `s1', `s2', or `s3'. 764 765`<*>r' 766 an `r' in any start condition, even an exclusive one. 767 768`<<EOF>>' 769 an end-of-file. 770 771`<s1,s2><<EOF>>' 772 an end-of-file when in start condition `s1' or `s2' 773 774 Note that inside of a character class, all regular expression 775operators lose their special meaning except escape (`\') and the 776character class operators, `-', `]]', and, at the beginning of the 777class, `^'. 778 779 The regular expressions listed above are grouped according to 780precedence, from highest precedence at the top to lowest at the bottom. 781Those grouped together have equal precedence (see special note on the 782precedence of the repeat operator, `{}', under the documentation for 783the `--posix' POSIX compliance option). For example, 784 785 786 foo|bar* 787 788 is the same as 789 790 791 (foo)|(ba(r*)) 792 793 since the `*' operator has higher precedence than concatenation, and 794concatenation higher than alternation (`|'). This pattern therefore 795matches _either_ the string `foo' _or_ the string `ba' followed by 796zero-or-more `r''s. To match `foo' or zero-or-more repetitions of the 797string `bar', use: 798 799 800 foo|(bar)* 801 802 And to match a sequence of zero or more repetitions of `foo' and 803`bar': 804 805 806 (foo|bar)* 807 808 In addition to characters and ranges of characters, character classes 809can also contain "character class expressions". These are expressions 810enclosed inside `[': and `:]' delimiters (which themselves must appear 811between the `[' and `]' of the character class. Other elements may 812occur inside the character class, too). The valid expressions are: 813 814 815 [:alnum:] [:alpha:] [:blank:] 816 [:cntrl:] [:digit:] [:graph:] 817 [:lower:] [:print:] [:punct:] 818 [:space:] [:upper:] [:xdigit:] 819 820 These expressions all designate a set of characters equivalent to the 821corresponding standard C `isXXX' function. For example, `[:alnum:]' 822designates those characters for which `isalnum()' returns true - i.e., 823any alphabetic or numeric character. Some systems don't provide 824`isblank()', so flex defines `[:blank:]' as a blank or a tab. 825 826 For example, the following character classes are all equivalent: 827 828 829 [[:alnum:]] 830 [[:alpha:][:digit:]] 831 [[:alpha:][0-9]] 832 [a-zA-Z0-9] 833 834 A word of caution. Character classes are expanded immediately when 835seen in the `flex' input. This means the character classes are 836sensitive to the locale in which `flex' is executed, and the resulting 837scanner will not be sensitive to the runtime locale. This may or may 838not be desirable. 839 840 * If your scanner is case-insensitive (the `-i' flag), then 841 `[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'. 842 843 * Character classes with ranges, such as `[a-Z]', should be used with 844 caution in a case-insensitive scanner if the range spans upper or 845 lowercase characters. Flex does not know if you want to fold all 846 upper and lowercase characters together, or if you want the 847 literal numeric range specified (with no case folding). When in 848 doubt, flex will assume that you meant the literal numeric range, 849 and will issue a warning. The exception to this rule is a 850 character range such as `[a-z]' or `[S-W]' where it is obvious 851 that you want case-folding to occur. Here are some examples with 852 the `-i' flag enabled: 853 854 Range Result Literal Range Alternate Range 855 `[a-t]' ok `[a-tA-T]' 856 `[A-T]' ok `[a-tA-T]' 857 `[A-t]' ambiguous `[A-Z\[\\\]_`a-t]' `[a-tA-T]' 858 `[_-{]' ambiguous `[_`a-z{]' `[_`a-zA-Z{]' 859 `[@-C]' ambiguous `[@ABC]' `[@A-Z\[\\\]_`abc]' 860 861 * A negated character class such as the example `[^A-Z]' above 862 _will_ match a newline unless `\n' (or an equivalent escape 863 sequence) is one of the characters explicitly present in the 864 negated character class (e.g., `[^A-Z\n]'). This is unlike how 865 many other regular expression tools treat negated character 866 classes, but unfortunately the inconsistency is historically 867 entrenched. Matching newlines means that a pattern like `[^"]*' 868 can match the entire input unless there's another quote in the 869 input. 870 871 Flex allows negation of character class expressions by prepending 872 `^' to the POSIX character class name. 873 874 875 [:^alnum:] [:^alpha:] [:^blank:] 876 [:^cntrl:] [:^digit:] [:^graph:] 877 [:^lower:] [:^print:] [:^punct:] 878 [:^space:] [:^upper:] [:^xdigit:] 879 880 Flex will issue a warning if the expressions `[:^upper:]' and 881 `[:^lower:]' appear in a case-insensitive scanner, since their 882 meaning is unclear. The current behavior is to skip them entirely, 883 but this may change without notice in future revisions of flex. 884 885 * The `{-}' operator computes the difference of two character 886 classes. For example, `[a-c]{-}[b-z]' represents all the 887 characters in the class `[a-c]' that are not in the class `[b-z]' 888 (which in this case, is just the single character `a'). The `{-}' 889 operator is left associative, so `[abc]{-}[b]{-}[c]' is the same 890 as `[a]'. Be careful not to accidentally create an empty set, 891 which will never match. 892 893 * The `{+}' operator computes the union of two character classes. 894 For example, `[a-z]{+}[0-9]' is the same as `[a-z0-9]'. This 895 operator is useful when preceded by the result of a difference 896 operation, as in, `[[:alpha:]]{-}[[:lower:]]{+}[q]', which is 897 equivalent to `[A-Zq]' in the "C" locale. 898 899 * A rule can have at most one instance of trailing context (the `/' 900 operator or the `$' operator). The start condition, `^', and 901 `<<EOF>>' patterns can only occur at the beginning of a pattern, 902 and, as well as with `/' and `$', cannot be grouped inside 903 parentheses. A `^' which does not occur at the beginning of a 904 rule or a `$' which does not occur at the end of a rule loses its 905 special properties and is treated as a normal character. 906 907 * The following are invalid: 908 909 910 foo/bar$ 911 <sc1>foo<sc2>bar 912 913 Note that the first of these can be written `foo/bar\n'. 914 915 * The following will result in `$' or `^' being treated as a normal 916 character: 917 918 919 foo|(bar$) 920 foo|^bar 921 922 If the desired meaning is a `foo' or a 923 `bar'-followed-by-a-newline, the following could be used (the 924 special `|' action is explained below, *note Actions::): 925 926 927 foo | 928 bar$ /* action goes here */ 929 930 A similar trick will work for matching a `foo' or a 931 `bar'-at-the-beginning-of-a-line. 932 933 934File: flex.info, Node: Matching, Next: Actions, Prev: Patterns, Up: Top 935 9367 How the Input Is Matched 937************************** 938 939When the generated scanner is run, it analyzes its input looking for 940strings which match any of its patterns. If it finds more than one 941match, it takes the one matching the most text (for trailing context 942rules, this includes the length of the trailing part, even though it 943will then be returned to the input). If it finds two or more matches of 944the same length, the rule listed first in the `flex' input file is 945chosen. 946 947 Once the match is determined, the text corresponding to the match 948(called the "token") is made available in the global character pointer 949`yytext', and its length in the global integer `yyleng'. The "action" 950corresponding to the matched pattern is then executed (*note 951Actions::), and then the remaining input is scanned for another match. 952 953 If no match is found, then the "default rule" is executed: the next 954character in the input is considered matched and copied to the standard 955output. Thus, the simplest valid `flex' input is: 956 957 958 %% 959 960 which generates a scanner that simply copies its input (one 961character at a time) to its output. 962 963 Note that `yytext' can be defined in two different ways: either as a 964character _pointer_ or as a character _array_. You can control which 965definition `flex' uses by including one of the special directives 966`%pointer' or `%array' in the first (definitions) section of your flex 967input. The default is `%pointer', unless you use the `-l' lex 968compatibility option, in which case `yytext' will be an array. The 969advantage of using `%pointer' is substantially faster scanning and no 970buffer overflow when matching very large tokens (unless you run out of 971dynamic memory). The disadvantage is that you are restricted in how 972your actions can modify `yytext' (*note Actions::), and calls to the 973`unput()' function destroys the present contents of `yytext', which can 974be a considerable porting headache when moving between different `lex' 975versions. 976 977 The advantage of `%array' is that you can then modify `yytext' to 978your heart's content, and calls to `unput()' do not destroy `yytext' 979(*note Actions::). Furthermore, existing `lex' programs sometimes 980access `yytext' externally using declarations of the form: 981 982 983 extern char yytext[]; 984 985 This definition is erroneous when used with `%pointer', but correct 986for `%array'. 987 988 The `%array' declaration defines `yytext' to be an array of `YYLMAX' 989characters, which defaults to a fairly large value. You can change the 990size by simply #define'ing `YYLMAX' to a different value in the first 991section of your `flex' input. As mentioned above, with `%pointer' 992yytext grows dynamically to accommodate large tokens. While this means 993your `%pointer' scanner can accommodate very large tokens (such as 994matching entire blocks of comments), bear in mind that each time the 995scanner must resize `yytext' it also must rescan the entire token from 996the beginning, so matching such tokens can prove slow. `yytext' 997presently does _not_ dynamically grow if a call to `unput()' results in 998too much text being pushed back; instead, a run-time error results. 999 1000 Also note that you cannot use `%array' with C++ scanner classes 1001(*note Cxx::). 1002 1003 1004File: flex.info, Node: Actions, Next: Generated Scanner, Prev: Matching, Up: Top 1005 10068 Actions 1007********* 1008 1009Each pattern in a rule has a corresponding "action", which can be any 1010arbitrary C statement. The pattern ends at the first non-escaped 1011whitespace character; the remainder of the line is its action. If the 1012action is empty, then when the pattern is matched the input token is 1013simply discarded. For example, here is the specification for a program 1014which deletes all occurrences of `zap me' from its input: 1015 1016 1017 %% 1018 "zap me" 1019 1020 This example will copy all other characters in the input to the 1021output since they will be matched by the default rule. 1022 1023 Here is a program which compresses multiple blanks and tabs down to a 1024single blank, and throws away whitespace found at the end of a line: 1025 1026 1027 %% 1028 [ \t]+ putchar( ' ' ); 1029 [ \t]+$ /* ignore this token */ 1030 1031 If the action contains a `{', then the action spans till the 1032balancing `}' is found, and the action may cross multiple lines. 1033`flex' knows about C strings and comments and won't be fooled by braces 1034found within them, but also allows actions to begin with `%{' and will 1035consider the action to be all the text up to the next `%}' (regardless 1036of ordinary braces inside the action). 1037 1038 An action consisting solely of a vertical bar (`|') means "same as 1039the action for the next rule". See below for an illustration. 1040 1041 Actions can include arbitrary C code, including `return' statements 1042to return a value to whatever routine called `yylex()'. Each time 1043`yylex()' is called it continues processing tokens from where it last 1044left off until it either reaches the end of the file or executes a 1045return. 1046 1047 Actions are free to modify `yytext' except for lengthening it 1048(adding characters to its end-these will overwrite later characters in 1049the input stream). This however does not apply when using `%array' 1050(*note Matching::). In that case, `yytext' may be freely modified in 1051any way. 1052 1053 Actions are free to modify `yyleng' except they should not do so if 1054the action also includes use of `yymore()' (see below). 1055 1056 There are a number of special directives which can be included 1057within an action: 1058 1059`ECHO' 1060 copies yytext to the scanner's output. 1061 1062`BEGIN' 1063 followed by the name of a start condition places the scanner in the 1064 corresponding start condition (see below). 1065 1066`REJECT' 1067 directs the scanner to proceed on to the "second best" rule which 1068 matched the input (or a prefix of the input). The rule is chosen 1069 as described above in *Note Matching::, and `yytext' and `yyleng' 1070 set up appropriately. It may either be one which matched as much 1071 text as the originally chosen rule but came later in the `flex' 1072 input file, or one which matched less text. For example, the 1073 following will both count the words in the input and call the 1074 routine `special()' whenever `frob' is seen: 1075 1076 1077 int word_count = 0; 1078 %% 1079 1080 frob special(); REJECT; 1081 [^ \t\n]+ ++word_count; 1082 1083 Without the `REJECT', any occurrences of `frob' in the input would 1084 not be counted as words, since the scanner normally executes only 1085 one action per token. Multiple uses of `REJECT' are allowed, each 1086 one finding the next best choice to the currently active rule. For 1087 example, when the following scanner scans the token `abcd', it will 1088 write `abcdabcaba' to the output: 1089 1090 1091 %% 1092 a | 1093 ab | 1094 abc | 1095 abcd ECHO; REJECT; 1096 .|\n /* eat up any unmatched character */ 1097 1098 The first three rules share the fourth's action since they use the 1099 special `|' action. 1100 1101 `REJECT' is a particularly expensive feature in terms of scanner 1102 performance; if it is used in _any_ of the scanner's actions it 1103 will slow down _all_ of the scanner's matching. Furthermore, 1104 `REJECT' cannot be used with the `-Cf' or `-CF' options (*note 1105 Scanner Options::). 1106 1107 Note also that unlike the other special actions, `REJECT' is a 1108 _branch_. Code immediately following it in the action will _not_ 1109 be executed. 1110 1111`yymore()' 1112 tells the scanner that the next time it matches a rule, the 1113 corresponding token should be _appended_ onto the current value of 1114 `yytext' rather than replacing it. For example, given the input 1115 `mega-kludge' the following will write `mega-mega-kludge' to the 1116 output: 1117 1118 1119 %% 1120 mega- ECHO; yymore(); 1121 kludge ECHO; 1122 1123 First `mega-' is matched and echoed to the output. Then `kludge' 1124 is matched, but the previous `mega-' is still hanging around at the 1125 beginning of `yytext' so the `ECHO' for the `kludge' rule will 1126 actually write `mega-kludge'. 1127 1128 Two notes regarding use of `yymore()'. First, `yymore()' depends on 1129the value of `yyleng' correctly reflecting the size of the current 1130token, so you must not modify `yyleng' if you are using `yymore()'. 1131Second, the presence of `yymore()' in the scanner's action entails a 1132minor performance penalty in the scanner's matching speed. 1133 1134 `yyless(n)' returns all but the first `n' characters of the current 1135token back to the input stream, where they will be rescanned when the 1136scanner looks for the next match. `yytext' and `yyleng' are adjusted 1137appropriately (e.g., `yyleng' will now be equal to `n'). For example, 1138on the input `foobar' the following will write out `foobarbar': 1139 1140 1141 %% 1142 foobar ECHO; yyless(3); 1143 [a-z]+ ECHO; 1144 1145 An argument of 0 to `yyless()' will cause the entire current input 1146string to be scanned again. Unless you've changed how the scanner will 1147subsequently process its input (using `BEGIN', for example), this will 1148result in an endless loop. 1149 1150 Note that `yyless()' is a macro and can only be used in the flex 1151input file, not from other source files. 1152 1153 `unput(c)' puts the character `c' back onto the input stream. It 1154will be the next character scanned. The following action will take the 1155current token and cause it to be rescanned enclosed in parentheses. 1156 1157 1158 { 1159 int i; 1160 /* Copy yytext because unput() trashes yytext */ 1161 char *yycopy = strdup( yytext ); 1162 unput( ')' ); 1163 for ( i = yyleng - 1; i >= 0; --i ) 1164 unput( yycopy[i] ); 1165 unput( '(' ); 1166 free( yycopy ); 1167 } 1168 1169 Note that since each `unput()' puts the given character back at the 1170_beginning_ of the input stream, pushing back strings must be done 1171back-to-front. 1172 1173 An important potential problem when using `unput()' is that if you 1174are using `%pointer' (the default), a call to `unput()' _destroys_ the 1175contents of `yytext', starting with its rightmost character and 1176devouring one character to the left with each call. If you need the 1177value of `yytext' preserved after a call to `unput()' (as in the above 1178example), you must either first copy it elsewhere, or build your 1179scanner using `%array' instead (*note Matching::). 1180 1181 Finally, note that you cannot put back `EOF' to attempt to mark the 1182input stream with an end-of-file. 1183 1184 `input()' reads the next character from the input stream. For 1185example, the following is one way to eat up C comments: 1186 1187 1188 %% 1189 "/*" { 1190 register int c; 1191 1192 for ( ; ; ) 1193 { 1194 while ( (c = input()) != '*' && 1195 c != EOF ) 1196 ; /* eat up text of comment */ 1197 1198 if ( c == '*' ) 1199 { 1200 while ( (c = input()) == '*' ) 1201 ; 1202 if ( c == '/' ) 1203 break; /* found the end */ 1204 } 1205 1206 if ( c == EOF ) 1207 { 1208 error( "EOF in comment" ); 1209 break; 1210 } 1211 } 1212 } 1213 1214 (Note that if the scanner is compiled using `C++', then `input()' is 1215instead referred to as yyinput(), in order to avoid a name clash with 1216the `C++' stream by the name of `input'.) 1217 1218 `YY_FLUSH_BUFFER()' flushes the scanner's internal buffer so that 1219the next time the scanner attempts to match a token, it will first 1220refill the buffer using `YY_INPUT()' (*note Generated Scanner::). This 1221action is a special case of the more general `yy_flush_buffer()' 1222function, described below (*note Multiple Input Buffers::) 1223 1224 `yyterminate()' can be used in lieu of a return statement in an 1225action. It terminates the scanner and returns a 0 to the scanner's 1226caller, indicating "all done". By default, `yyterminate()' is also 1227called when an end-of-file is encountered. It is a macro and may be 1228redefined. 1229 1230 1231File: flex.info, Node: Generated Scanner, Next: Start Conditions, Prev: Actions, Up: Top 1232 12339 The Generated Scanner 1234*********************** 1235 1236The output of `flex' is the file `lex.yy.c', which contains the 1237scanning routine `yylex()', a number of tables used by it for matching 1238tokens, and a number of auxiliary routines and macros. By default, 1239`yylex()' is declared as follows: 1240 1241 1242 int yylex() 1243 { 1244 ... various definitions and the actions in here ... 1245 } 1246 1247 (If your environment supports function prototypes, then it will be 1248`int yylex( void )'.) This definition may be changed by defining the 1249`YY_DECL' macro. For example, you could use: 1250 1251 1252 #define YY_DECL float lexscan( a, b ) float a, b; 1253 1254 to give the scanning routine the name `lexscan', returning a float, 1255and taking two floats as arguments. Note that if you give arguments to 1256the scanning routine using a K&R-style/non-prototyped function 1257declaration, you must terminate the definition with a semi-colon (;). 1258 1259 `flex' generates `C99' function definitions by default. However flex 1260does have the ability to generate obsolete, er, `traditional', function 1261definitions. This is to support bootstrapping gcc on old systems. 1262Unfortunately, traditional definitions prevent us from using any 1263standard data types smaller than int (such as short, char, or bool) as 1264function arguments. For this reason, future versions of `flex' may 1265generate standard C99 code only, leaving K&R-style functions to the 1266historians. Currently, if you do *not* want `C99' definitions, then 1267you must use `%option noansi-definitions'. 1268 1269 Whenever `yylex()' is called, it scans tokens from the global input 1270file `yyin' (which defaults to stdin). It continues until it either 1271reaches an end-of-file (at which point it returns the value 0) or one 1272of its actions executes a `return' statement. 1273 1274 If the scanner reaches an end-of-file, subsequent calls are undefined 1275unless either `yyin' is pointed at a new input file (in which case 1276scanning continues from that file), or `yyrestart()' is called. 1277`yyrestart()' takes one argument, a `FILE *' pointer (which can be 1278NULL, if you've set up `YY_INPUT' to scan from a source other than 1279`yyin'), and initializes `yyin' for scanning from that file. 1280Essentially there is no difference between just assigning `yyin' to a 1281new input file or using `yyrestart()' to do so; the latter is available 1282for compatibility with previous versions of `flex', and because it can 1283be used to switch input files in the middle of scanning. It can also 1284be used to throw away the current input buffer, by calling it with an 1285argument of `yyin'; but it would be better to use `YY_FLUSH_BUFFER' 1286(*note Actions::). Note that `yyrestart()' does _not_ reset the start 1287condition to `INITIAL' (*note Start Conditions::). 1288 1289 If `yylex()' stops scanning due to executing a `return' statement in 1290one of the actions, the scanner may then be called again and it will 1291resume scanning where it left off. 1292 1293 By default (and for purposes of efficiency), the scanner uses 1294block-reads rather than simple `getc()' calls to read characters from 1295`yyin'. The nature of how it gets its input can be controlled by 1296defining the `YY_INPUT' macro. The calling sequence for `YY_INPUT()' 1297is `YY_INPUT(buf,result,max_size)'. Its action is to place up to 1298`max_size' characters in the character array `buf' and return in the 1299integer variable `result' either the number of characters read or the 1300constant `YY_NULL' (0 on Unix systems) to indicate `EOF'. The default 1301`YY_INPUT' reads from the global file-pointer `yyin'. 1302 1303 Here is a sample definition of `YY_INPUT' (in the definitions 1304section of the input file): 1305 1306 1307 %{ 1308 #define YY_INPUT(buf,result,max_size) \ 1309 { \ 1310 int c = getchar(); \ 1311 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \ 1312 } 1313 %} 1314 1315 This definition will change the input processing to occur one 1316character at a time. 1317 1318 When the scanner receives an end-of-file indication from YY_INPUT, it 1319then checks the `yywrap()' function. If `yywrap()' returns false 1320(zero), then it is assumed that the function has gone ahead and set up 1321`yyin' to point to another input file, and scanning continues. If it 1322returns true (non-zero), then the scanner terminates, returning 0 to 1323its caller. Note that in either case, the start condition remains 1324unchanged; it does _not_ revert to `INITIAL'. 1325 1326 If you do not supply your own version of `yywrap()', then you must 1327either use `%option noyywrap' (in which case the scanner behaves as 1328though `yywrap()' returned 1), or you must link with `-lfl' to obtain 1329the default version of the routine, which always returns 1. 1330 1331 For scanning from in-memory buffers (e.g., scanning strings), see 1332*Note Scanning Strings::. *Note Multiple Input Buffers::. 1333 1334 The scanner writes its `ECHO' output to the `yyout' global (default, 1335`stdout'), which may be redefined by the user simply by assigning it to 1336some other `FILE' pointer. 1337 1338 1339File: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top 1340 134110 Start Conditions 1342******************* 1343 1344`flex' provides a mechanism for conditionally activating rules. Any 1345rule whose pattern is prefixed with `<sc>' will only be active when the 1346scanner is in the "start condition" named `sc'. For example, 1347 1348 1349 <STRING>[^"]* { /* eat up the string body ... */ 1350 ... 1351 } 1352 1353 will be active only when the scanner is in the `STRING' start 1354condition, and 1355 1356 1357 <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */ 1358 ... 1359 } 1360 1361 will be active only when the current start condition is either 1362`INITIAL', `STRING', or `QUOTE'. 1363 1364 Start conditions are declared in the definitions (first) section of 1365the input using unindented lines beginning with either `%s' or `%x' 1366followed by a list of names. The former declares "inclusive" start 1367conditions, the latter "exclusive" start conditions. A start condition 1368is activated using the `BEGIN' action. Until the next `BEGIN' action 1369is executed, rules with the given start condition will be active and 1370rules with other start conditions will be inactive. If the start 1371condition is inclusive, then rules with no start conditions at all will 1372also be active. If it is exclusive, then _only_ rules qualified with 1373the start condition will be active. A set of rules contingent on the 1374same exclusive start condition describe a scanner which is independent 1375of any of the other rules in the `flex' input. Because of this, 1376exclusive start conditions make it easy to specify "mini-scanners" 1377which scan portions of the input that are syntactically different from 1378the rest (e.g., comments). 1379 1380 If the distinction between inclusive and exclusive start conditions 1381is still a little vague, here's a simple example illustrating the 1382connection between the two. The set of rules: 1383 1384 1385 %s example 1386 %% 1387 1388 <example>foo do_something(); 1389 1390 bar something_else(); 1391 1392 is equivalent to 1393 1394 1395 %x example 1396 %% 1397 1398 <example>foo do_something(); 1399 1400 <INITIAL,example>bar something_else(); 1401 1402 Without the `<INITIAL,example>' qualifier, the `bar' pattern in the 1403second example wouldn't be active (i.e., couldn't match) when in start 1404condition `example'. If we just used `<example>' to qualify `bar', 1405though, then it would only be active in `example' and not in `INITIAL', 1406while in the first example it's active in both, because in the first 1407example the `example' start condition is an inclusive `(%s)' start 1408condition. 1409 1410 Also note that the special start-condition specifier `<*>' matches 1411every start condition. Thus, the above example could also have been 1412written: 1413 1414 1415 %x example 1416 %% 1417 1418 <example>foo do_something(); 1419 1420 <*>bar something_else(); 1421 1422 The default rule (to `ECHO' any unmatched character) remains active 1423in start conditions. It is equivalent to: 1424 1425 1426 <*>.|\n ECHO; 1427 1428 `BEGIN(0)' returns to the original state where only the rules with 1429no start conditions are active. This state can also be referred to as 1430the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to 1431`BEGIN(0)'. (The parentheses around the start condition name are not 1432required but are considered good style.) 1433 1434 `BEGIN' actions can also be given as indented code at the beginning 1435of the rules section. For example, the following will cause the scanner 1436to enter the `SPECIAL' start condition whenever `yylex()' is called and 1437the global variable `enter_special' is true: 1438 1439 1440 int enter_special; 1441 1442 %x SPECIAL 1443 %% 1444 if ( enter_special ) 1445 BEGIN(SPECIAL); 1446 1447 <SPECIAL>blahblahblah 1448 ...more rules follow... 1449 1450 To illustrate the uses of start conditions, here is a scanner which 1451provides two different interpretations of a string like `123.456'. By 1452default it will treat it as three tokens, the integer `123', a dot 1453(`.'), and the integer `456'. But if the string is preceded earlier in 1454the line by the string `expect-floats' it will treat it as a single 1455token, the floating-point number `123.456': 1456 1457 1458 %{ 1459 #include <math.h> 1460 %} 1461 %s expect 1462 1463 %% 1464 expect-floats BEGIN(expect); 1465 1466 <expect>[0-9]+@samp{.}[0-9]+ { 1467 printf( "found a float, = %f\n", 1468 atof( yytext ) ); 1469 } 1470 <expect>\n { 1471 /* that's the end of the line, so 1472 * we need another "expect-number" 1473 * before we'll recognize any more 1474 * numbers 1475 */ 1476 BEGIN(INITIAL); 1477 } 1478 1479 [0-9]+ { 1480 printf( "found an integer, = %d\n", 1481 atoi( yytext ) ); 1482 } 1483 1484 "." printf( "found a dot\n" ); 1485 1486 Here is a scanner which recognizes (and discards) C comments while 1487maintaining a count of the current input line. 1488 1489 1490 %x comment 1491 %% 1492 int line_num = 1; 1493 1494 "/*" BEGIN(comment); 1495 1496 <comment>[^*\n]* /* eat anything that's not a '*' */ 1497 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 1498 <comment>\n ++line_num; 1499 <comment>"*"+"/" BEGIN(INITIAL); 1500 1501 This scanner goes to a bit of trouble to match as much text as 1502possible with each rule. In general, when attempting to write a 1503high-speed scanner try to match as much possible in each rule, as it's 1504a big win. 1505 1506 Note that start-conditions names are really integer values and can 1507be stored as such. Thus, the above could be extended in the following 1508fashion: 1509 1510 1511 %x comment foo 1512 %% 1513 int line_num = 1; 1514 int comment_caller; 1515 1516 "/*" { 1517 comment_caller = INITIAL; 1518 BEGIN(comment); 1519 } 1520 1521 ... 1522 1523 <foo>"/*" { 1524 comment_caller = foo; 1525 BEGIN(comment); 1526 } 1527 1528 <comment>[^*\n]* /* eat anything that's not a '*' */ 1529 <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 1530 <comment>\n ++line_num; 1531 <comment>"*"+"/" BEGIN(comment_caller); 1532 1533 Furthermore, you can access the current start condition using the 1534integer-valued `YY_START' macro. For example, the above assignments to 1535`comment_caller' could instead be written 1536 1537 1538 comment_caller = YY_START; 1539 1540 Flex provides `YYSTATE' as an alias for `YY_START' (since that is 1541what's used by AT&T `lex'). 1542 1543 For historical reasons, start conditions do not have their own 1544name-space within the generated scanner. The start condition names are 1545unmodified in the generated scanner and generated header. *Note 1546option-header::. *Note option-prefix::. 1547 1548 Finally, here's an example of how to match C-style quoted strings 1549using exclusive start conditions, including expanded escape sequences 1550(but not including checking for a string that's too long): 1551 1552 1553 %x str 1554 1555 %% 1556 char string_buf[MAX_STR_CONST]; 1557 char *string_buf_ptr; 1558 1559 1560 \" string_buf_ptr = string_buf; BEGIN(str); 1561 1562 <str>\" { /* saw closing quote - all done */ 1563 BEGIN(INITIAL); 1564 *string_buf_ptr = '\0'; 1565 /* return string constant token type and 1566 * value to parser 1567 */ 1568 } 1569 1570 <str>\n { 1571 /* error - unterminated string constant */ 1572 /* generate error message */ 1573 } 1574 1575 <str>\\[0-7]{1,3} { 1576 /* octal escape sequence */ 1577 int result; 1578 1579 (void) sscanf( yytext + 1, "%o", &result ); 1580 1581 if ( result > 0xff ) 1582 /* error, constant is out-of-bounds */ 1583 1584 *string_buf_ptr++ = result; 1585 } 1586 1587 <str>\\[0-9]+ { 1588 /* generate error - bad escape sequence; something 1589 * like '\48' or '\0777777' 1590 */ 1591 } 1592 1593 <str>\\n *string_buf_ptr++ = '\n'; 1594 <str>\\t *string_buf_ptr++ = '\t'; 1595 <str>\\r *string_buf_ptr++ = '\r'; 1596 <str>\\b *string_buf_ptr++ = '\b'; 1597 <str>\\f *string_buf_ptr++ = '\f'; 1598 1599 <str>\\(.|\n) *string_buf_ptr++ = yytext[1]; 1600 1601 <str>[^\\\n\"]+ { 1602 char *yptr = yytext; 1603 1604 while ( *yptr ) 1605 *string_buf_ptr++ = *yptr++; 1606 } 1607 1608 Often, such as in some of the examples above, you wind up writing a 1609whole bunch of rules all preceded by the same start condition(s). Flex 1610makes this a little easier and cleaner by introducing a notion of start 1611condition "scope". A start condition scope is begun with: 1612 1613 1614 <SCs>{ 1615 1616 where `SCs' is a list of one or more start conditions. Inside the 1617start condition scope, every rule automatically has the prefix `SCs>' 1618applied to it, until a `}' which matches the initial `{'. So, for 1619example, 1620 1621 1622 <ESC>{ 1623 "\\n" return '\n'; 1624 "\\r" return '\r'; 1625 "\\f" return '\f'; 1626 "\\0" return '\0'; 1627 } 1628 1629 is equivalent to: 1630 1631 1632 <ESC>"\\n" return '\n'; 1633 <ESC>"\\r" return '\r'; 1634 <ESC>"\\f" return '\f'; 1635 <ESC>"\\0" return '\0'; 1636 1637 Start condition scopes may be nested. 1638 1639 The following routines are available for manipulating stacks of 1640start conditions: 1641 1642 -- Function: void yy_push_state ( int `new_state' ) 1643 pushes the current start condition onto the top of the start 1644 condition stack and switches to `new_state' as though you had used 1645 `BEGIN new_state' (recall that start condition names are also 1646 integers). 1647 1648 -- Function: void yy_pop_state () 1649 pops the top of the stack and switches to it via `BEGIN'. 1650 1651 -- Function: int yy_top_state () 1652 returns the top of the stack without altering the stack's contents. 1653 1654 The start condition stack grows dynamically and so has no built-in 1655size limitation. If memory is exhausted, program execution aborts. 1656 1657 To use start condition stacks, your scanner must include a `%option 1658stack' directive (*note Scanner Options::). 1659 1660 1661File: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top 1662 166311 Multiple Input Buffers 1664************************* 1665 1666Some scanners (such as those which support "include" files) require 1667reading from several input streams. As `flex' scanners do a large 1668amount of buffering, one cannot control where the next input will be 1669read from by simply writing a `YY_INPUT()' which is sensitive to the 1670scanning context. `YY_INPUT()' is only called when the scanner reaches 1671the end of its buffer, which may be a long time after scanning a 1672statement such as an `include' statement which requires switching the 1673input source. 1674 1675 To negotiate these sorts of problems, `flex' provides a mechanism 1676for creating and switching between multiple input buffers. An input 1677buffer is created by using: 1678 1679 -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size ) 1680 1681 which takes a `FILE' pointer and a size and creates a buffer 1682associated with the given file and large enough to hold `size' 1683characters (when in doubt, use `YY_BUF_SIZE' for the size). It returns 1684a `YY_BUFFER_STATE' handle, which may then be passed to other routines 1685(see below). The `YY_BUFFER_STATE' type is a pointer to an opaque 1686`struct yy_buffer_state' structure, so you may safely initialize 1687`YY_BUFFER_STATE' variables to `((YY_BUFFER_STATE) 0)' if you wish, and 1688also refer to the opaque structure in order to correctly declare input 1689buffers in source files other than that of your scanner. Note that the 1690`FILE' pointer in the call to `yy_create_buffer' is only used as the 1691value of `yyin' seen by `YY_INPUT'. If you redefine `YY_INPUT()' so it 1692no longer uses `yyin', then you can safely pass a NULL `FILE' pointer to 1693`yy_create_buffer'. You select a particular buffer to scan from using: 1694 1695 -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer ) 1696 1697 The above function switches the scanner's input buffer so subsequent 1698tokens will come from `new_buffer'. Note that `yy_switch_to_buffer()' 1699may be used by `yywrap()' to set things up for continued scanning, 1700instead of opening a new file and pointing `yyin' at it. If you are 1701looking for a stack of input buffers, then you want to use 1702`yypush_buffer_state()' instead of this function. Note also that 1703switching input sources via either `yy_switch_to_buffer()' or 1704`yywrap()' does _not_ change the start condition. 1705 1706 -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer ) 1707 1708 is used to reclaim the storage associated with a buffer. (`buffer' 1709can be NULL, in which case the routine does nothing.) You can also 1710clear the current contents of a buffer using: 1711 1712 -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer ) 1713 1714 This function pushes the new buffer state onto an internal stack. 1715The pushed state becomes the new current state. The stack is maintained 1716by flex and will grow as required. This function is intended to be used 1717instead of `yy_switch_to_buffer', when you want to change states, but 1718preserve the current state for later use. 1719 1720 -- Function: void yypop_buffer_state ( ) 1721 1722 This function removes the current state from the top of the stack, 1723and deletes it by calling `yy_delete_buffer'. The next state on the 1724stack, if any, becomes the new current state. 1725 1726 -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer ) 1727 1728 This function discards the buffer's contents, so the next time the 1729scanner attempts to match a token from the buffer, it will first fill 1730the buffer anew using `YY_INPUT()'. 1731 1732 -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size ) 1733 1734 is an alias for `yy_create_buffer()', provided for compatibility 1735with the C++ use of `new' and `delete' for creating and destroying 1736dynamic objects. 1737 1738 `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE' handle to the 1739current buffer. It should not be used as an lvalue. 1740 1741 Here are two examples of using these features for writing a scanner 1742which expands include files (the `<<EOF>>' feature is discussed below). 1743 1744 This first example uses yypush_buffer_state and yypop_buffer_state. 1745Flex maintains the stack internally. 1746 1747 1748 /* the "incl" state is used for picking up the name 1749 * of an include file 1750 */ 1751 %x incl 1752 %% 1753 include BEGIN(incl); 1754 1755 [a-z]+ ECHO; 1756 [^a-z\n]*\n? ECHO; 1757 1758 <incl>[ \t]* /* eat the whitespace */ 1759 <incl>[^ \t\n]+ { /* got the include file name */ 1760 yyin = fopen( yytext, "r" ); 1761 1762 if ( ! yyin ) 1763 error( ... ); 1764 1765 yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE )); 1766 1767 BEGIN(INITIAL); 1768 } 1769 1770 <<EOF>> { 1771 yypop_buffer_state(); 1772 1773 if ( !YY_CURRENT_BUFFER ) 1774 { 1775 yyterminate(); 1776 } 1777 } 1778 1779 The second example, below, does the same thing as the previous 1780example did, but manages its own input buffer stack manually (instead 1781of letting flex do it). 1782 1783 1784 /* the "incl" state is used for picking up the name 1785 * of an include file 1786 */ 1787 %x incl 1788 1789 %{ 1790 #define MAX_INCLUDE_DEPTH 10 1791 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 1792 int include_stack_ptr = 0; 1793 %} 1794 1795 %% 1796 include BEGIN(incl); 1797 1798 [a-z]+ ECHO; 1799 [^a-z\n]*\n? ECHO; 1800 1801 <incl>[ \t]* /* eat the whitespace */ 1802 <incl>[^ \t\n]+ { /* got the include file name */ 1803 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) 1804 { 1805 fprintf( stderr, "Includes nested too deeply" ); 1806 exit( 1 ); 1807 } 1808 1809 include_stack[include_stack_ptr++] = 1810 YY_CURRENT_BUFFER; 1811 1812 yyin = fopen( yytext, "r" ); 1813 1814 if ( ! yyin ) 1815 error( ... ); 1816 1817 yy_switch_to_buffer( 1818 yy_create_buffer( yyin, YY_BUF_SIZE ) ); 1819 1820 BEGIN(INITIAL); 1821 } 1822 1823 <<EOF>> { 1824 if ( --include_stack_ptr 0 ) 1825 { 1826 yyterminate(); 1827 } 1828 1829 else 1830 { 1831 yy_delete_buffer( YY_CURRENT_BUFFER ); 1832 yy_switch_to_buffer( 1833 include_stack[include_stack_ptr] ); 1834 } 1835 } 1836 1837 The following routines are available for setting up input buffers for 1838scanning in-memory strings instead of files. All of them create a new 1839input buffer for scanning the string, and return a corresponding 1840`YY_BUFFER_STATE' handle (which you should delete with 1841`yy_delete_buffer()' when done with it). They also switch to the new 1842buffer using `yy_switch_to_buffer()', so the next call to `yylex()' 1843will start scanning the string. 1844 1845 -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str ) 1846 scans a NUL-terminated string. 1847 1848 -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int 1849 len ) 1850 scans `len' bytes (including possibly `NUL's) starting at location 1851 `bytes'. 1852 1853 Note that both of these functions create and scan a _copy_ of the 1854string or bytes. (This may be desirable, since `yylex()' modifies the 1855contents of the buffer it is scanning.) You can avoid the copy by 1856using: 1857 1858 -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t 1859 size) 1860 which scans in place the buffer starting at `base', consisting of 1861 `size' bytes, the last two bytes of which _must_ be 1862 `YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not 1863 scanned; thus, scanning consists of `base[0]' through 1864 `base[size-2]', inclusive. 1865 1866 If you fail to set up `base' in this manner (i.e., forget the final 1867two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()' returns a 1868NULL pointer instead of creating a new input buffer. 1869 1870 -- Data type: yy_size_t 1871 is an integral type to which you can cast an integer expression 1872 reflecting the size of the buffer. 1873 1874 1875File: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top 1876 187712 End-of-File Rules 1878******************** 1879 1880The special rule `<<EOF>>' indicates actions which are to be taken when 1881an end-of-file is encountered and `yywrap()' returns non-zero (i.e., 1882indicates no further files to process). The action must finish by 1883doing one of the following things: 1884 1885 * assigning `yyin' to a new input file (in previous versions of 1886 `flex', after doing the assignment you had to call the special 1887 action `YY_NEW_FILE'. This is no longer necessary.) 1888 1889 * executing a `return' statement; 1890 1891 * executing the special `yyterminate()' action. 1892 1893 * or, switching to a new buffer using `yy_switch_to_buffer()' as 1894 shown in the example above. 1895 1896 <<EOF>> rules may not be used with other patterns; they may only be 1897qualified with a list of start conditions. If an unqualified <<EOF>> 1898rule is given, it applies to _all_ start conditions which do not 1899already have <<EOF>> actions. To specify an <<EOF>> rule for only the 1900initial start condition, use: 1901 1902 1903 <INITIAL><<EOF>> 1904 1905 These rules are useful for catching things like unclosed comments. 1906An example: 1907 1908 1909 %x quote 1910 %% 1911 1912 ...other rules for dealing with quotes... 1913 1914 <quote><<EOF>> { 1915 error( "unterminated quote" ); 1916 yyterminate(); 1917 } 1918 <<EOF>> { 1919 if ( *++filelist ) 1920 yyin = fopen( *filelist, "r" ); 1921 else 1922 yyterminate(); 1923 } 1924 1925 1926File: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top 1927 192813 Miscellaneous Macros 1929*********************** 1930 1931The macro `YY_USER_ACTION' can be defined to provide an action which is 1932always executed prior to the matched rule's action. For example, it 1933could be #define'd to call a routine to convert yytext to lower-case. 1934When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the 1935number of the matched rule (rules are numbered starting with 1). 1936Suppose you want to profile how often each of your rules is matched. 1937The following would do the trick: 1938 1939 1940 #define YY_USER_ACTION ++ctr[yy_act] 1941 1942 where `ctr' is an array to hold the counts for the different rules. 1943Note that the macro `YY_NUM_RULES' gives the total number of rules 1944(including the default rule), even if you use `-s)', so a correct 1945declaration for `ctr' is: 1946 1947 1948 int ctr[YY_NUM_RULES]; 1949 1950 The macro `YY_USER_INIT' may be defined to provide an action which 1951is always executed before the first scan (and before the scanner's 1952internal initializations are done). For example, it could be used to 1953call a routine to read in a data table or open a logging file. 1954 1955 The macro `yy_set_interactive(is_interactive)' can be used to 1956control whether the current buffer is considered "interactive". An 1957interactive buffer is processed more slowly, but must be used when the 1958scanner's input source is indeed interactive to avoid problems due to 1959waiting to fill buffers (see the discussion of the `-I' flag in *Note 1960Scanner Options::). A non-zero value in the macro invocation marks the 1961buffer as interactive, a zero value as non-interactive. Note that use 1962of this macro overrides `%option always-interactive' or `%option 1963never-interactive' (*note Scanner Options::). `yy_set_interactive()' 1964must be invoked prior to beginning to scan the buffer that is (or is 1965not) to be considered interactive. 1966 1967 The macro `yy_set_bol(at_bol)' can be used to control whether the 1968current buffer's scanning context for the next token match is done as 1969though at the beginning of a line. A non-zero macro argument makes 1970rules anchored with `^' active, while a zero argument makes `^' rules 1971inactive. 1972 1973 The macro `YY_AT_BOL()' returns true if the next token scanned from 1974the current buffer will have `^' rules active, false otherwise. 1975 1976 In the generated scanner, the actions are all gathered in one large 1977switch statement and separated using `YY_BREAK', which may be 1978redefined. By default, it is simply a `break', to separate each rule's 1979action from the following rule's. Redefining `YY_BREAK' allows, for 1980example, C++ users to #define YY_BREAK to do nothing (while being very 1981careful that every rule ends with a `break' or a `return'!) to avoid 1982suffering from unreachable statement warnings where because a rule's 1983action ends with `return', the `YY_BREAK' is inaccessible. 1984 1985 1986File: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top 1987 198814 Values Available To the User 1989******************************* 1990 1991This chapter summarizes the various values available to the user in the 1992rule actions. 1993 1994`char *yytext' 1995 holds the text of the current token. It may be modified but not 1996 lengthened (you cannot append characters to the end). 1997 1998 If the special directive `%array' appears in the first section of 1999 the scanner description, then `yytext' is instead declared `char 2000 yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can 2001 redefine in the first section if you don't like the default value 2002 (generally 8KB). Using `%array' results in somewhat slower 2003 scanners, but the value of `yytext' becomes immune to calls to 2004 `unput()', which potentially destroy its value when `yytext' is a 2005 character pointer. The opposite of `%array' is `%pointer', which 2006 is the default. 2007 2008 You cannot use `%array' when generating C++ scanner classes (the 2009 `-+' flag). 2010 2011`int yyleng' 2012 holds the length of the current token. 2013 2014`FILE *yyin' 2015 is the file which by default `flex' reads from. It may be 2016 redefined but doing so only makes sense before scanning begins or 2017 after an EOF has been encountered. Changing it in the midst of 2018 scanning will have unexpected results since `flex' buffers its 2019 input; use `yyrestart()' instead. Once scanning terminates 2020 because an end-of-file has been seen, you can assign `yyin' at the 2021 new input file and then call the scanner again to continue 2022 scanning. 2023 2024`void yyrestart( FILE *new_file )' 2025 may be called to point `yyin' at the new input file. The 2026 switch-over to the new file is immediate (any previously 2027 buffered-up input is lost). Note that calling `yyrestart()' with 2028 `yyin' as an argument thus throws away the current input buffer 2029 and continues scanning the same input file. 2030 2031`FILE *yyout' 2032 is the file to which `ECHO' actions are done. It can be reassigned 2033 by the user. 2034 2035`YY_CURRENT_BUFFER' 2036 returns a `YY_BUFFER_STATE' handle to the current buffer. 2037 2038`YY_START' 2039 returns an integer value corresponding to the current start 2040 condition. You can subsequently use this value with `BEGIN' to 2041 return to that start condition. 2042 2043 2044File: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top 2045 204615 Interfacing with Yacc 2047************************ 2048 2049One of the main uses of `flex' is as a companion to the `yacc' 2050parser-generator. `yacc' parsers expect to call a routine named 2051`yylex()' to find the next input token. The routine is supposed to 2052return the type of the next token as well as putting any associated 2053value in the global `yylval'. To use `flex' with `yacc', one specifies 2054the `-d' option to `yacc' to instruct it to generate the file `y.tab.h' 2055containing definitions of all the `%tokens' appearing in the `yacc' 2056input. This file is then included in the `flex' scanner. For example, 2057if one of the tokens is `TOK_NUMBER', part of the scanner might look 2058like: 2059 2060 2061 %{ 2062 #include "y.tab.h" 2063 %} 2064 2065 %% 2066 2067 [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER; 2068 2069 2070File: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top 2071 207216 Scanner Options 2073****************** 2074 2075The various `flex' options are categorized by function in the following 2076menu. If you want to lookup a particular option by name, *Note Index of 2077Scanner Options::. 2078 2079* Menu: 2080 2081* Options for Specifying Filenames:: 2082* Options Affecting Scanner Behavior:: 2083* Code-Level And API Options:: 2084* Options for Scanner Speed and Size:: 2085* Debugging Options:: 2086* Miscellaneous Options:: 2087 2088 Even though there are many scanner options, a typical scanner might 2089only specify the following options: 2090 2091 2092 %option 8bit reentrant bison-bridge 2093 %option warn nodefault 2094 %option yylineno 2095 %option outfile="scanner.c" header-file="scanner.h" 2096 2097 The first line specifies the general type of scanner we want. The 2098second line specifies that we are being careful. The third line asks 2099flex to track line numbers. The last line tells flex what to name the 2100files. (The options can be specified in any order. We just divided 2101them.) 2102 2103 `flex' also provides a mechanism for controlling options within the 2104scanner specification itself, rather than from the flex command-line. 2105This is done by including `%option' directives in the first section of 2106the scanner specification. You can specify multiple options with a 2107single `%option' directive, and multiple directives in the first 2108section of your flex input file. 2109 2110 Most options are given simply as names, optionally preceded by the 2111word `no' (with no intervening whitespace) to negate their meaning. 2112The names are the same as their long-option equivalents (but without the 2113leading `--' ). 2114 2115 `flex' scans your rule actions to determine whether you use the 2116`REJECT' or `yymore()' features. The `REJECT' and `yymore' options are 2117available to override its decision as to whether you use the options, 2118either by setting them (e.g., `%option reject)' to indicate the feature 2119is indeed used, or unsetting them to indicate it actually is not used 2120(e.g., `%option noyymore)'. 2121 2122 A number of options are available for lint purists who want to 2123suppress the appearance of unneeded routines in the generated scanner. 2124Each of the following, if unset (e.g., `%option nounput'), results in 2125the corresponding routine not appearing in the generated scanner: 2126 2127 2128 input, unput 2129 yy_push_state, yy_pop_state, yy_top_state 2130 yy_scan_buffer, yy_scan_bytes, yy_scan_string 2131 2132 yyget_extra, yyset_extra, yyget_leng, yyget_text, 2133 yyget_lineno, yyset_lineno, yyget_in, yyset_in, 2134 yyget_out, yyset_out, yyget_lval, yyset_lval, 2135 yyget_lloc, yyset_lloc, yyget_debug, yyset_debug 2136 2137 (though `yy_push_state()' and friends won't appear anyway unless you 2138use `%option stack)'. 2139 2140 2141File: flex.info, Node: Options for Specifying Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options 2142 214316.1 Options for Specifying Filenames 2144===================================== 2145 2146`--header-file=FILE, `%option header-file="FILE"'' 2147 instructs flex to write a C header to `FILE'. This file contains 2148 function prototypes, extern variables, and types used by the 2149 scanner. Only the external API is exported by the header file. 2150 Many macros that are usable from within scanner actions are not 2151 exported to the header file. This is due to namespace problems and 2152 the goal of a clean external API. 2153 2154 While in the header, the macro `yyIN_HEADER' is defined, where `yy' 2155 is substituted with the appropriate prefix. 2156 2157 The `--header-file' option is not compatible with the `--c++' 2158 option, since the C++ scanner provides its own header in 2159 `yyFlexLexer.h'. 2160 2161`-oFILE, --outfile=FILE, `%option outfile="FILE"'' 2162 directs flex to write the scanner to the file `FILE' instead of 2163 `lex.yy.c'. If you combine `--outfile' with the `--stdout' option, 2164 then the scanner is written to `stdout' but its `#line' directives 2165 (see the `-l' option above) refer to the file `FILE'. 2166 2167`-t, --stdout, `%option stdout'' 2168 instructs `flex' to write the scanner it generates to standard 2169 output instead of `lex.yy.c'. 2170 2171`-SFILE, --skel=FILE' 2172 overrides the default skeleton file from which `flex' constructs 2173 its scanners. You'll never need this option unless you are doing 2174 `flex' maintenance or development. 2175 2176`--tables-file=FILE' 2177 Write serialized scanner dfa tables to FILE. The generated scanner 2178 will not contain the tables, and requires them to be loaded at 2179 runtime. *Note serialization::. 2180 2181`--tables-verify' 2182 This option is for flex development. We document it here in case 2183 you stumble upon it by accident or in case you suspect some 2184 inconsistency in the serialized tables. Flex will serialize the 2185 scanner dfa tables but will also generate the in-code tables as it 2186 normally does. At runtime, the scanner will verify that the 2187 serialized tables match the in-code tables, instead of loading 2188 them. 2189 2190 2191 2192File: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifying Filenames, Up: Scanner Options 2193 219416.2 Options Affecting Scanner Behavior 2195======================================= 2196 2197`-i, --case-insensitive, `%option case-insensitive'' 2198 instructs `flex' to generate a "case-insensitive" scanner. The 2199 case of letters given in the `flex' input patterns will be ignored, 2200 and tokens in the input will be matched regardless of case. The 2201 matched text given in `yytext' will have the preserved case (i.e., 2202 it will not be folded). For tricky behavior, see *Note case and 2203 character ranges::. 2204 2205`-l, --lex-compat, `%option lex-compat'' 2206 turns on maximum compatibility with the original AT&T `lex' 2207 implementation. Note that this does not mean _full_ compatibility. 2208 Use of this option costs a considerable amount of performance, and 2209 it cannot be used with the `--c++', `--full', `--fast', `-Cf', or 2210 `-CF' options. For details on the compatibilities it provides, see 2211 *Note Lex and Posix::. This option also results in the name 2212 `YY_FLEX_LEX_COMPAT' being `#define''d in the generated scanner. 2213 2214`-B, --batch, `%option batch'' 2215 instructs `flex' to generate a "batch" scanner, the opposite of 2216 _interactive_ scanners generated by `--interactive' (see below). 2217 In general, you use `-B' when you are _certain_ that your scanner 2218 will never be used interactively, and you want to squeeze a 2219 _little_ more performance out of it. If your goal is instead to 2220 squeeze out a _lot_ more performance, you should be using the 2221 `-Cf' or `-CF' options, which turn on `--batch' automatically 2222 anyway. 2223 2224`-I, --interactive, `%option interactive'' 2225 instructs `flex' to generate an interactive scanner. An 2226 interactive scanner is one that only looks ahead to decide what 2227 token has been matched if it absolutely must. It turns out that 2228 always looking one extra character ahead, even if the scanner has 2229 already seen enough text to disambiguate the current token, is a 2230 bit faster than only looking ahead when necessary. But scanners 2231 that always look ahead give dreadful interactive performance; for 2232 example, when a user types a newline, it is not recognized as a 2233 newline token until they enter _another_ token, which often means 2234 typing in another whole line. 2235 2236 `flex' scanners default to `interactive' unless you use the `-Cf' 2237 or `-CF' table-compression options (*note Performance::). That's 2238 because if you're looking for high-performance you should be using 2239 one of these options, so if you didn't, `flex' assumes you'd 2240 rather trade off a bit of run-time performance for intuitive 2241 interactive behavior. Note also that you _cannot_ use 2242 `--interactive' in conjunction with `-Cf' or `-CF'. Thus, this 2243 option is not really needed; it is on by default for all those 2244 cases in which it is allowed. 2245 2246 You can force a scanner to _not_ be interactive by using `--batch' 2247 2248`-7, --7bit, `%option 7bit'' 2249 instructs `flex' to generate a 7-bit scanner, i.e., one which can 2250 only recognize 7-bit characters in its input. The advantage of 2251 using `--7bit' is that the scanner's tables can be up to half the 2252 size of those generated using the `--8bit'. The disadvantage is 2253 that such scanners often hang or crash if their input contains an 2254 8-bit character. 2255 2256 Note, however, that unless you generate your scanner using the 2257 `-Cf' or `-CF' table compression options, use of `--7bit' will 2258 save only a small amount of table space, and make your scanner 2259 considerably less portable. `Flex''s default behavior is to 2260 generate an 8-bit scanner unless you use the `-Cf' or `-CF', in 2261 which case `flex' defaults to generating 7-bit scanners unless 2262 your site was always configured to generate 8-bit scanners (as will 2263 often be the case with non-USA sites). You can tell whether flex 2264 generated a 7-bit or an 8-bit scanner by inspecting the flag 2265 summary in the `--verbose' output as described above. 2266 2267 Note that if you use `-Cfe' or `-CFe' `flex' still defaults to 2268 generating an 8-bit scanner, since usually with these compression 2269 options full 8-bit tables are not much more expensive than 7-bit 2270 tables. 2271 2272`-8, --8bit, `%option 8bit'' 2273 instructs `flex' to generate an 8-bit scanner, i.e., one which can 2274 recognize 8-bit characters. This flag is only needed for scanners 2275 generated using `-Cf' or `-CF', as otherwise flex defaults to 2276 generating an 8-bit scanner anyway. 2277 2278 See the discussion of `--7bit' above for `flex''s default behavior 2279 and the tradeoffs between 7-bit and 8-bit scanners. 2280 2281`--default, `%option default'' 2282 generate the default rule. 2283 2284`--always-interactive, `%option always-interactive'' 2285 instructs flex to generate a scanner which always considers its 2286 input _interactive_. Normally, on each new input file the scanner 2287 calls `isatty()' in an attempt to determine whether the scanner's 2288 input source is interactive and thus should be read a character at 2289 a time. When this option is used, however, then no such call is 2290 made. 2291 2292`--never-interactive, `--never-interactive'' 2293 instructs flex to generate a scanner which never considers its 2294 input interactive. This is the opposite of `always-interactive'. 2295 2296`-X, --posix, `%option posix'' 2297 turns on maximum compatibility with the POSIX 1003.2-1992 2298 definition of `lex'. Since `flex' was originally designed to 2299 implement the POSIX definition of `lex' this generally involves 2300 very few changes in behavior. At the current writing the known 2301 differences between `flex' and the POSIX standard are: 2302 2303 * In POSIX and AT&T `lex', the repeat operator, `{}', has lower 2304 precedence than concatenation (thus `ab{3}' yields `ababab'). 2305 Most POSIX utilities use an Extended Regular Expression (ERE) 2306 precedence that has the precedence of the repeat operator 2307 higher than concatenation (which causes `ab{3}' to yield 2308 `abbb'). By default, `flex' places the precedence of the 2309 repeat operator higher than concatenation which matches the 2310 ERE processing of other POSIX utilities. When either 2311 `--posix' or `-l' are specified, `flex' will use the 2312 traditional AT&T and POSIX-compliant precedence for the 2313 repeat operator where concatenation has higher precedence 2314 than the repeat operator. 2315 2316`--stack, `%option stack'' 2317 enables the use of start condition stacks (*note Start 2318 Conditions::). 2319 2320`--stdinit, `%option stdinit'' 2321 if set (i.e., %option stdinit) initializes `yyin' and `yyout' to 2322 `stdin' and `stdout', instead of the default of `NULL'. Some 2323 existing `lex' programs depend on this behavior, even though it is 2324 not compliant with ANSI C, which does not require `stdin' and 2325 `stdout' to be compile-time constant. In a reentrant scanner, 2326 however, this is not a problem since initialization is performed 2327 in `yylex_init' at runtime. 2328 2329`--yylineno, `%option yylineno'' 2330 directs `flex' to generate a scanner that maintains the number of 2331 the current line read from its input in the global variable 2332 `yylineno'. This option is implied by `%option lex-compat'. In a 2333 reentrant C scanner, the macro `yylineno' is accessible regardless 2334 of the value of `%option yylineno', however, its value is not 2335 modified by `flex' unless `%option yylineno' is enabled. 2336 2337`--yywrap, `%option yywrap'' 2338 if unset (i.e., `--noyywrap)', makes the scanner not call 2339 `yywrap()' upon an end-of-file, but simply assume that there are no 2340 more files to scan (until the user points `yyin' at a new file and 2341 calls `yylex()' again). 2342 2343 2344 2345File: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options 2346 234716.3 Code-Level And API Options 2348=============================== 2349 2350`--ansi-definitions, `%option ansi-definitions'' 2351 instruct flex to generate ANSI C99 definitions for functions. 2352 This option is enabled by default. If `%option 2353 noansi-definitions' is specified, then the obsolete style is 2354 generated. 2355 2356`--ansi-prototypes, `%option ansi-prototypes'' 2357 instructs flex to generate ANSI C99 prototypes for functions. 2358 This option is enabled by default. If `noansi-prototypes' is 2359 specified, then prototypes will have empty parameter lists. 2360 2361`--bison-bridge, `%option bison-bridge'' 2362 instructs flex to generate a C scanner that is meant to be called 2363 by a `GNU bison' parser. The scanner has minor API changes for 2364 `bison' compatibility. In particular, the declaration of `yylex' 2365 is modified to take an additional parameter, `yylval'. *Note 2366 Bison Bridge::. 2367 2368`--bison-locations, `%option bison-locations'' 2369 instruct flex that `GNU bison' `%locations' are being used. This 2370 means `yylex' will be passed an additional parameter, `yylloc'. 2371 This option implies `%option bison-bridge'. *Note Bison Bridge::. 2372 2373`-L, --noline, `%option noline'' 2374 instructs `flex' not to generate `#line' directives. Without this 2375 option, `flex' peppers the generated scanner with `#line' 2376 directives so error messages in the actions will be correctly 2377 located with respect to either the original `flex' input file (if 2378 the errors are due to code in the input file), or `lex.yy.c' (if 2379 the errors are `flex''s fault - you should report these sorts of 2380 errors to the email address given in *Note Reporting Bugs::). 2381 2382`-R, --reentrant, `%option reentrant'' 2383 instructs flex to generate a reentrant C scanner. The generated 2384 scanner may safely be used in a multi-threaded environment. The 2385 API for a reentrant scanner is different than for a non-reentrant 2386 scanner *note Reentrant::). Because of the API difference between 2387 reentrant and non-reentrant `flex' scanners, non-reentrant flex 2388 code must be modified before it is suitable for use with this 2389 option. This option is not compatible with the `--c++' option. 2390 2391 The option `--reentrant' does not affect the performance of the 2392 scanner. 2393 2394`-+, --c++, `%option c++'' 2395 specifies that you want flex to generate a C++ scanner class. 2396 *Note Cxx::, for details. 2397 2398`--array, `%option array'' 2399 specifies that you want yytext to be an array instead of a char* 2400 2401`--pointer, `%option pointer'' 2402 specify that `yytext' should be a `char *', not an array. This 2403 default is `char *'. 2404 2405`-PPREFIX, --prefix=PREFIX, `%option prefix="PREFIX"'' 2406 changes the default `yy' prefix used by `flex' for all 2407 globally-visible variable and function names to instead be 2408 `PREFIX'. For example, `--prefix=foo' changes the name of 2409 `yytext' to `footext'. It also changes the name of the default 2410 output file from `lex.yy.c' to `lex.foo.c'. Here is a partial 2411 list of the names affected: 2412 2413 2414 yy_create_buffer 2415 yy_delete_buffer 2416 yy_flex_debug 2417 yy_init_buffer 2418 yy_flush_buffer 2419 yy_load_buffer_state 2420 yy_switch_to_buffer 2421 yyin 2422 yyleng 2423 yylex 2424 yylineno 2425 yyout 2426 yyrestart 2427 yytext 2428 yywrap 2429 yyalloc 2430 yyrealloc 2431 yyfree 2432 2433 (If you are using a C++ scanner, then only `yywrap' and 2434 `yyFlexLexer' are affected.) Within your scanner itself, you can 2435 still refer to the global variables and functions using either 2436 version of their name; but externally, they have the modified name. 2437 2438 This option lets you easily link together multiple `flex' programs 2439 into the same executable. Note, though, that using this option 2440 also renames `yywrap()', so you now _must_ either provide your own 2441 (appropriately-named) version of the routine for your scanner, or 2442 use `%option noyywrap', as linking with `-lfl' no longer provides 2443 one for you by default. 2444 2445`--main, `%option main'' 2446 directs flex to provide a default `main()' program for the 2447 scanner, which simply calls `yylex()'. This option implies 2448 `noyywrap' (see below). 2449 2450`--nounistd, `%option nounistd'' 2451 suppresses inclusion of the non-ANSI header file `unistd.h'. This 2452 option is meant to target environments in which `unistd.h' does 2453 not exist. Be aware that certain options may cause flex to 2454 generate code that relies on functions normally found in 2455 `unistd.h', (e.g. `isatty()', `read()'.) If you wish to use these 2456 functions, you will have to inform your compiler where to find 2457 them. *Note option-always-interactive::. *Note option-read::. 2458 2459`--yyclass=NAME, `%option yyclass="NAME"'' 2460 only applies when generating a C++ scanner (the `--c++' option). 2461 It informs `flex' that you have derived `NAME' as a subclass of 2462 `yyFlexLexer', so `flex' will place your actions in the member 2463 function `foo::yylex()' instead of `yyFlexLexer::yylex()'. It 2464 also generates a `yyFlexLexer::yylex()' member function that emits 2465 a run-time error (by invoking `yyFlexLexer::LexerError())' if 2466 called. *Note Cxx::. 2467 2468 2469 2470File: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options 2471 247216.4 Options for Scanner Speed and Size 2473======================================= 2474 2475`-C[aefFmr]' 2476 controls the degree of table compression and, more generally, 2477 trade-offs between small scanners and fast scanners. 2478 2479 `-C' 2480 A lone `-C' specifies that the scanner tables should be 2481 compressed but neither equivalence classes nor 2482 meta-equivalence classes should be used. 2483 2484 `-Ca, --align, `%option align'' 2485 ("align") instructs flex to trade off larger tables in the 2486 generated scanner for faster performance because the elements 2487 of the tables are better aligned for memory access and 2488 computation. On some RISC architectures, fetching and 2489 manipulating longwords is more efficient than with 2490 smaller-sized units such as shortwords. This option can 2491 quadruple the size of the tables used by your scanner. 2492 2493 `-Ce, --ecs, `%option ecs'' 2494 directs `flex' to construct "equivalence classes", i.e., sets 2495 of characters which have identical lexical properties (for 2496 example, if the only appearance of digits in the `flex' input 2497 is in the character class "[0-9]" then the digits '0', '1', 2498 ..., '9' will all be put in the same equivalence class). 2499 Equivalence classes usually give dramatic reductions in the 2500 final table/object file sizes (typically a factor of 2-5) and 2501 are pretty cheap performance-wise (one array look-up per 2502 character scanned). 2503 2504 `-Cf' 2505 specifies that the "full" scanner tables should be generated - 2506 `flex' should not compress the tables by taking advantages of 2507 similar transition functions for different states. 2508 2509 `-CF' 2510 specifies that the alternate fast scanner representation 2511 (described above under the `--fast' flag) should be used. 2512 This option cannot be used with `--c++'. 2513 2514 `-Cm, --meta-ecs, `%option meta-ecs'' 2515 directs `flex' to construct "meta-equivalence classes", which 2516 are sets of equivalence classes (or characters, if equivalence 2517 classes are not being used) that are commonly used together. 2518 Meta-equivalence classes are often a big win when using 2519 compressed tables, but they have a moderate performance 2520 impact (one or two `if' tests and one array look-up per 2521 character scanned). 2522 2523 `-Cr, --read, `%option read'' 2524 causes the generated scanner to _bypass_ use of the standard 2525 I/O library (`stdio') for input. Instead of calling 2526 `fread()' or `getc()', the scanner will use the `read()' 2527 system call, resulting in a performance gain which varies 2528 from system to system, but in general is probably negligible 2529 unless you are also using `-Cf' or `-CF'. Using `-Cr' can 2530 cause strange behavior if, for example, you read from `yyin' 2531 using `stdio' prior to calling the scanner (because the 2532 scanner will miss whatever text your previous reads left in 2533 the `stdio' input buffer). `-Cr' has no effect if you define 2534 `YY_INPUT()' (*note Generated Scanner::). 2535 2536 The options `-Cf' or `-CF' and `-Cm' do not make sense together - 2537 there is no opportunity for meta-equivalence classes if the table 2538 is not being compressed. Otherwise the options may be freely 2539 mixed, and are cumulative. 2540 2541 The default setting is `-Cem', which specifies that `flex' should 2542 generate equivalence classes and meta-equivalence classes. This 2543 setting provides the highest degree of table compression. You can 2544 trade off faster-executing scanners at the cost of larger tables 2545 with the following generally being true: 2546 2547 2548 slowest & smallest 2549 -Cem 2550 -Cm 2551 -Ce 2552 -C 2553 -C{f,F}e 2554 -C{f,F} 2555 -C{f,F}a 2556 fastest & largest 2557 2558 Note that scanners with the smallest tables are usually generated 2559 and compiled the quickest, so during development you will usually 2560 want to use the default, maximal compression. 2561 2562 `-Cfe' is often a good compromise between speed and size for 2563 production scanners. 2564 2565`-f, --full, `%option full'' 2566 specifies "fast scanner". No table compression is done and 2567 `stdio' is bypassed. The result is large but fast. This option 2568 is equivalent to `--Cfr' 2569 2570`-F, --fast, `%option fast'' 2571 specifies that the _fast_ scanner table representation should be 2572 used (and `stdio' bypassed). This representation is about as fast 2573 as the full table representation `--full', and for some sets of 2574 patterns will be considerably smaller (and for others, larger). In 2575 general, if the pattern set contains both _keywords_ and a 2576 catch-all, _identifier_ rule, such as in the set: 2577 2578 2579 "case" return TOK_CASE; 2580 "switch" return TOK_SWITCH; 2581 ... 2582 "default" return TOK_DEFAULT; 2583 [a-z]+ return TOK_ID; 2584 2585 then you're better off using the full table representation. If 2586 only the _identifier_ rule is present and you then use a hash 2587 table or some such to detect the keywords, you're better off using 2588 `--fast'. 2589 2590 This option is equivalent to `-CFr'. It cannot be used with 2591 `--c++'. 2592 2593 2594 2595File: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options 2596 259716.5 Debugging Options 2598====================== 2599 2600`-b, --backup, `%option backup'' 2601 Generate backing-up information to `lex.backup'. This is a list of 2602 scanner states which require backing up and the input characters on 2603 which they do so. By adding rules one can remove backing-up 2604 states. If _all_ backing-up states are eliminated and `-Cf' or 2605 `-CF' is used, the generated scanner will run faster (see the 2606 `--perf-report' flag). Only users who wish to squeeze every last 2607 cycle out of their scanners need worry about this option. (*note 2608 Performance::). 2609 2610`-d, --debug, `%option debug'' 2611 makes the generated scanner run in "debug" mode. Whenever a 2612 pattern is recognized and the global variable `yy_flex_debug' is 2613 non-zero (which is the default), the scanner will write to 2614 `stderr' a line of the form: 2615 2616 2617 -accepting rule at line 53 ("the matched text") 2618 2619 The line number refers to the location of the rule in the file 2620 defining the scanner (i.e., the file that was fed to flex). 2621 Messages are also generated when the scanner backs up, accepts the 2622 default rule, reaches the end of its input buffer (or encounters a 2623 NUL; at this point, the two look the same as far as the scanner's 2624 concerned), or reaches an end-of-file. 2625 2626`-p, --perf-report, `%option perf-report'' 2627 generates a performance report to `stderr'. The report consists of 2628 comments regarding features of the `flex' input file which will 2629 cause a serious loss of performance in the resulting scanner. If 2630 you give the flag twice, you will also get comments regarding 2631 features that lead to minor performance losses. 2632 2633 Note that the use of `REJECT', and variable trailing context 2634 (*note Limitations::) entails a substantial performance penalty; 2635 use of `yymore()', the `^' operator, and the `--interactive' flag 2636 entail minor performance penalties. 2637 2638`-s, --nodefault, `%option nodefault'' 2639 causes the _default rule_ (that unmatched scanner input is echoed 2640 to `stdout)' to be suppressed. If the scanner encounters input 2641 that does not match any of its rules, it aborts with an error. 2642 This option is useful for finding holes in a scanner's rule set. 2643 2644`-T, --trace, `%option trace'' 2645 makes `flex' run in "trace" mode. It will generate a lot of 2646 messages to `stderr' concerning the form of the input and the 2647 resultant non-deterministic and deterministic finite automata. 2648 This option is mostly for use in maintaining `flex'. 2649 2650`-w, --nowarn, `%option nowarn'' 2651 suppresses warning messages. 2652 2653`-v, --verbose, `%option verbose'' 2654 specifies that `flex' should write to `stderr' a summary of 2655 statistics regarding the scanner it generates. Most of the 2656 statistics are meaningless to the casual `flex' user, but the 2657 first line identifies the version of `flex' (same as reported by 2658 `--version'), and the next line the flags used when generating the 2659 scanner, including those that are on by default. 2660 2661`--warn, `%option warn'' 2662 warn about certain things. In particular, if the default rule can 2663 be matched but no default rule has been given, the flex will warn 2664 you. We recommend using this option always. 2665 2666 2667 2668File: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options 2669 267016.6 Miscellaneous Options 2671========================== 2672 2673`-c' 2674 A do-nothing option included for POSIX compliance. 2675 2676`-h, -?, --help' 2677 generates a "help" summary of `flex''s options to `stdout' and 2678 then exits. 2679 2680`-n' 2681 Another do-nothing option included for POSIX compliance. 2682 2683`-V, --version' 2684 prints the version number to `stdout' and exits. 2685 2686 2687 2688File: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top 2689 269017 Performance Considerations 2691***************************** 2692 2693The main design goal of `flex' is that it generate high-performance 2694scanners. It has been optimized for dealing well with large sets of 2695rules. Aside from the effects on scanner speed of the table compression 2696`-C' options outlined above, there are a number of options/actions 2697which degrade performance. These are, from most expensive to least: 2698 2699 2700 REJECT 2701 arbitrary trailing context 2702 2703 pattern sets that require backing up 2704 %option yylineno 2705 %array 2706 2707 %option interactive 2708 %option always-interactive 2709 2710 @samp{^} beginning-of-line operator 2711 yymore() 2712 2713 with the first two all being quite expensive and the last two being 2714quite cheap. Note also that `unput()' is implemented as a routine call 2715that potentially does quite a bit of work, while `yyless()' is a 2716quite-cheap macro. So if you are just putting back some excess text you 2717scanned, use `yyless()'. 2718 2719 `REJECT' should be avoided at all costs when performance is 2720important. It is a particularly expensive option. 2721 2722 There is one case when `%option yylineno' can be expensive. That is 2723when your patterns match long tokens that could _possibly_ contain a 2724newline character. There is no performance penalty for rules that can 2725not possibly match newlines, since flex does not need to check them for 2726newlines. In general, you should avoid rules such as `[^f]+', which 2727match very long tokens, including newlines, and may possibly match your 2728entire file! A better approach is to separate `[^f]+' into two rules: 2729 2730 2731 %option yylineno 2732 %% 2733 [^f\n]+ 2734 \n+ 2735 2736 The above scanner does not incur a performance penalty. 2737 2738 Getting rid of backing up is messy and often may be an enormous 2739amount of work for a complicated scanner. In principal, one begins by 2740using the `-b' flag to generate a `lex.backup' file. For example, on 2741the input: 2742 2743 2744 %% 2745 foo return TOK_KEYWORD; 2746 foobar return TOK_KEYWORD; 2747 2748 the file looks like: 2749 2750 2751 State #6 is non-accepting - 2752 associated rule line numbers: 2753 2 3 2754 out-transitions: [ o ] 2755 jam-transitions: EOF [ \001-n p-\177 ] 2756 2757 State #8 is non-accepting - 2758 associated rule line numbers: 2759 3 2760 out-transitions: [ a ] 2761 jam-transitions: EOF [ \001-` b-\177 ] 2762 2763 State #9 is non-accepting - 2764 associated rule line numbers: 2765 3 2766 out-transitions: [ r ] 2767 jam-transitions: EOF [ \001-q s-\177 ] 2768 2769 Compressed tables always back up. 2770 2771 The first few lines tell us that there's a scanner state in which it 2772can make a transition on an 'o' but not on any other character, and 2773that in that state the currently scanned text does not match any rule. 2774The state occurs when trying to match the rules found at lines 2 and 3 2775in the input file. If the scanner is in that state and then reads 2776something other than an 'o', it will have to back up to find a rule 2777which is matched. With a bit of headscratching one can see that this 2778must be the state it's in when it has seen `fo'. When this has 2779happened, if anything other than another `o' is seen, the scanner will 2780have to back up to simply match the `f' (by the default rule). 2781 2782 The comment regarding State #8 indicates there's a problem when 2783`foob' has been scanned. Indeed, on any character other than an `a', 2784the scanner will have to back up to accept "foo". Similarly, the 2785comment for State #9 concerns when `fooba' has been scanned and an `r' 2786does not follow. 2787 2788 The final comment reminds us that there's no point going to all the 2789trouble of removing backing up from the rules unless we're using `-Cf' 2790or `-CF', since there's no performance gain doing so with compressed 2791scanners. 2792 2793 The way to remove the backing up is to add "error" rules: 2794 2795 2796 %% 2797 foo return TOK_KEYWORD; 2798 foobar return TOK_KEYWORD; 2799 2800 fooba | 2801 foob | 2802 fo { 2803 /* false alarm, not really a keyword */ 2804 return TOK_ID; 2805 } 2806 2807 Eliminating backing up among a list of keywords can also be done 2808using a "catch-all" rule: 2809 2810 2811 %% 2812 foo return TOK_KEYWORD; 2813 foobar return TOK_KEYWORD; 2814 2815 [a-z]+ return TOK_ID; 2816 2817 This is usually the best solution when appropriate. 2818 2819 Backing up messages tend to cascade. With a complicated set of rules 2820it's not uncommon to get hundreds of messages. If one can decipher 2821them, though, it often only takes a dozen or so rules to eliminate the 2822backing up (though it's easy to make a mistake and have an error rule 2823accidentally match a valid token. A possible future `flex' feature 2824will be to automatically add rules to eliminate backing up). 2825 2826 It's important to keep in mind that you gain the benefits of 2827eliminating backing up only if you eliminate _every_ instance of 2828backing up. Leaving just one means you gain nothing. 2829 2830 _Variable_ trailing context (where both the leading and trailing 2831parts do not have a fixed length) entails almost the same performance 2832loss as `REJECT' (i.e., substantial). So when possible a rule like: 2833 2834 2835 %% 2836 mouse|rat/(cat|dog) run(); 2837 2838 is better written: 2839 2840 2841 %% 2842 mouse/cat|dog run(); 2843 rat/cat|dog run(); 2844 2845 or as 2846 2847 2848 %% 2849 mouse|rat/cat run(); 2850 mouse|rat/dog run(); 2851 2852 Note that here the special '|' action does _not_ provide any 2853savings, and can even make things worse (*note Limitations::). 2854 2855 Another area where the user can increase a scanner's performance (and 2856one that's easier to implement) arises from the fact that the longer the 2857tokens matched, the faster the scanner will run. This is because with 2858long tokens the processing of most input characters takes place in the 2859(short) inner scanning loop, and does not often have to go through the 2860additional work of setting up the scanning environment (e.g., `yytext') 2861for the action. Recall the scanner for C comments: 2862 2863 2864 %x comment 2865 %% 2866 int line_num = 1; 2867 2868 "/*" BEGIN(comment); 2869 2870 <comment>[^*\n]* 2871 <comment>"*"+[^*/\n]* 2872 <comment>\n ++line_num; 2873 <comment>"*"+"/" BEGIN(INITIAL); 2874 2875 This could be sped up by writing it as: 2876 2877 2878 %x comment 2879 %% 2880 int line_num = 1; 2881 2882 "/*" BEGIN(comment); 2883 2884 <comment>[^*\n]* 2885 <comment>[^*\n]*\n ++line_num; 2886 <comment>"*"+[^*/\n]* 2887 <comment>"*"+[^*/\n]*\n ++line_num; 2888 <comment>"*"+"/" BEGIN(INITIAL); 2889 2890 Now instead of each newline requiring the processing of another 2891action, recognizing the newlines is distributed over the other rules to 2892keep the matched text as long as possible. Note that _adding_ rules 2893does _not_ slow down the scanner! The speed of the scanner is 2894independent of the number of rules or (modulo the considerations given 2895at the beginning of this section) how complicated the rules are with 2896regard to operators such as `*' and `|'. 2897 2898 A final example in speeding up a scanner: suppose you want to scan 2899through a file containing identifiers and keywords, one per line and 2900with no other extraneous characters, and recognize all the keywords. A 2901natural first approach is: 2902 2903 2904 %% 2905 asm | 2906 auto | 2907 break | 2908 ... etc ... 2909 volatile | 2910 while /* it's a keyword */ 2911 2912 .|\n /* it's not a keyword */ 2913 2914 To eliminate the back-tracking, introduce a catch-all rule: 2915 2916 2917 %% 2918 asm | 2919 auto | 2920 break | 2921 ... etc ... 2922 volatile | 2923 while /* it's a keyword */ 2924 2925 [a-z]+ | 2926 .|\n /* it's not a keyword */ 2927 2928 Now, if it's guaranteed that there's exactly one word per line, then 2929we can reduce the total number of matches by a half by merging in the 2930recognition of newlines with that of the other tokens: 2931 2932 2933 %% 2934 asm\n | 2935 auto\n | 2936 break\n | 2937 ... etc ... 2938 volatile\n | 2939 while\n /* it's a keyword */ 2940 2941 [a-z]+\n | 2942 .|\n /* it's not a keyword */ 2943 2944 One has to be careful here, as we have now reintroduced backing up 2945into the scanner. In particular, while _we_ know that there will never 2946be any characters in the input stream other than letters or newlines, 2947`flex' can't figure this out, and it will plan for possibly needing to 2948back up when it has scanned a token like `auto' and then the next 2949character is something other than a newline or a letter. Previously it 2950would then just match the `auto' rule and be done, but now it has no 2951`auto' rule, only a `auto\n' rule. To eliminate the possibility of 2952backing up, we could either duplicate all rules but without final 2953newlines, or, since we never expect to encounter such an input and 2954therefore don't how it's classified, we can introduce one more 2955catch-all rule, this one which doesn't include a newline: 2956 2957 2958 %% 2959 asm\n | 2960 auto\n | 2961 break\n | 2962 ... etc ... 2963 volatile\n | 2964 while\n /* it's a keyword */ 2965 2966 [a-z]+\n | 2967 [a-z]+ | 2968 .|\n /* it's not a keyword */ 2969 2970 Compiled with `-Cf', this is about as fast as one can get a `flex' 2971scanner to go for this particular problem. 2972 2973 A final note: `flex' is slow when matching `NUL's, particularly when 2974a token contains multiple `NUL's. It's best to write rules which match 2975_short_ amounts of text if it's anticipated that the text will often 2976include `NUL's. 2977 2978 Another final note regarding performance: as mentioned in *Note 2979Matching::, dynamically resizing `yytext' to accommodate huge tokens is 2980a slow process because it presently requires that the (huge) token be 2981rescanned from the beginning. Thus if performance is vital, you should 2982attempt to match "large" quantities of text but not "huge" quantities, 2983where the cutoff between the two is at about 8K characters per token. 2984 2985 2986File: flex.info, Node: Cxx, Next: Reentrant, Prev: Performance, Up: Top 2987 298818 Generating C++ Scanners 2989************************** 2990 2991*IMPORTANT*: the present form of the scanning class is _experimental_ 2992and may change considerably between major releases. 2993 2994 `flex' provides two different ways to generate scanners for use with 2995C++. The first way is to simply compile a scanner generated by `flex' 2996using a C++ compiler instead of a C compiler. You should not encounter 2997any compilation errors (*note Reporting Bugs::). You can then use C++ 2998code in your rule actions instead of C code. Note that the default 2999input source for your scanner remains `yyin', and default echoing is 3000still done to `yyout'. Both of these remain `FILE *' variables and not 3001C++ _streams_. 3002 3003 You can also use `flex' to generate a C++ scanner class, using the 3004`-+' option (or, equivalently, `%option c++)', which is automatically 3005specified if the name of the `flex' executable ends in a '+', such as 3006`flex++'. When using this option, `flex' defaults to generating the 3007scanner to the file `lex.yy.cc' instead of `lex.yy.c'. The generated 3008scanner includes the header file `FlexLexer.h', which defines the 3009interface to two C++ classes. 3010 3011 The first class, `FlexLexer', provides an abstract base class 3012defining the general scanner class interface. It provides the 3013following member functions: 3014 3015`const char* YYText()' 3016 returns the text of the most recently matched token, the 3017 equivalent of `yytext'. 3018 3019`int YYLeng()' 3020 returns the length of the most recently matched token, the 3021 equivalent of `yyleng'. 3022 3023`int lineno() const' 3024 returns the current input line number (see `%option yylineno)', or 3025 `1' if `%option yylineno' was not used. 3026 3027`void set_debug( int flag )' 3028 sets the debugging flag for the scanner, equivalent to assigning to 3029 `yy_flex_debug' (*note Scanner Options::). Note that you must 3030 build the scanner using `%option debug' to include debugging 3031 information in it. 3032 3033`int debug() const' 3034 returns the current setting of the debugging flag. 3035 3036 Also provided are member functions equivalent to 3037`yy_switch_to_buffer()', `yy_create_buffer()' (though the first 3038argument is an `istream*' object pointer and not a `FILE*)', 3039`yy_flush_buffer()', `yy_delete_buffer()', and `yyrestart()' (again, 3040the first argument is a `istream*' object pointer). 3041 3042 The second class defined in `FlexLexer.h' is `yyFlexLexer', which is 3043derived from `FlexLexer'. It defines the following additional member 3044functions: 3045 3046`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )' 3047 constructs a `yyFlexLexer' object using the given streams for input 3048 and output. If not specified, the streams default to `cin' and 3049 `cout', respectively. 3050 3051`virtual int yylex()' 3052 performs the same role is `yylex()' does for ordinary `flex' 3053 scanners: it scans the input stream, consuming tokens, until a 3054 rule's action returns a value. If you derive a subclass `S' from 3055 `yyFlexLexer' and want to access the member functions and variables 3056 of `S' inside `yylex()', then you need to use `%option 3057 yyclass="S"' to inform `flex' that you will be using that subclass 3058 instead of `yyFlexLexer'. In this case, rather than generating 3059 `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also 3060 generates a dummy `yyFlexLexer::yylex()' that calls 3061 `yyFlexLexer::LexerError()' if called). 3062 3063`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)' 3064 reassigns `yyin' to `new_in' (if non-null) and `yyout' to 3065 `new_out' (if non-null), deleting the previous input buffer if 3066 `yyin' is reassigned. 3067 3068`int yylex( istream* new_in, ostream* new_out = 0 )' 3069 first switches the input streams via `switch_streams( new_in, 3070 new_out )' and then returns the value of `yylex()'. 3071 3072 In addition, `yyFlexLexer' defines the following protected virtual 3073functions which you can redefine in derived classes to tailor the 3074scanner: 3075 3076`virtual int LexerInput( char* buf, int max_size )' 3077 reads up to `max_size' characters into `buf' and returns the 3078 number of characters read. To indicate end-of-input, return 0 3079 characters. Note that `interactive' scanners (see the `-B' and 3080 `-I' flags in *Note Scanner Options::) define the macro 3081 `YY_INTERACTIVE'. If you redefine `LexerInput()' and need to take 3082 different actions depending on whether or not the scanner might be 3083 scanning an interactive input source, you can test for the 3084 presence of this name via `#ifdef' statements. 3085 3086`virtual void LexerOutput( const char* buf, int size )' 3087 writes out `size' characters from the buffer `buf', which, while 3088 `NUL'-terminated, may also contain internal `NUL's if the 3089 scanner's rules can match text with `NUL's in them. 3090 3091`virtual void LexerError( const char* msg )' 3092 reports a fatal error message. The default version of this 3093 function writes the message to the stream `cerr' and exits. 3094 3095 Note that a `yyFlexLexer' object contains its _entire_ scanning 3096state. Thus you can use such objects to create reentrant scanners, but 3097see also *Note Reentrant::. You can instantiate multiple instances of 3098the same `yyFlexLexer' class, and you can also combine multiple C++ 3099scanner classes together in the same program using the `-P' option 3100discussed above. 3101 3102 Finally, note that the `%array' feature is not available to C++ 3103scanner classes; you must use `%pointer' (the default). 3104 3105 Here is an example of a simple C++ scanner: 3106 3107 3108 // An example of using the flex C++ scanner class. 3109 3110 %{ 3111 int mylineno = 0; 3112 %} 3113 3114 string \"[^\n"]+\" 3115 3116 ws [ \t]+ 3117 3118 alpha [A-Za-z] 3119 dig [0-9] 3120 name ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])* 3121 num1 [-+]?{dig}+\.?([eE][-+]?{dig}+)? 3122 num2 [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)? 3123 number {num1}|{num2} 3124 3125 %% 3126 3127 {ws} /* skip blanks and tabs */ 3128 3129 "/*" { 3130 int c; 3131 3132 while((c = yyinput()) != 0) 3133 { 3134 if(c == '\n') 3135 ++mylineno; 3136 3137 else if(c == @samp{*}) 3138 { 3139 if((c = yyinput()) == '/') 3140 break; 3141 else 3142 unput(c); 3143 } 3144 } 3145 } 3146 3147 {number} cout "number " YYText() '\n'; 3148 3149 \n mylineno++; 3150 3151 {name} cout "name " YYText() '\n'; 3152 3153 {string} cout "string " YYText() '\n'; 3154 3155 %% 3156 3157 int main( int /* argc */, char** /* argv */ ) 3158 { 3159 @code{flex}Lexer* lexer = new yyFlexLexer; 3160 while(lexer->yylex() != 0) 3161 ; 3162 return 0; 3163 } 3164 3165 If you want to create multiple (different) lexer classes, you use the 3166`-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to 3167some other `xxFlexLexer'. You then can include `<FlexLexer.h>' in your 3168other sources once per lexer class, first renaming `yyFlexLexer' as 3169follows: 3170 3171 3172 #undef yyFlexLexer 3173 #define yyFlexLexer xxFlexLexer 3174 #include <FlexLexer.h> 3175 3176 #undef yyFlexLexer 3177 #define yyFlexLexer zzFlexLexer 3178 #include <FlexLexer.h> 3179 3180 if, for example, you used `%option prefix="xx"' for one of your 3181scanners and `%option prefix="zz"' for the other. 3182 3183 3184File: flex.info, Node: Reentrant, Next: Lex and Posix, Prev: Cxx, Up: Top 3185 318619 Reentrant C Scanners 3187*********************** 3188 3189`flex' has the ability to generate a reentrant C scanner. This is 3190accomplished by specifying `%option reentrant' (`-R') The generated 3191scanner is both portable, and safe to use in one or more separate 3192threads of control. The most common use for reentrant scanners is from 3193within multi-threaded applications. Any thread may create and execute 3194a reentrant `flex' scanner without the need for synchronization with 3195other threads. 3196 3197* Menu: 3198 3199* Reentrant Uses:: 3200* Reentrant Overview:: 3201* Reentrant Example:: 3202* Reentrant Detail:: 3203* Reentrant Functions:: 3204 3205 3206File: flex.info, Node: Reentrant Uses, Next: Reentrant Overview, Prev: Reentrant, Up: Reentrant 3207 320819.1 Uses for Reentrant Scanners 3209================================ 3210 3211However, there are other uses for a reentrant scanner. For example, you 3212could scan two or more files simultaneously to implement a `diff' at 3213the token level (i.e., instead of at the character level): 3214 3215 3216 /* Example of maintaining more than one active scanner. */ 3217 3218 do { 3219 int tok1, tok2; 3220 3221 tok1 = yylex( scanner_1 ); 3222 tok2 = yylex( scanner_2 ); 3223 3224 if( tok1 != tok2 ) 3225 printf("Files are different."); 3226 3227 } while ( tok1 && tok2 ); 3228 3229 Another use for a reentrant scanner is recursion. (Note that a 3230recursive scanner can also be created using a non-reentrant scanner and 3231buffer states. *Note Multiple Input Buffers::.) 3232 3233 The following crude scanner supports the `eval' command by invoking 3234another instance of itself. 3235 3236 3237 /* Example of recursive invocation. */ 3238 3239 %option reentrant 3240 3241 %% 3242 "eval(".+")" { 3243 yyscan_t scanner; 3244 YY_BUFFER_STATE buf; 3245 3246 yylex_init( &scanner ); 3247 yytext[yyleng-1] = ' '; 3248 3249 buf = yy_scan_string( yytext + 5, scanner ); 3250 yylex( scanner ); 3251 3252 yy_delete_buffer(buf,scanner); 3253 yylex_destroy( scanner ); 3254 } 3255 ... 3256 %% 3257 3258 3259File: flex.info, Node: Reentrant Overview, Next: Reentrant Example, Prev: Reentrant Uses, Up: Reentrant 3260 326119.2 An Overview of the Reentrant API 3262===================================== 3263 3264The API for reentrant scanners is different than for non-reentrant 3265scanners. Here is a quick overview of the API: 3266 3267 `%option reentrant' must be specified. 3268 3269 * All functions take one additional argument: `yyscanner' 3270 3271 * All global variables are replaced by their macro equivalents. (We 3272 tell you this because it may be important to you during debugging.) 3273 3274 * `yylex_init' and `yylex_destroy' must be called before and after 3275 `yylex', respectively. 3276 3277 * Accessor methods (get/set functions) provide access to common 3278 `flex' variables. 3279 3280 * User-specific data can be stored in `yyextra'. 3281 3282 3283File: flex.info, Node: Reentrant Example, Next: Reentrant Detail, Prev: Reentrant Overview, Up: Reentrant 3284 328519.3 Reentrant Example 3286====================== 3287 3288First, an example of a reentrant scanner: 3289 3290 /* This scanner prints "//" comments. */ 3291 3292 %option reentrant stack noyywrap 3293 %x COMMENT 3294 3295 %% 3296 3297 "//" yy_push_state( COMMENT, yyscanner); 3298 .|\n 3299 3300 <COMMENT>\n yy_pop_state( yyscanner ); 3301 <COMMENT>[^\n]+ fprintf( yyout, "%s\n", yytext); 3302 3303 %% 3304 3305 int main ( int argc, char * argv[] ) 3306 { 3307 yyscan_t scanner; 3308 3309 yylex_init ( &scanner ); 3310 yylex ( scanner ); 3311 yylex_destroy ( scanner ); 3312 return 0; 3313 } 3314 3315 3316File: flex.info, Node: Reentrant Detail, Next: Reentrant Functions, Prev: Reentrant Example, Up: Reentrant 3317 331819.4 The Reentrant API in Detail 3319================================ 3320 3321Here are the things you need to do or know to use the reentrant C API of 3322`flex'. 3323 3324* Menu: 3325 3326* Specify Reentrant:: 3327* Extra Reentrant Argument:: 3328* Global Replacement:: 3329* Init and Destroy Functions:: 3330* Accessor Methods:: 3331* Extra Data:: 3332* About yyscan_t:: 3333 3334 3335File: flex.info, Node: Specify Reentrant, Next: Extra Reentrant Argument, Prev: Reentrant Detail, Up: Reentrant Detail 3336 333719.4.1 Declaring a Scanner As Reentrant 3338--------------------------------------- 3339 3340%option reentrant (-reentrant) must be specified. 3341 3342 Notice that `%option reentrant' is specified in the above example 3343(*note Reentrant Example::. Had this option not been specified, `flex' 3344would have happily generated a non-reentrant scanner without 3345complaining. You may explicitly specify `%option noreentrant', if you 3346do _not_ want a reentrant scanner, although it is not necessary. The 3347default is to generate a non-reentrant scanner. 3348 3349 3350File: flex.info, Node: Extra Reentrant Argument, Next: Global Replacement, Prev: Specify Reentrant, Up: Reentrant Detail 3351 335219.4.2 The Extra Argument 3353------------------------- 3354 3355All functions take one additional argument: `yyscanner'. 3356 3357 Notice that the calls to `yy_push_state' and `yy_pop_state' both 3358have an argument, `yyscanner' , that is not present in a non-reentrant 3359scanner. Here are the declarations of `yy_push_state' and 3360`yy_pop_state' in the reentrant scanner: 3361 3362 3363 static void yy_push_state ( int new_state , yyscan_t yyscanner ) ; 3364 static void yy_pop_state ( yyscan_t yyscanner ) ; 3365 3366 Notice that the argument `yyscanner' appears in the declaration of 3367both functions. In fact, all `flex' functions in a reentrant scanner 3368have this additional argument. It is always the last argument in the 3369argument list, it is always of type `yyscan_t' (which is typedef'd to 3370`void *') and it is always named `yyscanner'. As you may have guessed, 3371`yyscanner' is a pointer to an opaque data structure encapsulating the 3372current state of the scanner. For a list of function declarations, see 3373*Note Reentrant Functions::. Note that preprocessor macros, such as 3374`BEGIN', `ECHO', and `REJECT', do not take this additional argument. 3375 3376 3377File: flex.info, Node: Global Replacement, Next: Init and Destroy Functions, Prev: Extra Reentrant Argument, Up: Reentrant Detail 3378 337919.4.3 Global Variables Replaced By Macros 3380------------------------------------------ 3381 3382All global variables in traditional flex have been replaced by macro 3383equivalents. 3384 3385 Note that in the above example, `yyout' and `yytext' are not plain 3386variables. These are macros that will expand to their equivalent lvalue. 3387All of the familiar `flex' globals have been replaced by their macro 3388equivalents. In particular, `yytext', `yyleng', `yylineno', `yyin', 3389`yyout', `yyextra', `yylval', and `yylloc' are macros. You may safely 3390use these macros in actions as if they were plain variables. We only 3391tell you this so you don't expect to link to these variables 3392externally. Currently, each macro expands to a member of an internal 3393struct, e.g., 3394 3395 3396 #define yytext (((struct yyguts_t*)yyscanner)->yytext_r) 3397 3398 One important thing to remember about `yytext' and friends is that 3399`yytext' is not a global variable in a reentrant scanner, you can not 3400access it directly from outside an action or from other functions. You 3401must use an accessor method, e.g., `yyget_text', to accomplish this. 3402(See below). 3403 3404 3405File: flex.info, Node: Init and Destroy Functions, Next: Accessor Methods, Prev: Global Replacement, Up: Reentrant Detail 3406 340719.4.4 Init and Destroy Functions 3408--------------------------------- 3409 3410`yylex_init' and `yylex_destroy' must be called before and after 3411`yylex', respectively. 3412 3413 3414 int yylex_init ( yyscan_t * ptr_yy_globals ) ; 3415 int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ; 3416 int yylex ( yyscan_t yyscanner ) ; 3417 int yylex_destroy ( yyscan_t yyscanner ) ; 3418 3419 The function `yylex_init' must be called before calling any other 3420function. The argument to `yylex_init' is the address of an 3421uninitialized pointer to be filled in by `yylex_init', overwriting any 3422previous contents. The function `yylex_init_extra' may be used instead, 3423taking as its first argument a variable of type `YY_EXTRA_TYPE'. See 3424the section on yyextra, below, for more details. 3425 3426 The value stored in `ptr_yy_globals' should thereafter be passed to 3427`yylex' and `yylex_destroy'. Flex does not save the argument passed to 3428`yylex_init', so it is safe to pass the address of a local pointer to 3429`yylex_init' so long as it remains in scope for the duration of all 3430calls to the scanner, up to and including the call to `yylex_destroy'. 3431 3432 The function `yylex' should be familiar to you by now. The reentrant 3433version takes one argument, which is the value returned (via an 3434argument) by `yylex_init'. Otherwise, it behaves the same as the 3435non-reentrant version of `yylex'. 3436 3437 Both `yylex_init' and `yylex_init_extra' returns 0 (zero) on success, 3438or non-zero on failure, in which case errno is set to one of the 3439following values: 3440 3441 * ENOMEM Memory allocation error. *Note memory-management::. 3442 3443 * EINVAL Invalid argument. 3444 3445 The function `yylex_destroy' should be called to free resources used 3446by the scanner. After `yylex_destroy' is called, the contents of 3447`yyscanner' should not be used. Of course, there is no need to destroy 3448a scanner if you plan to reuse it. A `flex' scanner (both reentrant 3449and non-reentrant) may be restarted by calling `yyrestart'. 3450 3451 Below is an example of a program that creates a scanner, uses it, 3452then destroys it when done: 3453 3454 3455 int main () 3456 { 3457 yyscan_t scanner; 3458 int tok; 3459 3460 yylex_init(&scanner); 3461 3462 while ((tok=yylex()) > 0) 3463 printf("tok=%d yytext=%s\n", tok, yyget_text(scanner)); 3464 3465 yylex_destroy(scanner); 3466 return 0; 3467 } 3468 3469 3470File: flex.info, Node: Accessor Methods, Next: Extra Data, Prev: Init and Destroy Functions, Up: Reentrant Detail 3471 347219.4.5 Accessing Variables with Reentrant Scanners 3473-------------------------------------------------- 3474 3475Accessor methods (get/set functions) provide access to common `flex' 3476variables. 3477 3478 Many scanners that you build will be part of a larger project. 3479Portions of your project will need access to `flex' values, such as 3480`yytext'. In a non-reentrant scanner, these values are global, so 3481there is no problem accessing them. However, in a reentrant scanner, 3482there are no global `flex' values. You can not access them directly. 3483Instead, you must access `flex' values using accessor methods (get/set 3484functions). Each accessor method is named `yyget_NAME' or `yyset_NAME', 3485where `NAME' is the name of the `flex' variable you want. For example: 3486 3487 3488 /* Set the last character of yytext to NULL. */ 3489 void chop ( yyscan_t scanner ) 3490 { 3491 int len = yyget_leng( scanner ); 3492 yyget_text( scanner )[len - 1] = '\0'; 3493 } 3494 3495 The above code may be called from within an action like this: 3496 3497 3498 %% 3499 .+\n { chop( yyscanner );} 3500 3501 You may find that `%option header-file' is particularly useful for 3502generating prototypes of all the accessor functions. *Note 3503option-header::. 3504 3505 3506File: flex.info, Node: Extra Data, Next: About yyscan_t, Prev: Accessor Methods, Up: Reentrant Detail 3507 350819.4.6 Extra Data 3509----------------- 3510 3511User-specific data can be stored in `yyextra'. 3512 3513 In a reentrant scanner, it is unwise to use global variables to 3514communicate with or maintain state between different pieces of your 3515program. However, you may need access to external data or invoke 3516external functions from within the scanner actions. Likewise, you may 3517need to pass information to your scanner (e.g., open file descriptors, 3518or database connections). In a non-reentrant scanner, the only way to 3519do this would be through the use of global variables. `Flex' allows 3520you to store arbitrary, "extra" data in a scanner. This data is 3521accessible through the accessor methods `yyget_extra' and `yyset_extra' 3522from outside the scanner, and through the shortcut macro `yyextra' from 3523within the scanner itself. They are defined as follows: 3524 3525 3526 #define YY_EXTRA_TYPE void* 3527 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner ); 3528 void yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner); 3529 3530 In addition, an extra form of `yylex_init' is provided, 3531`yylex_init_extra'. This function is provided so that the yyextra value 3532can be accessed from within the very first yyalloc, used to allocate 3533the scanner itself. 3534 3535 By default, `YY_EXTRA_TYPE' is defined as type `void *'. You may 3536redefine this type using `%option extra-type="your_type"' in the 3537scanner: 3538 3539 3540 /* An example of overriding YY_EXTRA_TYPE. */ 3541 %{ 3542 #include <sys/stat.h> 3543 #include <unistd.h> 3544 %} 3545 %option reentrant 3546 %option extra-type="struct stat *" 3547 %% 3548 3549 __filesize__ printf( "%ld", yyextra->st_size ); 3550 __lastmod__ printf( "%ld", yyextra->st_mtime ); 3551 %% 3552 void scan_file( char* filename ) 3553 { 3554 yyscan_t scanner; 3555 struct stat buf; 3556 FILE *in; 3557 3558 in = fopen( filename, "r" ); 3559 stat( filename, &buf ); 3560 3561 yylex_init_extra( buf, &scanner ); 3562 yyset_in( in, scanner ); 3563 yylex( scanner ); 3564 yylex_destroy( scanner ); 3565 3566 fclose( in ); 3567 } 3568 3569 3570File: flex.info, Node: About yyscan_t, Prev: Extra Data, Up: Reentrant Detail 3571 357219.4.7 About yyscan_t 3573--------------------- 3574 3575`yyscan_t' is defined as: 3576 3577 3578 typedef void* yyscan_t; 3579 3580 It is initialized by `yylex_init()' to point to an internal 3581structure. You should never access this value directly. In particular, 3582you should never attempt to free it (use `yylex_destroy()' instead.) 3583 3584 3585File: flex.info, Node: Reentrant Functions, Prev: Reentrant Detail, Up: Reentrant 3586 358719.5 Functions and Macros Available in Reentrant C Scanners 3588=========================================================== 3589 3590The following Functions are available in a reentrant scanner: 3591 3592 3593 char *yyget_text ( yyscan_t scanner ); 3594 int yyget_leng ( yyscan_t scanner ); 3595 FILE *yyget_in ( yyscan_t scanner ); 3596 FILE *yyget_out ( yyscan_t scanner ); 3597 int yyget_lineno ( yyscan_t scanner ); 3598 YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner ); 3599 int yyget_debug ( yyscan_t scanner ); 3600 3601 void yyset_debug ( int flag, yyscan_t scanner ); 3602 void yyset_in ( FILE * in_str , yyscan_t scanner ); 3603 void yyset_out ( FILE * out_str , yyscan_t scanner ); 3604 void yyset_lineno ( int line_number , yyscan_t scanner ); 3605 void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner ); 3606 3607 There are no "set" functions for yytext and yyleng. This is 3608intentional. 3609 3610 The following Macro shortcuts are available in actions in a reentrant 3611scanner: 3612 3613 3614 yytext 3615 yyleng 3616 yyin 3617 yyout 3618 yylineno 3619 yyextra 3620 yy_flex_debug 3621 3622 In a reentrant C scanner, support for yylineno is always present 3623(i.e., you may access yylineno), but the value is never modified by 3624`flex' unless `%option yylineno' is enabled. This is to allow the user 3625to maintain the line count independently of `flex'. 3626 3627 The following functions and macros are made available when `%option 3628bison-bridge' (`--bison-bridge') is specified: 3629 3630 3631 YYSTYPE * yyget_lval ( yyscan_t scanner ); 3632 void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner ); 3633 yylval 3634 3635 The following functions and macros are made available when `%option 3636bison-locations' (`--bison-locations') is specified: 3637 3638 3639 YYLTYPE *yyget_lloc ( yyscan_t scanner ); 3640 void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner ); 3641 yylloc 3642 3643 Support for yylval assumes that `YYSTYPE' is a valid type. Support 3644for yylloc assumes that `YYSLYPE' is a valid type. Typically, these 3645types are generated by `bison', and are included in section 1 of the 3646`flex' input. 3647 3648 3649File: flex.info, Node: Lex and Posix, Next: Memory Management, Prev: Reentrant, Up: Top 3650 365120 Incompatibilities with Lex and Posix 3652*************************************** 3653 3654`flex' is a rewrite of the AT&T Unix _lex_ tool (the two 3655implementations do not share any code, though), with some extensions and 3656incompatibilities, both of which are of concern to those who wish to 3657write scanners acceptable to both implementations. `flex' is fully 3658compliant with the POSIX `lex' specification, except that when using 3659`%pointer' (the default), a call to `unput()' destroys the contents of 3660`yytext', which is counter to the POSIX specification. In this section 3661we discuss all of the known areas of incompatibility between `flex', 3662AT&T `lex', and the POSIX specification. `flex''s `-l' option turns on 3663maximum compatibility with the original AT&T `lex' implementation, at 3664the cost of a major loss in the generated scanner's performance. We 3665note below which incompatibilities can be overcome using the `-l' 3666option. `flex' is fully compatible with `lex' with the following 3667exceptions: 3668 3669 * The undocumented `lex' scanner internal variable `yylineno' is not 3670 supported unless `-l' or `%option yylineno' is used. 3671 3672 * `yylineno' should be maintained on a per-buffer basis, rather than 3673 a per-scanner (single global variable) basis. 3674 3675 * `yylineno' is not part of the POSIX specification. 3676 3677 * The `input()' routine is not redefinable, though it may be called 3678 to read characters following whatever has been matched by a rule. 3679 If `input()' encounters an end-of-file the normal `yywrap()' 3680 processing is done. A "real" end-of-file is returned by `input()' 3681 as `EOF'. 3682 3683 * Input is instead controlled by defining the `YY_INPUT()' macro. 3684 3685 * The `flex' restriction that `input()' cannot be redefined is in 3686 accordance with the POSIX specification, which simply does not 3687 specify any way of controlling the scanner's input other than by 3688 making an initial assignment to `yyin'. 3689 3690 * The `unput()' routine is not redefinable. This restriction is in 3691 accordance with POSIX. 3692 3693 * `flex' scanners are not as reentrant as `lex' scanners. In 3694 particular, if you have an interactive scanner and an interrupt 3695 handler which long-jumps out of the scanner, and the scanner is 3696 subsequently called again, you may get the following message: 3697 3698 3699 fatal @code{flex} scanner internal error--end of buffer missed 3700 3701 To reenter the scanner, first use: 3702 3703 3704 yyrestart( yyin ); 3705 3706 Note that this call will throw away any buffered input; usually 3707 this isn't a problem with an interactive scanner. *Note 3708 Reentrant::, for `flex''s reentrant API. 3709 3710 * Also note that `flex' C++ scanner classes _are_ reentrant, so if 3711 using C++ is an option for you, you should use them instead. 3712 *Note Cxx::, and *Note Reentrant:: for details. 3713 3714 * `output()' is not supported. Output from the ECHO macro is done 3715 to the file-pointer `yyout' (default `stdout)'. 3716 3717 * `output()' is not part of the POSIX specification. 3718 3719 * `lex' does not support exclusive start conditions (%x), though they 3720 are in the POSIX specification. 3721 3722 * When definitions are expanded, `flex' encloses them in parentheses. 3723 With `lex', the following: 3724 3725 3726 NAME [A-Z][A-Z0-9]* 3727 %% 3728 foo{NAME}? printf( "Found it\n" ); 3729 %% 3730 3731 will not match the string `foo' because when the macro is expanded 3732 the rule is equivalent to `foo[A-Z][A-Z0-9]*?' and the precedence 3733 is such that the `?' is associated with `[A-Z0-9]*'. With `flex', 3734 the rule will be expanded to `foo([A-Z][A-Z0-9]*)?' and so the 3735 string `foo' will match. 3736 3737 * Note that if the definition begins with `^' or ends with `$' then 3738 it is _not_ expanded with parentheses, to allow these operators to 3739 appear in definitions without losing their special meanings. But 3740 the `<s>', `/', and `<<EOF>>' operators cannot be used in a `flex' 3741 definition. 3742 3743 * Using `-l' results in the `lex' behavior of no parentheses around 3744 the definition. 3745 3746 * The POSIX specification is that the definition be enclosed in 3747 parentheses. 3748 3749 * Some implementations of `lex' allow a rule's action to begin on a 3750 separate line, if the rule's pattern has trailing whitespace: 3751 3752 3753 %% 3754 foo|bar<space here> 3755 { foobar_action();} 3756 3757 `flex' does not support this feature. 3758 3759 * The `lex' `%r' (generate a Ratfor scanner) option is not 3760 supported. It is not part of the POSIX specification. 3761 3762 * After a call to `unput()', _yytext_ is undefined until the next 3763 token is matched, unless the scanner was built using `%array'. 3764 This is not the case with `lex' or the POSIX specification. The 3765 `-l' option does away with this incompatibility. 3766 3767 * The precedence of the `{,}' (numeric range) operator is different. 3768 The AT&T and POSIX specifications of `lex' interpret `abc{1,3}' 3769 as match one, two, or three occurrences of `abc'", whereas `flex' 3770 interprets it as "match `ab' followed by one, two, or three 3771 occurrences of `c'". The `-l' and `--posix' options do away with 3772 this incompatibility. 3773 3774 * The precedence of the `^' operator is different. `lex' interprets 3775 `^foo|bar' as "match either 'foo' at the beginning of a line, or 3776 'bar' anywhere", whereas `flex' interprets it as "match either 3777 `foo' or `bar' if they come at the beginning of a line". The 3778 latter is in agreement with the POSIX specification. 3779 3780 * The special table-size declarations such as `%a' supported by 3781 `lex' are not required by `flex' scanners.. `flex' ignores them. 3782 3783 * The name `FLEX_SCANNER' is `#define''d so scanners may be written 3784 for use with either `flex' or `lex'. Scanners also include 3785 `YY_FLEX_MAJOR_VERSION', `YY_FLEX_MINOR_VERSION' and 3786 `YY_FLEX_SUBMINOR_VERSION' indicating which version of `flex' 3787 generated the scanner. For example, for the 2.5.22 release, these 3788 defines would be 2, 5 and 22 respectively. If the version of 3789 `flex' being used is a beta version, then the symbol `FLEX_BETA' 3790 is defined. 3791 3792 * The symbols `[[' and `]]' in the code sections of the input may 3793 conflict with the m4 delimiters. *Note M4 Dependency::. 3794 3795 3796 The following `flex' features are not included in `lex' or the POSIX 3797specification: 3798 3799 * C++ scanners 3800 3801 * %option 3802 3803 * start condition scopes 3804 3805 * start condition stacks 3806 3807 * interactive/non-interactive scanners 3808 3809 * yy_scan_string() and friends 3810 3811 * yyterminate() 3812 3813 * yy_set_interactive() 3814 3815 * yy_set_bol() 3816 3817 * YY_AT_BOL() <<EOF>> 3818 3819 * <*> 3820 3821 * YY_DECL 3822 3823 * YY_START 3824 3825 * YY_USER_ACTION 3826 3827 * YY_USER_INIT 3828 3829 * #line directives 3830 3831 * %{}'s around actions 3832 3833 * reentrant C API 3834 3835 * multiple actions on a line 3836 3837 * almost all of the `flex' command-line options 3838 3839 The feature "multiple actions on a line" refers to the fact that 3840with `flex' you can put multiple actions on the same line, separated 3841with semi-colons, while with `lex', the following: 3842 3843 3844 foo handle_foo(); ++num_foos_seen; 3845 3846 is (rather surprisingly) truncated to 3847 3848 3849 foo handle_foo(); 3850 3851 `flex' does not truncate the action. Actions that are not enclosed 3852in braces are simply terminated at the end of the line. 3853 3854 3855File: flex.info, Node: Memory Management, Next: Serialized Tables, Prev: Lex and Posix, Up: Top 3856 385721 Memory Management 3858******************** 3859 3860This chapter describes how flex handles dynamic memory, and how you can 3861override the default behavior. 3862 3863* Menu: 3864 3865* The Default Memory Management:: 3866* Overriding The Default Memory Management:: 3867* A Note About yytext And Memory:: 3868 3869 3870File: flex.info, Node: The Default Memory Management, Next: Overriding The Default Memory Management, Prev: Memory Management, Up: Memory Management 3871 387221.1 The Default Memory Management 3873================================== 3874 3875Flex allocates dynamic memory during initialization, and once in a 3876while from within a call to yylex(). Initialization takes place during 3877the first call to yylex(). Thereafter, flex may reallocate more memory 3878if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up 3879all memory when you call `yylex_destroy' *Note faq-memory-leak::. 3880 3881 Flex allocates dynamic memory for four purposes, listed below (1) 3882 388316kB for the input buffer. 3884 Flex allocates memory for the character buffer used to perform 3885 pattern matching. Flex must read ahead from the input stream and 3886 store it in a large character buffer. This buffer is typically 3887 the largest chunk of dynamic memory flex consumes. This buffer 3888 will grow if necessary, doubling the size each time. Flex frees 3889 this memory when you call yylex_destroy(). The default size of 3890 this buffer (16384 bytes) is almost always too large. The ideal 3891 size for this buffer is the length of the longest token expected, 3892 in bytes, plus a little more. Flex will allocate a few extra 3893 bytes for housekeeping. Currently, to override the size of the 3894 input buffer you must `#define YY_BUF_SIZE' to whatever number of 3895 bytes you want. We don't plan to change this in the near future, 3896 but we reserve the right to do so if we ever add a more robust 3897 memory management API. 3898 389964kb for the REJECT state. This will only be allocated if you use REJECT. 3900 The size is the large enough to hold the same number of states as 3901 characters in the input buffer. If you override the size of the 3902 input buffer (via `YY_BUF_SIZE'), then you automatically override 3903 the size of this buffer as well. 3904 3905100 bytes for the start condition stack. 3906 Flex allocates memory for the start condition stack. This is the 3907 stack used for pushing start states, i.e., with yy_push_state(). 3908 It will grow if necessary. Since the states are simply integers, 3909 this stack doesn't consume much memory. This stack is not present 3910 if `%option stack' is not specified. You will rarely need to tune 3911 this buffer. The ideal size for this stack is the maximum depth 3912 expected. The memory for this stack is automatically destroyed 3913 when you call yylex_destroy(). *Note option-stack::. 3914 391540 bytes for each YY_BUFFER_STATE. 3916 Flex allocates memory for each YY_BUFFER_STATE. The buffer state 3917 itself is about 40 bytes, plus an additional large character 3918 buffer (described above.) The initial buffer state is created 3919 during initialization, and with each call to yy_create_buffer(). 3920 You can't tune the size of this, but you can tune the character 3921 buffer as described above. Any buffer state that you explicitly 3922 create by calling yy_create_buffer() is _NOT_ destroyed 3923 automatically. You must call yy_delete_buffer() to free the 3924 memory. The exception to this rule is that flex will delete the 3925 current buffer automatically when you call yylex_destroy(). If you 3926 delete the current buffer, be sure to set it to NULL. That way, 3927 flex will not try to delete the buffer a second time (possibly 3928 crashing your program!) At the time of this writing, flex does not 3929 provide a growable stack for the buffer states. You have to 3930 manage that yourself. *Note Multiple Input Buffers::. 3931 393284 bytes for the reentrant scanner guts 3933 Flex allocates about 84 bytes for the reentrant scanner structure 3934 when you call yylex_init(). It is destroyed when the user calls 3935 yylex_destroy(). 3936 3937 3938 ---------- Footnotes ---------- 3939 3940 (1) The quantities given here are approximate, and may vary due to 3941host architecture, compiler configuration, or due to future 3942enhancements to flex. 3943 3944 3945File: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management 3946 394721.2 Overriding The Default Memory Management 3948============================================= 3949 3950Flex calls the functions `yyalloc', `yyrealloc', and `yyfree' when it 3951needs to allocate or free memory. By default, these functions are 3952wrappers around the standard C functions, `malloc', `realloc', and 3953`free', respectively. You can override the default implementations by 3954telling flex that you will provide your own implementations. 3955 3956 To override the default implementations, you must do two things: 3957 3958 1. Suppress the default implementations by specifying one or more of 3959 the following options: 3960 3961 * `%option noyyalloc' 3962 3963 * `%option noyyrealloc' 3964 3965 * `%option noyyfree'. 3966 3967 2. Provide your own implementation of the following functions: (1) 3968 3969 3970 // For a non-reentrant scanner 3971 void * yyalloc (size_t bytes); 3972 void * yyrealloc (void * ptr, size_t bytes); 3973 void yyfree (void * ptr); 3974 3975 // For a reentrant scanner 3976 void * yyalloc (size_t bytes, void * yyscanner); 3977 void * yyrealloc (void * ptr, size_t bytes, void * yyscanner); 3978 void yyfree (void * ptr, void * yyscanner); 3979 3980 3981 In the following example, we will override all three memory 3982routines. We assume that there is a custom allocator with garbage 3983collection. In order to make this example interesting, we will use a 3984reentrant scanner, passing a pointer to the custom allocator through 3985`yyextra'. 3986 3987 3988 %{ 3989 #include "some_allocator.h" 3990 %} 3991 3992 /* Suppress the default implementations. */ 3993 %option noyyalloc noyyrealloc noyyfree 3994 %option reentrant 3995 3996 /* Initialize the allocator. */ 3997 #define YY_EXTRA_TYPE struct allocator* 3998 #define YY_USER_INIT yyextra = allocator_create(); 3999 4000 %% 4001 .|\n ; 4002 %% 4003 4004 /* Provide our own implementations. */ 4005 void * yyalloc (size_t bytes, void* yyscanner) { 4006 return allocator_alloc (yyextra, bytes); 4007 } 4008 4009 void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) { 4010 return allocator_realloc (yyextra, bytes); 4011 } 4012 4013 void yyfree (void * ptr, void * yyscanner) { 4014 /* Do nothing -- we leave it to the garbage collector. */ 4015 } 4016 4017 ---------- Footnotes ---------- 4018 4019 (1) It is not necessary to override all (or any) of the memory 4020management routines. You may, for example, override `yyrealloc', but 4021not `yyfree' or `yyalloc'. 4022 4023 4024File: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management 4025 402621.3 A Note About yytext And Memory 4027=================================== 4028 4029When flex finds a match, `yytext' points to the first character of the 4030match in the input buffer. The string itself is part of the input 4031buffer, and is _NOT_ allocated separately. The value of yytext will be 4032overwritten the next time yylex() is called. In short, the value of 4033yytext is only valid from within the matched rule's action. 4034 4035 Often, you want the value of yytext to persist for later processing, 4036i.e., by a parser with non-zero lookahead. In order to preserve yytext, 4037you will have to copy it with strdup() or a similar function. But this 4038introduces some headache because your parser is now responsible for 4039freeing the copy of yytext. If you use a yacc or bison parser, 4040(commonly used with flex), you will discover that the error recovery 4041mechanisms can cause memory to be leaked. 4042 4043 To prevent memory leaks from strdup'd yytext, you will have to track 4044the memory somehow. Our experience has shown that a garbage collection 4045mechanism or a pooled memory mechanism will save you a lot of grief 4046when writing parsers. 4047 4048 4049File: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top 4050 405122 Serialized Tables 4052******************** 4053 4054A `flex' scanner has the ability to save the DFA tables to a file, and 4055load them at runtime when needed. The motivation for this feature is 4056to reduce the runtime memory footprint. Traditionally, these tables 4057have been compiled into the scanner as C arrays, and are sometimes 4058quite large. Since the tables are compiled into the scanner, the 4059memory used by the tables can never be freed. This is a waste of 4060memory, especially if an application uses several scanners, but none of 4061them at the same time. 4062 4063 The serialization feature allows the tables to be loaded at runtime, 4064before scanning begins. The tables may be discarded when scanning is 4065finished. 4066 4067* Menu: 4068 4069* Creating Serialized Tables:: 4070* Loading and Unloading Serialized Tables:: 4071* Tables File Format:: 4072 4073 4074File: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables 4075 407622.1 Creating Serialized Tables 4077=============================== 4078 4079You may create a scanner with serialized tables by specifying: 4080 4081 4082 %option tables-file=FILE 4083 or 4084 --tables-file=FILE 4085 4086 These options instruct flex to save the DFA tables to the file FILE. 4087The tables will _not_ be embedded in the generated scanner. The scanner 4088will not function on its own. The scanner will be dependent upon the 4089serialized tables. You must load the tables from this file at runtime 4090before you can scan anything. 4091 4092 If you do not specify a filename to `--tables-file', the tables will 4093be saved to `lex.yy.tables', where `yy' is the appropriate prefix. 4094 4095 If your project uses several different scanners, you can concatenate 4096the serialized tables into one file, and flex will find the correct set 4097of tables, using the scanner prefix as part of the lookup key. An 4098example follows: 4099 4100 4101 $ flex --tables-file --prefix=cpp cpp.l 4102 $ flex --tables-file --prefix=c c.l 4103 $ cat lex.cpp.tables lex.c.tables > all.tables 4104 4105 The above example created two scanners, `cpp', and `c'. Since we did 4106not specify a filename, the tables were serialized to `lex.c.tables' and 4107`lex.cpp.tables', respectively. Then, we concatenated the two files 4108together into `all.tables', which we will distribute with our project. 4109At runtime, we will open the file and tell flex to load the tables from 4110it. Flex will find the correct tables automatically. (See next 4111section). 4112 4113 4114File: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables 4115 411622.2 Loading and Unloading Serialized Tables 4117============================================ 4118 4119If you've built your scanner with `%option tables-file', then you must 4120load the scanner tables at runtime. This can be accomplished with the 4121following function: 4122 4123 -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER]) 4124 Locates scanner tables in the stream pointed to by FP and loads 4125 them. Memory for the tables is allocated via `yyalloc'. You must 4126 call this function before the first call to `yylex'. The argument 4127 SCANNER only appears in the reentrant scanner. This function 4128 returns `0' (zero) on success, or non-zero on error. 4129 4130 The loaded tables are *not* automatically destroyed (unloaded) when 4131you call `yylex_destroy'. The reason is that you may create several 4132scanners of the same type (in a reentrant scanner), each of which needs 4133access to these tables. To avoid a nasty memory leak, you must call 4134the following function: 4135 4136 -- Function: int yytables_destroy ([yyscan_t SCANNER]) 4137 Unloads the scanner tables. The tables must be loaded again before 4138 you can scan any more data. The argument SCANNER only appears in 4139 the reentrant scanner. This function returns `0' (zero) on 4140 success, or non-zero on error. 4141 4142 *The functions `yytables_fload' and `yytables_destroy' are not 4143thread-safe.* You must ensure that these functions are called exactly 4144once (for each scanner type) in a threaded program, before any thread 4145calls `yylex'. After the tables are loaded, they are never written to, 4146and no thread protection is required thereafter - until you destroy 4147them. 4148 4149 4150File: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables 4151 415222.3 Tables File Format 4153======================= 4154 4155This section defines the file format of serialized `flex' tables. 4156 4157 The tables format allows for one or more sets of tables to be 4158specified, where each set corresponds to a given scanner. Scanners are 4159indexed by name, as described below. The file format is as follows: 4160 4161 4162 TABLE SET 1 4163 +-------------------------------+ 4164 Header | uint32 th_magic; | 4165 | uint32 th_hsize; | 4166 | uint32 th_ssize; | 4167 | uint16 th_flags; | 4168 | char th_version[]; | 4169 | char th_name[]; | 4170 | uint8 th_pad64[]; | 4171 +-------------------------------+ 4172 Table 1 | uint16 td_id; | 4173 | uint16 td_flags; | 4174 | uint32 td_lolen; | 4175 | uint32 td_hilen; | 4176 | void td_data[]; | 4177 | uint8 td_pad64[]; | 4178 +-------------------------------+ 4179 Table 2 | | 4180 . . . 4181 . . . 4182 . . . 4183 . . . 4184 Table n | | 4185 +-------------------------------+ 4186 TABLE SET 2 4187 . 4188 . 4189 . 4190 TABLE SET N 4191 4192 The above diagram shows that a complete set of tables consists of a 4193header followed by multiple individual tables. Furthermore, multiple 4194complete sets may be present in the same file, each set with its own 4195header and tables. The sets are contiguous in the file. The only way to 4196know if another set follows is to check the next four bytes for the 4197magic number (or check for EOF). The header and tables sections are 4198padded to 64-bit boundaries. Below we describe each field in detail. 4199This format does not specify how the scanner will expand the given 4200data, i.e., data may be serialized as int8, but expanded to an int32 4201array at runtime. This is to reduce the size of the serialized data 4202where possible. Remember, _all integer values are in network byte 4203order_. 4204 4205Fields of a table header: 4206 4207`th_magic' 4208 Magic number, always 0xF13C57B1. 4209 4210`th_hsize' 4211 Size of this entire header, in bytes, including all fields plus 4212 any padding. 4213 4214`th_ssize' 4215 Size of this entire set, in bytes, including the header, all 4216 tables, plus any padding. 4217 4218`th_flags' 4219 Bit flags for this table set. Currently unused. 4220 4221`th_version[]' 4222 Flex version in NULL-terminated string format. e.g., `2.5.13a'. 4223 This is the version of flex that was used to create the serialized 4224 tables. 4225 4226`th_name[]' 4227 Contains the name of this table set. The default is `yytables', 4228 and is prefixed accordingly, e.g., `footables'. Must be 4229 NULL-terminated. 4230 4231`th_pad64[]' 4232 Zero or more NULL bytes, padding the entire header to the next 4233 64-bit boundary as calculated from the beginning of the header. 4234 4235Fields of a table: 4236 4237`td_id' 4238 Specifies the table identifier. Possible values are: 4239 `YYTD_ID_ACCEPT (0x01)' 4240 `yy_accept' 4241 4242 `YYTD_ID_BASE (0x02)' 4243 `yy_base' 4244 4245 `YYTD_ID_CHK (0x03)' 4246 `yy_chk' 4247 4248 `YYTD_ID_DEF (0x04)' 4249 `yy_def' 4250 4251 `YYTD_ID_EC (0x05)' 4252 `yy_ec ' 4253 4254 `YYTD_ID_META (0x06)' 4255 `yy_meta' 4256 4257 `YYTD_ID_NUL_TRANS (0x07)' 4258 `yy_NUL_trans' 4259 4260 `YYTD_ID_NXT (0x08)' 4261 `yy_nxt'. This array may be two dimensional. See the 4262 `td_hilen' field below. 4263 4264 `YYTD_ID_RULE_CAN_MATCH_EOL (0x09)' 4265 `yy_rule_can_match_eol' 4266 4267 `YYTD_ID_START_STATE_LIST (0x0A)' 4268 `yy_start_state_list'. This array is handled specially 4269 because it is an array of pointers to structs. See the 4270 `td_flags' field below. 4271 4272 `YYTD_ID_TRANSITION (0x0B)' 4273 `yy_transition'. This array is handled specially because it 4274 is an array of structs. See the `td_lolen' field below. 4275 4276 `YYTD_ID_ACCLIST (0x0C)' 4277 `yy_acclist' 4278 4279`td_flags' 4280 Bit flags describing how to interpret the data in `td_data'. The 4281 data arrays are one-dimensional by default, but may be two 4282 dimensional as specified in the `td_hilen' field. 4283 4284 `YYTD_DATA8 (0x01)' 4285 The data is serialized as an array of type int8. 4286 4287 `YYTD_DATA16 (0x02)' 4288 The data is serialized as an array of type int16. 4289 4290 `YYTD_DATA32 (0x04)' 4291 The data is serialized as an array of type int32. 4292 4293 `YYTD_PTRANS (0x08)' 4294 The data is a list of indexes of entries in the expanded 4295 `yy_transition' array. Each index should be expanded to a 4296 pointer to the corresponding entry in the `yy_transition' 4297 array. We count on the fact that the `yy_transition' array 4298 has already been seen. 4299 4300 `YYTD_STRUCT (0x10)' 4301 The data is a list of yy_trans_info structs, each of which 4302 consists of two integers. There is no padding between struct 4303 elements or between structs. The type of each member is 4304 determined by the `YYTD_DATA*' bits. 4305 4306`td_lolen' 4307 Specifies the number of elements in the lowest dimension array. If 4308 this is a one-dimensional array, then it is simply the number of 4309 elements in this array. The element size is determined by the 4310 `td_flags' field. 4311 4312`td_hilen' 4313 If `td_hilen' is non-zero, then the data is a two-dimensional 4314 array. Otherwise, the data is a one-dimensional array. `td_hilen' 4315 contains the number of elements in the higher dimensional array, 4316 and `td_lolen' contains the number of elements in the lowest 4317 dimension. 4318 4319 Conceptually, `td_data' is either `sometype td_data[td_lolen]', or 4320 `sometype td_data[td_hilen][td_lolen]', where `sometype' is 4321 specified by the `td_flags' field. It is possible for both 4322 `td_lolen' and `td_hilen' to be zero, in which case `td_data' is a 4323 zero length array, and no data is loaded, i.e., this table is 4324 simply skipped. Flex does not currently generate tables of zero 4325 length. 4326 4327`td_data[]' 4328 The table data. This array may be a one- or two-dimensional array, 4329 of type `int8', `int16', `int32', `struct yy_trans_info', or 4330 `struct yy_trans_info*', depending upon the values in the 4331 `td_flags', `td_lolen', and `td_hilen' fields. 4332 4333`td_pad64[]' 4334 Zero or more NULL bytes, padding the entire table to the next 4335 64-bit boundary as calculated from the beginning of this table. 4336 4337 4338File: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top 4339 434023 Diagnostics 4341************** 4342 4343The following is a list of `flex' diagnostic messages: 4344 4345 * `warning, rule cannot be matched' indicates that the given rule 4346 cannot be matched because it follows other rules that will always 4347 match the same text as it. For example, in the following `foo' 4348 cannot be matched because it comes after an identifier "catch-all" 4349 rule: 4350 4351 4352 [a-z]+ got_identifier(); 4353 foo got_foo(); 4354 4355 Using `REJECT' in a scanner suppresses this warning. 4356 4357 * `warning, -s option given but default rule can be matched' means 4358 that it is possible (perhaps only in a particular start condition) 4359 that the default rule (match any single character) is the only one 4360 that will match a particular input. Since `-s' was given, 4361 presumably this is not intended. 4362 4363 * `reject_used_but_not_detected undefined' or 4364 `yymore_used_but_not_detected undefined'. These errors can occur 4365 at compile time. They indicate that the scanner uses `REJECT' or 4366 `yymore()' but that `flex' failed to notice the fact, meaning that 4367 `flex' scanned the first two sections looking for occurrences of 4368 these actions and failed to find any, but somehow you snuck some in 4369 (via a #include file, for example). Use `%option reject' or 4370 `%option yymore' to indicate to `flex' that you really do use 4371 these features. 4372 4373 * `flex scanner jammed'. a scanner compiled with `-s' has 4374 encountered an input string which wasn't matched by any of its 4375 rules. This error can also occur due to internal problems. 4376 4377 * `token too large, exceeds YYLMAX'. your scanner uses `%array' and 4378 one of its rules matched a string longer than the `YYLMAX' 4379 constant (8K bytes by default). You can increase the value by 4380 #define'ing `YYLMAX' in the definitions section of your `flex' 4381 input. 4382 4383 * `scanner requires -8 flag to use the character 'x''. Your scanner 4384 specification includes recognizing the 8-bit character `'x'' and 4385 you did not specify the -8 flag, and your scanner defaulted to 4386 7-bit because you used the `-Cf' or `-CF' table compression 4387 options. See the discussion of the `-7' flag, *Note Scanner 4388 Options::, for details. 4389 4390 * `flex scanner push-back overflow'. you used `unput()' to push back 4391 so much text that the scanner's buffer could not hold both the 4392 pushed-back text and the current token in `yytext'. Ideally the 4393 scanner should dynamically resize the buffer in this case, but at 4394 present it does not. 4395 4396 * `input buffer overflow, can't enlarge buffer because scanner uses 4397 REJECT'. the scanner was working on matching an extremely large 4398 token and needed to expand the input buffer. This doesn't work 4399 with scanners that use `REJECT'. 4400 4401 * `fatal flex scanner internal error--end of buffer missed'. This can 4402 occur in a scanner which is reentered after a long-jump has jumped 4403 out (or over) the scanner's activation frame. Before reentering 4404 the scanner, use: 4405 4406 yyrestart( yyin ); 4407 or, as noted above, switch to using the C++ scanner class. 4408 4409 * `too many start conditions in <> construct!' you listed more start 4410 conditions in a <> construct than exist (so you must have listed at 4411 least one of them twice). 4412 4413 4414File: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top 4415 441624 Limitations 4417************** 4418 4419Some trailing context patterns cannot be properly matched and generate 4420warning messages (`dangerous trailing context'). These are patterns 4421where the ending of the first part of the rule matches the beginning of 4422the second part, such as `zx*/xy*', where the 'x*' matches the 'x' at 4423the beginning of the trailing context. (Note that the POSIX draft 4424states that the text matched by such patterns is undefined.) For some 4425trailing context rules, parts which are actually fixed-length are not 4426recognized as such, leading to the abovementioned performance loss. In 4427particular, parts using `|' or `{n}' (such as `foo{3}') are always 4428considered variable-length. Combining trailing context with the 4429special `|' action can result in _fixed_ trailing context being turned 4430into the more expensive _variable_ trailing context. For example, in 4431the following: 4432 4433 4434 %% 4435 abc | 4436 xyz/def 4437 4438 Use of `unput()' invalidates yytext and yyleng, unless the `%array' 4439directive or the `-l' option has been used. Pattern-matching of `NUL's 4440is substantially slower than matching other characters. Dynamic 4441resizing of the input buffer is slow, as it entails rescanning all the 4442text matched so far by the current (generally huge) token. Due to both 4443buffering of input and read-ahead, you cannot intermix calls to 4444`<stdio.h>' routines, such as, getchar(), with `flex' rules and expect 4445it to work. Call `input()' instead. The total table entries listed by 4446the `-v' flag excludes the number of table entries needed to determine 4447what rule has been matched. The number of entries is equal to the 4448number of DFA states if the scanner does not use `REJECT', and somewhat 4449greater than the number of states if it does. `REJECT' cannot be used 4450with the `-f' or `-F' options. 4451 4452 The `flex' internal algorithms need documentation. 4453 4454 4455File: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top 4456 445725 Additional Reading 4458********************* 4459 4460You may wish to read more about the following programs: 4461 * lex 4462 4463 * yacc 4464 4465 * sed 4466 4467 * awk 4468 4469 The following books may contain material of interest: 4470 4471 John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and 4472Associates. Be sure to get the 2nd edition. 4473 4474 M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_ 4475 4476 Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles, 4477Techniques and Tools_, Addison-Wesley (1986). Describes the 4478pattern-matching techniques used by `flex' (deterministic finite 4479automata). 4480 4481 4482File: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top 4483 4484FAQ 4485*** 4486 4487From time to time, the `flex' maintainer receives certain questions. 4488Rather than repeat answers to well-understood problems, we publish them 4489here. 4490 4491* Menu: 4492 4493* When was flex born?:: 4494* How do I expand backslash-escape sequences in C-style quoted strings?:: 4495* Why do flex scanners call fileno if it is not ANSI compatible?:: 4496* Does flex support recursive pattern definitions?:: 4497* How do I skip huge chunks of input (tens of megabytes) while using flex?:: 4498* Flex is not matching my patterns in the same order that I defined them.:: 4499* My actions are executing out of order or sometimes not at all.:: 4500* How can I have multiple input sources feed into the same scanner at the same time?:: 4501* Can I build nested parsers that work with the same input file?:: 4502* How can I match text only at the end of a file?:: 4503* How can I make REJECT cascade across start condition boundaries?:: 4504* Why cant I use fast or full tables with interactive mode?:: 4505* How much faster is -F or -f than -C?:: 4506* If I have a simple grammar cant I just parse it with flex?:: 4507* Why doesn't yyrestart() set the start state back to INITIAL?:: 4508* How can I match C-style comments?:: 4509* The period isn't working the way I expected.:: 4510* Can I get the flex manual in another format?:: 4511* Does there exist a "faster" NDFA->DFA algorithm?:: 4512* How does flex compile the DFA so quickly?:: 4513* How can I use more than 8192 rules?:: 4514* How do I abandon a file in the middle of a scan and switch to a new file?:: 4515* How do I execute code only during initialization (only before the first scan)?:: 4516* How do I execute code at termination?:: 4517* Where else can I find help?:: 4518* Can I include comments in the "rules" section of the file?:: 4519* I get an error about undefined yywrap().:: 4520* How can I change the matching pattern at run time?:: 4521* How can I expand macros in the input?:: 4522* How can I build a two-pass scanner?:: 4523* How do I match any string not matched in the preceding rules?:: 4524* I am trying to port code from AT&T lex that uses yysptr and yysbuf.:: 4525* Is there a way to make flex treat NULL like a regular character?:: 4526* Whenever flex can not match the input it says "flex scanner jammed".:: 4527* Why doesn't flex have non-greedy operators like perl does?:: 4528* Memory leak - 16386 bytes allocated by malloc.:: 4529* How do I track the byte offset for lseek()?:: 4530* How do I use my own I/O classes in a C++ scanner?:: 4531* How do I skip as many chars as possible?:: 4532* deleteme00:: 4533* Are certain equivalent patterns faster than others?:: 4534* Is backing up a big deal?:: 4535* Can I fake multi-byte character support?:: 4536* deleteme01:: 4537* Can you discuss some flex internals?:: 4538* unput() messes up yy_at_bol:: 4539* The | operator is not doing what I want:: 4540* Why can't flex understand this variable trailing context pattern?:: 4541* The ^ operator isn't working:: 4542* Trailing context is getting confused with trailing optional patterns:: 4543* Is flex GNU or not?:: 4544* ERASEME53:: 4545* I need to scan if-then-else blocks and while loops:: 4546* ERASEME55:: 4547* ERASEME56:: 4548* ERASEME57:: 4549* Is there a repository for flex scanners?:: 4550* How can I conditionally compile or preprocess my flex input file?:: 4551* Where can I find grammars for lex and yacc?:: 4552* I get an end-of-buffer message for each character scanned.:: 4553* unnamed-faq-62:: 4554* unnamed-faq-63:: 4555* unnamed-faq-64:: 4556* unnamed-faq-65:: 4557* unnamed-faq-66:: 4558* unnamed-faq-67:: 4559* unnamed-faq-68:: 4560* unnamed-faq-69:: 4561* unnamed-faq-70:: 4562* unnamed-faq-71:: 4563* unnamed-faq-72:: 4564* unnamed-faq-73:: 4565* unnamed-faq-74:: 4566* unnamed-faq-75:: 4567* unnamed-faq-76:: 4568* unnamed-faq-77:: 4569* unnamed-faq-78:: 4570* unnamed-faq-79:: 4571* unnamed-faq-80:: 4572* unnamed-faq-81:: 4573* unnamed-faq-82:: 4574* unnamed-faq-83:: 4575* unnamed-faq-84:: 4576* unnamed-faq-85:: 4577* unnamed-faq-86:: 4578* unnamed-faq-87:: 4579* unnamed-faq-88:: 4580* unnamed-faq-90:: 4581* unnamed-faq-91:: 4582* unnamed-faq-92:: 4583* unnamed-faq-93:: 4584* unnamed-faq-94:: 4585* unnamed-faq-95:: 4586* unnamed-faq-96:: 4587* unnamed-faq-97:: 4588* unnamed-faq-98:: 4589* unnamed-faq-99:: 4590* unnamed-faq-100:: 4591* unnamed-faq-101:: 4592* What is the difference between YYLEX_PARAM and YY_DECL?:: 4593* Why do I get "conflicting types for yylex" error?:: 4594* How do I access the values set in a Flex action from within a Bison action?:: 4595 4596 4597File: flex.info, Node: When was flex born?, Next: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ 4598 4599When was flex born? 4600=================== 4601 4602Vern Paxson took over the `Software Tools' lex project from Jef 4603Poskanzer in 1982. At that point it was written in Ratfor. Around 46041987 or so, Paxson translated it into C, and a legend was born :-). 4605 4606 4607File: flex.info, Node: How do I expand backslash-escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ 4608 4609How do I expand backslash-escape sequences in C-style quoted strings? 4610===================================================================== 4611 4612A key point when scanning quoted strings is that you cannot (easily) 4613write a single rule that will precisely match the string if you allow 4614things like embedded escape sequences and newlines. If you try to 4615match strings with a single rule then you'll wind up having to rescan 4616the string anyway to find any escape sequences. 4617 4618 Instead you can use exclusive start conditions and a set of rules, 4619one for matching non-escaped text, one for matching a single escape, 4620one for matching an embedded newline, and one for recognizing the end 4621of the string. Each of these rules is then faced with the question of 4622where to put its intermediary results. The best solution is for the 4623rules to append their local value of `yytext' to the end of a "string 4624literal" buffer. A rule like the escape-matcher will append to the 4625buffer the meaning of the escape sequence rather than the literal text 4626in `yytext'. In this way, `yytext' does not need to be modified at all. 4627 4628 4629File: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ 4630 4631Why do flex scanners call fileno if it is not ANSI compatible? 4632============================================================== 4633 4634Flex scanners call `fileno()' in order to get the file descriptor 4635corresponding to `yyin'. The file descriptor may be passed to 4636`isatty()' or `read()', depending upon which `%options' you specified. 4637If your system does not have `fileno()' support, to get rid of the 4638`read()' call, do not specify `%option read'. To get rid of the 4639`isatty()' call, you must specify one of `%option always-interactive' or 4640`%option never-interactive'. 4641 4642 4643File: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ 4644 4645Does flex support recursive pattern definitions? 4646================================================ 4647 4648e.g., 4649 4650 4651 %% 4652 block "{"({block}|{statement})*"}" 4653 4654 No. You cannot have recursive definitions. The pattern-matching 4655power of regular expressions in general (and therefore flex scanners, 4656too) is limited. In particular, regular expressions cannot "balance" 4657parentheses to an arbitrary degree. For example, it's impossible to 4658write a regular expression that matches all strings containing the same 4659number of '{'s as '}'s. For more powerful pattern matching, you need a 4660parser, such as `GNU bison'. 4661 4662 4663File: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ 4664 4665How do I skip huge chunks of input (tens of megabytes) while using flex? 4666======================================================================== 4667 4668Use `fseek()' (or `lseek()') to position yyin, then call `yyrestart()'. 4669 4670 4671File: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ 4672 4673Flex is not matching my patterns in the same order that I defined them. 4674======================================================================= 4675 4676`flex' picks the rule that matches the most text (i.e., the longest 4677possible input string). This is because `flex' uses an entirely 4678different matching technique ("deterministic finite automata") that 4679actually does all of the matching simultaneously, in parallel. (Seems 4680impossible, but it's actually a fairly simple technique once you 4681understand the principles.) 4682 4683 A side-effect of this parallel matching is that when the input 4684matches more than one rule, `flex' scanners pick the rule that matched 4685the _most_ text. This is explained further in the manual, in the 4686section *Note Matching::. 4687 4688 If you want `flex' to choose a shorter match, then you can work 4689around this behavior by expanding your short rule to match more text, 4690then put back the extra: 4691 4692 4693 data_.* yyless( 5 ); BEGIN BLOCKIDSTATE; 4694 4695 Another fix would be to make the second rule active only during the 4696`<BLOCKIDSTATE>' start condition, and make that start condition 4697exclusive by declaring it with `%x' instead of `%s'. 4698 4699 A final fix is to change the input language so that the ambiguity for 4700`data_' is removed, by adding characters to it that don't match the 4701identifier rule, or by removing characters (such as `_') from the 4702identifier rule so it no longer matches `data_'. (Of course, you might 4703also not have the option of changing the input language.) 4704 4705 4706File: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ 4707 4708My actions are executing out of order or sometimes not at all. 4709============================================================== 4710 4711Most likely, you have (in error) placed the opening `{' of the action 4712block on a different line than the rule, e.g., 4713 4714 4715 ^(foo|bar) 4716 { <<<--- WRONG! 4717 4718 } 4719 4720 `flex' requires that the opening `{' of an action associated with a 4721rule begin on the same line as does the rule. You need instead to 4722write your rules as follows: 4723 4724 4725 ^(foo|bar) { // CORRECT! 4726 4727 } 4728 4729 4730File: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ 4731 4732How can I have multiple input sources feed into the same scanner at the same time? 4733================================================================================== 4734 4735If ... 4736 * your scanner is free of backtracking (verified using `flex''s `-b' 4737 flag), 4738 4739 * AND you run your scanner interactively (`-I' option; default 4740 unless using special table compression options), 4741 4742 * AND you feed it one character at a time by redefining `YY_INPUT' 4743 to do so, 4744 4745 then every time it matches a token, it will have exhausted its input 4746buffer (because the scanner is free of backtracking). This means you 4747can safely use `select()' at the point and only call `yylex()' for 4748another token if `select()' indicates there's data available. 4749 4750 That is, move the `select()' out from the input function to a point 4751where it determines whether `yylex()' gets called for the next token. 4752 4753 With this approach, you will still have problems if your input can 4754arrive piecemeal; `select()' could inform you that the beginning of a 4755token is available, you call `yylex()' to get it, but it winds up 4756blocking waiting for the later characters in the token. 4757 4758 Here's another way: Move your input multiplexing inside of 4759`YY_INPUT'. That is, whenever `YY_INPUT' is called, it `select()''s to 4760see where input is available. If input is available for the scanner, 4761it reads and returns the next byte. If input is available from another 4762source, it calls whatever function is responsible for reading from that 4763source. (If no input is available, it blocks until some input is 4764available.) I've used this technique in an interpreter I wrote that 4765both reads keyboard input using a `flex' scanner and IPC traffic from 4766sockets, and it works fine. 4767 4768 4769File: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ 4770 4771Can I build nested parsers that work with the same input file? 4772============================================================== 4773 4774This is not going to work without some additional effort. The reason is 4775that `flex' block-buffers the input it reads from `yyin'. This means 4776that the "outermost" `yylex()', when called, will automatically slurp 4777up the first 8K of input available on yyin, and subsequent calls to 4778other `yylex()''s won't see that input. You might be tempted to work 4779around this problem by redefining `YY_INPUT' to only return a small 4780amount of text, but it turns out that that approach is quite difficult. 4781Instead, the best solution is to combine all of your scanners into one 4782large scanner, using a different exclusive start condition for each. 4783 4784 4785File: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ 4786 4787How can I match text only at the end of a file? 4788=============================================== 4789 4790There is no way to write a rule which is "match this text, but only if 4791it comes at the end of the file". You can fake it, though, if you 4792happen to have a character lying around that you don't allow in your 4793input. Then you redefine `YY_INPUT' to call your own routine which, if 4794it sees an `EOF', returns the magic character first (and remembers to 4795return a real `EOF' next time it's called). Then you could write: 4796 4797 4798 <COMMENT>(.|\n)*{EOF_CHAR} /* saw comment at EOF */ 4799 4800 4801File: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ 4802 4803How can I make REJECT cascade across start condition boundaries? 4804================================================================ 4805 4806You can do this as follows. Suppose you have a start condition `A', and 4807after exhausting all of the possible matches in `<A>', you want to try 4808matches in `<INITIAL>'. Then you could use the following: 4809 4810 4811 %x A 4812 %% 4813 <A>rule_that_is_long ...; REJECT; 4814 <A>rule ...; REJECT; /* shorter rule */ 4815 <A>etc. 4816 ... 4817 <A>.|\n { 4818 /* Shortest and last rule in <A>, so 4819 * cascaded REJECTs will eventually 4820 * wind up matching this rule. We want 4821 * to now switch to the initial state 4822 * and try matching from there instead. 4823 */ 4824 yyless(0); /* put back matched text */ 4825 BEGIN(INITIAL); 4826 } 4827 4828 4829File: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ 4830 4831Why can't I use fast or full tables with interactive mode? 4832========================================================== 4833 4834One of the assumptions flex makes is that interactive applications are 4835inherently slow (they're waiting on a human after all). It has to do 4836with how the scanner detects that it must be finished scanning a token. 4837For interactive scanners, after scanning each character the current 4838state is looked up in a table (essentially) to see whether there's a 4839chance of another input character possibly extending the length of the 4840match. If not, the scanner halts. For non-interactive scanners, the 4841end-of-token test is much simpler, basically a compare with 0, so no 4842memory bus cycles. Since the test occurs in the innermost scanning 4843loop, one would like to make it go as fast as possible. 4844 4845 Still, it seems reasonable to allow the user to choose to trade off 4846a bit of performance in this area to gain the corresponding 4847flexibility. There might be another reason, though, why fast scanners 4848don't support the interactive option. 4849 4850 4851File: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ 4852 4853How much faster is -F or -f than -C? 4854==================================== 4855 4856Much faster (factor of 2-3). 4857 4858 4859File: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesn't yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ 4860 4861If I have a simple grammar can't I just parse it with flex? 4862=========================================================== 4863 4864Is your grammar recursive? That's almost always a sign that you're 4865better off using a parser/scanner rather than just trying to use a 4866scanner alone. 4867 4868 4869File: flex.info, Node: Why doesn't yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ 4870 4871Why doesn't yyrestart() set the start state back to INITIAL? 4872============================================================ 4873 4874There are two reasons. The first is that there might be programs that 4875rely on the start state not changing across file changes. The second 4876is that beginning with `flex' version 2.4, use of `yyrestart()' is no 4877longer required, so fixing the problem there doesn't solve the more 4878general problem. 4879 4880 4881File: flex.info, Node: How can I match C-style comments?, Next: The period isn't working the way I expected., Prev: Why doesn't yyrestart() set the start state back to INITIAL?, Up: FAQ 4882 4883How can I match C-style comments? 4884================================= 4885 4886You might be tempted to try something like this: 4887 4888 4889 "/*".*"*/" // WRONG! 4890 4891 or, worse, this: 4892 4893 4894 "/*"(.|\n)"*/" // WRONG! 4895 4896 The above rules will eat too much input, and blow up on things like: 4897 4898 4899 /* a comment */ do_my_thing( "oops */" ); 4900 4901 Here is one way which allows you to track line information: 4902 4903 4904 <INITIAL>{ 4905 "/*" BEGIN(IN_COMMENT); 4906 } 4907 <IN_COMMENT>{ 4908 "*/" BEGIN(INITIAL); 4909 [^*\n]+ // eat comment in chunks 4910 "*" // eat the lone star 4911 \n yylineno++; 4912 } 4913 4914 4915File: flex.info, Node: The period isn't working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ 4916 4917The '.' isn't working the way I expected. 4918========================================= 4919 4920Here are some tips for using `.': 4921 4922 * A common mistake is to place the grouping parenthesis AFTER an 4923 operator, when you really meant to place the parenthesis BEFORE 4924 the operator, e.g., you probably want this `(foo|bar)+' and NOT 4925 this `(foo|bar+)'. 4926 4927 The first pattern matches the words `foo' or `bar' any number of 4928 times, e.g., it matches the text `barfoofoobarfoo'. The second 4929 pattern matches a single instance of `foo' or a single instance of 4930 `bar' followed by one or more `r's, e.g., it matches the text 4931 `barrrr' . 4932 4933 * A `.' inside `[]''s just means a literal`.' (period), and NOT "any 4934 character except newline". 4935 4936 * Remember that `.' matches any character EXCEPT `\n' (and `EOF'). 4937 If you really want to match ANY character, including newlines, 4938 then use `(.|\n)' Beware that the regex `(.|\n)+' will match your 4939 entire input! 4940 4941 * Finally, if you want to match a literal `.' (a period), then use 4942 `[.]' or `"."' 4943 4944 4945File: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isn't working the way I expected., Up: FAQ 4946 4947Can I get the flex manual in another format? 4948============================================ 4949 4950The `flex' source distribution includes a texinfo manual. You are free 4951to convert that texinfo into whatever format you desire. The `texinfo' 4952package includes tools for conversion to a number of formats. 4953 4954 4955File: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ 4956 4957Does there exist a "faster" NDFA->DFA algorithm? 4958================================================ 4959 4960There's no way around the potential exponential running time - it can 4961take you exponential time just to enumerate all of the DFA states. In 4962practice, though, the running time is closer to linear, or sometimes 4963quadratic. 4964 4965 4966File: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ 4967 4968How does flex compile the DFA so quickly? 4969========================================= 4970 4971There are two big speed wins that `flex' uses: 4972 4973 1. It analyzes the input rules to construct equivalence classes for 4974 those characters that always make the same transitions. It then 4975 rewrites the NFA using equivalence classes for transitions instead 4976 of characters. This cuts down the NFA->DFA computation time 4977 dramatically, to the point where, for uncompressed DFA tables, the 4978 DFA generation is often I/O bound in writing out the tables. 4979 4980 2. It maintains hash values for previously computed DFA states, so 4981 testing whether a newly constructed DFA state is equivalent to a 4982 previously constructed state can be done very quickly, by first 4983 comparing hash values. 4984 4985 4986File: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ 4987 4988How can I use more than 8192 rules? 4989=================================== 4990 4991`Flex' is compiled with an upper limit of 8192 rules per scanner. If 4992you need more than 8192 rules in your scanner, you'll have to recompile 4993`flex' with the following changes in `flexdef.h': 4994 4995 4996 < #define YY_TRAILING_MASK 0x2000 4997 < #define YY_TRAILING_HEAD_MASK 0x4000 4998 -- 4999 > #define YY_TRAILING_MASK 0x20000000 5000 > #define YY_TRAILING_HEAD_MASK 0x40000000 5001 5002 This should work okay as long as your C compiler uses 32 bit 5003integers. But you might want to think about whether using such a huge 5004number of rules is the best way to solve your problem. 5005 5006 The following may also be relevant: 5007 5008 With luck, you should be able to increase the definitions in 5009flexdef.h for: 5010 5011 5012 #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 5013 #define MAXIMUM_MNS 31999 5014 #define BAD_SUBSCRIPT -32767 5015 5016 recompile everything, and it'll all work. Flex only has these 501716-bit-like values built into it because a long time ago it was 5018developed on a machine with 16-bit ints. I've given this advice to 5019others in the past but haven't heard back from them whether it worked 5020okay or not... 5021 5022 5023File: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ 5024 5025How do I abandon a file in the middle of a scan and switch to a new file? 5026========================================================================= 5027 5028Just call `yyrestart(newfile)'. Be sure to reset the start state if you 5029want a "fresh start, since `yyrestart' does NOT reset the start state 5030back to `INITIAL'. 5031 5032 5033File: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ 5034 5035How do I execute code only during initialization (only before the first scan)? 5036============================================================================== 5037 5038You can specify an initial action by defining the macro `YY_USER_INIT' 5039(though note that `yyout' may not be available at the time this macro 5040is executed). Or you can add to the beginning of your rules section: 5041 5042 5043 %% 5044 /* Must be indented! */ 5045 static int did_init = 0; 5046 5047 if ( ! did_init ){ 5048 do_my_init(); 5049 did_init = 1; 5050 } 5051 5052 5053File: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ 5054 5055How do I execute code at termination? 5056===================================== 5057 5058You can specify an action for the `<<EOF>>' rule. 5059 5060 5061File: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ 5062 5063Where else can I find help? 5064=========================== 5065 5066You can find the flex homepage on the web at 5067`http://flex.sourceforge.net/'. See that page for details about flex 5068mailing lists as well. 5069 5070 5071File: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ 5072 5073Can I include comments in the "rules" section of the file? 5074========================================================== 5075 5076Yes, just about anywhere you want to. See the manual for the specific 5077syntax. 5078 5079 5080File: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ 5081 5082I get an error about undefined yywrap(). 5083======================================== 5084 5085You must supply a `yywrap()' function of your own, or link to `libfl.a' 5086(which provides one), or use 5087 5088 5089 %option noyywrap 5090 5091 in your source to say you don't want a `yywrap()' function. 5092 5093 5094File: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ 5095 5096How can I change the matching pattern at run time? 5097================================================== 5098 5099You can't, it's compiled into a static table when flex builds the 5100scanner. 5101 5102 5103File: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ 5104 5105How can I expand macros in the input? 5106===================================== 5107 5108The best way to approach this problem is at a higher level, e.g., in 5109the parser. 5110 5111 However, you can do this using multiple input buffers. 5112 5113 5114 %% 5115 macro/[a-z]+ { 5116 /* Saw the macro "macro" followed by extra stuff. */ 5117 main_buffer = YY_CURRENT_BUFFER; 5118 expansion_buffer = yy_scan_string(expand(yytext)); 5119 yy_switch_to_buffer(expansion_buffer); 5120 } 5121 5122 <<EOF>> { 5123 if ( expansion_buffer ) 5124 { 5125 // We were doing an expansion, return to where 5126 // we were. 5127 yy_switch_to_buffer(main_buffer); 5128 yy_delete_buffer(expansion_buffer); 5129 expansion_buffer = 0; 5130 } 5131 else 5132 yyterminate(); 5133 } 5134 5135 You probably will want a stack of expansion buffers to allow nested 5136macros. From the above though hopefully the idea is clear. 5137 5138 5139File: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ 5140 5141How can I build a two-pass scanner? 5142=================================== 5143 5144One way to do it is to filter the first pass to a temporary file, then 5145process the temporary file on the second pass. You will probably see a 5146performance hit, due to all the disk I/O. 5147 5148 When you need to look ahead far forward like this, it almost always 5149means that the right solution is to build a parse tree of the entire 5150input, then walk it after the parse in order to generate the output. 5151In a sense, this is a two-pass approach, once through the text and once 5152through the parse tree, but the performance hit for the latter is 5153usually an order of magnitude smaller, since everything is already 5154classified, in binary format, and residing in memory. 5155 5156 5157File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ 5158 5159How do I match any string not matched in the preceding rules? 5160============================================================= 5161 5162One way to assign precedence, is to place the more specific rules 5163first. If two rules would match the same input (same sequence of 5164characters) then the first rule listed in the `flex' input wins, e.g., 5165 5166 5167 %% 5168 foo[a-zA-Z_]+ return FOO_ID; 5169 bar[a-zA-Z_]+ return BAR_ID; 5170 [a-zA-Z_]+ return GENERIC_ID; 5171 5172 Note that the rule `[a-zA-Z_]+' must come *after* the others. It 5173will match the same amount of text as the more specific rules, and in 5174that case the `flex' scanner will pick the first rule listed in your 5175scanner as the one to match. 5176 5177 5178File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ 5179 5180I am trying to port code from AT&T lex that uses yysptr and yysbuf. 5181=================================================================== 5182 5183Those are internal variables pointing into the AT&T scanner's input 5184buffer. I imagine they're being manipulated in user versions of the 5185`input()' and `unput()' functions. If so, what you need to do is 5186analyze those functions to figure out what they're doing, and then 5187replace `input()' with an appropriate definition of `YY_INPUT'. You 5188shouldn't need to (and must not) replace `flex''s `unput()' function. 5189 5190 5191File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ 5192 5193Is there a way to make flex treat NULL like a regular character? 5194================================================================ 5195 5196Yes, `\0' and `\x00' should both do the trick. Perhaps you have an 5197ancient version of `flex'. The latest release is version 2.5.35. 5198 5199 5200File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesn't flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ 5201 5202Whenever flex can not match the input it says "flex scanner jammed". 5203==================================================================== 5204 5205You need to add a rule that matches the otherwise-unmatched text, e.g., 5206 5207 5208 %option yylineno 5209 %% 5210 [[a bunch of rules here]] 5211 5212 . printf("bad input character '%s' at line %d\n", yytext, yylineno); 5213 5214 See `%option default' for more information. 5215 5216 5217File: flex.info, Node: Why doesn't flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ 5218 5219Why doesn't flex have non-greedy operators like perl does? 5220========================================================== 5221 5222A DFA can do a non-greedy match by stopping the first time it enters an 5223accepting state, instead of consuming input until it determines that no 5224further matching is possible (a "jam" state). This is actually easier 5225to implement than longest leftmost match (which flex does). 5226 5227 But it's also much less useful than longest leftmost match. In 5228general, when you find yourself wishing for non-greedy matching, that's 5229usually a sign that you're trying to make the scanner do some parsing. 5230That's generally the wrong approach, since it lacks the power to do a 5231decent job. Better is to either introduce a separate parser, or to 5232split the scanner into multiple scanners using (exclusive) start 5233conditions. 5234 5235 You might have a separate start state once you've seen the `BEGIN'. 5236In that state, you might then have a regex that will match `END' (to 5237kick you out of the state), and perhaps `(.|\n)' to get a single 5238character within the chunk ... 5239 5240 This approach also has much better error-reporting properties. 5241 5242 5243File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesn't flex have non-greedy operators like perl does?, Up: FAQ 5244 5245Memory leak - 16386 bytes allocated by malloc. 5246============================================== 5247 5248UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that 5249you did not call `yylex_destroy()'. If you are using an earlier version 5250of `flex', then read on. 5251 5252 The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the 5253read-buffer, and about 40 for `struct yy_buffer_state' (depending upon 5254alignment). The leak is in the non-reentrant C scanner only (NOT in the 5255reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know 5256when you are done, the buffer is never freed. 5257 5258 However, the leak won't multiply since the buffer is reused no 5259matter how many times you call `yylex()'. 5260 5261 If you want to reclaim the memory when you are completely done 5262scanning, then you might try this: 5263 5264 5265 /* For non-reentrant C scanner only. */ 5266 yy_delete_buffer(YY_CURRENT_BUFFER); 5267 yy_init = 1; 5268 5269 Note: `yy_init' is an "internal variable", and hasn't been tested in 5270this situation. It is possible that some other globals may need 5271resetting as well. 5272 5273 5274File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ 5275 5276How do I track the byte offset for lseek()? 5277=========================================== 5278 5279 5280 > We thought that it would be possible to have this number through the 5281 > evaluation of the following expression: 5282 > 5283 > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf 5284 5285 While this is the right idea, it has two problems. The first is that 5286it's possible that `flex' will request less than `YY_READ_BUF_SIZE' 5287during an invocation of `YY_INPUT' (or that your input source will 5288return less even though `YY_READ_BUF_SIZE' bytes were requested). The 5289second problem is that when refilling its internal buffer, `flex' keeps 5290some characters from the previous buffer (because usually it's in the 5291middle of a match, and needs those characters to construct `yytext' for 5292the match once it's done). Because of this, `yy_c_buf_p - 5293YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters 5294already read from the current buffer. 5295 5296 An alternative solution is to count the number of characters you've 5297matched since starting to scan. This can be done by using 5298`YY_USER_ACTION'. For example, 5299 5300 5301 #define YY_USER_ACTION num_chars += yyleng; 5302 5303 (You need to be careful to update your bookkeeping if you use 5304`yymore('), `yyless()', `unput()', or `input()'.) 5305 5306 5307File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ 5308 5309How do I use my own I/O classes in a C++ scanner? 5310================================================= 5311 5312When the flex C++ scanning class rewrite finally happens, then this 5313sort of thing should become much easier. 5314 5315 You can do this by passing the various functions (such as 5316`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then 5317dealing with your own I/O classes surreptitiously (i.e., stashing them 5318in special member variables). This works because the only assumption 5319about the lexer regarding what's done with the iostream's is that 5320they're ultimately passed to `LexerInput()' and `LexerOutput', which 5321then do whatever is necessary with them. 5322 5323 5324File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ 5325 5326How do I skip as many chars as possible? 5327======================================== 5328 5329How do I skip as many chars as possible - without interfering with the 5330other patterns? 5331 5332 In the example below, we want to skip over characters until we see 5333the phrase "endskip". The following will _NOT_ work correctly (do you 5334see why not?) 5335 5336 5337 /* INCORRECT SCANNER */ 5338 %x SKIP 5339 %% 5340 <INITIAL>startskip BEGIN(SKIP); 5341 ... 5342 <SKIP>"endskip" BEGIN(INITIAL); 5343 <SKIP>.* ; 5344 5345 The problem is that the pattern .* will eat up the word "endskip." 5346The simplest (but slow) fix is: 5347 5348 5349 <SKIP>"endskip" BEGIN(INITIAL); 5350 <SKIP>. ; 5351 5352 The fix involves making the second rule match more, without making 5353it match "endskip" plus something else. So for example: 5354 5355 5356 <SKIP>"endskip" BEGIN(INITIAL); 5357 <SKIP>[^e]+ ; 5358 <SKIP>. ;/* so you eat up e's, too */ 5359 5360 5361File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ 5362 5363deleteme00 5364========== 5365 5366 5367 QUESTION: 5368 When was flex born? 5369 5370 Vern Paxson took over 5371 the Software Tools lex project from Jef Poskanzer in 1982. At that point it 5372 was written in Ratfor. Around 1987 or so, Paxson translated it into C, and 5373 a legend was born :-). 5374 5375 5376File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ 5377 5378Are certain equivalent patterns faster than others? 5379=================================================== 5380 5381 5382 To: Adoram Rogel <adoram@orna.hybridge.com> 5383 Subject: Re: Flex 2.5.2 performance questions 5384 In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT. 5385 Date: Wed, 18 Sep 96 10:51:02 PDT 5386 From: Vern Paxson <vern> 5387 5388 [Note, the most recent flex release is 2.5.4, which you can get from 5389 ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.] 5390 5391 > 1. Using the pattern 5392 > ([Ff](oot)?)?[Nn](ote)?(\.)? 5393 > instead of 5394 > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.))) 5395 > (in a very complicated flex program) caused the program to slow from 5396 > 300K+/min to 100K/min (no other changes were done). 5397 5398 These two are not equivalent. For example, the first can match "footnote." 5399 but the second can only match "footnote". This is almost certainly the 5400 cause in the discrepancy - the slower scanner run is matching more tokens, 5401 and/or having to do more backing up. 5402 5403 > 2. Which of these two are better: [Ff]oot or (F|f)oot ? 5404 5405 From a performance point of view, they're equivalent (modulo presumably 5406 minor effects such as memory cache hit rates; and the presence of trailing 5407 context, see below). From a space point of view, the first is slightly 5408 preferable. 5409 5410 > 3. I have a pattern that look like this: 5411 > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd) 5412 > 5413 > running yet another complicated program that includes the following rule: 5414 > <snext>{and}/{no4}{bb}{pats} 5415 > 5416 > gets me to "too complicated - over 32,000 states"... 5417 5418 I can't tell from this example whether the trailing context is variable-length 5419 or fixed-length (it could be the latter if {and} is fixed-length). If it's 5420 variable length, which flex -p will tell you, then this reflects a basic 5421 performance problem, and if you can eliminate it by restructuring your 5422 scanner, you will see significant improvement. 5423 5424 > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about 5425 > 10 patterns and changed the rule to be 5 rules. 5426 > This did compile, but what is the rule of thumb here ? 5427 5428 The rule is to avoid trailing context other than fixed-length, in which for 5429 a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use 5430 of the '|' operator automatically makes the pattern variable length, so in 5431 this case '[Ff]oot' is preferred to '(F|f)oot'. 5432 5433 > 4. I changed a rule that looked like this: 5434 > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN... 5435 > 5436 > to the next 2 rules: 5437 > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;} 5438 > <snext8>{and}{bb}/{ROMAN} { BEGIN... 5439 > 5440 > Again, I understand the using [^...] will cause a great performance loss 5441 5442 Actually, it doesn't cause any sort of performance loss. It's a surprising 5443 fact about regular expressions that they always match in linear time 5444 regardless of how complex they are. 5445 5446 > but are there any specific rules about it ? 5447 5448 See the "Performance Considerations" section of the man page, and also 5449 the example in MISC/fastwc/. 5450 5451 Vern 5452 5453 5454File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ 5455 5456Is backing up a big deal? 5457========================= 5458 5459 5460 To: Adoram Rogel <adoram@hybridge.com> 5461 Subject: Re: Flex 2.5.2 performance questions 5462 In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT. 5463 Date: Thu, 19 Sep 96 09:58:00 PDT 5464 From: Vern Paxson <vern> 5465 5466 > a lot about the backing up problem. 5467 > I believe that there lies my biggest problem, and I'll try to improve 5468 > it. 5469 5470 Since you have variable trailing context, this is a bigger performance 5471 problem. Fixing it is usually easier than fixing backing up, which in a 5472 complicated scanner (yours seems to fit the bill) can be extremely 5473 difficult to do correctly. 5474 5475 You also don't mention what flags you are using for your scanner. 5476 -f makes a large speed difference, and -Cfe buys you nearly as much 5477 speed but the resulting scanner is considerably smaller. 5478 5479 > I have an | operator in {and} and in {pats} so both of them are variable 5480 > length. 5481 5482 -p should have reported this. 5483 5484 > Is changing one of them to fixed-length is enough ? 5485 5486 Yes. 5487 5488 > Is it possible to change the 32,000 states limit ? 5489 5490 Yes. I've appended instructions on how. Before you make this change, 5491 though, you should think about whether there are ways to fundamentally 5492 simplify your scanner - those are certainly preferable! 5493 5494 Vern 5495 5496 To increase the 32K limit (on a machine with 32 bit integers), you increase 5497 the magnitude of the following in flexdef.h: 5498 5499 #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 5500 #define MAXIMUM_MNS 31999 5501 #define BAD_SUBSCRIPT -32767 5502 #define MAX_SHORT 32700 5503 5504 Adding a 0 or two after each should do the trick. 5505 5506 5507File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ 5508 5509Can I fake multi-byte character support? 5510======================================== 5511 5512 5513 To: Heeman_Lee@hp.com 5514 Subject: Re: flex - multi-byte support? 5515 In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT. 5516 Date: Fri, 04 Oct 1996 11:42:18 PDT 5517 From: Vern Paxson <vern> 5518 5519 > I assume as long as my *.l file defines the 5520 > range of expected character code values (in octal format), flex will 5521 > scan the file and read multi-byte characters correctly. But I have no 5522 > confidence in this assumption. 5523 5524 Your lack of confidence is justified - this won't work. 5525 5526 Flex has in it a widespread assumption that the input is processed 5527 one byte at a time. Fixing this is on the to-do list, but is involved, 5528 so it won't happen any time soon. In the interim, the best I can suggest 5529 (unless you want to try fixing it yourself) is to write your rules in 5530 terms of pairs of bytes, using definitions in the first section: 5531 5532 X \xfe\xc2 5533 ... 5534 %% 5535 foo{X}bar found_foo_fe_c2_bar(); 5536 5537 etc. Definitely a pain - sorry about that. 5538 5539 By the way, the email address you used for me is ancient, indicating you 5540 have a very old version of flex. You can get the most recent, 2.5.4, from 5541 ftp.ee.lbl.gov. 5542 5543 Vern 5544 5545 5546File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ 5547 5548deleteme01 5549========== 5550 5551 5552 To: moleary@primus.com 5553 Subject: Re: Flex / Unicode compatibility question 5554 In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT. 5555 Date: Tue, 22 Oct 1996 11:06:13 PDT 5556 From: Vern Paxson <vern> 5557 5558 Unfortunately flex at the moment has a widespread assumption within it 5559 that characters are processed 8 bits at a time. I don't see any easy 5560 fix for this (other than writing your rules in terms of double characters - 5561 a pain). I also don't know of a wider lex, though you might try surfing 5562 the Plan 9 stuff because I know it's a Unicode system, and also the PCCT 5563 toolkit (try searching say Alta Vista for "Purdue Compiler Construction 5564 Toolkit"). 5565 5566 Fixing flex to handle wider characters is on the long-term to-do list. 5567 But since flex is a strictly spare-time project these days, this probably 5568 won't happen for quite a while, unless someone else does it first. 5569 5570 Vern 5571 5572 5573File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ 5574 5575Can you discuss some flex internals? 5576==================================== 5577 5578 5579 To: Johan Linde <jl@theophys.kth.se> 5580 Subject: Re: translation of flex 5581 In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST. 5582 Date: Mon, 11 Nov 1996 10:33:50 PST 5583 From: Vern Paxson <vern> 5584 5585 > I'm working for the Swedish team translating GNU program, and I'm currently 5586 > working with flex. I have a few questions about some of the messages which 5587 > I hope you can answer. 5588 5589 All of the things you're wondering about, by the way, concerning flex 5590 internals - probably the only person who understands what they mean in 5591 English is me! So I wouldn't worry too much about getting them right. 5592 That said ... 5593 5594 > #: main.c:545 5595 > msgid " %d protos created\n" 5596 > 5597 > Does proto mean prototype? 5598 5599 Yes - prototypes of state compression tables. 5600 5601 > #: main.c:539 5602 > msgid " %d/%d (peak %d) template nxt-chk entries created\n" 5603 > 5604 > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?) 5605 > However, 'template next-check entries' doesn't make much sense to me. To be 5606 > able to find a good translation I need to know a little bit more about it. 5607 5608 There is a scheme in the Aho/Sethi/Ullman compiler book for compressing 5609 scanner tables. It involves creating two pairs of tables. The first has 5610 "base" and "default" entries, the second has "next" and "check" entries. 5611 The "base" entry is indexed by the current state and yields an index into 5612 the next/check table. The "default" entry gives what to do if the state 5613 transition isn't found in next/check. The "next" entry gives the next 5614 state to enter, but only if the "check" entry verifies that this entry is 5615 correct for the current state. Flex creates templates of series of 5616 next/check entries and then encodes differences from these templates as a 5617 way to compress the tables. 5618 5619 > #: main.c:533 5620 > msgid " %d/%d base-def entries created\n" 5621 > 5622 > The same problem here for 'base-def'. 5623 5624 See above. 5625 5626 Vern 5627 5628 5629File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ 5630 5631unput() messes up yy_at_bol 5632=========================== 5633 5634 5635 To: Xinying Li <xli@npac.syr.edu> 5636 Subject: Re: FLEX ? 5637 In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST. 5638 Date: Wed, 13 Nov 1996 19:51:54 PST 5639 From: Vern Paxson <vern> 5640 5641 > "unput()" them to input flow, question occurs. If I do this after I scan 5642 > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That 5643 > means the carriage flag has gone. 5644 5645 You can control this by calling yy_set_bol(). It's described in the manual. 5646 5647 > And if in pre-reading it goes to the end of file, is anything done 5648 > to control the end of curren buffer and end of file? 5649 5650 No, there's no way to put back an end-of-file. 5651 5652 > By the way I am using flex 2.5.2 and using the "-l". 5653 5654 The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and 5655 2.5.3. You can get it from ftp.ee.lbl.gov. 5656 5657 Vern 5658 5659 5660File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ 5661 5662The | operator is not doing what I want 5663======================================= 5664 5665 5666 To: Alain.ISSARD@st.com 5667 Subject: Re: Start condition with FLEX 5668 In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST. 5669 Date: Mon, 18 Nov 1996 10:41:34 PST 5670 From: Vern Paxson <vern> 5671 5672 > I am not able to use the start condition scope and to use the | (OR) with 5673 > rules having start conditions. 5674 5675 The problem is that if you use '|' as a regular expression operator, for 5676 example "a|b" meaning "match either 'a' or 'b'", then it must *not* have 5677 any blanks around it. If you instead want the special '|' *action* (which 5678 from your scanner appears to be the case), which is a way of giving two 5679 different rules the same action: 5680 5681 foo | 5682 bar matched_foo_or_bar(); 5683 5684 then '|' *must* be separated from the first rule by whitespace and *must* 5685 be followed by a new line. You *cannot* write it as: 5686 5687 foo | bar matched_foo_or_bar(); 5688 5689 even though you might think you could because yacc supports this syntax. 5690 The reason for this unfortunately incompatibility is historical, but it's 5691 unlikely to be changed. 5692 5693 Your problems with start condition scope are simply due to syntax errors 5694 from your use of '|' later confusing flex. 5695 5696 Let me know if you still have problems. 5697 5698 Vern 5699 5700 5701File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ 5702 5703Why can't flex understand this variable trailing context pattern? 5704================================================================= 5705 5706 5707 To: Gregory Margo <gmargo@newton.vip.best.com> 5708 Subject: Re: flex-2.5.3 bug report 5709 In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST. 5710 Date: Sat, 23 Nov 1996 17:07:32 PST 5711 From: Vern Paxson <vern> 5712 5713 > Enclosed is a lex file that "real" lex will process, but I cannot get 5714 > flex to process it. Could you try it and maybe point me in the right direction? 5715 5716 Your problem is that some of the definitions in the scanner use the '/' 5717 trailing context operator, and have it enclosed in ()'s. Flex does not 5718 allow this operator to be enclosed in ()'s because doing so allows undefined 5719 regular expressions such as "(a/b)+". So the solution is to remove the 5720 parentheses. Note that you must also be building the scanner with the -l 5721 option for AT&T lex compatibility. Without this option, flex automatically 5722 encloses the definitions in parentheses. 5723 5724 Vern 5725 5726 5727File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ 5728 5729The ^ operator isn't working 5730============================ 5731 5732 5733 To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de> 5734 Subject: Re: Flex Bug ? 5735 In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST. 5736 Date: Tue, 26 Nov 1996 11:15:05 PST 5737 From: Vern Paxson <vern> 5738 5739 > In my lexer code, i have the line : 5740 > ^\*.* { } 5741 > 5742 > Thus all lines starting with an astrix (*) are comment lines. 5743 > This does not work ! 5744 5745 I can't get this problem to reproduce - it works fine for me. Note 5746 though that if what you have is slightly different: 5747 5748 COMMENT ^\*.* 5749 %% 5750 {COMMENT} { } 5751 5752 then it won't work, because flex pushes back macro definitions enclosed 5753 in ()'s, so the rule becomes 5754 5755 (^\*.*) { } 5756 5757 and now that the '^' operator is not at the immediate beginning of the 5758 line, it's interpreted as just a regular character. You can avoid this 5759 behavior by using the "-l" lex-compatibility flag, or "%option lex-compat". 5760 5761 Vern 5762 5763 5764File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ 5765 5766Trailing context is getting confused with trailing optional patterns 5767==================================================================== 5768 5769 5770 To: Adoram Rogel <adoram@hybridge.com> 5771 Subject: Re: Flex 2.5.4 BOF ??? 5772 In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST. 5773 Date: Wed, 27 Nov 1996 10:56:25 PST 5774 From: Vern Paxson <vern> 5775 5776 > Organization(s)?/[a-z] 5777 > 5778 > This matched "Organizations" (looking in debug mode, the trailing s 5779 > was matched with trailing context instead of the optional (s) in the 5780 > end of the word. 5781 5782 That should only happen with lex. Flex can properly match this pattern. 5783 (That might be what you're saying, I'm just not sure.) 5784 5785 > Is there a way to avoid this dangerous trailing context problem ? 5786 5787 Unfortunately, there's no easy way. On the other hand, I don't see why 5788 it should be a problem. Lex's matching is clearly wrong, and I'd hope 5789 that usually the intent remains the same as expressed with the pattern, 5790 so flex's matching will be correct. 5791 5792 Vern 5793 5794 5795File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ 5796 5797Is flex GNU or not? 5798=================== 5799 5800 5801 To: Cameron MacKinnon <mackin@interlog.com> 5802 Subject: Re: Flex documentation bug 5803 In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST. 5804 Date: Sun, 01 Dec 1996 22:29:39 PST 5805 From: Vern Paxson <vern> 5806 5807 > I'm not sure how or where to submit bug reports (documentation or 5808 > otherwise) for the GNU project stuff ... 5809 5810 Well, strictly speaking flex isn't part of the GNU project. They just 5811 distribute it because no one's written a decent GPL'd lex replacement. 5812 So you should send bugs directly to me. Those sent to the GNU folks 5813 sometimes find there way to me, but some may drop between the cracks. 5814 5815 > In GNU Info, under the section 'Start Conditions', and also in the man 5816 > page (mine's dated April '95) is a nice little snippet showing how to 5817 > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in 5818 > size. Unfortunately, no overflow checking is ever done ... 5819 5820 This is already mentioned in the manual: 5821 5822 Finally, here's an example of how to match C-style quoted 5823 strings using exclusive start conditions, including expanded 5824 escape sequences (but not including checking for a string 5825 that's too long): 5826 5827 The reason for not doing the overflow checking is that it will needlessly 5828 clutter up an example whose main purpose is just to demonstrate how to 5829 use flex. 5830 5831 The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov. 5832 5833 Vern 5834 5835 5836File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ 5837 5838ERASEME53 5839========= 5840 5841 5842 To: tsv@cs.UManitoba.CA 5843 Subject: Re: Flex (reg).. 5844 In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST. 5845 Date: Thu, 06 Mar 1997 15:54:19 PST 5846 From: Vern Paxson <vern> 5847 5848 > [:alpha:] ([:alnum:] | \\_)* 5849 5850 If your rule really has embedded blanks as shown above, then it won't 5851 work, as the first blank delimits the rule from the action. (It wouldn't 5852 even compile ...) You need instead: 5853 5854 [:alpha:]([:alnum:]|\\_)* 5855 5856 and that should work fine - there's no restriction on what can go inside 5857 of ()'s except for the trailing context operator, '/'. 5858 5859 Vern 5860 5861 5862File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ 5863 5864I need to scan if-then-else blocks and while loops 5865================================================== 5866 5867 5868 To: "Mike Stolnicki" <mstolnic@ford.com> 5869 Subject: Re: FLEX help 5870 In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT. 5871 Date: Fri, 30 May 1997 10:46:35 PDT 5872 From: Vern Paxson <vern> 5873 5874 > We'd like to add "if-then-else", "while", and "for" statements to our 5875 > language ... 5876 > We've investigated many possible solutions. The one solution that seems 5877 > the most reasonable involves knowing the position of a TOKEN in yyin. 5878 5879 I strongly advise you to instead build a parse tree (abstract syntax tree) 5880 and loop over that instead. You'll find this has major benefits in keeping 5881 your interpreter simple and extensible. 5882 5883 That said, the functionality you mention for get_position and set_position 5884 have been on the to-do list for a while. As flex is a purely spare-time 5885 project for me, no guarantees when this will be added (in particular, it 5886 for sure won't be for many months to come). 5887 5888 Vern 5889 5890 5891File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ 5892 5893ERASEME55 5894========= 5895 5896 5897 To: Colin Paul Adams <colin@colina.demon.co.uk> 5898 Subject: Re: Flex C++ classes and Bison 5899 In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT. 5900 Date: Fri, 15 Aug 1997 10:48:19 PDT 5901 From: Vern Paxson <vern> 5902 5903 > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control 5904 > *parm) 5905 > 5906 > I have been trying to get this to work as a C++ scanner, but it does 5907 > not appear to be possible (warning that it matches no declarations in 5908 > yyFlexLexer, or something like that). 5909 > 5910 > Is this supposed to be possible, or is it being worked on (I DID 5911 > notice the comment that scanner classes are still experimental, so I'm 5912 > not too hopeful)? 5913 5914 What you need to do is derive a subclass from yyFlexLexer that provides 5915 the above yylex() method, squirrels away lvalp and parm into member 5916 variables, and then invokes yyFlexLexer::yylex() to do the regular scanning. 5917 5918 Vern 5919 5920 5921File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ 5922 5923ERASEME56 5924========= 5925 5926 5927 To: Mikael.Latvala@lmf.ericsson.se 5928 Subject: Re: Possible mistake in Flex v2.5 document 5929 In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT. 5930 Date: Fri, 05 Sep 1997 10:01:54 PDT 5931 From: Vern Paxson <vern> 5932 5933 > In that example you show how to count comment lines when using 5934 > C style /* ... */ comments. My question is, shouldn't you take into 5935 > account a scenario where end of a comment marker occurs inside 5936 > character or string literals? 5937 5938 The scanner certainly needs to also scan character and string literals. 5939 However it does that (there's an example in the man page for strings), the 5940 lexer will recognize the beginning of the literal before it runs across the 5941 embedded "/*". Consequently, it will finish scanning the literal before it 5942 even considers the possibility of matching "/*". 5943 5944 Example: 5945 5946 '([^']*|{ESCAPE_SEQUENCE})' 5947 5948 will match all the text between the ''s (inclusive). So the lexer 5949 considers this as a token beginning at the first ', and doesn't even 5950 attempt to match other tokens inside it. 5951 5952 I thinnk this subtlety is not worth putting in the manual, as I suspect 5953 it would confuse more people than it would enlighten. 5954 5955 Vern 5956 5957 5958File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ 5959 5960ERASEME57 5961========= 5962 5963 5964 To: "Marty Leisner" <leisner@sdsp.mc.xerox.com> 5965 Subject: Re: flex limitations 5966 In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT. 5967 Date: Mon, 08 Sep 1997 11:38:08 PDT 5968 From: Vern Paxson <vern> 5969 5970 > %% 5971 > [a-zA-Z]+ /* skip a line */ 5972 > { printf("got %s\n", yytext); } 5973 > %% 5974 5975 What version of flex are you using? If I feed this to 2.5.4, it complains: 5976 5977 "bug.l", line 5: EOF encountered inside an action 5978 "bug.l", line 5: unrecognized rule 5979 "bug.l", line 5: fatal parse error 5980 5981 Not the world's greatest error message, but it manages to flag the problem. 5982 5983 (With the introduction of start condition scopes, flex can't accommodate 5984 an action on a separate line, since it's ambiguous with an indented rule.) 5985 5986 You can get 2.5.4 from ftp.ee.lbl.gov. 5987 5988 Vern 5989 5990 5991File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ 5992 5993Is there a repository for flex scanners? 5994======================================== 5995 5996Not that we know of. You might try asking on comp.compilers. 5997 5998 5999File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ 6000 6001How can I conditionally compile or preprocess my flex input file? 6002================================================================= 6003 6004Flex doesn't have a preprocessor like C does. You might try using m4, 6005or the C preprocessor plus a sed script to clean up the result. 6006 6007 6008File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ 6009 6010Where can I find grammars for lex and yacc? 6011=========================================== 6012 6013In the sources for flex and bison. 6014 6015 6016File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ 6017 6018I get an end-of-buffer message for each character scanned. 6019========================================================== 6020 6021This will happen if your LexerInput() function returns only one 6022character at a time, which can happen either if you're scanner is 6023"interactive", or if the streams library on your platform always 6024returns 1 for yyin->gcount(). 6025 6026 Solution: override LexerInput() with a version that returns whole 6027buffers. 6028 6029 6030File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ 6031 6032unnamed-faq-62 6033============== 6034 6035 6036 To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 6037 Subject: Re: Flex maximums 6038 In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST. 6039 Date: Mon, 17 Nov 1997 17:16:15 PST 6040 From: Vern Paxson <vern> 6041 6042 > I took a quick look into the flex-sources and altered some #defines in 6043 > flexdefs.h: 6044 > 6045 > #define INITIAL_MNS 64000 6046 > #define MNS_INCREMENT 1024000 6047 > #define MAXIMUM_MNS 64000 6048 6049 The things to fix are to add a couple of zeroes to: 6050 6051 #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 6052 #define MAXIMUM_MNS 31999 6053 #define BAD_SUBSCRIPT -32767 6054 #define MAX_SHORT 32700 6055 6056 and, if you get complaints about too many rules, make the following change too: 6057 6058 #define YY_TRAILING_MASK 0x200000 6059 #define YY_TRAILING_HEAD_MASK 0x400000 6060 6061 - Vern 6062 6063 6064File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ 6065 6066unnamed-faq-63 6067============== 6068 6069 6070 To: jimmey@lexis-nexis.com (Jimmey Todd) 6071 Subject: Re: FLEX question regarding istream vs ifstream 6072 In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST. 6073 Date: Mon, 15 Dec 1997 13:21:35 PST 6074 From: Vern Paxson <vern> 6075 6076 > stdin_handle = YY_CURRENT_BUFFER; 6077 > ifstream fin( "aFile" ); 6078 > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) ); 6079 > 6080 > What I'm wanting to do, is pass the contents of a file thru one set 6081 > of rules and then pass stdin thru another set... It works great if, I 6082 > don't use the C++ classes. But since everything else that I'm doing is 6083 > in C++, I thought I'd be consistent. 6084 > 6085 > The problem is that 'yy_create_buffer' is expecting an istream* as it's 6086 > first argument (as stated in the man page). However, fin is a ifstream 6087 > object. Any ideas on what I might be doing wrong? Any help would be 6088 > appreciated. Thanks!! 6089 6090 You need to pass &fin, to turn it into an ifstream* instead of an ifstream. 6091 Then its type will be compatible with the expected istream*, because ifstream 6092 is derived from istream. 6093 6094 Vern 6095 6096 6097File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ 6098 6099unnamed-faq-64 6100============== 6101 6102 6103 To: Enda Fadian <fadiane@piercom.ie> 6104 Subject: Re: Question related to Flex man page? 6105 In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST. 6106 Date: Tue, 16 Dec 1997 14:17:09 PST 6107 From: Vern Paxson <vern> 6108 6109 > Can you explain to me what is ment by a long-jump in relation to flex? 6110 6111 Using the longjmp() function while inside yylex() or a routine called by it. 6112 6113 > what is the flex activation frame. 6114 6115 Just yylex()'s stack frame. 6116 6117 > As far as I can see yyrestart will bring me back to the sart of the input 6118 > file and using flex++ isnot really an option! 6119 6120 No, yyrestart() doesn't imply a rewind, even though its name might sound 6121 like it does. It tells the scanner to flush its internal buffers and 6122 start reading from the given file at its present location. 6123 6124 Vern 6125 6126 6127File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ 6128 6129unnamed-faq-65 6130============== 6131 6132 6133 To: hassan@larc.info.uqam.ca (Hassan Alaoui) 6134 Subject: Re: Need urgent Help 6135 In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST. 6136 Date: Sun, 21 Dec 1997 21:30:46 PST 6137 From: Vern Paxson <vern> 6138 6139 > /usr/lib/yaccpar: In function `int yyparse()': 6140 > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)' 6141 > 6142 > ld: Undefined symbol 6143 > _yylex 6144 > _yyparse 6145 > _yyin 6146 6147 This is a known problem with Solaris C++ (and/or Solaris yacc). I believe 6148 the fix is to explicitly insert some 'extern "C"' statements for the 6149 corresponding routines/symbols. 6150 6151 Vern 6152 6153 6154File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ 6155 6156unnamed-faq-66 6157============== 6158 6159 6160 To: mc0307@mclink.it 6161 Cc: gnu@prep.ai.mit.edu 6162 Subject: Re: [mc0307@mclink.it: Help request] 6163 In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST. 6164 Date: Sun, 21 Dec 1997 22:33:37 PST 6165 From: Vern Paxson <vern> 6166 6167 > This is my definition for float and integer types: 6168 > . . . 6169 > NZD [1-9] 6170 > ... 6171 > I've tested my program on other lex version (on UNIX Sun Solaris an HP 6172 > UNIX) and it work well, so I think that my definitions are correct. 6173 > There are any differences between Lex and Flex? 6174 6175 There are indeed differences, as discussed in the man page. The one 6176 you are probably running into is that when flex expands a name definition, 6177 it puts parentheses around the expansion, while lex does not. There's 6178 an example in the man page of how this can lead to different matching. 6179 Flex's behavior complies with the POSIX standard (or at least with the 6180 last POSIX draft I saw). 6181 6182 Vern 6183 6184 6185File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ 6186 6187unnamed-faq-67 6188============== 6189 6190 6191 To: hassan@larc.info.uqam.ca (Hassan Alaoui) 6192 Subject: Re: Thanks 6193 In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST. 6194 Date: Mon, 22 Dec 1997 14:35:05 PST 6195 From: Vern Paxson <vern> 6196 6197 > Thank you very much for your help. I compile and link well with C++ while 6198 > declaring 'yylex ...' extern, But a little problem remains. I get a 6199 > segmentation default when executing ( I linked with lfl library) while it 6200 > works well when using LEX instead of flex. Do you have some ideas about the 6201 > reason for this ? 6202 6203 The one possible reason for this that comes to mind is if you've defined 6204 yytext as "extern char yytext[]" (which is what lex uses) instead of 6205 "extern char *yytext" (which is what flex uses). If it's not that, then 6206 I'm afraid I don't know what the problem might be. 6207 6208 Vern 6209 6210 6211File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ 6212 6213unnamed-faq-68 6214============== 6215 6216 6217 To: "Bart Niswonger" <NISWONGR@almaden.ibm.com> 6218 Subject: Re: flex 2.5: c++ scanners & start conditions 6219 In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST. 6220 Date: Tue, 06 Jan 1998 19:19:30 PST 6221 From: Vern Paxson <vern> 6222 6223 > The problem is that when I do this (using %option c++) start 6224 > conditions seem to not apply. 6225 6226 The BEGIN macro modifies the yy_start variable. For C scanners, this 6227 is a static with scope visible through the whole file. For C++ scanners, 6228 it's a member variable, so it only has visible scope within a member 6229 function. Your lexbegin() routine is not a member function when you 6230 build a C++ scanner, so it's not modifying the correct yy_start. The 6231 diagnostic that indicates this is that you found you needed to add 6232 a declaration of yy_start in order to get your scanner to compile when 6233 using C++; instead, the correct fix is to make lexbegin() a member 6234 function (by deriving from yyFlexLexer). 6235 6236 Vern 6237 6238 6239File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ 6240 6241unnamed-faq-69 6242============== 6243 6244 6245 To: "Boris Zinin" <boris@ippe.rssi.ru> 6246 Subject: Re: current position in flex buffer 6247 In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST. 6248 Date: Mon, 12 Jan 1998 12:03:15 PST 6249 From: Vern Paxson <vern> 6250 6251 > The problem is how to determine the current position in flex active 6252 > buffer when a rule is matched.... 6253 6254 You will need to keep track of this explicitly, such as by redefining 6255 YY_USER_ACTION to count the number of characters matched. 6256 6257 The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov. 6258 6259 Vern 6260 6261 6262File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ 6263 6264unnamed-faq-70 6265============== 6266 6267 6268 To: Bik.Dhaliwal@bis.org 6269 Subject: Re: Flex question 6270 In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST. 6271 Date: Tue, 27 Jan 1998 22:41:52 PST 6272 From: Vern Paxson <vern> 6273 6274 > That requirement involves knowing 6275 > the character position at which a particular token was matched 6276 > in the lexer. 6277 6278 The way you have to do this is by explicitly keeping track of where 6279 you are in the file, by counting the number of characters scanned 6280 for each token (available in yyleng). It may prove convenient to 6281 do this by redefining YY_USER_ACTION, as described in the manual. 6282 6283 Vern 6284 6285 6286File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ 6287 6288unnamed-faq-71 6289============== 6290 6291 6292 To: Vladimir Alexiev <vladimir@cs.ualberta.ca> 6293 Subject: Re: flex: how to control start condition from parser? 6294 In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST. 6295 Date: Tue, 27 Jan 1998 22:45:37 PST 6296 From: Vern Paxson <vern> 6297 6298 > It seems useful for the parser to be able to tell the lexer about such 6299 > context dependencies, because then they don't have to be limited to 6300 > local or sequential context. 6301 6302 One way to do this is to have the parser call a stub routine that's 6303 included in the scanner's .l file, and consequently that has access ot 6304 BEGIN. The only ugliness is that the parser can't pass in the state 6305 it wants, because those aren't visible - but if you don't have many 6306 such states, then using a different set of names doesn't seem like 6307 to much of a burden. 6308 6309 While generating a .h file like you suggests is certainly cleaner, 6310 flex development has come to a virtual stand-still :-(, so a workaround 6311 like the above is much more pragmatic than waiting for a new feature. 6312 6313 Vern 6314 6315 6316File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ 6317 6318unnamed-faq-72 6319============== 6320 6321 6322 To: Barbara Denny <denny@3com.com> 6323 Subject: Re: freebsd flex bug? 6324 In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST. 6325 Date: Fri, 30 Jan 1998 12:42:32 PST 6326 From: Vern Paxson <vern> 6327 6328 > lex.yy.c:1996: parse error before `=' 6329 6330 This is the key, identifying this error. (It may help to pinpoint 6331 it by using flex -L, so it doesn't generate #line directives in its 6332 output.) I will bet you heavy money that you have a start condition 6333 name that is also a variable name, or something like that; flex spits 6334 out #define's for each start condition name, mapping them to a number, 6335 so you can wind up with: 6336 6337 %x foo 6338 %% 6339 ... 6340 %% 6341 void bar() 6342 { 6343 int foo = 3; 6344 } 6345 6346 and the penultimate will turn into "int 1 = 3" after C preprocessing, 6347 since flex will put "#define foo 1" in the generated scanner. 6348 6349 Vern 6350 6351 6352File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ 6353 6354unnamed-faq-73 6355============== 6356 6357 6358 To: Maurice Petrie <mpetrie@infoscigroup.com> 6359 Subject: Re: Lost flex .l file 6360 In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST. 6361 Date: Mon, 02 Feb 1998 11:15:12 PST 6362 From: Vern Paxson <vern> 6363 6364 > I am curious as to 6365 > whether there is a simple way to backtrack from the generated source to 6366 > reproduce the lost list of tokens we are searching on. 6367 6368 In theory, it's straight-forward to go from the DFA representation 6369 back to a regular-expression representation - the two are isomorphic. 6370 In practice, a huge headache, because you have to unpack all the tables 6371 back into a single DFA representation, and then write a program to munch 6372 on that and translate it into an RE. 6373 6374 Sorry for the less-than-happy news ... 6375 6376 Vern 6377 6378 6379File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ 6380 6381unnamed-faq-74 6382============== 6383 6384 6385 To: jimmey@lexis-nexis.com (Jimmey Todd) 6386 Subject: Re: Flex performance question 6387 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. 6388 Date: Thu, 19 Feb 1998 08:48:51 PST 6389 From: Vern Paxson <vern> 6390 6391 > What I have found, is that the smaller the data chunk, the faster the 6392 > program executes. This is the opposite of what I expected. Should this be 6393 > happening this way? 6394 6395 This is exactly what will happen if your input file has embedded NULs. 6396 From the man page: 6397 6398 A final note: flex is slow when matching NUL's, particularly 6399 when a token contains multiple NUL's. It's best to write 6400 rules which match short amounts of text if it's anticipated 6401 that the text will often include NUL's. 6402 6403 So that's the first thing to look for. 6404 6405 Vern 6406 6407 6408File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ 6409 6410unnamed-faq-75 6411============== 6412 6413 6414 To: jimmey@lexis-nexis.com (Jimmey Todd) 6415 Subject: Re: Flex performance question 6416 In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. 6417 Date: Thu, 19 Feb 1998 15:42:25 PST 6418 From: Vern Paxson <vern> 6419 6420 So there are several problems. 6421 6422 First, to go fast, you want to match as much text as possible, which 6423 your scanners don't in the case that what they're scanning is *not* 6424 a <RN> tag. So you want a rule like: 6425 6426 [^<]+ 6427 6428 Second, C++ scanners are particularly slow if they're interactive, 6429 which they are by default. Using -B speeds it up by a factor of 3-4 6430 on my workstation. 6431 6432 Third, C++ scanners that use the istream interface are slow, because 6433 of how poorly implemented istream's are. I built two versions of 6434 the following scanner: 6435 6436 %% 6437 .*\n 6438 .* 6439 %% 6440 6441 and the C version inhales a 2.5MB file on my workstation in 0.8 seconds. 6442 The C++ istream version, using -B, takes 3.8 seconds. 6443 6444 Vern 6445 6446 6447File: flex.info, Node: unnamed-faq-76, Next: unnamed-faq-77, Prev: unnamed-faq-75, Up: FAQ 6448 6449unnamed-faq-76 6450============== 6451 6452 6453 To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com> 6454 Subject: Re: FLEX 2.5 & THE YEAR 2000 6455 In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT. 6456 Date: Wed, 03 Jun 1998 10:22:26 PDT 6457 From: Vern Paxson <vern> 6458 6459 > I am researching the Y2K problem with General Electric R&D 6460 > and need to know if there are any known issues concerning 6461 > the above mentioned software and Y2K regardless of version. 6462 6463 There shouldn't be, all it ever does with the date is ask the system 6464 for it and then print it out. 6465 6466 Vern 6467 6468 6469File: flex.info, Node: unnamed-faq-77, Next: unnamed-faq-78, Prev: unnamed-faq-76, Up: FAQ 6470 6471unnamed-faq-77 6472============== 6473 6474 6475 To: "Hans Dermot Doran" <htd@ibhdoran.com> 6476 Subject: Re: flex problem 6477 In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT. 6478 Date: Tue, 21 Jul 1998 14:23:34 PDT 6479 From: Vern Paxson <vern> 6480 6481 > To overcome this, I gets() the stdin into a string and lex the string. The 6482 > string is lexed OK except that the end of string isn't lexed properly 6483 > (yy_scan_string()), that is the lexer dosn't recognise the end of string. 6484 6485 Flex doesn't contain mechanisms for recognizing buffer endpoints. But if 6486 you use fgets instead (which you should anyway, to protect against buffer 6487 overflows), then the final \n will be preserved in the string, and you can 6488 scan that in order to find the end of the string. 6489 6490 Vern 6491 6492 6493File: flex.info, Node: unnamed-faq-78, Next: unnamed-faq-79, Prev: unnamed-faq-77, Up: FAQ 6494 6495unnamed-faq-78 6496============== 6497 6498 6499 To: soumen@almaden.ibm.com 6500 Subject: Re: Flex++ 2.5.3 instance member vs. static member 6501 In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT. 6502 Date: Tue, 28 Jul 1998 01:10:34 PDT 6503 From: Vern Paxson <vern> 6504 6505 > %{ 6506 > int mylineno = 0; 6507 > %} 6508 > ws [ \t]+ 6509 > alpha [A-Za-z] 6510 > dig [0-9] 6511 > %% 6512 > 6513 > Now you'd expect mylineno to be a member of each instance of class 6514 > yyFlexLexer, but is this the case? A look at the lex.yy.cc file seems to 6515 > indicate otherwise; unless I am missing something the declaration of 6516 > mylineno seems to be outside any class scope. 6517 > 6518 > How will this work if I want to run a multi-threaded application with each 6519 > thread creating a FlexLexer instance? 6520 6521 Derive your own subclass and make mylineno a member variable of it. 6522 6523 Vern 6524 6525 6526File: flex.info, Node: unnamed-faq-79, Next: unnamed-faq-80, Prev: unnamed-faq-78, Up: FAQ 6527 6528unnamed-faq-79 6529============== 6530 6531 6532 To: Adoram Rogel <adoram@hybridge.com> 6533 Subject: Re: More than 32K states change hangs 6534 In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT. 6535 Date: Tue, 04 Aug 1998 22:28:45 PDT 6536 From: Vern Paxson <vern> 6537 6538 > Vern Paxson, 6539 > 6540 > I followed your advice, posted on Usenet bu you, and emailed to me 6541 > personally by you, on how to overcome the 32K states limit. I'm running 6542 > on Linux machines. 6543 > I took the full source of version 2.5.4 and did the following changes in 6544 > flexdef.h: 6545 > #define JAMSTATE -327660 6546 > #define MAXIMUM_MNS 319990 6547 > #define BAD_SUBSCRIPT -327670 6548 > #define MAX_SHORT 327000 6549 > 6550 > and compiled. 6551 > All looked fine, including check and bigcheck, so I installed. 6552 6553 Hmmm, you shouldn't increase MAX_SHORT, though looking through my email 6554 archives I see that I did indeed recommend doing so. Try setting it back 6555 to 32700; that should suffice that you no longer need -Ca. If it still 6556 hangs, then the interesting question is - where? 6557 6558 > Compiling the same hanged program with a out-of-the-box (RedHat 4.2 6559 > distribution of Linux) 6560 > flex 2.5.4 binary works. 6561 6562 Since Linux comes with source code, you should diff it against what 6563 you have to see what problems they missed. 6564 6565 > Should I always compile with the -Ca option now ? even short and simple 6566 > filters ? 6567 6568 No, definitely not. It's meant to be for those situations where you 6569 absolutely must squeeze every last cycle out of your scanner. 6570 6571 Vern 6572 6573 6574File: flex.info, Node: unnamed-faq-80, Next: unnamed-faq-81, Prev: unnamed-faq-79, Up: FAQ 6575 6576unnamed-faq-80 6577============== 6578 6579 6580 To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com> 6581 Subject: Re: flex output for static code portion 6582 In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT. 6583 Date: Mon, 17 Aug 1998 23:57:42 PDT 6584 From: Vern Paxson <vern> 6585 6586 > I would like to use flex under the hood to generate a binary file 6587 > containing the data structures that control the parse. 6588 6589 This has been on the wish-list for a long time. In principle it's 6590 straight-forward - you redirect mkdata() et al's I/O to another file, 6591 and modify the skeleton to have a start-up function that slurps these 6592 into dynamic arrays. The concerns are (1) the scanner generation code 6593 is hairy and full of corner cases, so it's easy to get surprised when 6594 going down this path :-( ; and (2) being careful about buffering so 6595 that when the tables change you make sure the scanner starts in the 6596 correct state and reading at the right point in the input file. 6597 6598 > I was wondering if you know of anyone who has used flex in this way. 6599 6600 I don't - but it seems like a reasonable project to undertake (unlike 6601 numerous other flex tweaks :-). 6602 6603 Vern 6604 6605 6606File: flex.info, Node: unnamed-faq-81, Next: unnamed-faq-82, Prev: unnamed-faq-80, Up: FAQ 6607 6608unnamed-faq-81 6609============== 6610 6611 6612 Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11]) 6613 by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838 6614 for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT) 6615 Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2]) 6616 by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694 6617 for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200 6618 Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200 6619 From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de> 6620 Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de> 6621 Subject: "flex scanner push-back overflow" 6622 To: vern@ee.lbl.gov 6623 Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST) 6624 Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 6625 X-NoJunk: Do NOT send commercial mail, spam or ads to this address! 6626 X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/ 6627 X-Mailer: ELM [version 2.4ME+ PL28 (25)] 6628 MIME-Version: 1.0 6629 Content-Type: text/plain; charset=US-ASCII 6630 Content-Transfer-Encoding: 7bit 6631 6632 Hi Vern, 6633 6634 Yesterday, I encountered a strange problem: I use the macro processor m4 6635 to include some lengthy lists into a .l file. Following is a flex macro 6636 definition that causes some serious pain in my neck: 6637 6638 AUTHOR ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...]) 6639 6640 The complete list contains about 10kB. When I try to "flex" this file 6641 (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased 6642 some of the predefined values in flexdefs.h) I get the error: 6643 6644 myflex/flex -8 sentag.tmp.l 6645 flex scanner push-back overflow 6646 6647 When I remove the slashes in the macro definition everything works fine. 6648 As I understand it, the double quotes escape the slash-character so it 6649 really means "/" and not "trailing context". Furthermore, I tried to 6650 escape the slashes with backslashes, but with no use, the same error message 6651 appeared when flexing the code. 6652 6653 Do you have an idea what's going on here? 6654 6655 Greetings from Germany, 6656 Georg 6657 -- 6658 Georg Rehm georg@cl-ki.uni-osnabrueck.de 6659 Institute for Semantic Information Processing, University of Osnabrueck, FRG 6660 6661 6662File: flex.info, Node: unnamed-faq-82, Next: unnamed-faq-83, Prev: unnamed-faq-81, Up: FAQ 6663 6664unnamed-faq-82 6665============== 6666 6667 6668 To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 6669 Subject: Re: "flex scanner push-back overflow" 6670 In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT. 6671 Date: Thu, 20 Aug 1998 07:05:35 PDT 6672 From: Vern Paxson <vern> 6673 6674 > myflex/flex -8 sentag.tmp.l 6675 > flex scanner push-back overflow 6676 6677 Flex itself uses a flex scanner. That scanner is running out of buffer 6678 space when it tries to unput() the humongous macro you've defined. When 6679 you remove the '/'s, you make it small enough so that it fits in the buffer; 6680 removing spaces would do the same thing. 6681 6682 The fix is to either rethink how come you're using such a big macro and 6683 perhaps there's another/better way to do it; or to rebuild flex's own 6684 scan.c with a larger value for 6685 6686 #define YY_BUF_SIZE 16384 6687 6688 - Vern 6689 6690 6691File: flex.info, Node: unnamed-faq-83, Next: unnamed-faq-84, Prev: unnamed-faq-82, Up: FAQ 6692 6693unnamed-faq-83 6694============== 6695 6696 6697 To: Jan Kort <jan@research.techforce.nl> 6698 Subject: Re: Flex 6699 In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200. 6700 Date: Sat, 05 Sep 1998 00:59:49 PDT 6701 From: Vern Paxson <vern> 6702 6703 > %% 6704 > 6705 > "TEST1\n" { fprintf(stderr, "TEST1\n"); yyless(5); } 6706 > ^\n { fprintf(stderr, "empty line\n"); } 6707 > . { } 6708 > \n { fprintf(stderr, "new line\n"); } 6709 > 6710 > %% 6711 > -- input --------------------------------------- 6712 > TEST1 6713 > -- output -------------------------------------- 6714 > TEST1 6715 > empty line 6716 > ------------------------------------------------ 6717 6718 IMHO, it's not clear whether or not this is in fact a bug. It depends 6719 on whether you view yyless() as backing up in the input stream, or as 6720 pushing new characters onto the beginning of the input stream. Flex 6721 interprets it as the latter (for implementation convenience, I'll admit), 6722 and so considers the newline as in fact matching at the beginning of a 6723 line, as after all the last token scanned an entire line and so the 6724 scanner is now at the beginning of a new line. 6725 6726 I agree that this is counter-intuitive for yyless(), given its 6727 functional description (it's less so for unput(), depending on whether 6728 you're unput()'ing new text or scanned text). But I don't plan to 6729 change it any time soon, as it's a pain to do so. Consequently, 6730 you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak 6731 your scanner into the behavior you desire. 6732 6733 Sorry for the less-than-completely-satisfactory answer. 6734 6735 Vern 6736 6737 6738File: flex.info, Node: unnamed-faq-84, Next: unnamed-faq-85, Prev: unnamed-faq-83, Up: FAQ 6739 6740unnamed-faq-84 6741============== 6742 6743 6744 To: Patrick Krusenotto <krusenot@mac-info-link.de> 6745 Subject: Re: Problems with restarting flex-2.5.2-generated scanner 6746 In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT. 6747 Date: Thu, 24 Sep 1998 23:28:43 PDT 6748 From: Vern Paxson <vern> 6749 6750 > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately 6751 > trying to make my scanner restart with a new file after my parser stops 6752 > with a parse error. When my compiler restarts, the parser always 6753 > receives the token after the token (in the old file!) that caused the 6754 > parser error. 6755 6756 I suspect the problem is that your parser has read ahead in order 6757 to attempt to resolve an ambiguity, and when it's restarted it picks 6758 up with that token rather than reading a fresh one. If you're using 6759 yacc, then the special "error" production can sometimes be used to 6760 consume tokens in an attempt to get the parser into a consistent state. 6761 6762 Vern 6763 6764 6765File: flex.info, Node: unnamed-faq-85, Next: unnamed-faq-86, Prev: unnamed-faq-84, Up: FAQ 6766 6767unnamed-faq-85 6768============== 6769 6770 6771 To: Henric Jungheim <junghelh@pe-nelson.com> 6772 Subject: Re: flex 2.5.4a 6773 In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST. 6774 Date: Tue, 27 Oct 1998 16:50:14 PST 6775 From: Vern Paxson <vern> 6776 6777 > This brings up a feature request: How about a command line 6778 > option to specify the filename when reading from stdin? That way one 6779 > doesn't need to create a temporary file in order to get the "#line" 6780 > directives to make sense. 6781 6782 Use -o combined with -t (per the man page description of -o). 6783 6784 > P.S., Is there any simple way to use non-blocking IO to parse multiple 6785 > streams? 6786 6787 Simple, no. 6788 6789 One approach might be to return a magic character on EWOULDBLOCK and 6790 have a rule 6791 6792 .*<magic-character> // put back .*, eat magic character 6793 6794 This is off the top of my head, not sure it'll work. 6795 6796 Vern 6797 6798 6799File: flex.info, Node: unnamed-faq-86, Next: unnamed-faq-87, Prev: unnamed-faq-85, Up: FAQ 6800 6801unnamed-faq-86 6802============== 6803 6804 6805 To: "Repko, Billy D" <billy.d.repko@intel.com> 6806 Subject: Re: Compiling scanners 6807 In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST. 6808 Date: Thu, 14 Jan 1999 00:25:30 PST 6809 From: Vern Paxson <vern> 6810 6811 > It appears that maybe it cannot find the lfl library. 6812 6813 The Makefile in the distribution builds it, so you should have it. 6814 It's exceedingly trivial, just a main() that calls yylex() and 6815 a yyrap() that always returns 1. 6816 6817 > %% 6818 > \n ++num_lines; ++num_chars; 6819 > . ++num_chars; 6820 6821 You can't indent your rules like this - that's where the errors are coming 6822 from. Flex copies indented text to the output file, it's how you do things 6823 like 6824 6825 int num_lines_seen = 0; 6826 6827 to declare local variables. 6828 6829 Vern 6830 6831 6832File: flex.info, Node: unnamed-faq-87, Next: unnamed-faq-88, Prev: unnamed-faq-86, Up: FAQ 6833 6834unnamed-faq-87 6835============== 6836 6837 6838 To: Erick Branderhorst <Erick.Branderhorst@asml.nl> 6839 Subject: Re: flex input buffer 6840 In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST. 6841 Date: Tue, 09 Feb 1999 21:03:37 PST 6842 From: Vern Paxson <vern> 6843 6844 > In the flex.skl file the size of the default input buffers is set. Can you 6845 > explain why this size is set and why it is such a high number. 6846 6847 It's large to optimize performance when scanning large files. You can 6848 safely make it a lot lower if needed. 6849 6850 Vern 6851 6852 6853File: flex.info, Node: unnamed-faq-88, Next: unnamed-faq-90, Prev: unnamed-faq-87, Up: FAQ 6854 6855unnamed-faq-88 6856============== 6857 6858 6859 To: "Guido Minnen" <guidomi@cogs.susx.ac.uk> 6860 Subject: Re: Flex error message 6861 In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST. 6862 Date: Thu, 25 Feb 1999 00:11:31 PST 6863 From: Vern Paxson <vern> 6864 6865 > I'm extending a larger scanner written in Flex and I keep running into 6866 > problems. More specifically, I get the error message: 6867 > "flex: input rules are too complicated (>= 32000 NFA states)" 6868 6869 Increase the definitions in flexdef.h for: 6870 6871 #define JAMSTATE -32766 /* marks a reference to the state that always j 6872 ams */ 6873 #define MAXIMUM_MNS 31999 6874 #define BAD_SUBSCRIPT -32767 6875 6876 recompile everything, and it should all work. 6877 6878 Vern 6879 6880 6881File: flex.info, Node: unnamed-faq-90, Next: unnamed-faq-91, Prev: unnamed-faq-88, Up: FAQ 6882 6883unnamed-faq-90 6884============== 6885 6886 6887 To: "Dmitriy Goldobin" <gold@ems.chel.su> 6888 Subject: Re: FLEX trouble 6889 In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT. 6890 Date: Tue, 01 Jun 1999 00:15:07 PDT 6891 From: Vern Paxson <vern> 6892 6893 > I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20 6894 > but rule "/*"(.|\n)*"*/" don't work ? 6895 6896 The second of these will have to scan the entire input stream (because 6897 "(.|\n)*" matches an arbitrary amount of any text) in order to see if 6898 it ends with "*/", terminating the comment. That potentially will overflow 6899 the input buffer. 6900 6901 > More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error 6902 > 'unrecognized rule'. 6903 6904 You can't use the '/' operator inside parentheses. It's not clear 6905 what "(a/b)*" actually means. 6906 6907 > I now use workaround with state <comment>, but single-rule is 6908 > better, i think. 6909 6910 Single-rule is nice but will always have the problem of either setting 6911 restrictions on comments (like not allowing multi-line comments) and/or 6912 running the risk of consuming the entire input stream, as noted above. 6913 6914 Vern 6915 6916 6917File: flex.info, Node: unnamed-faq-91, Next: unnamed-faq-92, Prev: unnamed-faq-90, Up: FAQ 6918 6919unnamed-faq-91 6920============== 6921 6922 6923 Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18]) 6924 by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100 6925 for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT) 6926 Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999 6927 To: vern@ee.lbl.gov 6928 Date: Tue, 15 Jun 1999 08:55:43 -0700 6929 From: "Aki Niimura" <neko@my-deja.com> 6930 Message-ID: <KNONDOHDOBGAEAAA@my-deja.com> 6931 Mime-Version: 1.0 6932 Cc: 6933 X-Sent-Mail: on 6934 Reply-To: 6935 X-Mailer: MailCity Service 6936 Subject: A question on flex C++ scanner 6937 X-Sender-Ip: 12.72.207.61 6938 Organization: My Deja Email (http://www.my-deja.com:80) 6939 Content-Type: text/plain; charset=us-ascii 6940 Content-Transfer-Encoding: 7bit 6941 6942 Dear Dr. Paxon, 6943 6944 I have been using flex for years. 6945 It works very well on many projects. 6946 Most case, I used it to generate a scanner on C language. 6947 However, one project I needed to generate a scanner 6948 on C++ lanuage. Thanks to your enhancement, flex did 6949 the job. 6950 6951 Currently, I'm working on enhancing my previous project. 6952 I need to deal with multiple input streams (recursive 6953 inclusion) in this scanner (C++). 6954 I did similar thing for another scanner (C) as you 6955 explained in your documentation. 6956 6957 The generated scanner (C++) has necessary methods: 6958 - switch_to_buffer(struct yy_buffer_state *b) 6959 - yy_create_buffer(istream *is, int sz) 6960 - yy_delete_buffer(struct yy_buffer_state *b) 6961 6962 However, I couldn't figure out how to access current 6963 buffer (yy_current_buffer). 6964 6965 yy_current_buffer is a protected member of yyFlexLexer. 6966 I can't access it directly. 6967 Then, I thought yy_create_buffer() with is = 0 might 6968 return current stream buffer. But it seems not as far 6969 as I checked the source. (flex 2.5.4) 6970 6971 I went through the Web in addition to Flex documentation. 6972 However, it hasn't been successful, so far. 6973 6974 It is not my intention to bother you, but, can you 6975 comment about how to obtain the current stream buffer? 6976 6977 Your response would be highly appreciated. 6978 6979 Best regards, 6980 Aki Niimura 6981 6982 --== Sent via Deja.com http://www.deja.com/ ==-- 6983 Share what you know. Learn what you don't. 6984 6985 6986File: flex.info, Node: unnamed-faq-92, Next: unnamed-faq-93, Prev: unnamed-faq-91, Up: FAQ 6987 6988unnamed-faq-92 6989============== 6990 6991 6992 To: neko@my-deja.com 6993 Subject: Re: A question on flex C++ scanner 6994 In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT. 6995 Date: Tue, 15 Jun 1999 09:04:24 PDT 6996 From: Vern Paxson <vern> 6997 6998 > However, I couldn't figure out how to access current 6999 > buffer (yy_current_buffer). 7000 7001 Derive your own subclass from yyFlexLexer. 7002 7003 Vern 7004 7005 7006File: flex.info, Node: unnamed-faq-93, Next: unnamed-faq-94, Prev: unnamed-faq-92, Up: FAQ 7007 7008unnamed-faq-93 7009============== 7010 7011 7012 To: "Stones, Darren" <Darren.Stones@nectech.co.uk> 7013 Subject: Re: You're the man to see? 7014 In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT. 7015 Date: Wed, 23 Jun 1999 09:01:40 PDT 7016 From: Vern Paxson <vern> 7017 7018 > I hope you can help me. I am using Flex and Bison to produce an interpreted 7019 > language. However all goes well until I try to implement an IF statement or 7020 > a WHILE. I cannot get this to work as the parser parses all the conditions 7021 > eg. the TRUE and FALSE conditons to check for a rule match. So I cannot 7022 > make a decision!! 7023 7024 You need to use the parser to build a parse tree (= abstract syntax trwee), 7025 and when that's all done you recursively evaluate the tree, binding variables 7026 to values at that time. 7027 7028 Vern 7029 7030 7031File: flex.info, Node: unnamed-faq-94, Next: unnamed-faq-95, Prev: unnamed-faq-93, Up: FAQ 7032 7033unnamed-faq-94 7034============== 7035 7036 7037 To: Petr Danecek <petr@ics.cas.cz> 7038 Subject: Re: flex - question 7039 In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT. 7040 Date: Fri, 02 Jul 1999 16:52:13 PDT 7041 From: Vern Paxson <vern> 7042 7043 > file, it takes an enormous amount of time. It is funny, because the 7044 > source code has only 12 rules!!! I think it looks like an exponencial 7045 > growth. 7046 7047 Right, that's the problem - some patterns (those with a lot of 7048 ambiguity, where yours has because at any given time the scanner can 7049 be in the middle of all sorts of combinations of the different 7050 rules) blow up exponentially. 7051 7052 For your rules, there is an easy fix. Change the ".*" that comes fater 7053 the directory name to "[^ ]*". With that in place, the rules are no 7054 longer nearly so ambiguous, because then once one of the directories 7055 has been matched, no other can be matched (since they all require a 7056 leading blank). 7057 7058 If that's not an acceptable solution, then you can enter a start state 7059 to pick up the .*\n after each directory is matched. 7060 7061 Also note that for speed, you'll want to add a ".*" rule at the end, 7062 otherwise rules that don't match any of the patterns will be matched 7063 very slowly, a character at a time. 7064 7065 Vern 7066 7067 7068File: flex.info, Node: unnamed-faq-95, Next: unnamed-faq-96, Prev: unnamed-faq-94, Up: FAQ 7069 7070unnamed-faq-95 7071============== 7072 7073 7074 To: Tielman Koekemoer <tielman@spi.co.za> 7075 Subject: Re: Please help. 7076 In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT. 7077 Date: Thu, 08 Jul 1999 08:20:39 PDT 7078 From: Vern Paxson <vern> 7079 7080 > I was hoping you could help me with my problem. 7081 > 7082 > I tried compiling (gnu)flex on a Solaris 2.4 machine 7083 > but when I ran make (after configure) I got an error. 7084 > 7085 > -------------------------------------------------------------- 7086 > gcc -c -I. -I. -g -O parse.c 7087 > ./flex -t -p ./scan.l >scan.c 7088 > sh: ./flex: not found 7089 > *** Error code 1 7090 > make: Fatal error: Command failed for target `scan.c' 7091 > ------------------------------------------------------------- 7092 > 7093 > What's strange to me is that I'm only 7094 > trying to install flex now. I then edited the Makefile to 7095 > and changed where it says "FLEX = flex" to "FLEX = lex" 7096 > ( lex: the native Solaris one ) but then it complains about 7097 > the "-p" option. Is there any way I can compile flex without 7098 > using flex or lex? 7099 > 7100 > Thanks so much for your time. 7101 7102 You managed to step on the bootstrap sequence, which first copies 7103 initscan.c to scan.c in order to build flex. Try fetching a fresh 7104 distribution from ftp.ee.lbl.gov. (Or you can first try removing 7105 ".bootstrap" and doing a make again.) 7106 7107 Vern 7108 7109 7110File: flex.info, Node: unnamed-faq-96, Next: unnamed-faq-97, Prev: unnamed-faq-95, Up: FAQ 7111 7112unnamed-faq-96 7113============== 7114 7115 7116 To: Tielman Koekemoer <tielman@spi.co.za> 7117 Subject: Re: Please help. 7118 In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT. 7119 Date: Fri, 09 Jul 1999 00:27:20 PDT 7120 From: Vern Paxson <vern> 7121 7122 > First I removed .bootstrap (and ran make) - no luck. I downloaded the 7123 > software but I still have the same problem. Is there anything else I 7124 > could try. 7125 7126 Try: 7127 7128 cp initscan.c scan.c 7129 touch scan.c 7130 make scan.o 7131 7132 If this last tries to first build scan.c from scan.l using ./flex, then 7133 your "make" is broken, in which case compile scan.c to scan.o by hand. 7134 7135 Vern 7136 7137 7138File: flex.info, Node: unnamed-faq-97, Next: unnamed-faq-98, Prev: unnamed-faq-96, Up: FAQ 7139 7140unnamed-faq-97 7141============== 7142 7143 7144 To: Sumanth Kamenani <skamenan@crl.nmsu.edu> 7145 Subject: Re: Error 7146 In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT. 7147 Date: Tue, 20 Jul 1999 00:18:26 PDT 7148 From: Vern Paxson <vern> 7149 7150 > I am getting a compilation error. The error is given as "unknown symbol- yylex". 7151 7152 The parser relies on calling yylex(), but you're instead using the C++ scanning 7153 class, so you need to supply a yylex() "glue" function that calls an instance 7154 scanner of the scanner (e.g., "scanner->yylex()"). 7155 7156 Vern 7157 7158 7159File: flex.info, Node: unnamed-faq-98, Next: unnamed-faq-99, Prev: unnamed-faq-97, Up: FAQ 7160 7161unnamed-faq-98 7162============== 7163 7164 7165 To: daniel@synchrods.synchrods.COM (Daniel Senderowicz) 7166 Subject: Re: lex 7167 In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST. 7168 Date: Tue, 23 Nov 1999 15:54:30 PST 7169 From: Vern Paxson <vern> 7170 7171 Well, your problem is the 7172 7173 switch (yybgin-yysvec-1) { /* witchcraft */ 7174 7175 at the beginning of lex rules. "witchcraft" == "non-portable". It's 7176 assuming knowledge of the AT&T lex's internal variables. 7177 7178 For flex, you can probably do the equivalent using a switch on YYSTATE. 7179 7180 Vern 7181 7182 7183File: flex.info, Node: unnamed-faq-99, Next: unnamed-faq-100, Prev: unnamed-faq-98, Up: FAQ 7184 7185unnamed-faq-99 7186============== 7187 7188 7189 To: archow@hss.hns.com 7190 Subject: Re: Regarding distribution of flex and yacc based grammars 7191 In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530. 7192 Date: Wed, 22 Dec 1999 01:56:24 PST 7193 From: Vern Paxson <vern> 7194 7195 > When we provide the customer with an object code distribution, is it 7196 > necessary for us to provide source 7197 > for the generated C files from flex and bison since they are generated by 7198 > flex and bison ? 7199 7200 For flex, no. I don't know what the current state of this is for bison. 7201 7202 > Also, is there any requrirement for us to neccessarily provide source for 7203 > the grammar files which are fed into flex and bison ? 7204 7205 Again, for flex, no. 7206 7207 See the file "COPYING" in the flex distribution for the legalese. 7208 7209 Vern 7210 7211 7212File: flex.info, Node: unnamed-faq-100, Next: unnamed-faq-101, Prev: unnamed-faq-99, Up: FAQ 7213 7214unnamed-faq-100 7215=============== 7216 7217 7218 To: Martin Gallwey <gallweym@hyperion.moe.ul.ie> 7219 Subject: Re: Flex, and self referencing rules 7220 In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST. 7221 Date: Sat, 19 Feb 2000 18:33:16 PST 7222 From: Vern Paxson <vern> 7223 7224 > However, I do not use unput anywhere. I do use self-referencing 7225 > rules like this: 7226 > 7227 > UnaryExpr ({UnionExpr})|("-"{UnaryExpr}) 7228 7229 You can't do this - flex is *not* a parser like yacc (which does indeed 7230 allow recursion), it is a scanner that's confined to regular expressions. 7231 7232 Vern 7233 7234 7235File: flex.info, Node: unnamed-faq-101, Next: What is the difference between YYLEX_PARAM and YY_DECL?, Prev: unnamed-faq-100, Up: FAQ 7236 7237unnamed-faq-101 7238=============== 7239 7240 7241 To: slg3@lehigh.edu (SAMUEL L. GULDEN) 7242 Subject: Re: Flex problem 7243 In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST. 7244 Date: Thu, 02 Mar 2000 23:00:46 PST 7245 From: Vern Paxson <vern> 7246 7247 If this is exactly your program: 7248 7249 > digit [0-9] 7250 > digits {digit}+ 7251 > whitespace [ \t\n]+ 7252 > 7253 > %% 7254 > "[" { printf("open_brac\n");} 7255 > "]" { printf("close_brac\n");} 7256 > "+" { printf("addop\n");} 7257 > "*" { printf("multop\n");} 7258 > {digits} { printf("NUMBER = %s\n", yytext);} 7259 > whitespace ; 7260 7261 then the problem is that the last rule needs to be "{whitespace}" ! 7262 7263 Vern 7264 7265 7266File: flex.info, Node: What is the difference between YYLEX_PARAM and YY_DECL?, Next: Why do I get "conflicting types for yylex" error?, Prev: unnamed-faq-101, Up: FAQ 7267 7268What is the difference between YYLEX_PARAM and YY_DECL? 7269======================================================= 7270 7271YYLEX_PARAM is not a flex symbol. It is for Bison. It tells Bison to 7272pass extra params when it calls yylex() from the parser. 7273 7274 YY_DECL is the Flex declaration of yylex. The default is similar to 7275this: 7276 7277 7278 #define int yy_lex () 7279 7280 7281File: flex.info, Node: Why do I get "conflicting types for yylex" error?, Next: How do I access the values set in a Flex action from within a Bison action?, Prev: What is the difference between YYLEX_PARAM and YY_DECL?, Up: FAQ 7282 7283Why do I get "conflicting types for yylex" error? 7284================================================= 7285 7286This is a compiler error regarding a generated Bison parser, not a Flex 7287scanner. It means you need a prototype of yylex() in the top of the 7288Bison file. Be sure the prototype matches YY_DECL. 7289 7290 7291File: flex.info, Node: How do I access the values set in a Flex action from within a Bison action?, Prev: Why do I get "conflicting types for yylex" error?, Up: FAQ 7292 7293How do I access the values set in a Flex action from within a Bison action? 7294=========================================================================== 7295 7296With $1, $2, $3, etc. These are called "Semantic Values" in the Bison 7297manual. See *Note Top: (bison)Top. 7298 7299 7300File: flex.info, Node: Appendices, Next: Indices, Prev: FAQ, Up: Top 7301 7302Appendix A Appendices 7303********************* 7304 7305* Menu: 7306 7307* Makefiles and Flex:: 7308* Bison Bridge:: 7309* M4 Dependency:: 7310* Common Patterns:: 7311 7312 7313File: flex.info, Node: Makefiles and Flex, Next: Bison Bridge, Prev: Appendices, Up: Appendices 7314 7315A.1 Makefiles and Flex 7316====================== 7317 7318In this appendix, we provide tips for writing Makefiles to build your 7319scanners. 7320 7321 In a traditional build environment, we say that the `.c' files are 7322the sources, and the `.o' files are the intermediate files. When using 7323`flex', however, the `.l' files are the sources, and the generated `.c' 7324files (along with the `.o' files) are the intermediate files. This 7325requires you to carefully plan your Makefile. 7326 7327 Modern `make' programs understand that `foo.l' is intended to 7328generate `lex.yy.c' or `foo.c', and will behave accordingly(1)(2). The 7329following Makefile does not explicitly instruct `make' how to build 7330`foo.c' from `foo.l'. Instead, it relies on the implicit rules of the 7331`make' program to build the intermediate file, `scan.c': 7332 7333 7334 # Basic Makefile -- relies on implicit rules 7335 # Creates "myprogram" from "scan.l" and "myprogram.c" 7336 # 7337 LEX=flex 7338 myprogram: scan.o myprogram.o 7339 scan.o: scan.l 7340 7341 For simple cases, the above may be sufficient. For other cases, you 7342may have to explicitly instruct `make' how to build your scanner. The 7343following is an example of a Makefile containing explicit rules: 7344 7345 7346 # Basic Makefile -- provides explicit rules 7347 # Creates "myprogram" from "scan.l" and "myprogram.c" 7348 # 7349 LEX=flex 7350 myprogram: scan.o myprogram.o 7351 $(CC) -o $@ $(LDFLAGS) $^ 7352 7353 myprogram.o: myprogram.c 7354 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ 7355 7356 scan.o: scan.c 7357 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ 7358 7359 scan.c: scan.l 7360 $(LEX) $(LFLAGS) -o $@ $^ 7361 7362 clean: 7363 $(RM) *.o scan.c 7364 7365 Notice in the above example that `scan.c' is in the `clean' target. 7366This is because we consider the file `scan.c' to be an intermediate 7367file. 7368 7369 Finally, we provide a realistic example of a `flex' scanner used 7370with a `bison' parser(3). There is a tricky problem we have to deal 7371with. Since a `flex' scanner will typically include a header file 7372(e.g., `y.tab.h') generated by the parser, we need to be sure that the 7373header file is generated BEFORE the scanner is compiled. We handle this 7374case in the following example: 7375 7376 7377 # Makefile example -- scanner and parser. 7378 # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c" 7379 # 7380 LEX = flex 7381 YACC = bison -y 7382 YFLAGS = -d 7383 objects = scan.o parse.o myprogram.o 7384 7385 myprogram: $(objects) 7386 scan.o: scan.l parse.c 7387 parse.o: parse.y 7388 myprogram.o: myprogram.c 7389 7390 In the above example, notice the line, 7391 7392 7393 scan.o: scan.l parse.c 7394 7395 , which lists the file `parse.c' (the generated parser) as a 7396dependency of `scan.o'. We want to ensure that the parser is created 7397before the scanner is compiled, and the above line seems to do the 7398trick. Feel free to experiment with your specific implementation of 7399`make'. 7400 7401 For more details on writing Makefiles, see *Note Top: (make)Top. 7402 7403 ---------- Footnotes ---------- 7404 7405 (1) GNU `make' and GNU `automake' are two such programs that provide 7406implicit rules for flex-generated scanners. 7407 7408 (2) GNU `automake' may generate code to execute flex in 7409lex-compatible mode, or to stdout. If this is not what you want, then 7410you should provide an explicit rule in your Makefile.am 7411 7412 (3) This example also applies to yacc parsers. 7413 7414 7415File: flex.info, Node: Bison Bridge, Next: M4 Dependency, Prev: Makefiles and Flex, Up: Appendices 7416 7417A.2 C Scanners with Bison Parsers 7418================================= 7419 7420This section describes the `flex' features useful when integrating 7421`flex' with `GNU bison'(1). Skip this section if you are not using 7422`bison' with your scanner. Here we discuss only the `flex' half of the 7423`flex' and `bison' pair. We do not discuss `bison' in any detail. For 7424more information about generating `bison' parsers, see *Note Top: 7425(bison)Top. 7426 7427 A compatible `bison' scanner is generated by declaring `%option 7428bison-bridge' or by supplying `--bison-bridge' when invoking `flex' 7429from the command line. This instructs `flex' that the macro `yylval' 7430may be used. The data type for `yylval', `YYSTYPE', is typically 7431defined in a header file, included in section 1 of the `flex' input 7432file. For a list of functions and macros available, *Note 7433bison-functions::. 7434 7435 The declaration of yylex becomes, 7436 7437 7438 int yylex ( YYSTYPE * lvalp, yyscan_t scanner ); 7439 7440 If `%option bison-locations' is specified, then the declaration 7441becomes, 7442 7443 7444 int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner ); 7445 7446 Note that the macros `yylval' and `yylloc' evaluate to pointers. 7447Support for `yylloc' is optional in `bison', so it is optional in 7448`flex' as well. The following is an example of a `flex' scanner that is 7449compatible with `bison'. 7450 7451 7452 /* Scanner for "C" assignment statements... sort of. */ 7453 %{ 7454 #include "y.tab.h" /* Generated by bison. */ 7455 %} 7456 7457 %option bison-bridge bison-locations 7458 % 7459 7460 [[:digit:]]+ { yylval->num = atoi(yytext); return NUMBER;} 7461 [[:alnum:]]+ { yylval->str = strdup(yytext); return STRING;} 7462 "="|";" { return yytext[0];} 7463 . {} 7464 % 7465 7466 As you can see, there really is no magic here. We just use `yylval' 7467as we would any other variable. The data type of `yylval' is generated 7468by `bison', and included in the file `y.tab.h'. Here is the 7469corresponding `bison' parser: 7470 7471 7472 /* Parser to convert "C" assignments to lisp. */ 7473 %{ 7474 /* Pass the argument to yyparse through to yylex. */ 7475 #define YYPARSE_PARAM scanner 7476 #define YYLEX_PARAM scanner 7477 %} 7478 %locations 7479 %pure_parser 7480 %union { 7481 int num; 7482 char* str; 7483 } 7484 %token <str> STRING 7485 %token <num> NUMBER 7486 %% 7487 assignment: 7488 STRING '=' NUMBER ';' { 7489 printf( "(setf %s %d)", $1, $3 ); 7490 } 7491 ; 7492 7493 ---------- Footnotes ---------- 7494 7495 (1) The features described here are purely optional, and are by no 7496means the only way to use flex with bison. We merely provide some glue 7497to ease development of your parser-scanner pair. 7498 7499 7500File: flex.info, Node: M4 Dependency, Next: Common Patterns, Prev: Bison Bridge, Up: Appendices 7501 7502A.3 M4 Dependency 7503================= 7504 7505The macro processor `m4'(1) must be installed wherever flex is 7506installed. `flex' invokes `m4', found by searching the directories in 7507the `PATH' environment variable. Any code you place in section 1 or in 7508the actions will be sent through m4. Please follow these rules to 7509protect your code from unwanted `m4' processing. 7510 7511 * Do not use symbols that begin with, `m4_', such as, `m4_define', 7512 or `m4_include', since those are reserved for `m4' macro names. If 7513 for some reason you need m4_ as a prefix, use a preprocessor 7514 #define to get your symbol past m4 unmangled. 7515 7516 * Do not use the strings `[[' or `]]' anywhere in your code. The 7517 former is not valid in C, except within comments and strings, but 7518 the latter is valid in code such as `x[y[z]]'. The solution is 7519 simple. To get the literal string `"]]"', use `"]""]"'. To get the 7520 array notation `x[y[z]]', use `x[y[z] ]'. Flex will attempt to 7521 detect these sequences in user code, and escape them. However, 7522 it's best to avoid this complexity where possible, by removing 7523 such sequences from your code. 7524 7525 7526 `m4' is only required at the time you run `flex'. The generated 7527scanner is ordinary C or C++, and does _not_ require `m4'. 7528 7529 ---------- Footnotes ---------- 7530 7531 (1) The use of m4 is subject to change in future revisions of flex. 7532It is not part of the public API of flex. Do not depend on it. 7533 7534 7535File: flex.info, Node: Common Patterns, Prev: M4 Dependency, Up: Appendices 7536 7537A.4 Common Patterns 7538=================== 7539 7540This appendix provides examples of common regular expressions you might 7541use in your scanner. 7542 7543* Menu: 7544 7545* Numbers:: 7546* Identifiers:: 7547* Quoted Constructs:: 7548* Addresses:: 7549 7550 7551File: flex.info, Node: Numbers, Next: Identifiers, Up: Common Patterns 7552 7553A.4.1 Numbers 7554------------- 7555 7556C99 decimal constant 7557 `([[:digit:]]{-}[0])[[:digit:]]*' 7558 7559C99 hexadecimal constant 7560 `0[xX][[:xdigit:]]+' 7561 7562C99 octal constant 7563 `0[0123456]*' 7564 7565C99 floating point constant 7566 7567 {dseq} ([[:digit:]]+) 7568 {dseq_opt} ([[:digit:]]*) 7569 {frac} (({dseq_opt}"."{dseq})|{dseq}".") 7570 {exp} ([eE][+-]?{dseq}) 7571 {exp_opt} ({exp}?) 7572 {fsuff} [flFL] 7573 {fsuff_opt} ({fsuff}?) 7574 {hpref} (0[xX]) 7575 {hdseq} ([[:xdigit:]]+) 7576 {hdseq_opt} ([[:xdigit:]]*) 7577 {hfrac} (({hdseq_opt}"."{hdseq})|({hdseq}".")) 7578 {bexp} ([pP][+-]?{dseq}) 7579 {dfc} (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt})) 7580 {hfc} (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt})) 7581 7582 {c99_floating_point_constant} ({dfc}|{hfc}) 7583 7584 See C99 section 6.4.4.2 for the gory details. 7585 7586 7587 7588File: flex.info, Node: Identifiers, Next: Quoted Constructs, Prev: Numbers, Up: Common Patterns 7589 7590A.4.2 Identifiers 7591----------------- 7592 7593C99 Identifier 7594 7595 ucn ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8}))) 7596 nondigit [_[:alpha:]] 7597 c99_id ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})* 7598 7599 Technically, the above pattern does not encompass all possible C99 7600 identifiers, since C99 allows for "implementation-defined" 7601 characters. In practice, C compilers follow the above pattern, 7602 with the addition of the `$' character. 7603 7604UTF-8 Encoded Unicode Code Point 7605 7606 [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2}) 7607 7608 7609 7610File: flex.info, Node: Quoted Constructs, Next: Addresses, Prev: Identifiers, Up: Common Patterns 7611 7612A.4.3 Quoted Constructs 7613----------------------- 7614 7615C99 String Literal 7616 `L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"' 7617 7618C99 Comment 7619 `("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)' 7620 7621 Note that in C99, a `//'-style comment may be split across lines, 7622 and, contrary to popular belief, does not include the trailing 7623 `\n' character. 7624 7625 A better way to scan `/* */' comments is by line, rather than 7626 matching possibly huge comments all at once. This will allow you 7627 to scan comments of unlimited length, as long as line breaks 7628 appear at sane intervals. This is also more efficient when used 7629 with automatic line number processing. *Note option-yylineno::. 7630 7631 7632 <INITIAL>{ 7633 "/*" BEGIN(COMMENT); 7634 } 7635 <COMMENT>{ 7636 "*/" BEGIN(0); 7637 [^*\n]+ ; 7638 "*"[^/] ; 7639 \n ; 7640 } 7641 7642 7643 7644File: flex.info, Node: Addresses, Prev: Quoted Constructs, Up: Common Patterns 7645 7646A.4.4 Addresses 7647--------------- 7648 7649IPv4 Address 7650 `(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))' 7651 7652IPv6 Address 7653 7654 hex4 ([[:xdigit:]]{1,4}) 7655 hexseq ({hex4}(:{hex4}*)) 7656 hexpart ({hexseq}|({hexseq}::({hexseq}?))|::{hexseq}) 7657 IPv6address ({hexpart}(":"{IPv4address})?) 7658 7659 See RFC2373 for details. 7660 7661URI 7662 `(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?' 7663 7664 This pattern is nearly useless, since it allows just about any 7665 character to appear in a URI, including spaces and control 7666 characters. See RFC2396 for details. 7667 7668 7669 7670File: flex.info, Node: Indices, Prev: Appendices, Up: Top 7671 7672Indices 7673******* 7674 7675* Menu: 7676 7677* Concept Index:: 7678* Index of Functions and Macros:: 7679* Index of Variables:: 7680* Index of Data Types:: 7681* Index of Hooks:: 7682* Index of Scanner Options:: 7683 7684