156bd8546SchristosThis is flex.info, produced by makeinfo version 6.1 from flex.texi. 23c3a7b76Schristos 33c3a7b76SchristosThe flex manual is placed under the same licensing conditions as the 43c3a7b76Schristosrest of flex: 53c3a7b76Schristos 630da1778Schristos Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex 730da1778SchristosProject. 83c3a7b76Schristos 93c3a7b76Schristos Copyright (C) 1990, 1997 The Regents of the University of California. 103c3a7b76SchristosAll rights reserved. 113c3a7b76Schristos 123c3a7b76Schristos This code is derived from software contributed to Berkeley by Vern 133c3a7b76SchristosPaxson. 143c3a7b76Schristos 153c3a7b76Schristos The United States Government has rights in this work pursuant to 163c3a7b76Schristoscontract no. DE-AC03-76SF00098 between the United States Department of 173c3a7b76SchristosEnergy and the University of California. 183c3a7b76Schristos 193c3a7b76Schristos Redistribution and use in source and binary forms, with or without 203c3a7b76Schristosmodification, are permitted provided that the following conditions are 213c3a7b76Schristosmet: 223c3a7b76Schristos 233c3a7b76Schristos 1. Redistributions of source code must retain the above copyright 243c3a7b76Schristos notice, this list of conditions and the following disclaimer. 253c3a7b76Schristos 263c3a7b76Schristos 2. Redistributions in binary form must reproduce the above copyright 273c3a7b76Schristos notice, this list of conditions and the following disclaimer in the 283c3a7b76Schristos documentation and/or other materials provided with the 293c3a7b76Schristos distribution. 303c3a7b76Schristos 313c3a7b76Schristos Neither the name of the University nor the names of its contributors 323c3a7b76Schristosmay be used to endorse or promote products derived from this software 333c3a7b76Schristoswithout specific prior written permission. 343c3a7b76Schristos 353c3a7b76Schristos THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED 363c3a7b76SchristosWARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 373c3a7b76SchristosMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 3830da1778SchristosINFO-DIR-SECTION Programming 3930da1778SchristosSTART-INFO-DIR-ENTRY 4030da1778Schristos* flex: (flex). Fast lexical analyzer generator (lex replacement). 4130da1778SchristosEND-INFO-DIR-ENTRY 423c3a7b76Schristos 433c3a7b76Schristos 443c3a7b76SchristosFile: flex.info, Node: Top, Next: Copyright, Prev: (dir), Up: (dir) 453c3a7b76Schristos 463c3a7b76Schristosflex 473c3a7b76Schristos**** 483c3a7b76Schristos 4930da1778SchristosThis manual describes 'flex', a tool for generating programs that 503c3a7b76Schristosperform pattern-matching on text. The manual includes both tutorial and 513c3a7b76Schristosreference sections. 523c3a7b76Schristos 53*463ae347Schristos This edition of 'The flex Manual' documents 'flex' version 2.6.4. It 54*463ae347Schristoswas last updated on 6 May 2017. 553c3a7b76Schristos 563c3a7b76Schristos This manual was written by Vern Paxson, Will Estes and John Millaway. 573c3a7b76Schristos 583c3a7b76Schristos* Menu: 593c3a7b76Schristos 603c3a7b76Schristos* Copyright:: 613c3a7b76Schristos* Reporting Bugs:: 623c3a7b76Schristos* Introduction:: 633c3a7b76Schristos* Simple Examples:: 643c3a7b76Schristos* Format:: 653c3a7b76Schristos* Patterns:: 663c3a7b76Schristos* Matching:: 673c3a7b76Schristos* Actions:: 683c3a7b76Schristos* Generated Scanner:: 693c3a7b76Schristos* Start Conditions:: 703c3a7b76Schristos* Multiple Input Buffers:: 713c3a7b76Schristos* EOF:: 723c3a7b76Schristos* Misc Macros:: 733c3a7b76Schristos* User Values:: 743c3a7b76Schristos* Yacc:: 753c3a7b76Schristos* Scanner Options:: 763c3a7b76Schristos* Performance:: 773c3a7b76Schristos* Cxx:: 783c3a7b76Schristos* Reentrant:: 793c3a7b76Schristos* Lex and Posix:: 803c3a7b76Schristos* Memory Management:: 813c3a7b76Schristos* Serialized Tables:: 823c3a7b76Schristos* Diagnostics:: 833c3a7b76Schristos* Limitations:: 843c3a7b76Schristos* Bibliography:: 853c3a7b76Schristos* FAQ:: 863c3a7b76Schristos* Appendices:: 873c3a7b76Schristos* Indices:: 883c3a7b76Schristos 8930da1778Schristos -- The Detailed Node Listing -- 903c3a7b76Schristos 913c3a7b76SchristosFormat of the Input File 923c3a7b76Schristos 933c3a7b76Schristos* Definitions Section:: 943c3a7b76Schristos* Rules Section:: 953c3a7b76Schristos* User Code Section:: 963c3a7b76Schristos* Comments in the Input:: 973c3a7b76Schristos 983c3a7b76SchristosScanner Options 993c3a7b76Schristos 1003c3a7b76Schristos* Options for Specifying Filenames:: 1013c3a7b76Schristos* Options Affecting Scanner Behavior:: 1023c3a7b76Schristos* Code-Level And API Options:: 1033c3a7b76Schristos* Options for Scanner Speed and Size:: 1043c3a7b76Schristos* Debugging Options:: 1053c3a7b76Schristos* Miscellaneous Options:: 1063c3a7b76Schristos 1073c3a7b76SchristosReentrant C Scanners 1083c3a7b76Schristos 1093c3a7b76Schristos* Reentrant Uses:: 1103c3a7b76Schristos* Reentrant Overview:: 1113c3a7b76Schristos* Reentrant Example:: 1123c3a7b76Schristos* Reentrant Detail:: 1133c3a7b76Schristos* Reentrant Functions:: 1143c3a7b76Schristos 1153c3a7b76SchristosThe Reentrant API in Detail 1163c3a7b76Schristos 1173c3a7b76Schristos* Specify Reentrant:: 1183c3a7b76Schristos* Extra Reentrant Argument:: 1193c3a7b76Schristos* Global Replacement:: 1203c3a7b76Schristos* Init and Destroy Functions:: 1213c3a7b76Schristos* Accessor Methods:: 1223c3a7b76Schristos* Extra Data:: 1233c3a7b76Schristos* About yyscan_t:: 1243c3a7b76Schristos 1253c3a7b76SchristosMemory Management 1263c3a7b76Schristos 1273c3a7b76Schristos* The Default Memory Management:: 1283c3a7b76Schristos* Overriding The Default Memory Management:: 1293c3a7b76Schristos* A Note About yytext And Memory:: 1303c3a7b76Schristos 1313c3a7b76SchristosSerialized Tables 1323c3a7b76Schristos 1333c3a7b76Schristos* Creating Serialized Tables:: 1343c3a7b76Schristos* Loading and Unloading Serialized Tables:: 1353c3a7b76Schristos* Tables File Format:: 1363c3a7b76Schristos 1373c3a7b76SchristosFAQ 1383c3a7b76Schristos 1393c3a7b76Schristos* When was flex born?:: 1403c3a7b76Schristos* How do I expand backslash-escape sequences in C-style quoted strings?:: 1413c3a7b76Schristos* Why do flex scanners call fileno if it is not ANSI compatible?:: 1423c3a7b76Schristos* Does flex support recursive pattern definitions?:: 1433c3a7b76Schristos* How do I skip huge chunks of input (tens of megabytes) while using flex?:: 1443c3a7b76Schristos* Flex is not matching my patterns in the same order that I defined them.:: 1453c3a7b76Schristos* My actions are executing out of order or sometimes not at all.:: 1463c3a7b76Schristos* How can I have multiple input sources feed into the same scanner at the same time?:: 1473c3a7b76Schristos* Can I build nested parsers that work with the same input file?:: 1483c3a7b76Schristos* How can I match text only at the end of a file?:: 1493c3a7b76Schristos* How can I make REJECT cascade across start condition boundaries?:: 1503c3a7b76Schristos* Why cant I use fast or full tables with interactive mode?:: 1513c3a7b76Schristos* How much faster is -F or -f than -C?:: 1523c3a7b76Schristos* If I have a simple grammar cant I just parse it with flex?:: 1533c3a7b76Schristos* Why doesn't yyrestart() set the start state back to INITIAL?:: 1543c3a7b76Schristos* How can I match C-style comments?:: 1553c3a7b76Schristos* The period isn't working the way I expected.:: 1563c3a7b76Schristos* Can I get the flex manual in another format?:: 1573c3a7b76Schristos* Does there exist a "faster" NDFA->DFA algorithm?:: 1583c3a7b76Schristos* How does flex compile the DFA so quickly?:: 1593c3a7b76Schristos* How can I use more than 8192 rules?:: 1603c3a7b76Schristos* How do I abandon a file in the middle of a scan and switch to a new file?:: 1613c3a7b76Schristos* How do I execute code only during initialization (only before the first scan)?:: 1623c3a7b76Schristos* How do I execute code at termination?:: 1633c3a7b76Schristos* Where else can I find help?:: 1643c3a7b76Schristos* Can I include comments in the "rules" section of the file?:: 1653c3a7b76Schristos* I get an error about undefined yywrap().:: 1663c3a7b76Schristos* How can I change the matching pattern at run time?:: 1673c3a7b76Schristos* How can I expand macros in the input?:: 1683c3a7b76Schristos* How can I build a two-pass scanner?:: 1693c3a7b76Schristos* How do I match any string not matched in the preceding rules?:: 1703c3a7b76Schristos* I am trying to port code from AT&T lex that uses yysptr and yysbuf.:: 1713c3a7b76Schristos* Is there a way to make flex treat NULL like a regular character?:: 1723c3a7b76Schristos* Whenever flex can not match the input it says "flex scanner jammed".:: 1733c3a7b76Schristos* Why doesn't flex have non-greedy operators like perl does?:: 1743c3a7b76Schristos* Memory leak - 16386 bytes allocated by malloc.:: 1753c3a7b76Schristos* How do I track the byte offset for lseek()?:: 1763c3a7b76Schristos* How do I use my own I/O classes in a C++ scanner?:: 1773c3a7b76Schristos* How do I skip as many chars as possible?:: 1783c3a7b76Schristos* deleteme00:: 1793c3a7b76Schristos* Are certain equivalent patterns faster than others?:: 1803c3a7b76Schristos* Is backing up a big deal?:: 1813c3a7b76Schristos* Can I fake multi-byte character support?:: 1823c3a7b76Schristos* deleteme01:: 1833c3a7b76Schristos* Can you discuss some flex internals?:: 1843c3a7b76Schristos* unput() messes up yy_at_bol:: 1853c3a7b76Schristos* The | operator is not doing what I want:: 1863c3a7b76Schristos* Why can't flex understand this variable trailing context pattern?:: 1873c3a7b76Schristos* The ^ operator isn't working:: 1883c3a7b76Schristos* Trailing context is getting confused with trailing optional patterns:: 1893c3a7b76Schristos* Is flex GNU or not?:: 1903c3a7b76Schristos* ERASEME53:: 1913c3a7b76Schristos* I need to scan if-then-else blocks and while loops:: 1923c3a7b76Schristos* ERASEME55:: 1933c3a7b76Schristos* ERASEME56:: 1943c3a7b76Schristos* ERASEME57:: 1953c3a7b76Schristos* Is there a repository for flex scanners?:: 1963c3a7b76Schristos* How can I conditionally compile or preprocess my flex input file?:: 1973c3a7b76Schristos* Where can I find grammars for lex and yacc?:: 1983c3a7b76Schristos* I get an end-of-buffer message for each character scanned.:: 1993c3a7b76Schristos* unnamed-faq-62:: 2003c3a7b76Schristos* unnamed-faq-63:: 2013c3a7b76Schristos* unnamed-faq-64:: 2023c3a7b76Schristos* unnamed-faq-65:: 2033c3a7b76Schristos* unnamed-faq-66:: 2043c3a7b76Schristos* unnamed-faq-67:: 2053c3a7b76Schristos* unnamed-faq-68:: 2063c3a7b76Schristos* unnamed-faq-69:: 2073c3a7b76Schristos* unnamed-faq-70:: 2083c3a7b76Schristos* unnamed-faq-71:: 2093c3a7b76Schristos* unnamed-faq-72:: 2103c3a7b76Schristos* unnamed-faq-73:: 2113c3a7b76Schristos* unnamed-faq-74:: 2123c3a7b76Schristos* unnamed-faq-75:: 2133c3a7b76Schristos* unnamed-faq-76:: 2143c3a7b76Schristos* unnamed-faq-77:: 2153c3a7b76Schristos* unnamed-faq-78:: 2163c3a7b76Schristos* unnamed-faq-79:: 2173c3a7b76Schristos* unnamed-faq-80:: 2183c3a7b76Schristos* unnamed-faq-81:: 2193c3a7b76Schristos* unnamed-faq-82:: 2203c3a7b76Schristos* unnamed-faq-83:: 2213c3a7b76Schristos* unnamed-faq-84:: 2223c3a7b76Schristos* unnamed-faq-85:: 2233c3a7b76Schristos* unnamed-faq-86:: 2243c3a7b76Schristos* unnamed-faq-87:: 2253c3a7b76Schristos* unnamed-faq-88:: 2263c3a7b76Schristos* unnamed-faq-90:: 2273c3a7b76Schristos* unnamed-faq-91:: 2283c3a7b76Schristos* unnamed-faq-92:: 2293c3a7b76Schristos* unnamed-faq-93:: 2303c3a7b76Schristos* unnamed-faq-94:: 2313c3a7b76Schristos* unnamed-faq-95:: 2323c3a7b76Schristos* unnamed-faq-96:: 2333c3a7b76Schristos* unnamed-faq-97:: 2343c3a7b76Schristos* unnamed-faq-98:: 2353c3a7b76Schristos* unnamed-faq-99:: 2363c3a7b76Schristos* unnamed-faq-100:: 2373c3a7b76Schristos* unnamed-faq-101:: 2383c3a7b76Schristos* What is the difference between YYLEX_PARAM and YY_DECL?:: 2393c3a7b76Schristos* Why do I get "conflicting types for yylex" error?:: 2403c3a7b76Schristos* How do I access the values set in a Flex action from within a Bison action?:: 2413c3a7b76Schristos 2423c3a7b76SchristosAppendices 2433c3a7b76Schristos 2443c3a7b76Schristos* Makefiles and Flex:: 2453c3a7b76Schristos* Bison Bridge:: 2463c3a7b76Schristos* M4 Dependency:: 2473c3a7b76Schristos* Common Patterns:: 2483c3a7b76Schristos 2493c3a7b76SchristosIndices 2503c3a7b76Schristos 2513c3a7b76Schristos* Concept Index:: 2523c3a7b76Schristos* Index of Functions and Macros:: 2533c3a7b76Schristos* Index of Variables:: 2543c3a7b76Schristos* Index of Data Types:: 2553c3a7b76Schristos* Index of Hooks:: 2563c3a7b76Schristos* Index of Scanner Options:: 2573c3a7b76Schristos 25830da1778Schristos 2593c3a7b76Schristos 2603c3a7b76SchristosFile: flex.info, Node: Copyright, Next: Reporting Bugs, Prev: Top, Up: Top 2613c3a7b76Schristos 2623c3a7b76Schristos1 Copyright 2633c3a7b76Schristos*********** 2643c3a7b76Schristos 2653c3a7b76SchristosThe flex manual is placed under the same licensing conditions as the 2663c3a7b76Schristosrest of flex: 2673c3a7b76Schristos 26830da1778Schristos Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex 26930da1778SchristosProject. 2703c3a7b76Schristos 2713c3a7b76Schristos Copyright (C) 1990, 1997 The Regents of the University of California. 2723c3a7b76SchristosAll rights reserved. 2733c3a7b76Schristos 2743c3a7b76Schristos This code is derived from software contributed to Berkeley by Vern 2753c3a7b76SchristosPaxson. 2763c3a7b76Schristos 2773c3a7b76Schristos The United States Government has rights in this work pursuant to 2783c3a7b76Schristoscontract no. DE-AC03-76SF00098 between the United States Department of 2793c3a7b76SchristosEnergy and the University of California. 2803c3a7b76Schristos 2813c3a7b76Schristos Redistribution and use in source and binary forms, with or without 2823c3a7b76Schristosmodification, are permitted provided that the following conditions are 2833c3a7b76Schristosmet: 2843c3a7b76Schristos 2853c3a7b76Schristos 1. Redistributions of source code must retain the above copyright 2863c3a7b76Schristos notice, this list of conditions and the following disclaimer. 2873c3a7b76Schristos 2883c3a7b76Schristos 2. Redistributions in binary form must reproduce the above copyright 2893c3a7b76Schristos notice, this list of conditions and the following disclaimer in the 2903c3a7b76Schristos documentation and/or other materials provided with the 2913c3a7b76Schristos distribution. 2923c3a7b76Schristos 2933c3a7b76Schristos Neither the name of the University nor the names of its contributors 2943c3a7b76Schristosmay be used to endorse or promote products derived from this software 2953c3a7b76Schristoswithout specific prior written permission. 2963c3a7b76Schristos 2973c3a7b76Schristos THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED 2983c3a7b76SchristosWARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 2993c3a7b76SchristosMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 3003c3a7b76Schristos 3013c3a7b76Schristos 3023c3a7b76SchristosFile: flex.info, Node: Reporting Bugs, Next: Introduction, Prev: Copyright, Up: Top 3033c3a7b76Schristos 3043c3a7b76Schristos2 Reporting Bugs 3053c3a7b76Schristos**************** 3063c3a7b76Schristos 30756bd8546SchristosIf you find a bug in 'flex', please report it using GitHub's issue 30856bd8546Schristostracking facility at <https://github.com/westes/flex/issues/> 3093c3a7b76Schristos 3103c3a7b76Schristos 3113c3a7b76SchristosFile: flex.info, Node: Introduction, Next: Simple Examples, Prev: Reporting Bugs, Up: Top 3123c3a7b76Schristos 3133c3a7b76Schristos3 Introduction 3143c3a7b76Schristos************** 3153c3a7b76Schristos 31630da1778Schristos'flex' is a tool for generating "scanners". A scanner is a program 31730da1778Schristoswhich recognizes lexical patterns in text. The 'flex' program reads the 31830da1778Schristosgiven input files, or its standard input if no file names are given, for 31930da1778Schristosa description of a scanner to generate. The description is in the form 32030da1778Schristosof pairs of regular expressions and C code, called "rules". 'flex' 32130da1778Schristosgenerates as output a C source file, 'lex.yy.c' by default, which 32230da1778Schristosdefines a routine 'yylex()'. This file can be compiled and linked with 32330da1778Schristosthe flex runtime library to produce an executable. When the executable 32430da1778Schristosis run, it analyzes its input for occurrences of the regular 32530da1778Schristosexpressions. Whenever it finds one, it executes the corresponding C 32630da1778Schristoscode. 3273c3a7b76Schristos 3283c3a7b76Schristos 3293c3a7b76SchristosFile: flex.info, Node: Simple Examples, Next: Format, Prev: Introduction, Up: Top 3303c3a7b76Schristos 3313c3a7b76Schristos4 Some Simple Examples 3323c3a7b76Schristos********************** 3333c3a7b76Schristos 33430da1778SchristosFirst some simple examples to get the flavor of how one uses 'flex'. 3353c3a7b76Schristos 33630da1778Schristos The following 'flex' input specifies a scanner which, when it 33730da1778Schristosencounters the string 'username' will replace it with the user's login 3383c3a7b76Schristosname: 3393c3a7b76Schristos 3403c3a7b76Schristos %% 3413c3a7b76Schristos username printf( "%s", getlogin() ); 3423c3a7b76Schristos 34330da1778Schristos By default, any text not matched by a 'flex' scanner is copied to the 34430da1778Schristosoutput, so the net effect of this scanner is to copy its input file to 34530da1778Schristosits output with each occurrence of 'username' expanded. In this input, 34630da1778Schristosthere is just one rule. 'username' is the "pattern" and the 'printf' is 34730da1778Schristosthe "action". The '%%' symbol marks the beginning of the rules. 3483c3a7b76Schristos 3493c3a7b76Schristos Here's another simple example: 3503c3a7b76Schristos 3513c3a7b76Schristos int num_lines = 0, num_chars = 0; 3523c3a7b76Schristos 3533c3a7b76Schristos %% 3543c3a7b76Schristos \n ++num_lines; ++num_chars; 3553c3a7b76Schristos . ++num_chars; 3563c3a7b76Schristos 3573c3a7b76Schristos %% 358dded093eSchristos 359dded093eSchristos int main() 3603c3a7b76Schristos { 3613c3a7b76Schristos yylex(); 3623c3a7b76Schristos printf( "# of lines = %d, # of chars = %d\n", 3633c3a7b76Schristos num_lines, num_chars ); 3643c3a7b76Schristos } 3653c3a7b76Schristos 3663c3a7b76Schristos This scanner counts the number of characters and the number of lines 3673c3a7b76Schristosin its input. It produces no output other than the final report on the 3683c3a7b76Schristoscharacter and line counts. The first line declares two globals, 36930da1778Schristos'num_lines' and 'num_chars', which are accessible both inside 'yylex()' 37030da1778Schristosand in the 'main()' routine declared after the second '%%'. There are 37130da1778Schristostwo rules, one which matches a newline ('\n') and increments both the 3723c3a7b76Schristosline count and the character count, and one which matches any character 37330da1778Schristosother than a newline (indicated by the '.' regular expression). 3743c3a7b76Schristos 3753c3a7b76Schristos A somewhat more complicated example: 3763c3a7b76Schristos 3773c3a7b76Schristos /* scanner for a toy Pascal-like language */ 3783c3a7b76Schristos 3793c3a7b76Schristos %{ 3803c3a7b76Schristos /* need this for the call to atof() below */ 381dded093eSchristos #include <math.h> 3823c3a7b76Schristos %} 3833c3a7b76Schristos 3843c3a7b76Schristos DIGIT [0-9] 3853c3a7b76Schristos ID [a-z][a-z0-9]* 3863c3a7b76Schristos 3873c3a7b76Schristos %% 3883c3a7b76Schristos 3893c3a7b76Schristos {DIGIT}+ { 3903c3a7b76Schristos printf( "An integer: %s (%d)\n", yytext, 3913c3a7b76Schristos atoi( yytext ) ); 3923c3a7b76Schristos } 3933c3a7b76Schristos 3943c3a7b76Schristos {DIGIT}+"."{DIGIT}* { 3953c3a7b76Schristos printf( "A float: %s (%g)\n", yytext, 3963c3a7b76Schristos atof( yytext ) ); 3973c3a7b76Schristos } 3983c3a7b76Schristos 3993c3a7b76Schristos if|then|begin|end|procedure|function { 4003c3a7b76Schristos printf( "A keyword: %s\n", yytext ); 4013c3a7b76Schristos } 4023c3a7b76Schristos 4033c3a7b76Schristos {ID} printf( "An identifier: %s\n", yytext ); 4043c3a7b76Schristos 4053c3a7b76Schristos "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext ); 4063c3a7b76Schristos 40756bd8546Schristos "{"[^{}\n]*"}" /* eat up one-line comments */ 4083c3a7b76Schristos 4093c3a7b76Schristos [ \t\n]+ /* eat up whitespace */ 4103c3a7b76Schristos 4113c3a7b76Schristos . printf( "Unrecognized character: %s\n", yytext ); 4123c3a7b76Schristos 4133c3a7b76Schristos %% 4143c3a7b76Schristos 415dded093eSchristos int main( int argc, char **argv ) 4163c3a7b76Schristos { 4173c3a7b76Schristos ++argv, --argc; /* skip over program name */ 4183c3a7b76Schristos if ( argc > 0 ) 4193c3a7b76Schristos yyin = fopen( argv[0], "r" ); 4203c3a7b76Schristos else 4213c3a7b76Schristos yyin = stdin; 4223c3a7b76Schristos 4233c3a7b76Schristos yylex(); 4243c3a7b76Schristos } 4253c3a7b76Schristos 4263c3a7b76Schristos This is the beginnings of a simple scanner for a language like 4273c3a7b76SchristosPascal. It identifies different types of "tokens" and reports on what 4283c3a7b76Schristosit has seen. 4293c3a7b76Schristos 4303c3a7b76Schristos The details of this example will be explained in the following 4313c3a7b76Schristossections. 4323c3a7b76Schristos 4333c3a7b76Schristos 4343c3a7b76SchristosFile: flex.info, Node: Format, Next: Patterns, Prev: Simple Examples, Up: Top 4353c3a7b76Schristos 4363c3a7b76Schristos5 Format of the Input File 4373c3a7b76Schristos************************** 4383c3a7b76Schristos 43930da1778SchristosThe 'flex' input file consists of three sections, separated by a line 44030da1778Schristoscontaining only '%%'. 4413c3a7b76Schristos 4423c3a7b76Schristos definitions 4433c3a7b76Schristos %% 4443c3a7b76Schristos rules 4453c3a7b76Schristos %% 4463c3a7b76Schristos user code 4473c3a7b76Schristos 4483c3a7b76Schristos* Menu: 4493c3a7b76Schristos 4503c3a7b76Schristos* Definitions Section:: 4513c3a7b76Schristos* Rules Section:: 4523c3a7b76Schristos* User Code Section:: 4533c3a7b76Schristos* Comments in the Input:: 4543c3a7b76Schristos 4553c3a7b76Schristos 4563c3a7b76SchristosFile: flex.info, Node: Definitions Section, Next: Rules Section, Prev: Format, Up: Format 4573c3a7b76Schristos 4583c3a7b76Schristos5.1 Format of the Definitions Section 4593c3a7b76Schristos===================================== 4603c3a7b76Schristos 4613c3a7b76SchristosThe "definitions section" contains declarations of simple "name" 4623c3a7b76Schristosdefinitions to simplify the scanner specification, and declarations of 4633c3a7b76Schristos"start conditions", which are explained in a later section. 4643c3a7b76Schristos 4653c3a7b76Schristos Name definitions have the form: 4663c3a7b76Schristos 4673c3a7b76Schristos name definition 4683c3a7b76Schristos 46930da1778Schristos The 'name' is a word beginning with a letter or an underscore ('_') 47030da1778Schristosfollowed by zero or more letters, digits, '_', or '-' (dash). The 4713c3a7b76Schristosdefinition is taken to begin at the first non-whitespace character 4723c3a7b76Schristosfollowing the name and continuing to the end of the line. The 47330da1778Schristosdefinition can subsequently be referred to using '{name}', which will 47430da1778Schristosexpand to '(definition)'. For example, 4753c3a7b76Schristos 4763c3a7b76Schristos DIGIT [0-9] 4773c3a7b76Schristos ID [a-z][a-z0-9]* 4783c3a7b76Schristos 47930da1778Schristos Defines 'DIGIT' to be a regular expression which matches a single 48030da1778Schristosdigit, and 'ID' to be a regular expression which matches a letter 4813c3a7b76Schristosfollowed by zero-or-more letters-or-digits. A subsequent reference to 4823c3a7b76Schristos 4833c3a7b76Schristos {DIGIT}+"."{DIGIT}* 4843c3a7b76Schristos 4853c3a7b76Schristos is identical to 4863c3a7b76Schristos 4873c3a7b76Schristos ([0-9])+"."([0-9])* 4883c3a7b76Schristos 48930da1778Schristos and matches one-or-more digits followed by a '.' followed by 4903c3a7b76Schristoszero-or-more digits. 4913c3a7b76Schristos 49230da1778Schristos An unindented comment (i.e., a line beginning with '/*') is copied 49330da1778Schristosverbatim to the output up to the next '*/'. 4943c3a7b76Schristos 49530da1778Schristos Any _indented_ text or text enclosed in '%{' and '%}' is also copied 49630da1778Schristosverbatim to the output (with the %{ and %} symbols removed). The %{ and 49730da1778Schristos%} symbols must appear unindented on lines by themselves. 4983c3a7b76Schristos 49930da1778Schristos A '%top' block is similar to a '%{' ... '%}' block, except that the 50030da1778Schristoscode in a '%top' block is relocated to the _top_ of the generated file, 50130da1778Schristosbefore any flex definitions (1). The '%top' block is useful when you 5023c3a7b76Schristoswant certain preprocessor macros to be defined or certain files to be 50330da1778Schristosincluded before the generated code. The single characters, '{' and '}' 50430da1778Schristosare used to delimit the '%top' block, as show in the example below: 5053c3a7b76Schristos 5063c3a7b76Schristos %top{ 5073c3a7b76Schristos /* This code goes at the "top" of the generated file. */ 5083c3a7b76Schristos #include <stdint.h> 5093c3a7b76Schristos #include <inttypes.h> 5103c3a7b76Schristos } 5113c3a7b76Schristos 51230da1778Schristos Multiple '%top' blocks are allowed, and their order is preserved. 5133c3a7b76Schristos 5143c3a7b76Schristos ---------- Footnotes ---------- 5153c3a7b76Schristos 51630da1778Schristos (1) Actually, 'yyIN_HEADER' is defined before the '%top' block. 5173c3a7b76Schristos 5183c3a7b76Schristos 5193c3a7b76SchristosFile: flex.info, Node: Rules Section, Next: User Code Section, Prev: Definitions Section, Up: Format 5203c3a7b76Schristos 5213c3a7b76Schristos5.2 Format of the Rules Section 5223c3a7b76Schristos=============================== 5233c3a7b76Schristos 52430da1778SchristosThe "rules" section of the 'flex' input contains a series of rules of 5253c3a7b76Schristosthe form: 5263c3a7b76Schristos 5273c3a7b76Schristos pattern action 5283c3a7b76Schristos 52930da1778Schristos where the pattern must be unindented and the action must begin on the 53030da1778Schristossame line. *Note Patterns::, for a further description of patterns and 53130da1778Schristosactions. 5323c3a7b76Schristos 5333c3a7b76Schristos In the rules section, any indented or %{ %} enclosed text appearing 5343c3a7b76Schristosbefore the first rule may be used to declare variables which are local 5353c3a7b76Schristosto the scanning routine and (after the declarations) code which is to be 53630da1778Schristosexecuted whenever the scanning routine is entered. Other indented or %{ 53730da1778Schristos%} text in the rule section is still copied to the output, but its 5383c3a7b76Schristosmeaning is not well-defined and it may well cause compile-time errors 5393c3a7b76Schristos(this feature is present for POSIX compliance. *Note Lex and Posix::, 5403c3a7b76Schristosfor other such features). 5413c3a7b76Schristos 54230da1778Schristos Any _indented_ text or text enclosed in '%{' and '%}' is copied 54330da1778Schristosverbatim to the output (with the %{ and %} symbols removed). The %{ and 54430da1778Schristos%} symbols must appear unindented on lines by themselves. 5453c3a7b76Schristos 5463c3a7b76Schristos 5473c3a7b76SchristosFile: flex.info, Node: User Code Section, Next: Comments in the Input, Prev: Rules Section, Up: Format 5483c3a7b76Schristos 5493c3a7b76Schristos5.3 Format of the User Code Section 5503c3a7b76Schristos=================================== 5513c3a7b76Schristos 55230da1778SchristosThe user code section is simply copied to 'lex.yy.c' verbatim. It is 5533c3a7b76Schristosused for companion routines which call or are called by the scanner. 5543c3a7b76SchristosThe presence of this section is optional; if it is missing, the second 55530da1778Schristos'%%' in the input file may be skipped, too. 5563c3a7b76Schristos 5573c3a7b76Schristos 5583c3a7b76SchristosFile: flex.info, Node: Comments in the Input, Prev: User Code Section, Up: Format 5593c3a7b76Schristos 5603c3a7b76Schristos5.4 Comments in the Input 5613c3a7b76Schristos========================= 5623c3a7b76Schristos 56330da1778SchristosFlex supports C-style comments, that is, anything between '/*' and '*/' 5643c3a7b76Schristosis considered a comment. Whenever flex encounters a comment, it copies 5653c3a7b76Schristosthe entire comment verbatim to the generated source code. Comments may 5663c3a7b76Schristosappear just about anywhere, but with the following exceptions: 5673c3a7b76Schristos 5683c3a7b76Schristos * Comments may not appear in the Rules Section wherever flex is 5693c3a7b76Schristos expecting a regular expression. This means comments may not appear 5703c3a7b76Schristos at the beginning of a line, or immediately following a list of 5713c3a7b76Schristos scanner states. 57230da1778Schristos * Comments may not appear on an '%option' line in the Definitions 5733c3a7b76Schristos Section. 5743c3a7b76Schristos 5753c3a7b76Schristos If you want to follow a simple rule, then always begin a comment on a 5763c3a7b76Schristosnew line, with one or more whitespace characters before the initial 57730da1778Schristos'/*'). This rule will work anywhere in the input file. 5783c3a7b76Schristos 5793c3a7b76Schristos All the comments in the following example are valid: 5803c3a7b76Schristos 5813c3a7b76Schristos %{ 5823c3a7b76Schristos /* code block */ 5833c3a7b76Schristos %} 5843c3a7b76Schristos 5853c3a7b76Schristos /* Definitions Section */ 5863c3a7b76Schristos %x STATE_X 5873c3a7b76Schristos 5883c3a7b76Schristos %% 5893c3a7b76Schristos /* Rules Section */ 5903c3a7b76Schristos ruleA /* after regex */ { /* code block */ } /* after code block */ 5913c3a7b76Schristos /* Rules Section (indented) */ 5923c3a7b76Schristos <STATE_X>{ 5933c3a7b76Schristos ruleC ECHO; 5943c3a7b76Schristos ruleD ECHO; 5953c3a7b76Schristos %{ 5963c3a7b76Schristos /* code block */ 5973c3a7b76Schristos %} 5983c3a7b76Schristos } 5993c3a7b76Schristos %% 6003c3a7b76Schristos /* User Code Section */ 6013c3a7b76Schristos 60230da1778Schristos 6033c3a7b76Schristos 6043c3a7b76SchristosFile: flex.info, Node: Patterns, Next: Matching, Prev: Format, Up: Top 6053c3a7b76Schristos 6063c3a7b76Schristos6 Patterns 6073c3a7b76Schristos********** 6083c3a7b76Schristos 609dded093eSchristosThe patterns in the input (see *note Rules Section::) are written using 6103c3a7b76Schristosan extended set of regular expressions. These are: 6113c3a7b76Schristos 61230da1778Schristos'x' 6133c3a7b76Schristos match the character 'x' 6143c3a7b76Schristos 61530da1778Schristos'.' 6163c3a7b76Schristos any character (byte) except newline 6173c3a7b76Schristos 61830da1778Schristos'[xyz]' 6193c3a7b76Schristos a "character class"; in this case, the pattern matches either an 6203c3a7b76Schristos 'x', a 'y', or a 'z' 6213c3a7b76Schristos 62230da1778Schristos'[abj-oZ]' 6233c3a7b76Schristos a "character class" with a range in it; matches an 'a', a 'b', any 6243c3a7b76Schristos letter from 'j' through 'o', or a 'Z' 6253c3a7b76Schristos 62630da1778Schristos'[^A-Z]' 6273c3a7b76Schristos a "negated character class", i.e., any character but those in the 6283c3a7b76Schristos class. In this case, any character EXCEPT an uppercase letter. 6293c3a7b76Schristos 63030da1778Schristos'[^A-Z\n]' 6313c3a7b76Schristos any character EXCEPT an uppercase letter or a newline 6323c3a7b76Schristos 63330da1778Schristos'[a-z]{-}[aeiou]' 6343c3a7b76Schristos the lowercase consonants 6353c3a7b76Schristos 63630da1778Schristos'r*' 6373c3a7b76Schristos zero or more r's, where r is any regular expression 6383c3a7b76Schristos 63930da1778Schristos'r+' 6403c3a7b76Schristos one or more r's 6413c3a7b76Schristos 64230da1778Schristos'r?' 6433c3a7b76Schristos zero or one r's (that is, "an optional r") 6443c3a7b76Schristos 64530da1778Schristos'r{2,5}' 6463c3a7b76Schristos anywhere from two to five r's 6473c3a7b76Schristos 64830da1778Schristos'r{2,}' 6493c3a7b76Schristos two or more r's 6503c3a7b76Schristos 65130da1778Schristos'r{4}' 6523c3a7b76Schristos exactly 4 r's 6533c3a7b76Schristos 65430da1778Schristos'{name}' 65530da1778Schristos the expansion of the 'name' definition (*note Format::). 6563c3a7b76Schristos 65730da1778Schristos'"[xyz]\"foo"' 65830da1778Schristos the literal string: '[xyz]"foo' 6593c3a7b76Schristos 66030da1778Schristos'\X' 66130da1778Schristos if X is 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C 66230da1778Schristos interpretation of '\x'. Otherwise, a literal 'X' (used to escape 66330da1778Schristos operators such as '*') 6643c3a7b76Schristos 66530da1778Schristos'\0' 6663c3a7b76Schristos a NUL character (ASCII code 0) 6673c3a7b76Schristos 66830da1778Schristos'\123' 6693c3a7b76Schristos the character with octal value 123 6703c3a7b76Schristos 67130da1778Schristos'\x2a' 6723c3a7b76Schristos the character with hexadecimal value 2a 6733c3a7b76Schristos 67430da1778Schristos'(r)' 67530da1778Schristos match an 'r'; parentheses are used to override precedence (see 6763c3a7b76Schristos below) 6773c3a7b76Schristos 67830da1778Schristos'(?r-s:pattern)' 67930da1778Schristos apply option 'r' and omit option 's' while interpreting pattern. 68030da1778Schristos Options may be zero or more of the characters 'i', 's', or 'x'. 6813c3a7b76Schristos 68230da1778Schristos 'i' means case-insensitive. '-i' means case-sensitive. 6833c3a7b76Schristos 68430da1778Schristos 's' alters the meaning of the '.' syntax to match any single byte 68530da1778Schristos whatsoever. '-s' alters the meaning of '.' to match any byte 68630da1778Schristos except '\n'. 6873c3a7b76Schristos 68830da1778Schristos 'x' ignores comments and whitespace in patterns. Whitespace is 68930da1778Schristos ignored unless it is backslash-escaped, contained within '""'s, or 6903c3a7b76Schristos appears inside a character class. 6913c3a7b76Schristos 6923c3a7b76Schristos The following are all valid: 6933c3a7b76Schristos 6943c3a7b76Schristos (?:foo) same as (foo) 6953c3a7b76Schristos (?i:ab7) same as ([aA][bB]7) 6963c3a7b76Schristos (?-i:ab) same as (ab) 6973c3a7b76Schristos (?s:.) same as [\x00-\xFF] 6983c3a7b76Schristos (?-s:.) same as [^\n] 6993c3a7b76Schristos (?ix-s: a . b) same as ([Aa][^\n][bB]) 7003c3a7b76Schristos (?x:a b) same as ("ab") 7013c3a7b76Schristos (?x:a\ b) same as ("a b") 7023c3a7b76Schristos (?x:a" "b) same as ("a b") 7033c3a7b76Schristos (?x:a[ ]b) same as ("a b") 7043c3a7b76Schristos (?x:a 7053c3a7b76Schristos /* comment */ 7063c3a7b76Schristos b 7073c3a7b76Schristos c) same as (abc) 7083c3a7b76Schristos 70930da1778Schristos'(?# comment )' 71030da1778Schristos omit everything within '()'. The first ')' character encountered 7113c3a7b76Schristos ends the pattern. It is not possible to for the comment to contain 71230da1778Schristos a ')' character. The comment may span lines. 7133c3a7b76Schristos 71430da1778Schristos'rs' 71530da1778Schristos the regular expression 'r' followed by the regular expression 's'; 7163c3a7b76Schristos called "concatenation" 7173c3a7b76Schristos 71830da1778Schristos'r|s' 71930da1778Schristos either an 'r' or an 's' 7203c3a7b76Schristos 72130da1778Schristos'r/s' 72230da1778Schristos an 'r' but only if it is followed by an 's'. The text matched by 72330da1778Schristos 's' is included when determining whether this rule is the longest 7243c3a7b76Schristos match, but is then returned to the input before the action is 72530da1778Schristos executed. So the action only sees the text matched by 'r'. This 7263c3a7b76Schristos type of pattern is called "trailing context". (There are some 72730da1778Schristos combinations of 'r/s' that flex cannot match correctly. *Note 7283c3a7b76Schristos Limitations::, regarding dangerous trailing context.) 7293c3a7b76Schristos 73030da1778Schristos'^r' 73130da1778Schristos an 'r', but only at the beginning of a line (i.e., when just 7323c3a7b76Schristos starting to scan, or right after a newline has been scanned). 7333c3a7b76Schristos 73430da1778Schristos'r$' 73530da1778Schristos an 'r', but only at the end of a line (i.e., just before a 73630da1778Schristos newline). Equivalent to 'r/\n'. 7373c3a7b76Schristos 73830da1778Schristos Note that 'flex''s notion of "newline" is exactly whatever the C 73930da1778Schristos compiler used to compile 'flex' interprets '\n' as; in particular, 74030da1778Schristos on some DOS systems you must either filter out '\r's in the input 74130da1778Schristos yourself, or explicitly use 'r/\r\n' for 'r$'. 7423c3a7b76Schristos 74330da1778Schristos'<s>r' 74430da1778Schristos an 'r', but only in start condition 's' (see *note Start 7453c3a7b76Schristos Conditions:: for discussion of start conditions). 7463c3a7b76Schristos 74730da1778Schristos'<s1,s2,s3>r' 74830da1778Schristos same, but in any of start conditions 's1', 's2', or 's3'. 7493c3a7b76Schristos 75030da1778Schristos'<*>r' 75130da1778Schristos an 'r' in any start condition, even an exclusive one. 7523c3a7b76Schristos 75330da1778Schristos'<<EOF>>' 7543c3a7b76Schristos an end-of-file. 7553c3a7b76Schristos 75630da1778Schristos'<s1,s2><<EOF>>' 75730da1778Schristos an end-of-file when in start condition 's1' or 's2' 7583c3a7b76Schristos 7593c3a7b76Schristos Note that inside of a character class, all regular expression 76030da1778Schristosoperators lose their special meaning except escape ('\') and the 76130da1778Schristoscharacter class operators, '-', ']]', and, at the beginning of the 76230da1778Schristosclass, '^'. 7633c3a7b76Schristos 7643c3a7b76Schristos The regular expressions listed above are grouped according to 7653c3a7b76Schristosprecedence, from highest precedence at the top to lowest at the bottom. 7663c3a7b76SchristosThose grouped together have equal precedence (see special note on the 76730da1778Schristosprecedence of the repeat operator, '{}', under the documentation for the 76830da1778Schristos'--posix' POSIX compliance option). For example, 7693c3a7b76Schristos 7703c3a7b76Schristos foo|bar* 7713c3a7b76Schristos 7723c3a7b76Schristos is the same as 7733c3a7b76Schristos 7743c3a7b76Schristos (foo)|(ba(r*)) 7753c3a7b76Schristos 77630da1778Schristos since the '*' operator has higher precedence than concatenation, and 77730da1778Schristosconcatenation higher than alternation ('|'). This pattern therefore 77830da1778Schristosmatches _either_ the string 'foo' _or_ the string 'ba' followed by 77930da1778Schristoszero-or-more 'r''s. To match 'foo' or zero-or-more repetitions of the 78030da1778Schristosstring 'bar', use: 7813c3a7b76Schristos 7823c3a7b76Schristos foo|(bar)* 7833c3a7b76Schristos 78430da1778Schristos And to match a sequence of zero or more repetitions of 'foo' and 78530da1778Schristos'bar': 7863c3a7b76Schristos 7873c3a7b76Schristos (foo|bar)* 7883c3a7b76Schristos 7893c3a7b76Schristos In addition to characters and ranges of characters, character classes 7903c3a7b76Schristoscan also contain "character class expressions". These are expressions 79130da1778Schristosenclosed inside '[:' and ':]' delimiters (which themselves must appear 79230da1778Schristosbetween the '[' and ']' of the character class. Other elements may 7933c3a7b76Schristosoccur inside the character class, too). The valid expressions are: 7943c3a7b76Schristos 7953c3a7b76Schristos [:alnum:] [:alpha:] [:blank:] 7963c3a7b76Schristos [:cntrl:] [:digit:] [:graph:] 7973c3a7b76Schristos [:lower:] [:print:] [:punct:] 7983c3a7b76Schristos [:space:] [:upper:] [:xdigit:] 7993c3a7b76Schristos 8003c3a7b76Schristos These expressions all designate a set of characters equivalent to the 80130da1778Schristoscorresponding standard C 'isXXX' function. For example, '[:alnum:]' 80230da1778Schristosdesignates those characters for which 'isalnum()' returns true - i.e., 8033c3a7b76Schristosany alphabetic or numeric character. Some systems don't provide 80430da1778Schristos'isblank()', so flex defines '[:blank:]' as a blank or a tab. 8053c3a7b76Schristos 8063c3a7b76Schristos For example, the following character classes are all equivalent: 8073c3a7b76Schristos 8083c3a7b76Schristos [[:alnum:]] 8093c3a7b76Schristos [[:alpha:][:digit:]] 8103c3a7b76Schristos [[:alpha:][0-9]] 8113c3a7b76Schristos [a-zA-Z0-9] 8123c3a7b76Schristos 8133c3a7b76Schristos A word of caution. Character classes are expanded immediately when 81430da1778Schristosseen in the 'flex' input. This means the character classes are 81530da1778Schristossensitive to the locale in which 'flex' is executed, and the resulting 8163c3a7b76Schristosscanner will not be sensitive to the runtime locale. This may or may 8173c3a7b76Schristosnot be desirable. 8183c3a7b76Schristos 81930da1778Schristos * If your scanner is case-insensitive (the '-i' flag), then 82030da1778Schristos '[:upper:]' and '[:lower:]' are equivalent to '[:alpha:]'. 8213c3a7b76Schristos 82230da1778Schristos * Character classes with ranges, such as '[a-Z]', should be used with 8233c3a7b76Schristos caution in a case-insensitive scanner if the range spans upper or 8243c3a7b76Schristos lowercase characters. Flex does not know if you want to fold all 82530da1778Schristos upper and lowercase characters together, or if you want the literal 82630da1778Schristos numeric range specified (with no case folding). When in doubt, 82730da1778Schristos flex will assume that you meant the literal numeric range, and will 82830da1778Schristos issue a warning. The exception to this rule is a character range 82930da1778Schristos such as '[a-z]' or '[S-W]' where it is obvious that you want 83030da1778Schristos case-folding to occur. Here are some examples with the '-i' flag 83130da1778Schristos enabled: 8323c3a7b76Schristos 8333c3a7b76Schristos Range Result Literal Range Alternate Range 83430da1778Schristos '[a-t]' ok '[a-tA-T]' 83530da1778Schristos '[A-T]' ok '[a-tA-T]' 83630da1778Schristos '[A-t]' ambiguous '[A-Z\[\\\]_`a-t]' '[a-tA-T]' 83730da1778Schristos '[_-{]' ambiguous '[_`a-z{]' '[_`a-zA-Z{]' 83830da1778Schristos '[@-C]' ambiguous '[@ABC]' '[@A-Z\[\\\]_`abc]' 8393c3a7b76Schristos 84030da1778Schristos * A negated character class such as the example '[^A-Z]' above _will_ 84130da1778Schristos match a newline unless '\n' (or an equivalent escape sequence) is 84230da1778Schristos one of the characters explicitly present in the negated character 84330da1778Schristos class (e.g., '[^A-Z\n]'). This is unlike how many other regular 84430da1778Schristos expression tools treat negated character classes, but unfortunately 84530da1778Schristos the inconsistency is historically entrenched. Matching newlines 84630da1778Schristos means that a pattern like '[^"]*' can match the entire input unless 84730da1778Schristos there's another quote in the input. 8483c3a7b76Schristos 8493c3a7b76Schristos Flex allows negation of character class expressions by prepending 85030da1778Schristos '^' to the POSIX character class name. 8513c3a7b76Schristos 8523c3a7b76Schristos [:^alnum:] [:^alpha:] [:^blank:] 8533c3a7b76Schristos [:^cntrl:] [:^digit:] [:^graph:] 8543c3a7b76Schristos [:^lower:] [:^print:] [:^punct:] 8553c3a7b76Schristos [:^space:] [:^upper:] [:^xdigit:] 8563c3a7b76Schristos 85730da1778Schristos Flex will issue a warning if the expressions '[:^upper:]' and 85830da1778Schristos '[:^lower:]' appear in a case-insensitive scanner, since their 8593c3a7b76Schristos meaning is unclear. The current behavior is to skip them entirely, 8603c3a7b76Schristos but this may change without notice in future revisions of flex. 8613c3a7b76Schristos 86230da1778Schristos * 86330da1778Schristos The '{-}' operator computes the difference of two character 86430da1778Schristos classes. For example, '[a-c]{-}[b-z]' represents all the 86530da1778Schristos characters in the class '[a-c]' that are not in the class '[b-z]' 86630da1778Schristos (which in this case, is just the single character 'a'). The '{-}' 86730da1778Schristos operator is left associative, so '[abc]{-}[b]{-}[c]' is the same as 86830da1778Schristos '[a]'. Be careful not to accidentally create an empty set, which 86930da1778Schristos will never match. 8703c3a7b76Schristos 87130da1778Schristos * 87230da1778Schristos The '{+}' operator computes the union of two character classes. 87330da1778Schristos For example, '[a-z]{+}[0-9]' is the same as '[a-z0-9]'. This 8743c3a7b76Schristos operator is useful when preceded by the result of a difference 87530da1778Schristos operation, as in, '[[:alpha:]]{-}[[:lower:]]{+}[q]', which is 87630da1778Schristos equivalent to '[A-Zq]' in the "C" locale. 8773c3a7b76Schristos 87830da1778Schristos * A rule can have at most one instance of trailing context (the '/' 87930da1778Schristos operator or the '$' operator). The start condition, '^', and 88030da1778Schristos '<<EOF>>' patterns can only occur at the beginning of a pattern, 88130da1778Schristos and, as well as with '/' and '$', cannot be grouped inside 88230da1778Schristos parentheses. A '^' which does not occur at the beginning of a rule 88330da1778Schristos or a '$' which does not occur at the end of a rule loses its 8843c3a7b76Schristos special properties and is treated as a normal character. 8853c3a7b76Schristos 8863c3a7b76Schristos * The following are invalid: 8873c3a7b76Schristos 8883c3a7b76Schristos foo/bar$ 8893c3a7b76Schristos <sc1>foo<sc2>bar 8903c3a7b76Schristos 89130da1778Schristos Note that the first of these can be written 'foo/bar\n'. 8923c3a7b76Schristos 89330da1778Schristos * The following will result in '$' or '^' being treated as a normal 8943c3a7b76Schristos character: 8953c3a7b76Schristos 8963c3a7b76Schristos foo|(bar$) 8973c3a7b76Schristos foo|^bar 8983c3a7b76Schristos 89930da1778Schristos If the desired meaning is a 'foo' or a 'bar'-followed-by-a-newline, 90030da1778Schristos the following could be used (the special '|' action is explained 90130da1778Schristos below, *note Actions::): 9023c3a7b76Schristos 9033c3a7b76Schristos foo | 9043c3a7b76Schristos bar$ /* action goes here */ 9053c3a7b76Schristos 90630da1778Schristos A similar trick will work for matching a 'foo' or a 90730da1778Schristos 'bar'-at-the-beginning-of-a-line. 9083c3a7b76Schristos 9093c3a7b76Schristos 9103c3a7b76SchristosFile: flex.info, Node: Matching, Next: Actions, Prev: Patterns, Up: Top 9113c3a7b76Schristos 9123c3a7b76Schristos7 How the Input Is Matched 9133c3a7b76Schristos************************** 9143c3a7b76Schristos 9153c3a7b76SchristosWhen the generated scanner is run, it analyzes its input looking for 9163c3a7b76Schristosstrings which match any of its patterns. If it finds more than one 9173c3a7b76Schristosmatch, it takes the one matching the most text (for trailing context 9183c3a7b76Schristosrules, this includes the length of the trailing part, even though it 9193c3a7b76Schristoswill then be returned to the input). If it finds two or more matches of 92030da1778Schristosthe same length, the rule listed first in the 'flex' input file is 9213c3a7b76Schristoschosen. 9223c3a7b76Schristos 9233c3a7b76Schristos Once the match is determined, the text corresponding to the match 9243c3a7b76Schristos(called the "token") is made available in the global character pointer 92530da1778Schristos'yytext', and its length in the global integer 'yyleng'. The "action" 92630da1778Schristoscorresponding to the matched pattern is then executed (*note Actions::), 92730da1778Schristosand then the remaining input is scanned for another match. 9283c3a7b76Schristos 9293c3a7b76Schristos If no match is found, then the "default rule" is executed: the next 9303c3a7b76Schristoscharacter in the input is considered matched and copied to the standard 93130da1778Schristosoutput. Thus, the simplest valid 'flex' input is: 9323c3a7b76Schristos 9333c3a7b76Schristos %% 9343c3a7b76Schristos 93530da1778Schristos which generates a scanner that simply copies its input (one character 93630da1778Schristosat a time) to its output. 9373c3a7b76Schristos 93830da1778Schristos Note that 'yytext' can be defined in two different ways: either as a 9393c3a7b76Schristoscharacter _pointer_ or as a character _array_. You can control which 94030da1778Schristosdefinition 'flex' uses by including one of the special directives 94130da1778Schristos'%pointer' or '%array' in the first (definitions) section of your flex 94230da1778Schristosinput. The default is '%pointer', unless you use the '-l' lex 94330da1778Schristoscompatibility option, in which case 'yytext' will be an array. The 94430da1778Schristosadvantage of using '%pointer' is substantially faster scanning and no 9453c3a7b76Schristosbuffer overflow when matching very large tokens (unless you run out of 9463c3a7b76Schristosdynamic memory). The disadvantage is that you are restricted in how 94730da1778Schristosyour actions can modify 'yytext' (*note Actions::), and calls to the 94830da1778Schristos'unput()' function destroys the present contents of 'yytext', which can 94930da1778Schristosbe a considerable porting headache when moving between different 'lex' 9503c3a7b76Schristosversions. 9513c3a7b76Schristos 95230da1778Schristos The advantage of '%array' is that you can then modify 'yytext' to 95330da1778Schristosyour heart's content, and calls to 'unput()' do not destroy 'yytext' 95430da1778Schristos(*note Actions::). Furthermore, existing 'lex' programs sometimes 95530da1778Schristosaccess 'yytext' externally using declarations of the form: 9563c3a7b76Schristos 9573c3a7b76Schristos extern char yytext[]; 9583c3a7b76Schristos 95930da1778Schristos This definition is erroneous when used with '%pointer', but correct 96030da1778Schristosfor '%array'. 9613c3a7b76Schristos 96230da1778Schristos The '%array' declaration defines 'yytext' to be an array of 'YYLMAX' 9633c3a7b76Schristoscharacters, which defaults to a fairly large value. You can change the 96430da1778Schristossize by simply #define'ing 'YYLMAX' to a different value in the first 96530da1778Schristossection of your 'flex' input. As mentioned above, with '%pointer' 9663c3a7b76Schristosyytext grows dynamically to accommodate large tokens. While this means 96730da1778Schristosyour '%pointer' scanner can accommodate very large tokens (such as 9683c3a7b76Schristosmatching entire blocks of comments), bear in mind that each time the 96930da1778Schristosscanner must resize 'yytext' it also must rescan the entire token from 97030da1778Schristosthe beginning, so matching such tokens can prove slow. 'yytext' 97130da1778Schristospresently does _not_ dynamically grow if a call to 'unput()' results in 9723c3a7b76Schristostoo much text being pushed back; instead, a run-time error results. 9733c3a7b76Schristos 97430da1778Schristos Also note that you cannot use '%array' with C++ scanner classes 9753c3a7b76Schristos(*note Cxx::). 9763c3a7b76Schristos 9773c3a7b76Schristos 9783c3a7b76SchristosFile: flex.info, Node: Actions, Next: Generated Scanner, Prev: Matching, Up: Top 9793c3a7b76Schristos 9803c3a7b76Schristos8 Actions 9813c3a7b76Schristos********* 9823c3a7b76Schristos 9833c3a7b76SchristosEach pattern in a rule has a corresponding "action", which can be any 9843c3a7b76Schristosarbitrary C statement. The pattern ends at the first non-escaped 9853c3a7b76Schristoswhitespace character; the remainder of the line is its action. If the 9863c3a7b76Schristosaction is empty, then when the pattern is matched the input token is 9873c3a7b76Schristossimply discarded. For example, here is the specification for a program 98830da1778Schristoswhich deletes all occurrences of 'zap me' from its input: 9893c3a7b76Schristos 9903c3a7b76Schristos %% 9913c3a7b76Schristos "zap me" 9923c3a7b76Schristos 9933c3a7b76Schristos This example will copy all other characters in the input to the 9943c3a7b76Schristosoutput since they will be matched by the default rule. 9953c3a7b76Schristos 9963c3a7b76Schristos Here is a program which compresses multiple blanks and tabs down to a 9973c3a7b76Schristossingle blank, and throws away whitespace found at the end of a line: 9983c3a7b76Schristos 9993c3a7b76Schristos %% 10003c3a7b76Schristos [ \t]+ putchar( ' ' ); 10013c3a7b76Schristos [ \t]+$ /* ignore this token */ 10023c3a7b76Schristos 100330da1778Schristos If the action contains a '{', then the action spans till the 100430da1778Schristosbalancing '}' is found, and the action may cross multiple lines. 'flex' 100530da1778Schristosknows about C strings and comments and won't be fooled by braces found 100630da1778Schristoswithin them, but also allows actions to begin with '%{' and will 100730da1778Schristosconsider the action to be all the text up to the next '%}' (regardless 10083c3a7b76Schristosof ordinary braces inside the action). 10093c3a7b76Schristos 101030da1778Schristos An action consisting solely of a vertical bar ('|') means "same as 10113c3a7b76Schristosthe action for the next rule". See below for an illustration. 10123c3a7b76Schristos 101330da1778Schristos Actions can include arbitrary C code, including 'return' statements 101430da1778Schristosto return a value to whatever routine called 'yylex()'. Each time 101530da1778Schristos'yylex()' is called it continues processing tokens from where it last 10163c3a7b76Schristosleft off until it either reaches the end of the file or executes a 10173c3a7b76Schristosreturn. 10183c3a7b76Schristos 101930da1778Schristos Actions are free to modify 'yytext' except for lengthening it (adding 102030da1778Schristoscharacters to its end-these will overwrite later characters in the input 102130da1778Schristosstream). This however does not apply when using '%array' (*note 102230da1778SchristosMatching::). In that case, 'yytext' may be freely modified in any way. 10233c3a7b76Schristos 102430da1778Schristos Actions are free to modify 'yyleng' except they should not do so if 102530da1778Schristosthe action also includes use of 'yymore()' (see below). 10263c3a7b76Schristos 102730da1778Schristos There are a number of special directives which can be included within 102830da1778Schristosan action: 10293c3a7b76Schristos 103030da1778Schristos'ECHO' 10313c3a7b76Schristos copies yytext to the scanner's output. 10323c3a7b76Schristos 103330da1778Schristos'BEGIN' 10343c3a7b76Schristos followed by the name of a start condition places the scanner in the 10353c3a7b76Schristos corresponding start condition (see below). 10363c3a7b76Schristos 103730da1778Schristos'REJECT' 10383c3a7b76Schristos directs the scanner to proceed on to the "second best" rule which 10393c3a7b76Schristos matched the input (or a prefix of the input). The rule is chosen 104030da1778Schristos as described above in *note Matching::, and 'yytext' and 'yyleng' 10413c3a7b76Schristos set up appropriately. It may either be one which matched as much 104230da1778Schristos text as the originally chosen rule but came later in the 'flex' 10433c3a7b76Schristos input file, or one which matched less text. For example, the 10443c3a7b76Schristos following will both count the words in the input and call the 104530da1778Schristos routine 'special()' whenever 'frob' is seen: 10463c3a7b76Schristos 10473c3a7b76Schristos int word_count = 0; 10483c3a7b76Schristos %% 10493c3a7b76Schristos 10503c3a7b76Schristos frob special(); REJECT; 10513c3a7b76Schristos [^ \t\n]+ ++word_count; 10523c3a7b76Schristos 105330da1778Schristos Without the 'REJECT', any occurrences of 'frob' in the input would 10543c3a7b76Schristos not be counted as words, since the scanner normally executes only 105530da1778Schristos one action per token. Multiple uses of 'REJECT' are allowed, each 10563c3a7b76Schristos one finding the next best choice to the currently active rule. For 105730da1778Schristos example, when the following scanner scans the token 'abcd', it will 105830da1778Schristos write 'abcdabcaba' to the output: 10593c3a7b76Schristos 10603c3a7b76Schristos %% 10613c3a7b76Schristos a | 10623c3a7b76Schristos ab | 10633c3a7b76Schristos abc | 10643c3a7b76Schristos abcd ECHO; REJECT; 10653c3a7b76Schristos .|\n /* eat up any unmatched character */ 10663c3a7b76Schristos 10673c3a7b76Schristos The first three rules share the fourth's action since they use the 106830da1778Schristos special '|' action. 10693c3a7b76Schristos 107030da1778Schristos 'REJECT' is a particularly expensive feature in terms of scanner 10713c3a7b76Schristos performance; if it is used in _any_ of the scanner's actions it 10723c3a7b76Schristos will slow down _all_ of the scanner's matching. Furthermore, 107330da1778Schristos 'REJECT' cannot be used with the '-Cf' or '-CF' options (*note 10743c3a7b76Schristos Scanner Options::). 10753c3a7b76Schristos 107630da1778Schristos Note also that unlike the other special actions, 'REJECT' is a 10773c3a7b76Schristos _branch_. Code immediately following it in the action will _not_ 10783c3a7b76Schristos be executed. 10793c3a7b76Schristos 108030da1778Schristos'yymore()' 10813c3a7b76Schristos tells the scanner that the next time it matches a rule, the 10823c3a7b76Schristos corresponding token should be _appended_ onto the current value of 108330da1778Schristos 'yytext' rather than replacing it. For example, given the input 108430da1778Schristos 'mega-kludge' the following will write 'mega-mega-kludge' to the 10853c3a7b76Schristos output: 10863c3a7b76Schristos 10873c3a7b76Schristos %% 10883c3a7b76Schristos mega- ECHO; yymore(); 10893c3a7b76Schristos kludge ECHO; 10903c3a7b76Schristos 109130da1778Schristos First 'mega-' is matched and echoed to the output. Then 'kludge' 109230da1778Schristos is matched, but the previous 'mega-' is still hanging around at the 109330da1778Schristos beginning of 'yytext' so the 'ECHO' for the 'kludge' rule will 109430da1778Schristos actually write 'mega-kludge'. 10953c3a7b76Schristos 109630da1778Schristos Two notes regarding use of 'yymore()'. First, 'yymore()' depends on 109730da1778Schristosthe value of 'yyleng' correctly reflecting the size of the current 109830da1778Schristostoken, so you must not modify 'yyleng' if you are using 'yymore()'. 109930da1778SchristosSecond, the presence of 'yymore()' in the scanner's action entails a 11003c3a7b76Schristosminor performance penalty in the scanner's matching speed. 11013c3a7b76Schristos 110230da1778Schristos 'yyless(n)' returns all but the first 'n' characters of the current 11033c3a7b76Schristostoken back to the input stream, where they will be rescanned when the 110430da1778Schristosscanner looks for the next match. 'yytext' and 'yyleng' are adjusted 110530da1778Schristosappropriately (e.g., 'yyleng' will now be equal to 'n'). For example, 110630da1778Schristoson the input 'foobar' the following will write out 'foobarbar': 11073c3a7b76Schristos 11083c3a7b76Schristos %% 11093c3a7b76Schristos foobar ECHO; yyless(3); 11103c3a7b76Schristos [a-z]+ ECHO; 11113c3a7b76Schristos 111230da1778Schristos An argument of 0 to 'yyless()' will cause the entire current input 11133c3a7b76Schristosstring to be scanned again. Unless you've changed how the scanner will 111430da1778Schristossubsequently process its input (using 'BEGIN', for example), this will 11153c3a7b76Schristosresult in an endless loop. 11163c3a7b76Schristos 111730da1778Schristos Note that 'yyless()' is a macro and can only be used in the flex 11183c3a7b76Schristosinput file, not from other source files. 11193c3a7b76Schristos 112030da1778Schristos 'unput(c)' puts the character 'c' back onto the input stream. It 11213c3a7b76Schristoswill be the next character scanned. The following action will take the 11223c3a7b76Schristoscurrent token and cause it to be rescanned enclosed in parentheses. 11233c3a7b76Schristos 11243c3a7b76Schristos { 11253c3a7b76Schristos int i; 11263c3a7b76Schristos /* Copy yytext because unput() trashes yytext */ 11273c3a7b76Schristos char *yycopy = strdup( yytext ); 11283c3a7b76Schristos unput( ')' ); 11293c3a7b76Schristos for ( i = yyleng - 1; i >= 0; --i ) 11303c3a7b76Schristos unput( yycopy[i] ); 11313c3a7b76Schristos unput( '(' ); 11323c3a7b76Schristos free( yycopy ); 11333c3a7b76Schristos } 11343c3a7b76Schristos 113530da1778Schristos Note that since each 'unput()' puts the given character back at the 11363c3a7b76Schristos_beginning_ of the input stream, pushing back strings must be done 11373c3a7b76Schristosback-to-front. 11383c3a7b76Schristos 113930da1778Schristos An important potential problem when using 'unput()' is that if you 114030da1778Schristosare using '%pointer' (the default), a call to 'unput()' _destroys_ the 114130da1778Schristoscontents of 'yytext', starting with its rightmost character and 11423c3a7b76Schristosdevouring one character to the left with each call. If you need the 114330da1778Schristosvalue of 'yytext' preserved after a call to 'unput()' (as in the above 114430da1778Schristosexample), you must either first copy it elsewhere, or build your scanner 114530da1778Schristosusing '%array' instead (*note Matching::). 11463c3a7b76Schristos 114730da1778Schristos Finally, note that you cannot put back 'EOF' to attempt to mark the 11483c3a7b76Schristosinput stream with an end-of-file. 11493c3a7b76Schristos 115030da1778Schristos 'input()' reads the next character from the input stream. For 11513c3a7b76Schristosexample, the following is one way to eat up C comments: 11523c3a7b76Schristos 11533c3a7b76Schristos %% 11543c3a7b76Schristos "/*" { 115530da1778Schristos int c; 11563c3a7b76Schristos 11573c3a7b76Schristos for ( ; ; ) 11583c3a7b76Schristos { 11593c3a7b76Schristos while ( (c = input()) != '*' && 11603c3a7b76Schristos c != EOF ) 11613c3a7b76Schristos ; /* eat up text of comment */ 11623c3a7b76Schristos 11633c3a7b76Schristos if ( c == '*' ) 11643c3a7b76Schristos { 11653c3a7b76Schristos while ( (c = input()) == '*' ) 11663c3a7b76Schristos ; 11673c3a7b76Schristos if ( c == '/' ) 11683c3a7b76Schristos break; /* found the end */ 11693c3a7b76Schristos } 11703c3a7b76Schristos 11713c3a7b76Schristos if ( c == EOF ) 11723c3a7b76Schristos { 11733c3a7b76Schristos error( "EOF in comment" ); 11743c3a7b76Schristos break; 11753c3a7b76Schristos } 11763c3a7b76Schristos } 11773c3a7b76Schristos } 11783c3a7b76Schristos 117930da1778Schristos (Note that if the scanner is compiled using 'C++', then 'input()' is 11803c3a7b76Schristosinstead referred to as yyinput(), in order to avoid a name clash with 118130da1778Schristosthe 'C++' stream by the name of 'input'.) 11823c3a7b76Schristos 118330da1778Schristos 'YY_FLUSH_BUFFER;' flushes the scanner's internal buffer so that the 1184dded093eSchristosnext time the scanner attempts to match a token, it will first refill 118530da1778Schristosthe buffer using 'YY_INPUT()' (*note Generated Scanner::). This action 118630da1778Schristosis a special case of the more general 'yy_flush_buffer;' function, 1187dded093eSchristosdescribed below (*note Multiple Input Buffers::) 11883c3a7b76Schristos 118930da1778Schristos 'yyterminate()' can be used in lieu of a return statement in an 11903c3a7b76Schristosaction. It terminates the scanner and returns a 0 to the scanner's 119130da1778Schristoscaller, indicating "all done". By default, 'yyterminate()' is also 11923c3a7b76Schristoscalled when an end-of-file is encountered. It is a macro and may be 11933c3a7b76Schristosredefined. 11943c3a7b76Schristos 11953c3a7b76Schristos 11963c3a7b76SchristosFile: flex.info, Node: Generated Scanner, Next: Start Conditions, Prev: Actions, Up: Top 11973c3a7b76Schristos 11983c3a7b76Schristos9 The Generated Scanner 11993c3a7b76Schristos*********************** 12003c3a7b76Schristos 120130da1778SchristosThe output of 'flex' is the file 'lex.yy.c', which contains the scanning 120230da1778Schristosroutine 'yylex()', a number of tables used by it for matching tokens, 120330da1778Schristosand a number of auxiliary routines and macros. By default, 'yylex()' is 120430da1778Schristosdeclared as follows: 12053c3a7b76Schristos 12063c3a7b76Schristos int yylex() 12073c3a7b76Schristos { 12083c3a7b76Schristos ... various definitions and the actions in here ... 12093c3a7b76Schristos } 12103c3a7b76Schristos 12113c3a7b76Schristos (If your environment supports function prototypes, then it will be 121230da1778Schristos'int yylex( void )'.) This definition may be changed by defining the 121330da1778Schristos'YY_DECL' macro. For example, you could use: 12143c3a7b76Schristos 12153c3a7b76Schristos #define YY_DECL float lexscan( a, b ) float a, b; 12163c3a7b76Schristos 121730da1778Schristos to give the scanning routine the name 'lexscan', returning a float, 12183c3a7b76Schristosand taking two floats as arguments. Note that if you give arguments to 12193c3a7b76Schristosthe scanning routine using a K&R-style/non-prototyped function 12203c3a7b76Schristosdeclaration, you must terminate the definition with a semi-colon (;). 12213c3a7b76Schristos 122256bd8546Schristos 'flex' generates 'C99' function definitions by default. Flex used to 122356bd8546Schristoshave the ability to generate obsolete, er, 'traditional', function 122456bd8546Schristosdefinitions. This was to support bootstrapping gcc on old systems. 12253c3a7b76SchristosUnfortunately, traditional definitions prevent us from using any 12263c3a7b76Schristosstandard data types smaller than int (such as short, char, or bool) as 122756bd8546Schristosfunction arguments. Furthermore, traditional definitions support added 122856bd8546Schristosextra complexity in the skeleton file. For this reason, current 122956bd8546Schristosversions of 'flex' generate standard C99 code only, leaving K&R-style 123056bd8546Schristosfunctions to the historians. 12313c3a7b76Schristos 123230da1778Schristos Whenever 'yylex()' is called, it scans tokens from the global input 123330da1778Schristosfile 'yyin' (which defaults to stdin). It continues until it either 123430da1778Schristosreaches an end-of-file (at which point it returns the value 0) or one of 123530da1778Schristosits actions executes a 'return' statement. 12363c3a7b76Schristos 12373c3a7b76Schristos If the scanner reaches an end-of-file, subsequent calls are undefined 123830da1778Schristosunless either 'yyin' is pointed at a new input file (in which case 123930da1778Schristosscanning continues from that file), or 'yyrestart()' is called. 124030da1778Schristos'yyrestart()' takes one argument, a 'FILE *' pointer (which can be NULL, 124130da1778Schristosif you've set up 'YY_INPUT' to scan from a source other than 'yyin'), 124230da1778Schristosand initializes 'yyin' for scanning from that file. Essentially there 124330da1778Schristosis no difference between just assigning 'yyin' to a new input file or 124430da1778Schristosusing 'yyrestart()' to do so; the latter is available for compatibility 124530da1778Schristoswith previous versions of 'flex', and because it can be used to switch 124630da1778Schristosinput files in the middle of scanning. It can also be used to throw 124730da1778Schristosaway the current input buffer, by calling it with an argument of 'yyin'; 124830da1778Schristosbut it would be better to use 'YY_FLUSH_BUFFER' (*note Actions::). Note 124930da1778Schristosthat 'yyrestart()' does _not_ reset the start condition to 'INITIAL' 125030da1778Schristos(*note Start Conditions::). 12513c3a7b76Schristos 125230da1778Schristos If 'yylex()' stops scanning due to executing a 'return' statement in 12533c3a7b76Schristosone of the actions, the scanner may then be called again and it will 12543c3a7b76Schristosresume scanning where it left off. 12553c3a7b76Schristos 12563c3a7b76Schristos By default (and for purposes of efficiency), the scanner uses 125730da1778Schristosblock-reads rather than simple 'getc()' calls to read characters from 125830da1778Schristos'yyin'. The nature of how it gets its input can be controlled by 125930da1778Schristosdefining the 'YY_INPUT' macro. The calling sequence for 'YY_INPUT()' is 126030da1778Schristos'YY_INPUT(buf,result,max_size)'. Its action is to place up to 126130da1778Schristos'max_size' characters in the character array 'buf' and return in the 126230da1778Schristosinteger variable 'result' either the number of characters read or the 126330da1778Schristosconstant 'YY_NULL' (0 on Unix systems) to indicate 'EOF'. The default 126430da1778Schristos'YY_INPUT' reads from the global file-pointer 'yyin'. 12653c3a7b76Schristos 126630da1778Schristos Here is a sample definition of 'YY_INPUT' (in the definitions section 126730da1778Schristosof the input file): 12683c3a7b76Schristos 12693c3a7b76Schristos %{ 12703c3a7b76Schristos #define YY_INPUT(buf,result,max_size) \ 12713c3a7b76Schristos { \ 12723c3a7b76Schristos int c = getchar(); \ 12733c3a7b76Schristos result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \ 12743c3a7b76Schristos } 12753c3a7b76Schristos %} 12763c3a7b76Schristos 12773c3a7b76Schristos This definition will change the input processing to occur one 12783c3a7b76Schristoscharacter at a time. 12793c3a7b76Schristos 12803c3a7b76Schristos When the scanner receives an end-of-file indication from YY_INPUT, it 128130da1778Schristosthen checks the 'yywrap()' function. If 'yywrap()' returns false 12823c3a7b76Schristos(zero), then it is assumed that the function has gone ahead and set up 128330da1778Schristos'yyin' to point to another input file, and scanning continues. If it 128430da1778Schristosreturns true (non-zero), then the scanner terminates, returning 0 to its 128530da1778Schristoscaller. Note that in either case, the start condition remains 128630da1778Schristosunchanged; it does _not_ revert to 'INITIAL'. 12873c3a7b76Schristos 128830da1778Schristos If you do not supply your own version of 'yywrap()', then you must 128930da1778Schristoseither use '%option noyywrap' (in which case the scanner behaves as 129030da1778Schristosthough 'yywrap()' returned 1), or you must link with '-lfl' to obtain 12913c3a7b76Schristosthe default version of the routine, which always returns 1. 12923c3a7b76Schristos 12933c3a7b76Schristos For scanning from in-memory buffers (e.g., scanning strings), see 1294dded093eSchristos*note Scanning Strings::. *Note Multiple Input Buffers::. 12953c3a7b76Schristos 129630da1778Schristos The scanner writes its 'ECHO' output to the 'yyout' global (default, 129730da1778Schristos'stdout'), which may be redefined by the user simply by assigning it to 129830da1778Schristossome other 'FILE' pointer. 12993c3a7b76Schristos 13003c3a7b76Schristos 13013c3a7b76SchristosFile: flex.info, Node: Start Conditions, Next: Multiple Input Buffers, Prev: Generated Scanner, Up: Top 13023c3a7b76Schristos 13033c3a7b76Schristos10 Start Conditions 13043c3a7b76Schristos******************* 13053c3a7b76Schristos 130630da1778Schristos'flex' provides a mechanism for conditionally activating rules. Any 130730da1778Schristosrule whose pattern is prefixed with '<sc>' will only be active when the 130830da1778Schristosscanner is in the "start condition" named 'sc'. For example, 13093c3a7b76Schristos 13103c3a7b76Schristos <STRING>[^"]* { /* eat up the string body ... */ 13113c3a7b76Schristos ... 13123c3a7b76Schristos } 13133c3a7b76Schristos 131430da1778Schristos will be active only when the scanner is in the 'STRING' start 13153c3a7b76Schristoscondition, and 13163c3a7b76Schristos 13173c3a7b76Schristos <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */ 13183c3a7b76Schristos ... 13193c3a7b76Schristos } 13203c3a7b76Schristos 13213c3a7b76Schristos will be active only when the current start condition is either 132230da1778Schristos'INITIAL', 'STRING', or 'QUOTE'. 13233c3a7b76Schristos 13243c3a7b76Schristos Start conditions are declared in the definitions (first) section of 132530da1778Schristosthe input using unindented lines beginning with either '%s' or '%x' 13263c3a7b76Schristosfollowed by a list of names. The former declares "inclusive" start 13273c3a7b76Schristosconditions, the latter "exclusive" start conditions. A start condition 132830da1778Schristosis activated using the 'BEGIN' action. Until the next 'BEGIN' action is 132930da1778Schristosexecuted, rules with the given start condition will be active and rules 133030da1778Schristoswith other start conditions will be inactive. If the start condition is 133130da1778Schristosinclusive, then rules with no start conditions at all will also be 133230da1778Schristosactive. If it is exclusive, then _only_ rules qualified with the start 133330da1778Schristoscondition will be active. A set of rules contingent on the same 133430da1778Schristosexclusive start condition describe a scanner which is independent of any 133530da1778Schristosof the other rules in the 'flex' input. Because of this, exclusive 133630da1778Schristosstart conditions make it easy to specify "mini-scanners" which scan 133730da1778Schristosportions of the input that are syntactically different from the rest 133830da1778Schristos(e.g., comments). 13393c3a7b76Schristos 13403c3a7b76Schristos If the distinction between inclusive and exclusive start conditions 13413c3a7b76Schristosis still a little vague, here's a simple example illustrating the 13423c3a7b76Schristosconnection between the two. The set of rules: 13433c3a7b76Schristos 13443c3a7b76Schristos %s example 13453c3a7b76Schristos %% 13463c3a7b76Schristos 13473c3a7b76Schristos <example>foo do_something(); 13483c3a7b76Schristos 13493c3a7b76Schristos bar something_else(); 13503c3a7b76Schristos 13513c3a7b76Schristos is equivalent to 13523c3a7b76Schristos 13533c3a7b76Schristos %x example 13543c3a7b76Schristos %% 13553c3a7b76Schristos 13563c3a7b76Schristos <example>foo do_something(); 13573c3a7b76Schristos 13583c3a7b76Schristos <INITIAL,example>bar something_else(); 13593c3a7b76Schristos 136030da1778Schristos Without the '<INITIAL,example>' qualifier, the 'bar' pattern in the 13613c3a7b76Schristossecond example wouldn't be active (i.e., couldn't match) when in start 136230da1778Schristoscondition 'example'. If we just used '<example>' to qualify 'bar', 136330da1778Schristosthough, then it would only be active in 'example' and not in 'INITIAL', 13643c3a7b76Schristoswhile in the first example it's active in both, because in the first 136530da1778Schristosexample the 'example' start condition is an inclusive '(%s)' start 13663c3a7b76Schristoscondition. 13673c3a7b76Schristos 136830da1778Schristos Also note that the special start-condition specifier '<*>' matches 13693c3a7b76Schristosevery start condition. Thus, the above example could also have been 13703c3a7b76Schristoswritten: 13713c3a7b76Schristos 13723c3a7b76Schristos %x example 13733c3a7b76Schristos %% 13743c3a7b76Schristos 13753c3a7b76Schristos <example>foo do_something(); 13763c3a7b76Schristos 13773c3a7b76Schristos <*>bar something_else(); 13783c3a7b76Schristos 137930da1778Schristos The default rule (to 'ECHO' any unmatched character) remains active 13803c3a7b76Schristosin start conditions. It is equivalent to: 13813c3a7b76Schristos 13823c3a7b76Schristos <*>.|\n ECHO; 13833c3a7b76Schristos 138430da1778Schristos 'BEGIN(0)' returns to the original state where only the rules with no 138530da1778Schristosstart conditions are active. This state can also be referred to as the 138630da1778Schristosstart-condition 'INITIAL', so 'BEGIN(INITIAL)' is equivalent to 138730da1778Schristos'BEGIN(0)'. (The parentheses around the start condition name are not 13883c3a7b76Schristosrequired but are considered good style.) 13893c3a7b76Schristos 139030da1778Schristos 'BEGIN' actions can also be given as indented code at the beginning 13913c3a7b76Schristosof the rules section. For example, the following will cause the scanner 139230da1778Schristosto enter the 'SPECIAL' start condition whenever 'yylex()' is called and 139330da1778Schristosthe global variable 'enter_special' is true: 13943c3a7b76Schristos 13953c3a7b76Schristos int enter_special; 13963c3a7b76Schristos 13973c3a7b76Schristos %x SPECIAL 13983c3a7b76Schristos %% 13993c3a7b76Schristos if ( enter_special ) 14003c3a7b76Schristos BEGIN(SPECIAL); 14013c3a7b76Schristos 14023c3a7b76Schristos <SPECIAL>blahblahblah 14033c3a7b76Schristos ...more rules follow... 14043c3a7b76Schristos 14053c3a7b76Schristos To illustrate the uses of start conditions, here is a scanner which 140630da1778Schristosprovides two different interpretations of a string like '123.456'. By 140730da1778Schristosdefault it will treat it as three tokens, the integer '123', a dot 140830da1778Schristos('.'), and the integer '456'. But if the string is preceded earlier in 140930da1778Schristosthe line by the string 'expect-floats' it will treat it as a single 141030da1778Schristostoken, the floating-point number '123.456': 14113c3a7b76Schristos 14123c3a7b76Schristos %{ 14133c3a7b76Schristos #include <math.h> 14143c3a7b76Schristos %} 14153c3a7b76Schristos %s expect 14163c3a7b76Schristos 14173c3a7b76Schristos %% 14183c3a7b76Schristos expect-floats BEGIN(expect); 14193c3a7b76Schristos 1420dded093eSchristos <expect>[0-9]+.[0-9]+ { 14213c3a7b76Schristos printf( "found a float, = %f\n", 14223c3a7b76Schristos atof( yytext ) ); 14233c3a7b76Schristos } 14243c3a7b76Schristos <expect>\n { 14253c3a7b76Schristos /* that's the end of the line, so 14263c3a7b76Schristos * we need another "expect-number" 14273c3a7b76Schristos * before we'll recognize any more 14283c3a7b76Schristos * numbers 14293c3a7b76Schristos */ 14303c3a7b76Schristos BEGIN(INITIAL); 14313c3a7b76Schristos } 14323c3a7b76Schristos 14333c3a7b76Schristos [0-9]+ { 14343c3a7b76Schristos printf( "found an integer, = %d\n", 14353c3a7b76Schristos atoi( yytext ) ); 14363c3a7b76Schristos } 14373c3a7b76Schristos 14383c3a7b76Schristos "." printf( "found a dot\n" ); 14393c3a7b76Schristos 14403c3a7b76Schristos Here is a scanner which recognizes (and discards) C comments while 14413c3a7b76Schristosmaintaining a count of the current input line. 14423c3a7b76Schristos 14433c3a7b76Schristos %x comment 14443c3a7b76Schristos %% 14453c3a7b76Schristos int line_num = 1; 14463c3a7b76Schristos 14473c3a7b76Schristos "/*" BEGIN(comment); 14483c3a7b76Schristos 14493c3a7b76Schristos <comment>[^*\n]* /* eat anything that's not a '*' */ 14503c3a7b76Schristos <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 14513c3a7b76Schristos <comment>\n ++line_num; 14523c3a7b76Schristos <comment>"*"+"/" BEGIN(INITIAL); 14533c3a7b76Schristos 14543c3a7b76Schristos This scanner goes to a bit of trouble to match as much text as 14553c3a7b76Schristospossible with each rule. In general, when attempting to write a 145630da1778Schristoshigh-speed scanner try to match as much possible in each rule, as it's a 145730da1778Schristosbig win. 14583c3a7b76Schristos 145930da1778Schristos Note that start-conditions names are really integer values and can be 146030da1778Schristosstored as such. Thus, the above could be extended in the following 14613c3a7b76Schristosfashion: 14623c3a7b76Schristos 14633c3a7b76Schristos %x comment foo 14643c3a7b76Schristos %% 14653c3a7b76Schristos int line_num = 1; 14663c3a7b76Schristos int comment_caller; 14673c3a7b76Schristos 14683c3a7b76Schristos "/*" { 14693c3a7b76Schristos comment_caller = INITIAL; 14703c3a7b76Schristos BEGIN(comment); 14713c3a7b76Schristos } 14723c3a7b76Schristos 14733c3a7b76Schristos ... 14743c3a7b76Schristos 14753c3a7b76Schristos <foo>"/*" { 14763c3a7b76Schristos comment_caller = foo; 14773c3a7b76Schristos BEGIN(comment); 14783c3a7b76Schristos } 14793c3a7b76Schristos 14803c3a7b76Schristos <comment>[^*\n]* /* eat anything that's not a '*' */ 14813c3a7b76Schristos <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ 14823c3a7b76Schristos <comment>\n ++line_num; 14833c3a7b76Schristos <comment>"*"+"/" BEGIN(comment_caller); 14843c3a7b76Schristos 14853c3a7b76Schristos Furthermore, you can access the current start condition using the 148630da1778Schristosinteger-valued 'YY_START' macro. For example, the above assignments to 148730da1778Schristos'comment_caller' could instead be written 14883c3a7b76Schristos 14893c3a7b76Schristos comment_caller = YY_START; 14903c3a7b76Schristos 149130da1778Schristos Flex provides 'YYSTATE' as an alias for 'YY_START' (since that is 149230da1778Schristoswhat's used by AT&T 'lex'). 14933c3a7b76Schristos 14943c3a7b76Schristos For historical reasons, start conditions do not have their own 14953c3a7b76Schristosname-space within the generated scanner. The start condition names are 14963c3a7b76Schristosunmodified in the generated scanner and generated header. *Note 14973c3a7b76Schristosoption-header::. *Note option-prefix::. 14983c3a7b76Schristos 14993c3a7b76Schristos Finally, here's an example of how to match C-style quoted strings 15003c3a7b76Schristosusing exclusive start conditions, including expanded escape sequences 15013c3a7b76Schristos(but not including checking for a string that's too long): 15023c3a7b76Schristos 15033c3a7b76Schristos %x str 15043c3a7b76Schristos 15053c3a7b76Schristos %% 15063c3a7b76Schristos char string_buf[MAX_STR_CONST]; 15073c3a7b76Schristos char *string_buf_ptr; 15083c3a7b76Schristos 15093c3a7b76Schristos 15103c3a7b76Schristos \" string_buf_ptr = string_buf; BEGIN(str); 15113c3a7b76Schristos 15123c3a7b76Schristos <str>\" { /* saw closing quote - all done */ 15133c3a7b76Schristos BEGIN(INITIAL); 15143c3a7b76Schristos *string_buf_ptr = '\0'; 15153c3a7b76Schristos /* return string constant token type and 15163c3a7b76Schristos * value to parser 15173c3a7b76Schristos */ 15183c3a7b76Schristos } 15193c3a7b76Schristos 15203c3a7b76Schristos <str>\n { 15213c3a7b76Schristos /* error - unterminated string constant */ 15223c3a7b76Schristos /* generate error message */ 15233c3a7b76Schristos } 15243c3a7b76Schristos 15253c3a7b76Schristos <str>\\[0-7]{1,3} { 15263c3a7b76Schristos /* octal escape sequence */ 15273c3a7b76Schristos int result; 15283c3a7b76Schristos 15293c3a7b76Schristos (void) sscanf( yytext + 1, "%o", &result ); 15303c3a7b76Schristos 15313c3a7b76Schristos if ( result > 0xff ) 15323c3a7b76Schristos /* error, constant is out-of-bounds */ 15333c3a7b76Schristos 15343c3a7b76Schristos *string_buf_ptr++ = result; 15353c3a7b76Schristos } 15363c3a7b76Schristos 15373c3a7b76Schristos <str>\\[0-9]+ { 15383c3a7b76Schristos /* generate error - bad escape sequence; something 15393c3a7b76Schristos * like '\48' or '\0777777' 15403c3a7b76Schristos */ 15413c3a7b76Schristos } 15423c3a7b76Schristos 15433c3a7b76Schristos <str>\\n *string_buf_ptr++ = '\n'; 15443c3a7b76Schristos <str>\\t *string_buf_ptr++ = '\t'; 15453c3a7b76Schristos <str>\\r *string_buf_ptr++ = '\r'; 15463c3a7b76Schristos <str>\\b *string_buf_ptr++ = '\b'; 15473c3a7b76Schristos <str>\\f *string_buf_ptr++ = '\f'; 15483c3a7b76Schristos 15493c3a7b76Schristos <str>\\(.|\n) *string_buf_ptr++ = yytext[1]; 15503c3a7b76Schristos 15513c3a7b76Schristos <str>[^\\\n\"]+ { 15523c3a7b76Schristos char *yptr = yytext; 15533c3a7b76Schristos 15543c3a7b76Schristos while ( *yptr ) 15553c3a7b76Schristos *string_buf_ptr++ = *yptr++; 15563c3a7b76Schristos } 15573c3a7b76Schristos 15583c3a7b76Schristos Often, such as in some of the examples above, you wind up writing a 15593c3a7b76Schristoswhole bunch of rules all preceded by the same start condition(s). Flex 15603c3a7b76Schristosmakes this a little easier and cleaner by introducing a notion of start 15613c3a7b76Schristoscondition "scope". A start condition scope is begun with: 15623c3a7b76Schristos 15633c3a7b76Schristos <SCs>{ 15643c3a7b76Schristos 156530da1778Schristos where '<SCs>' is a list of one or more start conditions. Inside the 156630da1778Schristosstart condition scope, every rule automatically has the prefix '<SCs>' 156730da1778Schristosapplied to it, until a '}' which matches the initial '{'. So, for 15683c3a7b76Schristosexample, 15693c3a7b76Schristos 15703c3a7b76Schristos <ESC>{ 15713c3a7b76Schristos "\\n" return '\n'; 15723c3a7b76Schristos "\\r" return '\r'; 15733c3a7b76Schristos "\\f" return '\f'; 15743c3a7b76Schristos "\\0" return '\0'; 15753c3a7b76Schristos } 15763c3a7b76Schristos 15773c3a7b76Schristos is equivalent to: 15783c3a7b76Schristos 15793c3a7b76Schristos <ESC>"\\n" return '\n'; 15803c3a7b76Schristos <ESC>"\\r" return '\r'; 15813c3a7b76Schristos <ESC>"\\f" return '\f'; 15823c3a7b76Schristos <ESC>"\\0" return '\0'; 15833c3a7b76Schristos 15843c3a7b76Schristos Start condition scopes may be nested. 15853c3a7b76Schristos 158630da1778Schristos The following routines are available for manipulating stacks of start 158730da1778Schristosconditions: 15883c3a7b76Schristos 158930da1778Schristos -- Function: void yy_push_state ( int 'new_state' ) 15903c3a7b76Schristos pushes the current start condition onto the top of the start 159130da1778Schristos condition stack and switches to 'new_state' as though you had used 159230da1778Schristos 'BEGIN new_state' (recall that start condition names are also 15933c3a7b76Schristos integers). 15943c3a7b76Schristos 15953c3a7b76Schristos -- Function: void yy_pop_state () 159630da1778Schristos pops the top of the stack and switches to it via 'BEGIN'. 15973c3a7b76Schristos 15983c3a7b76Schristos -- Function: int yy_top_state () 15993c3a7b76Schristos returns the top of the stack without altering the stack's contents. 16003c3a7b76Schristos 16013c3a7b76Schristos The start condition stack grows dynamically and so has no built-in 16023c3a7b76Schristossize limitation. If memory is exhausted, program execution aborts. 16033c3a7b76Schristos 160430da1778Schristos To use start condition stacks, your scanner must include a '%option 16053c3a7b76Schristosstack' directive (*note Scanner Options::). 16063c3a7b76Schristos 16073c3a7b76Schristos 16083c3a7b76SchristosFile: flex.info, Node: Multiple Input Buffers, Next: EOF, Prev: Start Conditions, Up: Top 16093c3a7b76Schristos 16103c3a7b76Schristos11 Multiple Input Buffers 16113c3a7b76Schristos************************* 16123c3a7b76Schristos 16133c3a7b76SchristosSome scanners (such as those which support "include" files) require 161430da1778Schristosreading from several input streams. As 'flex' scanners do a large 16153c3a7b76Schristosamount of buffering, one cannot control where the next input will be 161630da1778Schristosread from by simply writing a 'YY_INPUT()' which is sensitive to the 161730da1778Schristosscanning context. 'YY_INPUT()' is only called when the scanner reaches 16183c3a7b76Schristosthe end of its buffer, which may be a long time after scanning a 161930da1778Schristosstatement such as an 'include' statement which requires switching the 16203c3a7b76Schristosinput source. 16213c3a7b76Schristos 162230da1778Schristos To negotiate these sorts of problems, 'flex' provides a mechanism for 162330da1778Schristoscreating and switching between multiple input buffers. An input buffer 162430da1778Schristosis created by using: 16253c3a7b76Schristos 16263c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size ) 16273c3a7b76Schristos 162830da1778Schristos which takes a 'FILE' pointer and a size and creates a buffer 162930da1778Schristosassociated with the given file and large enough to hold 'size' 163030da1778Schristoscharacters (when in doubt, use 'YY_BUF_SIZE' for the size). It returns 163130da1778Schristosa 'YY_BUFFER_STATE' handle, which may then be passed to other routines 163230da1778Schristos(see below). The 'YY_BUFFER_STATE' type is a pointer to an opaque 163330da1778Schristos'struct yy_buffer_state' structure, so you may safely initialize 163430da1778Schristos'YY_BUFFER_STATE' variables to '((YY_BUFFER_STATE) 0)' if you wish, and 16353c3a7b76Schristosalso refer to the opaque structure in order to correctly declare input 16363c3a7b76Schristosbuffers in source files other than that of your scanner. Note that the 163730da1778Schristos'FILE' pointer in the call to 'yy_create_buffer' is only used as the 163830da1778Schristosvalue of 'yyin' seen by 'YY_INPUT'. If you redefine 'YY_INPUT()' so it 163930da1778Schristosno longer uses 'yyin', then you can safely pass a NULL 'FILE' pointer to 164030da1778Schristos'yy_create_buffer'. You select a particular buffer to scan from using: 16413c3a7b76Schristos 16423c3a7b76Schristos -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer ) 16433c3a7b76Schristos 16443c3a7b76Schristos The above function switches the scanner's input buffer so subsequent 164530da1778Schristostokens will come from 'new_buffer'. Note that 'yy_switch_to_buffer()' 164630da1778Schristosmay be used by 'yywrap()' to set things up for continued scanning, 164730da1778Schristosinstead of opening a new file and pointing 'yyin' at it. If you are 16483c3a7b76Schristoslooking for a stack of input buffers, then you want to use 164930da1778Schristos'yypush_buffer_state()' instead of this function. Note also that 165030da1778Schristosswitching input sources via either 'yy_switch_to_buffer()' or 'yywrap()' 165130da1778Schristosdoes _not_ change the start condition. 16523c3a7b76Schristos 16533c3a7b76Schristos -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer ) 16543c3a7b76Schristos 165530da1778Schristos is used to reclaim the storage associated with a buffer. ('buffer' 16563c3a7b76Schristoscan be NULL, in which case the routine does nothing.) You can also 16573c3a7b76Schristosclear the current contents of a buffer using: 16583c3a7b76Schristos 16593c3a7b76Schristos -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer ) 16603c3a7b76Schristos 16613c3a7b76Schristos This function pushes the new buffer state onto an internal stack. 16623c3a7b76SchristosThe pushed state becomes the new current state. The stack is maintained 16633c3a7b76Schristosby flex and will grow as required. This function is intended to be used 166430da1778Schristosinstead of 'yy_switch_to_buffer', when you want to change states, but 16653c3a7b76Schristospreserve the current state for later use. 16663c3a7b76Schristos 16673c3a7b76Schristos -- Function: void yypop_buffer_state ( ) 16683c3a7b76Schristos 16693c3a7b76Schristos This function removes the current state from the top of the stack, 167030da1778Schristosand deletes it by calling 'yy_delete_buffer'. The next state on the 16713c3a7b76Schristosstack, if any, becomes the new current state. 16723c3a7b76Schristos 16733c3a7b76Schristos -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer ) 16743c3a7b76Schristos 16753c3a7b76Schristos This function discards the buffer's contents, so the next time the 16763c3a7b76Schristosscanner attempts to match a token from the buffer, it will first fill 167730da1778Schristosthe buffer anew using 'YY_INPUT()'. 16783c3a7b76Schristos 16793c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size ) 16803c3a7b76Schristos 168130da1778Schristos is an alias for 'yy_create_buffer()', provided for compatibility with 168230da1778Schristosthe C++ use of 'new' and 'delete' for creating and destroying dynamic 168330da1778Schristosobjects. 16843c3a7b76Schristos 168530da1778Schristos 'YY_CURRENT_BUFFER' macro returns a 'YY_BUFFER_STATE' handle to the 16863c3a7b76Schristoscurrent buffer. It should not be used as an lvalue. 16873c3a7b76Schristos 16883c3a7b76Schristos Here are two examples of using these features for writing a scanner 168930da1778Schristoswhich expands include files (the '<<EOF>>' feature is discussed below). 16903c3a7b76Schristos 16913c3a7b76Schristos This first example uses yypush_buffer_state and yypop_buffer_state. 16923c3a7b76SchristosFlex maintains the stack internally. 16933c3a7b76Schristos 16943c3a7b76Schristos /* the "incl" state is used for picking up the name 16953c3a7b76Schristos * of an include file 16963c3a7b76Schristos */ 16973c3a7b76Schristos %x incl 16983c3a7b76Schristos %% 16993c3a7b76Schristos include BEGIN(incl); 17003c3a7b76Schristos 17013c3a7b76Schristos [a-z]+ ECHO; 17023c3a7b76Schristos [^a-z\n]*\n? ECHO; 17033c3a7b76Schristos 17043c3a7b76Schristos <incl>[ \t]* /* eat the whitespace */ 17053c3a7b76Schristos <incl>[^ \t\n]+ { /* got the include file name */ 17063c3a7b76Schristos yyin = fopen( yytext, "r" ); 17073c3a7b76Schristos 17083c3a7b76Schristos if ( ! yyin ) 17093c3a7b76Schristos error( ... ); 17103c3a7b76Schristos 17113c3a7b76Schristos yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE )); 17123c3a7b76Schristos 17133c3a7b76Schristos BEGIN(INITIAL); 17143c3a7b76Schristos } 17153c3a7b76Schristos 17163c3a7b76Schristos <<EOF>> { 17173c3a7b76Schristos yypop_buffer_state(); 17183c3a7b76Schristos 17193c3a7b76Schristos if ( !YY_CURRENT_BUFFER ) 17203c3a7b76Schristos { 17213c3a7b76Schristos yyterminate(); 17223c3a7b76Schristos } 17233c3a7b76Schristos } 17243c3a7b76Schristos 17253c3a7b76Schristos The second example, below, does the same thing as the previous 172630da1778Schristosexample did, but manages its own input buffer stack manually (instead of 172730da1778Schristosletting flex do it). 17283c3a7b76Schristos 17293c3a7b76Schristos /* the "incl" state is used for picking up the name 17303c3a7b76Schristos * of an include file 17313c3a7b76Schristos */ 17323c3a7b76Schristos %x incl 17333c3a7b76Schristos 17343c3a7b76Schristos %{ 17353c3a7b76Schristos #define MAX_INCLUDE_DEPTH 10 17363c3a7b76Schristos YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH]; 17373c3a7b76Schristos int include_stack_ptr = 0; 17383c3a7b76Schristos %} 17393c3a7b76Schristos 17403c3a7b76Schristos %% 17413c3a7b76Schristos include BEGIN(incl); 17423c3a7b76Schristos 17433c3a7b76Schristos [a-z]+ ECHO; 17443c3a7b76Schristos [^a-z\n]*\n? ECHO; 17453c3a7b76Schristos 17463c3a7b76Schristos <incl>[ \t]* /* eat the whitespace */ 17473c3a7b76Schristos <incl>[^ \t\n]+ { /* got the include file name */ 17483c3a7b76Schristos if ( include_stack_ptr >= MAX_INCLUDE_DEPTH ) 17493c3a7b76Schristos { 17503c3a7b76Schristos fprintf( stderr, "Includes nested too deeply" ); 17513c3a7b76Schristos exit( 1 ); 17523c3a7b76Schristos } 17533c3a7b76Schristos 17543c3a7b76Schristos include_stack[include_stack_ptr++] = 17553c3a7b76Schristos YY_CURRENT_BUFFER; 17563c3a7b76Schristos 17573c3a7b76Schristos yyin = fopen( yytext, "r" ); 17583c3a7b76Schristos 17593c3a7b76Schristos if ( ! yyin ) 17603c3a7b76Schristos error( ... ); 17613c3a7b76Schristos 17623c3a7b76Schristos yy_switch_to_buffer( 17633c3a7b76Schristos yy_create_buffer( yyin, YY_BUF_SIZE ) ); 17643c3a7b76Schristos 17653c3a7b76Schristos BEGIN(INITIAL); 17663c3a7b76Schristos } 17673c3a7b76Schristos 17683c3a7b76Schristos <<EOF>> { 1769*463ae347Schristos if ( --include_stack_ptr == 0 ) 17703c3a7b76Schristos { 17713c3a7b76Schristos yyterminate(); 17723c3a7b76Schristos } 17733c3a7b76Schristos 17743c3a7b76Schristos else 17753c3a7b76Schristos { 17763c3a7b76Schristos yy_delete_buffer( YY_CURRENT_BUFFER ); 17773c3a7b76Schristos yy_switch_to_buffer( 17783c3a7b76Schristos include_stack[include_stack_ptr] ); 17793c3a7b76Schristos } 17803c3a7b76Schristos } 17813c3a7b76Schristos 17823c3a7b76Schristos The following routines are available for setting up input buffers for 17833c3a7b76Schristosscanning in-memory strings instead of files. All of them create a new 17843c3a7b76Schristosinput buffer for scanning the string, and return a corresponding 178530da1778Schristos'YY_BUFFER_STATE' handle (which you should delete with 178630da1778Schristos'yy_delete_buffer()' when done with it). They also switch to the new 178730da1778Schristosbuffer using 'yy_switch_to_buffer()', so the next call to 'yylex()' will 178830da1778Schristosstart scanning the string. 17893c3a7b76Schristos 17903c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str ) 17913c3a7b76Schristos scans a NUL-terminated string. 17923c3a7b76Schristos 179330da1778Schristos -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len 179430da1778Schristos ) 179530da1778Schristos scans 'len' bytes (including possibly 'NUL's) starting at location 179630da1778Schristos 'bytes'. 17973c3a7b76Schristos 17983c3a7b76Schristos Note that both of these functions create and scan a _copy_ of the 179930da1778Schristosstring or bytes. (This may be desirable, since 'yylex()' modifies the 18003c3a7b76Schristoscontents of the buffer it is scanning.) You can avoid the copy by 18013c3a7b76Schristosusing: 18023c3a7b76Schristos 18033c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t 18043c3a7b76Schristos size) 180530da1778Schristos which scans in place the buffer starting at 'base', consisting of 180630da1778Schristos 'size' bytes, the last two bytes of which _must_ be 180730da1778Schristos 'YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not 180830da1778Schristos scanned; thus, scanning consists of 'base[0]' through 180930da1778Schristos 'base[size-2]', inclusive. 18103c3a7b76Schristos 181130da1778Schristos If you fail to set up 'base' in this manner (i.e., forget the final 181230da1778Schristostwo 'YY_END_OF_BUFFER_CHAR' bytes), then 'yy_scan_buffer()' returns a 18133c3a7b76SchristosNULL pointer instead of creating a new input buffer. 18143c3a7b76Schristos 18153c3a7b76Schristos -- Data type: yy_size_t 18163c3a7b76Schristos is an integral type to which you can cast an integer expression 18173c3a7b76Schristos reflecting the size of the buffer. 18183c3a7b76Schristos 18193c3a7b76Schristos 18203c3a7b76SchristosFile: flex.info, Node: EOF, Next: Misc Macros, Prev: Multiple Input Buffers, Up: Top 18213c3a7b76Schristos 18223c3a7b76Schristos12 End-of-File Rules 18233c3a7b76Schristos******************** 18243c3a7b76Schristos 182530da1778SchristosThe special rule '<<EOF>>' indicates actions which are to be taken when 182630da1778Schristosan end-of-file is encountered and 'yywrap()' returns non-zero (i.e., 182730da1778Schristosindicates no further files to process). The action must finish by doing 182830da1778Schristosone of the following things: 18293c3a7b76Schristos 183030da1778Schristos * assigning 'yyin' to a new input file (in previous versions of 183130da1778Schristos 'flex', after doing the assignment you had to call the special 183230da1778Schristos action 'YY_NEW_FILE'. This is no longer necessary.) 18333c3a7b76Schristos 183430da1778Schristos * executing a 'return' statement; 18353c3a7b76Schristos 183630da1778Schristos * executing the special 'yyterminate()' action. 18373c3a7b76Schristos 183830da1778Schristos * or, switching to a new buffer using 'yy_switch_to_buffer()' as 18393c3a7b76Schristos shown in the example above. 18403c3a7b76Schristos 18413c3a7b76Schristos <<EOF>> rules may not be used with other patterns; they may only be 18423c3a7b76Schristosqualified with a list of start conditions. If an unqualified <<EOF>> 184330da1778Schristosrule is given, it applies to _all_ start conditions which do not already 184430da1778Schristoshave <<EOF>> actions. To specify an <<EOF>> rule for only the initial 184530da1778Schristosstart condition, use: 18463c3a7b76Schristos 18473c3a7b76Schristos <INITIAL><<EOF>> 18483c3a7b76Schristos 18493c3a7b76Schristos These rules are useful for catching things like unclosed comments. 18503c3a7b76SchristosAn example: 18513c3a7b76Schristos 18523c3a7b76Schristos %x quote 18533c3a7b76Schristos %% 18543c3a7b76Schristos 18553c3a7b76Schristos ...other rules for dealing with quotes... 18563c3a7b76Schristos 18573c3a7b76Schristos <quote><<EOF>> { 18583c3a7b76Schristos error( "unterminated quote" ); 18593c3a7b76Schristos yyterminate(); 18603c3a7b76Schristos } 18613c3a7b76Schristos <<EOF>> { 18623c3a7b76Schristos if ( *++filelist ) 18633c3a7b76Schristos yyin = fopen( *filelist, "r" ); 18643c3a7b76Schristos else 18653c3a7b76Schristos yyterminate(); 18663c3a7b76Schristos } 18673c3a7b76Schristos 18683c3a7b76Schristos 18693c3a7b76SchristosFile: flex.info, Node: Misc Macros, Next: User Values, Prev: EOF, Up: Top 18703c3a7b76Schristos 18713c3a7b76Schristos13 Miscellaneous Macros 18723c3a7b76Schristos*********************** 18733c3a7b76Schristos 187430da1778SchristosThe macro 'YY_USER_ACTION' can be defined to provide an action which is 18753c3a7b76Schristosalways executed prior to the matched rule's action. For example, it 18763c3a7b76Schristoscould be #define'd to call a routine to convert yytext to lower-case. 187730da1778SchristosWhen 'YY_USER_ACTION' is invoked, the variable 'yy_act' gives the number 187830da1778Schristosof the matched rule (rules are numbered starting with 1). Suppose you 187930da1778Schristoswant to profile how often each of your rules is matched. The following 188030da1778Schristoswould do the trick: 18813c3a7b76Schristos 18823c3a7b76Schristos #define YY_USER_ACTION ++ctr[yy_act] 18833c3a7b76Schristos 188430da1778Schristos where 'ctr' is an array to hold the counts for the different rules. 188530da1778SchristosNote that the macro 'YY_NUM_RULES' gives the total number of rules 188630da1778Schristos(including the default rule), even if you use '-s)', so a correct 188730da1778Schristosdeclaration for 'ctr' is: 18883c3a7b76Schristos 18893c3a7b76Schristos int ctr[YY_NUM_RULES]; 18903c3a7b76Schristos 189130da1778Schristos The macro 'YY_USER_INIT' may be defined to provide an action which is 189230da1778Schristosalways executed before the first scan (and before the scanner's internal 189330da1778Schristosinitializations are done). For example, it could be used to call a 189430da1778Schristosroutine to read in a data table or open a logging file. 18953c3a7b76Schristos 189630da1778Schristos The macro 'yy_set_interactive(is_interactive)' can be used to control 189730da1778Schristoswhether the current buffer is considered "interactive". An interactive 189830da1778Schristosbuffer is processed more slowly, but must be used when the scanner's 189930da1778Schristosinput source is indeed interactive to avoid problems due to waiting to 190030da1778Schristosfill buffers (see the discussion of the '-I' flag in *note Scanner 190130da1778SchristosOptions::). A non-zero value in the macro invocation marks the buffer 190230da1778Schristosas interactive, a zero value as non-interactive. Note that use of this 190330da1778Schristosmacro overrides '%option always-interactive' or '%option 190430da1778Schristosnever-interactive' (*note Scanner Options::). 'yy_set_interactive()' 19053c3a7b76Schristosmust be invoked prior to beginning to scan the buffer that is (or is 19063c3a7b76Schristosnot) to be considered interactive. 19073c3a7b76Schristos 190830da1778Schristos The macro 'yy_set_bol(at_bol)' can be used to control whether the 19093c3a7b76Schristoscurrent buffer's scanning context for the next token match is done as 19103c3a7b76Schristosthough at the beginning of a line. A non-zero macro argument makes 191130da1778Schristosrules anchored with '^' active, while a zero argument makes '^' rules 19123c3a7b76Schristosinactive. 19133c3a7b76Schristos 191430da1778Schristos The macro 'YY_AT_BOL()' returns true if the next token scanned from 191530da1778Schristosthe current buffer will have '^' rules active, false otherwise. 19163c3a7b76Schristos 19173c3a7b76Schristos In the generated scanner, the actions are all gathered in one large 191830da1778Schristosswitch statement and separated using 'YY_BREAK', which may be redefined. 191930da1778SchristosBy default, it is simply a 'break', to separate each rule's action from 192030da1778Schristosthe following rule's. Redefining 'YY_BREAK' allows, for example, C++ 192130da1778Schristosusers to #define YY_BREAK to do nothing (while being very careful that 192230da1778Schristosevery rule ends with a 'break' or a 'return'!) to avoid suffering from 192330da1778Schristosunreachable statement warnings where because a rule's action ends with 192430da1778Schristos'return', the 'YY_BREAK' is inaccessible. 19253c3a7b76Schristos 19263c3a7b76Schristos 19273c3a7b76SchristosFile: flex.info, Node: User Values, Next: Yacc, Prev: Misc Macros, Up: Top 19283c3a7b76Schristos 19293c3a7b76Schristos14 Values Available To the User 19303c3a7b76Schristos******************************* 19313c3a7b76Schristos 19323c3a7b76SchristosThis chapter summarizes the various values available to the user in the 19333c3a7b76Schristosrule actions. 19343c3a7b76Schristos 193530da1778Schristos'char *yytext' 19363c3a7b76Schristos holds the text of the current token. It may be modified but not 19373c3a7b76Schristos lengthened (you cannot append characters to the end). 19383c3a7b76Schristos 193930da1778Schristos If the special directive '%array' appears in the first section of 194030da1778Schristos the scanner description, then 'yytext' is instead declared 'char 194130da1778Schristos yytext[YYLMAX]', where 'YYLMAX' is a macro definition that you can 19423c3a7b76Schristos redefine in the first section if you don't like the default value 194330da1778Schristos (generally 8KB). Using '%array' results in somewhat slower 194430da1778Schristos scanners, but the value of 'yytext' becomes immune to calls to 194530da1778Schristos 'unput()', which potentially destroy its value when 'yytext' is a 194630da1778Schristos character pointer. The opposite of '%array' is '%pointer', which 19473c3a7b76Schristos is the default. 19483c3a7b76Schristos 194930da1778Schristos You cannot use '%array' when generating C++ scanner classes (the 195030da1778Schristos '-+' flag). 19513c3a7b76Schristos 195230da1778Schristos'int yyleng' 19533c3a7b76Schristos holds the length of the current token. 19543c3a7b76Schristos 195530da1778Schristos'FILE *yyin' 195630da1778Schristos is the file which by default 'flex' reads from. It may be 19573c3a7b76Schristos redefined but doing so only makes sense before scanning begins or 19583c3a7b76Schristos after an EOF has been encountered. Changing it in the midst of 195930da1778Schristos scanning will have unexpected results since 'flex' buffers its 196030da1778Schristos input; use 'yyrestart()' instead. Once scanning terminates because 196130da1778Schristos an end-of-file has been seen, you can assign 'yyin' at the new 196230da1778Schristos input file and then call the scanner again to continue scanning. 19633c3a7b76Schristos 196430da1778Schristos'void yyrestart( FILE *new_file )' 196530da1778Schristos may be called to point 'yyin' at the new input file. The 19663c3a7b76Schristos switch-over to the new file is immediate (any previously 196730da1778Schristos buffered-up input is lost). Note that calling 'yyrestart()' with 196830da1778Schristos 'yyin' as an argument thus throws away the current input buffer and 196930da1778Schristos continues scanning the same input file. 19703c3a7b76Schristos 197130da1778Schristos'FILE *yyout' 197230da1778Schristos is the file to which 'ECHO' actions are done. It can be reassigned 19733c3a7b76Schristos by the user. 19743c3a7b76Schristos 197530da1778Schristos'YY_CURRENT_BUFFER' 197630da1778Schristos returns a 'YY_BUFFER_STATE' handle to the current buffer. 19773c3a7b76Schristos 197830da1778Schristos'YY_START' 19793c3a7b76Schristos returns an integer value corresponding to the current start 198030da1778Schristos condition. You can subsequently use this value with 'BEGIN' to 19813c3a7b76Schristos return to that start condition. 19823c3a7b76Schristos 19833c3a7b76Schristos 19843c3a7b76SchristosFile: flex.info, Node: Yacc, Next: Scanner Options, Prev: User Values, Up: Top 19853c3a7b76Schristos 19863c3a7b76Schristos15 Interfacing with Yacc 19873c3a7b76Schristos************************ 19883c3a7b76Schristos 198930da1778SchristosOne of the main uses of 'flex' is as a companion to the 'yacc' 199030da1778Schristosparser-generator. 'yacc' parsers expect to call a routine named 199130da1778Schristos'yylex()' to find the next input token. The routine is supposed to 19923c3a7b76Schristosreturn the type of the next token as well as putting any associated 199330da1778Schristosvalue in the global 'yylval'. To use 'flex' with 'yacc', one specifies 199430da1778Schristosthe '-d' option to 'yacc' to instruct it to generate the file 'y.tab.h' 199530da1778Schristoscontaining definitions of all the '%tokens' appearing in the 'yacc' 199630da1778Schristosinput. This file is then included in the 'flex' scanner. For example, 199730da1778Schristosif one of the tokens is 'TOK_NUMBER', part of the scanner might look 19983c3a7b76Schristoslike: 19993c3a7b76Schristos 20003c3a7b76Schristos %{ 20013c3a7b76Schristos #include "y.tab.h" 20023c3a7b76Schristos %} 20033c3a7b76Schristos 20043c3a7b76Schristos %% 20053c3a7b76Schristos 20063c3a7b76Schristos [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER; 20073c3a7b76Schristos 20083c3a7b76Schristos 20093c3a7b76SchristosFile: flex.info, Node: Scanner Options, Next: Performance, Prev: Yacc, Up: Top 20103c3a7b76Schristos 20113c3a7b76Schristos16 Scanner Options 20123c3a7b76Schristos****************** 20133c3a7b76Schristos 201430da1778SchristosThe various 'flex' options are categorized by function in the following 20153c3a7b76Schristosmenu. If you want to lookup a particular option by name, *Note Index of 20163c3a7b76SchristosScanner Options::. 20173c3a7b76Schristos 20183c3a7b76Schristos* Menu: 20193c3a7b76Schristos 20203c3a7b76Schristos* Options for Specifying Filenames:: 20213c3a7b76Schristos* Options Affecting Scanner Behavior:: 20223c3a7b76Schristos* Code-Level And API Options:: 20233c3a7b76Schristos* Options for Scanner Speed and Size:: 20243c3a7b76Schristos* Debugging Options:: 20253c3a7b76Schristos* Miscellaneous Options:: 20263c3a7b76Schristos 20273c3a7b76Schristos Even though there are many scanner options, a typical scanner might 20283c3a7b76Schristosonly specify the following options: 20293c3a7b76Schristos 20303c3a7b76Schristos %option 8bit reentrant bison-bridge 20313c3a7b76Schristos %option warn nodefault 20323c3a7b76Schristos %option yylineno 20333c3a7b76Schristos %option outfile="scanner.c" header-file="scanner.h" 20343c3a7b76Schristos 20353c3a7b76Schristos The first line specifies the general type of scanner we want. The 20363c3a7b76Schristossecond line specifies that we are being careful. The third line asks 20373c3a7b76Schristosflex to track line numbers. The last line tells flex what to name the 20383c3a7b76Schristosfiles. (The options can be specified in any order. We just divided 20393c3a7b76Schristosthem.) 20403c3a7b76Schristos 204130da1778Schristos 'flex' also provides a mechanism for controlling options within the 20423c3a7b76Schristosscanner specification itself, rather than from the flex command-line. 204330da1778SchristosThis is done by including '%option' directives in the first section of 20443c3a7b76Schristosthe scanner specification. You can specify multiple options with a 204530da1778Schristossingle '%option' directive, and multiple directives in the first section 204630da1778Schristosof your flex input file. 20473c3a7b76Schristos 20483c3a7b76Schristos Most options are given simply as names, optionally preceded by the 204930da1778Schristosword 'no' (with no intervening whitespace) to negate their meaning. The 205030da1778Schristosnames are the same as their long-option equivalents (but without the 205130da1778Schristosleading '--' ). 20523c3a7b76Schristos 205330da1778Schristos 'flex' scans your rule actions to determine whether you use the 205430da1778Schristos'REJECT' or 'yymore()' features. The 'REJECT' and 'yymore' options are 20553c3a7b76Schristosavailable to override its decision as to whether you use the options, 205630da1778Schristoseither by setting them (e.g., '%option reject)' to indicate the feature 20573c3a7b76Schristosis indeed used, or unsetting them to indicate it actually is not used 205830da1778Schristos(e.g., '%option noyymore)'. 20593c3a7b76Schristos 20603c3a7b76Schristos A number of options are available for lint purists who want to 20613c3a7b76Schristossuppress the appearance of unneeded routines in the generated scanner. 206230da1778SchristosEach of the following, if unset (e.g., '%option nounput'), results in 20633c3a7b76Schristosthe corresponding routine not appearing in the generated scanner: 20643c3a7b76Schristos 20653c3a7b76Schristos input, unput 20663c3a7b76Schristos yy_push_state, yy_pop_state, yy_top_state 20673c3a7b76Schristos yy_scan_buffer, yy_scan_bytes, yy_scan_string 20683c3a7b76Schristos 20693c3a7b76Schristos yyget_extra, yyset_extra, yyget_leng, yyget_text, 20703c3a7b76Schristos yyget_lineno, yyset_lineno, yyget_in, yyset_in, 20713c3a7b76Schristos yyget_out, yyset_out, yyget_lval, yyset_lval, 20723c3a7b76Schristos yyget_lloc, yyset_lloc, yyget_debug, yyset_debug 20733c3a7b76Schristos 207430da1778Schristos (though 'yy_push_state()' and friends won't appear anyway unless you 207530da1778Schristosuse '%option stack)'. 20763c3a7b76Schristos 20773c3a7b76Schristos 20783c3a7b76SchristosFile: flex.info, Node: Options for Specifying Filenames, Next: Options Affecting Scanner Behavior, Prev: Scanner Options, Up: Scanner Options 20793c3a7b76Schristos 20803c3a7b76Schristos16.1 Options for Specifying Filenames 20813c3a7b76Schristos===================================== 20823c3a7b76Schristos 208330da1778Schristos'--header-file=FILE, '%option header-file="FILE"'' 208430da1778Schristos instructs flex to write a C header to 'FILE'. This file contains 20853c3a7b76Schristos function prototypes, extern variables, and types used by the 20863c3a7b76Schristos scanner. Only the external API is exported by the header file. 20873c3a7b76Schristos Many macros that are usable from within scanner actions are not 20883c3a7b76Schristos exported to the header file. This is due to namespace problems and 20893c3a7b76Schristos the goal of a clean external API. 20903c3a7b76Schristos 209130da1778Schristos While in the header, the macro 'yyIN_HEADER' is defined, where 'yy' 20923c3a7b76Schristos is substituted with the appropriate prefix. 20933c3a7b76Schristos 209430da1778Schristos The '--header-file' option is not compatible with the '--c++' 20953c3a7b76Schristos option, since the C++ scanner provides its own header in 209630da1778Schristos 'yyFlexLexer.h'. 20973c3a7b76Schristos 209830da1778Schristos'-oFILE, --outfile=FILE, '%option outfile="FILE"'' 209930da1778Schristos directs flex to write the scanner to the file 'FILE' instead of 210030da1778Schristos 'lex.yy.c'. If you combine '--outfile' with the '--stdout' option, 210130da1778Schristos then the scanner is written to 'stdout' but its '#line' directives 210230da1778Schristos (see the '-l' option above) refer to the file 'FILE'. 21033c3a7b76Schristos 210430da1778Schristos'-t, --stdout, '%option stdout'' 210530da1778Schristos instructs 'flex' to write the scanner it generates to standard 210630da1778Schristos output instead of 'lex.yy.c'. 21073c3a7b76Schristos 210830da1778Schristos'-SFILE, --skel=FILE' 210930da1778Schristos overrides the default skeleton file from which 'flex' constructs 21103c3a7b76Schristos its scanners. You'll never need this option unless you are doing 211130da1778Schristos 'flex' maintenance or development. 21123c3a7b76Schristos 211330da1778Schristos'--tables-file=FILE' 21143c3a7b76Schristos Write serialized scanner dfa tables to FILE. The generated scanner 21153c3a7b76Schristos will not contain the tables, and requires them to be loaded at 21163c3a7b76Schristos runtime. *Note serialization::. 21173c3a7b76Schristos 211830da1778Schristos'--tables-verify' 21193c3a7b76Schristos This option is for flex development. We document it here in case 21203c3a7b76Schristos you stumble upon it by accident or in case you suspect some 21213c3a7b76Schristos inconsistency in the serialized tables. Flex will serialize the 21223c3a7b76Schristos scanner dfa tables but will also generate the in-code tables as it 21233c3a7b76Schristos normally does. At runtime, the scanner will verify that the 21243c3a7b76Schristos serialized tables match the in-code tables, instead of loading 21253c3a7b76Schristos them. 21263c3a7b76Schristos 21273c3a7b76Schristos 21283c3a7b76SchristosFile: flex.info, Node: Options Affecting Scanner Behavior, Next: Code-Level And API Options, Prev: Options for Specifying Filenames, Up: Scanner Options 21293c3a7b76Schristos 21303c3a7b76Schristos16.2 Options Affecting Scanner Behavior 21313c3a7b76Schristos======================================= 21323c3a7b76Schristos 213330da1778Schristos'-i, --case-insensitive, '%option case-insensitive'' 213430da1778Schristos instructs 'flex' to generate a "case-insensitive" scanner. The 213530da1778Schristos case of letters given in the 'flex' input patterns will be ignored, 21363c3a7b76Schristos and tokens in the input will be matched regardless of case. The 213730da1778Schristos matched text given in 'yytext' will have the preserved case (i.e., 2138dded093eSchristos it will not be folded). For tricky behavior, see *note case and 21393c3a7b76Schristos character ranges::. 21403c3a7b76Schristos 214130da1778Schristos'-l, --lex-compat, '%option lex-compat'' 214230da1778Schristos turns on maximum compatibility with the original AT&T 'lex' 21433c3a7b76Schristos implementation. Note that this does not mean _full_ compatibility. 21443c3a7b76Schristos Use of this option costs a considerable amount of performance, and 214530da1778Schristos it cannot be used with the '--c++', '--full', '--fast', '-Cf', or 214630da1778Schristos '-CF' options. For details on the compatibilities it provides, see 2147dded093eSchristos *note Lex and Posix::. This option also results in the name 214830da1778Schristos 'YY_FLEX_LEX_COMPAT' being '#define''d in the generated scanner. 21493c3a7b76Schristos 215030da1778Schristos'-B, --batch, '%option batch'' 215130da1778Schristos instructs 'flex' to generate a "batch" scanner, the opposite of 215230da1778Schristos _interactive_ scanners generated by '--interactive' (see below). 215330da1778Schristos In general, you use '-B' when you are _certain_ that your scanner 21543c3a7b76Schristos will never be used interactively, and you want to squeeze a 21553c3a7b76Schristos _little_ more performance out of it. If your goal is instead to 215630da1778Schristos squeeze out a _lot_ more performance, you should be using the '-Cf' 215730da1778Schristos or '-CF' options, which turn on '--batch' automatically anyway. 21583c3a7b76Schristos 215930da1778Schristos'-I, --interactive, '%option interactive'' 216030da1778Schristos instructs 'flex' to generate an interactive scanner. An 21613c3a7b76Schristos interactive scanner is one that only looks ahead to decide what 21623c3a7b76Schristos token has been matched if it absolutely must. It turns out that 21633c3a7b76Schristos always looking one extra character ahead, even if the scanner has 21643c3a7b76Schristos already seen enough text to disambiguate the current token, is a 21653c3a7b76Schristos bit faster than only looking ahead when necessary. But scanners 21663c3a7b76Schristos that always look ahead give dreadful interactive performance; for 21673c3a7b76Schristos example, when a user types a newline, it is not recognized as a 21683c3a7b76Schristos newline token until they enter _another_ token, which often means 21693c3a7b76Schristos typing in another whole line. 21703c3a7b76Schristos 217130da1778Schristos 'flex' scanners default to 'interactive' unless you use the '-Cf' 217230da1778Schristos or '-CF' table-compression options (*note Performance::). That's 21733c3a7b76Schristos because if you're looking for high-performance you should be using 217430da1778Schristos one of these options, so if you didn't, 'flex' assumes you'd rather 217530da1778Schristos trade off a bit of run-time performance for intuitive interactive 217630da1778Schristos behavior. Note also that you _cannot_ use '--interactive' in 217730da1778Schristos conjunction with '-Cf' or '-CF'. Thus, this option is not really 217830da1778Schristos needed; it is on by default for all those cases in which it is 217930da1778Schristos allowed. 21803c3a7b76Schristos 218130da1778Schristos You can force a scanner to _not_ be interactive by using '--batch' 21823c3a7b76Schristos 218330da1778Schristos'-7, --7bit, '%option 7bit'' 218430da1778Schristos instructs 'flex' to generate a 7-bit scanner, i.e., one which can 21853c3a7b76Schristos only recognize 7-bit characters in its input. The advantage of 218630da1778Schristos using '--7bit' is that the scanner's tables can be up to half the 218730da1778Schristos size of those generated using the '--8bit'. The disadvantage is 21883c3a7b76Schristos that such scanners often hang or crash if their input contains an 21893c3a7b76Schristos 8-bit character. 21903c3a7b76Schristos 21913c3a7b76Schristos Note, however, that unless you generate your scanner using the 219230da1778Schristos '-Cf' or '-CF' table compression options, use of '--7bit' will save 219330da1778Schristos only a small amount of table space, and make your scanner 219430da1778Schristos considerably less portable. 'Flex''s default behavior is to 219530da1778Schristos generate an 8-bit scanner unless you use the '-Cf' or '-CF', in 219630da1778Schristos which case 'flex' defaults to generating 7-bit scanners unless your 219730da1778Schristos site was always configured to generate 8-bit scanners (as will 21983c3a7b76Schristos often be the case with non-USA sites). You can tell whether flex 21993c3a7b76Schristos generated a 7-bit or an 8-bit scanner by inspecting the flag 220030da1778Schristos summary in the '--verbose' output as described above. 22013c3a7b76Schristos 220230da1778Schristos Note that if you use '-Cfe' or '-CFe' 'flex' still defaults to 22033c3a7b76Schristos generating an 8-bit scanner, since usually with these compression 22043c3a7b76Schristos options full 8-bit tables are not much more expensive than 7-bit 22053c3a7b76Schristos tables. 22063c3a7b76Schristos 220730da1778Schristos'-8, --8bit, '%option 8bit'' 220830da1778Schristos instructs 'flex' to generate an 8-bit scanner, i.e., one which can 22093c3a7b76Schristos recognize 8-bit characters. This flag is only needed for scanners 221030da1778Schristos generated using '-Cf' or '-CF', as otherwise flex defaults to 22113c3a7b76Schristos generating an 8-bit scanner anyway. 22123c3a7b76Schristos 221330da1778Schristos See the discussion of '--7bit' above for 'flex''s default behavior 22143c3a7b76Schristos and the tradeoffs between 7-bit and 8-bit scanners. 22153c3a7b76Schristos 221630da1778Schristos'--default, '%option default'' 22173c3a7b76Schristos generate the default rule. 22183c3a7b76Schristos 221930da1778Schristos'--always-interactive, '%option always-interactive'' 22203c3a7b76Schristos instructs flex to generate a scanner which always considers its 22213c3a7b76Schristos input _interactive_. Normally, on each new input file the scanner 222230da1778Schristos calls 'isatty()' in an attempt to determine whether the scanner's 22233c3a7b76Schristos input source is interactive and thus should be read a character at 22243c3a7b76Schristos a time. When this option is used, however, then no such call is 22253c3a7b76Schristos made. 22263c3a7b76Schristos 222730da1778Schristos'--never-interactive, '--never-interactive'' 22283c3a7b76Schristos instructs flex to generate a scanner which never considers its 222930da1778Schristos input interactive. This is the opposite of 'always-interactive'. 22303c3a7b76Schristos 223130da1778Schristos'-X, --posix, '%option posix'' 22323c3a7b76Schristos turns on maximum compatibility with the POSIX 1003.2-1992 223330da1778Schristos definition of 'lex'. Since 'flex' was originally designed to 223430da1778Schristos implement the POSIX definition of 'lex' this generally involves 22353c3a7b76Schristos very few changes in behavior. At the current writing the known 223630da1778Schristos differences between 'flex' and the POSIX standard are: 22373c3a7b76Schristos 223830da1778Schristos * In POSIX and AT&T 'lex', the repeat operator, '{}', has lower 223930da1778Schristos precedence than concatenation (thus 'ab{3}' yields 'ababab'). 22403c3a7b76Schristos Most POSIX utilities use an Extended Regular Expression (ERE) 22413c3a7b76Schristos precedence that has the precedence of the repeat operator 224230da1778Schristos higher than concatenation (which causes 'ab{3}' to yield 224330da1778Schristos 'abbb'). By default, 'flex' places the precedence of the 22443c3a7b76Schristos repeat operator higher than concatenation which matches the 22453c3a7b76Schristos ERE processing of other POSIX utilities. When either 224630da1778Schristos '--posix' or '-l' are specified, 'flex' will use the 224730da1778Schristos traditional AT&T and POSIX-compliant precedence for the repeat 224830da1778Schristos operator where concatenation has higher precedence than the 224930da1778Schristos repeat operator. 22503c3a7b76Schristos 225130da1778Schristos'--stack, '%option stack'' 22523c3a7b76Schristos enables the use of start condition stacks (*note Start 22533c3a7b76Schristos Conditions::). 22543c3a7b76Schristos 225530da1778Schristos'--stdinit, '%option stdinit'' 225630da1778Schristos if set (i.e., %option stdinit) initializes 'yyin' and 'yyout' to 225730da1778Schristos 'stdin' and 'stdout', instead of the default of 'NULL'. Some 225830da1778Schristos existing 'lex' programs depend on this behavior, even though it is 225930da1778Schristos not compliant with ANSI C, which does not require 'stdin' and 226030da1778Schristos 'stdout' to be compile-time constant. In a reentrant scanner, 226130da1778Schristos however, this is not a problem since initialization is performed in 226230da1778Schristos 'yylex_init' at runtime. 22633c3a7b76Schristos 226430da1778Schristos'--yylineno, '%option yylineno'' 226530da1778Schristos directs 'flex' to generate a scanner that maintains the number of 22663c3a7b76Schristos the current line read from its input in the global variable 226730da1778Schristos 'yylineno'. This option is implied by '%option lex-compat'. In a 226830da1778Schristos reentrant C scanner, the macro 'yylineno' is accessible regardless 226930da1778Schristos of the value of '%option yylineno', however, its value is not 227030da1778Schristos modified by 'flex' unless '%option yylineno' is enabled. 22713c3a7b76Schristos 227230da1778Schristos'--yywrap, '%option yywrap'' 227330da1778Schristos if unset (i.e., '--noyywrap)', makes the scanner not call 227430da1778Schristos 'yywrap()' upon an end-of-file, but simply assume that there are no 227530da1778Schristos more files to scan (until the user points 'yyin' at a new file and 227630da1778Schristos calls 'yylex()' again). 22773c3a7b76Schristos 22783c3a7b76Schristos 22793c3a7b76SchristosFile: flex.info, Node: Code-Level And API Options, Next: Options for Scanner Speed and Size, Prev: Options Affecting Scanner Behavior, Up: Scanner Options 22803c3a7b76Schristos 22813c3a7b76Schristos16.3 Code-Level And API Options 22823c3a7b76Schristos=============================== 22833c3a7b76Schristos 228430da1778Schristos'--ansi-definitions, '%option ansi-definitions'' 228556bd8546Schristos Deprecated, ignored 22863c3a7b76Schristos 228730da1778Schristos'--ansi-prototypes, '%option ansi-prototypes'' 228856bd8546Schristos Deprecated, ignored 22893c3a7b76Schristos 229030da1778Schristos'--bison-bridge, '%option bison-bridge'' 22913c3a7b76Schristos instructs flex to generate a C scanner that is meant to be called 229230da1778Schristos by a 'GNU bison' parser. The scanner has minor API changes for 229330da1778Schristos 'bison' compatibility. In particular, the declaration of 'yylex' 229430da1778Schristos is modified to take an additional parameter, 'yylval'. *Note Bison 229530da1778Schristos Bridge::. 22963c3a7b76Schristos 229730da1778Schristos'--bison-locations, '%option bison-locations'' 229830da1778Schristos instruct flex that 'GNU bison' '%locations' are being used. This 229930da1778Schristos means 'yylex' will be passed an additional parameter, 'yylloc'. 230030da1778Schristos This option implies '%option bison-bridge'. *Note Bison Bridge::. 23013c3a7b76Schristos 230230da1778Schristos'-L, --noline, '%option noline'' 230330da1778Schristos instructs 'flex' not to generate '#line' directives. Without this 230430da1778Schristos option, 'flex' peppers the generated scanner with '#line' 23053c3a7b76Schristos directives so error messages in the actions will be correctly 230630da1778Schristos located with respect to either the original 'flex' input file (if 230730da1778Schristos the errors are due to code in the input file), or 'lex.yy.c' (if 230830da1778Schristos the errors are 'flex''s fault - you should report these sorts of 2309dded093eSchristos errors to the email address given in *note Reporting Bugs::). 23103c3a7b76Schristos 231130da1778Schristos'-R, --reentrant, '%option reentrant'' 23123c3a7b76Schristos instructs flex to generate a reentrant C scanner. The generated 23133c3a7b76Schristos scanner may safely be used in a multi-threaded environment. The 23143c3a7b76Schristos API for a reentrant scanner is different than for a non-reentrant 23153c3a7b76Schristos scanner *note Reentrant::). Because of the API difference between 231630da1778Schristos reentrant and non-reentrant 'flex' scanners, non-reentrant flex 23173c3a7b76Schristos code must be modified before it is suitable for use with this 231830da1778Schristos option. This option is not compatible with the '--c++' option. 23193c3a7b76Schristos 232030da1778Schristos The option '--reentrant' does not affect the performance of the 23213c3a7b76Schristos scanner. 23223c3a7b76Schristos 232330da1778Schristos'-+, --c++, '%option c++'' 23243c3a7b76Schristos specifies that you want flex to generate a C++ scanner class. 23253c3a7b76Schristos *Note Cxx::, for details. 23263c3a7b76Schristos 232730da1778Schristos'--array, '%option array'' 23283c3a7b76Schristos specifies that you want yytext to be an array instead of a char* 23293c3a7b76Schristos 233030da1778Schristos'--pointer, '%option pointer'' 233130da1778Schristos specify that 'yytext' should be a 'char *', not an array. This 233230da1778Schristos default is 'char *'. 23333c3a7b76Schristos 233430da1778Schristos'-PPREFIX, --prefix=PREFIX, '%option prefix="PREFIX"'' 233530da1778Schristos changes the default 'yy' prefix used by 'flex' for all 23363c3a7b76Schristos globally-visible variable and function names to instead be 233730da1778Schristos 'PREFIX'. For example, '--prefix=foo' changes the name of 'yytext' 233830da1778Schristos to 'footext'. It also changes the name of the default output file 233930da1778Schristos from 'lex.yy.c' to 'lex.foo.c'. Here is a partial list of the 234030da1778Schristos names affected: 23413c3a7b76Schristos 23423c3a7b76Schristos yy_create_buffer 23433c3a7b76Schristos yy_delete_buffer 23443c3a7b76Schristos yy_flex_debug 23453c3a7b76Schristos yy_init_buffer 23463c3a7b76Schristos yy_flush_buffer 23473c3a7b76Schristos yy_load_buffer_state 23483c3a7b76Schristos yy_switch_to_buffer 23493c3a7b76Schristos yyin 23503c3a7b76Schristos yyleng 23513c3a7b76Schristos yylex 23523c3a7b76Schristos yylineno 23533c3a7b76Schristos yyout 23543c3a7b76Schristos yyrestart 23553c3a7b76Schristos yytext 23563c3a7b76Schristos yywrap 23573c3a7b76Schristos yyalloc 23583c3a7b76Schristos yyrealloc 23593c3a7b76Schristos yyfree 23603c3a7b76Schristos 236130da1778Schristos (If you are using a C++ scanner, then only 'yywrap' and 236230da1778Schristos 'yyFlexLexer' are affected.) Within your scanner itself, you can 23633c3a7b76Schristos still refer to the global variables and functions using either 23643c3a7b76Schristos version of their name; but externally, they have the modified name. 23653c3a7b76Schristos 236630da1778Schristos This option lets you easily link together multiple 'flex' programs 23673c3a7b76Schristos into the same executable. Note, though, that using this option 236830da1778Schristos also renames 'yywrap()', so you now _must_ either provide your own 23693c3a7b76Schristos (appropriately-named) version of the routine for your scanner, or 237030da1778Schristos use '%option noyywrap', as linking with '-lfl' no longer provides 23713c3a7b76Schristos one for you by default. 23723c3a7b76Schristos 237330da1778Schristos'--main, '%option main'' 237430da1778Schristos directs flex to provide a default 'main()' program for the scanner, 237530da1778Schristos which simply calls 'yylex()'. This option implies 'noyywrap' (see 237630da1778Schristos below). 23773c3a7b76Schristos 237830da1778Schristos'--nounistd, '%option nounistd'' 237930da1778Schristos suppresses inclusion of the non-ANSI header file 'unistd.h'. This 238030da1778Schristos option is meant to target environments in which 'unistd.h' does not 238130da1778Schristos exist. Be aware that certain options may cause flex to generate 238230da1778Schristos code that relies on functions normally found in 'unistd.h', (e.g. 238330da1778Schristos 'isatty()', 'read()'.) If you wish to use these functions, you 238430da1778Schristos will have to inform your compiler where to find them. *Note 238530da1778Schristos option-always-interactive::. *Note option-read::. 23863c3a7b76Schristos 238730da1778Schristos'--yyclass=NAME, '%option yyclass="NAME"'' 238830da1778Schristos only applies when generating a C++ scanner (the '--c++' option). 238930da1778Schristos It informs 'flex' that you have derived 'NAME' as a subclass of 239030da1778Schristos 'yyFlexLexer', so 'flex' will place your actions in the member 239130da1778Schristos function 'foo::yylex()' instead of 'yyFlexLexer::yylex()'. It also 239230da1778Schristos generates a 'yyFlexLexer::yylex()' member function that emits a 239330da1778Schristos run-time error (by invoking 'yyFlexLexer::LexerError())' if called. 239430da1778Schristos *Note Cxx::. 23953c3a7b76Schristos 23963c3a7b76Schristos 23973c3a7b76SchristosFile: flex.info, Node: Options for Scanner Speed and Size, Next: Debugging Options, Prev: Code-Level And API Options, Up: Scanner Options 23983c3a7b76Schristos 23993c3a7b76Schristos16.4 Options for Scanner Speed and Size 24003c3a7b76Schristos======================================= 24013c3a7b76Schristos 240230da1778Schristos'-C[aefFmr]' 24033c3a7b76Schristos controls the degree of table compression and, more generally, 24043c3a7b76Schristos trade-offs between small scanners and fast scanners. 24053c3a7b76Schristos 240630da1778Schristos '-C' 240730da1778Schristos A lone '-C' specifies that the scanner tables should be 24083c3a7b76Schristos compressed but neither equivalence classes nor 24093c3a7b76Schristos meta-equivalence classes should be used. 24103c3a7b76Schristos 241130da1778Schristos '-Ca, --align, '%option align'' 24123c3a7b76Schristos ("align") instructs flex to trade off larger tables in the 24133c3a7b76Schristos generated scanner for faster performance because the elements 24143c3a7b76Schristos of the tables are better aligned for memory access and 24153c3a7b76Schristos computation. On some RISC architectures, fetching and 24163c3a7b76Schristos manipulating longwords is more efficient than with 24173c3a7b76Schristos smaller-sized units such as shortwords. This option can 24183c3a7b76Schristos quadruple the size of the tables used by your scanner. 24193c3a7b76Schristos 242030da1778Schristos '-Ce, --ecs, '%option ecs'' 242130da1778Schristos directs 'flex' to construct "equivalence classes", i.e., sets 24223c3a7b76Schristos of characters which have identical lexical properties (for 242330da1778Schristos example, if the only appearance of digits in the 'flex' input 24243c3a7b76Schristos is in the character class "[0-9]" then the digits '0', '1', 24253c3a7b76Schristos ..., '9' will all be put in the same equivalence class). 24263c3a7b76Schristos Equivalence classes usually give dramatic reductions in the 24273c3a7b76Schristos final table/object file sizes (typically a factor of 2-5) and 24283c3a7b76Schristos are pretty cheap performance-wise (one array look-up per 24293c3a7b76Schristos character scanned). 24303c3a7b76Schristos 243130da1778Schristos '-Cf' 24323c3a7b76Schristos specifies that the "full" scanner tables should be generated - 243330da1778Schristos 'flex' should not compress the tables by taking advantages of 24343c3a7b76Schristos similar transition functions for different states. 24353c3a7b76Schristos 243630da1778Schristos '-CF' 24373c3a7b76Schristos specifies that the alternate fast scanner representation 243830da1778Schristos (described above under the '--fast' flag) should be used. 243930da1778Schristos This option cannot be used with '--c++'. 24403c3a7b76Schristos 244130da1778Schristos '-Cm, --meta-ecs, '%option meta-ecs'' 244230da1778Schristos directs 'flex' to construct "meta-equivalence classes", which 24433c3a7b76Schristos are sets of equivalence classes (or characters, if equivalence 24443c3a7b76Schristos classes are not being used) that are commonly used together. 24453c3a7b76Schristos Meta-equivalence classes are often a big win when using 244630da1778Schristos compressed tables, but they have a moderate performance impact 244730da1778Schristos (one or two 'if' tests and one array look-up per character 244830da1778Schristos scanned). 24493c3a7b76Schristos 245030da1778Schristos '-Cr, --read, '%option read'' 24513c3a7b76Schristos causes the generated scanner to _bypass_ use of the standard 245230da1778Schristos I/O library ('stdio') for input. Instead of calling 'fread()' 245330da1778Schristos or 'getc()', the scanner will use the 'read()' system call, 245430da1778Schristos resulting in a performance gain which varies from system to 245530da1778Schristos system, but in general is probably negligible unless you are 245630da1778Schristos also using '-Cf' or '-CF'. Using '-Cr' can cause strange 245730da1778Schristos behavior if, for example, you read from 'yyin' using 'stdio' 245830da1778Schristos prior to calling the scanner (because the scanner will miss 245930da1778Schristos whatever text your previous reads left in the 'stdio' input 246030da1778Schristos buffer). '-Cr' has no effect if you define 'YY_INPUT()' 246130da1778Schristos (*note Generated Scanner::). 24623c3a7b76Schristos 246330da1778Schristos The options '-Cf' or '-CF' and '-Cm' do not make sense together - 24643c3a7b76Schristos there is no opportunity for meta-equivalence classes if the table 24653c3a7b76Schristos is not being compressed. Otherwise the options may be freely 24663c3a7b76Schristos mixed, and are cumulative. 24673c3a7b76Schristos 246830da1778Schristos The default setting is '-Cem', which specifies that 'flex' should 24693c3a7b76Schristos generate equivalence classes and meta-equivalence classes. This 24703c3a7b76Schristos setting provides the highest degree of table compression. You can 24713c3a7b76Schristos trade off faster-executing scanners at the cost of larger tables 24723c3a7b76Schristos with the following generally being true: 24733c3a7b76Schristos 24743c3a7b76Schristos slowest & smallest 24753c3a7b76Schristos -Cem 24763c3a7b76Schristos -Cm 24773c3a7b76Schristos -Ce 24783c3a7b76Schristos -C 24793c3a7b76Schristos -C{f,F}e 24803c3a7b76Schristos -C{f,F} 24813c3a7b76Schristos -C{f,F}a 24823c3a7b76Schristos fastest & largest 24833c3a7b76Schristos 24843c3a7b76Schristos Note that scanners with the smallest tables are usually generated 24853c3a7b76Schristos and compiled the quickest, so during development you will usually 24863c3a7b76Schristos want to use the default, maximal compression. 24873c3a7b76Schristos 248830da1778Schristos '-Cfe' is often a good compromise between speed and size for 24893c3a7b76Schristos production scanners. 24903c3a7b76Schristos 249130da1778Schristos'-f, --full, '%option full'' 249230da1778Schristos specifies "fast scanner". No table compression is done and 'stdio' 249330da1778Schristos is bypassed. The result is large but fast. This option is 249430da1778Schristos equivalent to '--Cfr' 24953c3a7b76Schristos 249630da1778Schristos'-F, --fast, '%option fast'' 24973c3a7b76Schristos specifies that the _fast_ scanner table representation should be 249830da1778Schristos used (and 'stdio' bypassed). This representation is about as fast 249930da1778Schristos as the full table representation '--full', and for some sets of 25003c3a7b76Schristos patterns will be considerably smaller (and for others, larger). In 25013c3a7b76Schristos general, if the pattern set contains both _keywords_ and a 25023c3a7b76Schristos catch-all, _identifier_ rule, such as in the set: 25033c3a7b76Schristos 25043c3a7b76Schristos "case" return TOK_CASE; 25053c3a7b76Schristos "switch" return TOK_SWITCH; 25063c3a7b76Schristos ... 25073c3a7b76Schristos "default" return TOK_DEFAULT; 25083c3a7b76Schristos [a-z]+ return TOK_ID; 25093c3a7b76Schristos 25103c3a7b76Schristos then you're better off using the full table representation. If 251130da1778Schristos only the _identifier_ rule is present and you then use a hash table 251230da1778Schristos or some such to detect the keywords, you're better off using 251330da1778Schristos '--fast'. 25143c3a7b76Schristos 251530da1778Schristos This option is equivalent to '-CFr'. It cannot be used with 251630da1778Schristos '--c++'. 25173c3a7b76Schristos 25183c3a7b76Schristos 25193c3a7b76SchristosFile: flex.info, Node: Debugging Options, Next: Miscellaneous Options, Prev: Options for Scanner Speed and Size, Up: Scanner Options 25203c3a7b76Schristos 25213c3a7b76Schristos16.5 Debugging Options 25223c3a7b76Schristos====================== 25233c3a7b76Schristos 252430da1778Schristos'-b, --backup, '%option backup'' 252530da1778Schristos Generate backing-up information to 'lex.backup'. This is a list of 25263c3a7b76Schristos scanner states which require backing up and the input characters on 25273c3a7b76Schristos which they do so. By adding rules one can remove backing-up 252830da1778Schristos states. If _all_ backing-up states are eliminated and '-Cf' or 252930da1778Schristos '-CF' is used, the generated scanner will run faster (see the 253030da1778Schristos '--perf-report' flag). Only users who wish to squeeze every last 25313c3a7b76Schristos cycle out of their scanners need worry about this option. (*note 25323c3a7b76Schristos Performance::). 25333c3a7b76Schristos 253430da1778Schristos'-d, --debug, '%option debug'' 25353c3a7b76Schristos makes the generated scanner run in "debug" mode. Whenever a 253630da1778Schristos pattern is recognized and the global variable 'yy_flex_debug' is 253730da1778Schristos non-zero (which is the default), the scanner will write to 'stderr' 253830da1778Schristos a line of the form: 25393c3a7b76Schristos 25403c3a7b76Schristos -accepting rule at line 53 ("the matched text") 25413c3a7b76Schristos 25423c3a7b76Schristos The line number refers to the location of the rule in the file 25433c3a7b76Schristos defining the scanner (i.e., the file that was fed to flex). 25443c3a7b76Schristos Messages are also generated when the scanner backs up, accepts the 25453c3a7b76Schristos default rule, reaches the end of its input buffer (or encounters a 25463c3a7b76Schristos NUL; at this point, the two look the same as far as the scanner's 25473c3a7b76Schristos concerned), or reaches an end-of-file. 25483c3a7b76Schristos 254930da1778Schristos'-p, --perf-report, '%option perf-report'' 255030da1778Schristos generates a performance report to 'stderr'. The report consists of 255130da1778Schristos comments regarding features of the 'flex' input file which will 25523c3a7b76Schristos cause a serious loss of performance in the resulting scanner. If 25533c3a7b76Schristos you give the flag twice, you will also get comments regarding 25543c3a7b76Schristos features that lead to minor performance losses. 25553c3a7b76Schristos 255630da1778Schristos Note that the use of 'REJECT', and variable trailing context (*note 255730da1778Schristos Limitations::) entails a substantial performance penalty; use of 255830da1778Schristos 'yymore()', the '^' operator, and the '--interactive' flag entail 255930da1778Schristos minor performance penalties. 25603c3a7b76Schristos 256130da1778Schristos'-s, --nodefault, '%option nodefault'' 25623c3a7b76Schristos causes the _default rule_ (that unmatched scanner input is echoed 256330da1778Schristos to 'stdout)' to be suppressed. If the scanner encounters input 25643c3a7b76Schristos that does not match any of its rules, it aborts with an error. 25653c3a7b76Schristos This option is useful for finding holes in a scanner's rule set. 25663c3a7b76Schristos 256730da1778Schristos'-T, --trace, '%option trace'' 256830da1778Schristos makes 'flex' run in "trace" mode. It will generate a lot of 256930da1778Schristos messages to 'stderr' concerning the form of the input and the 25703c3a7b76Schristos resultant non-deterministic and deterministic finite automata. 257130da1778Schristos This option is mostly for use in maintaining 'flex'. 25723c3a7b76Schristos 257330da1778Schristos'-w, --nowarn, '%option nowarn'' 25743c3a7b76Schristos suppresses warning messages. 25753c3a7b76Schristos 257630da1778Schristos'-v, --verbose, '%option verbose'' 257730da1778Schristos specifies that 'flex' should write to 'stderr' a summary of 25783c3a7b76Schristos statistics regarding the scanner it generates. Most of the 257930da1778Schristos statistics are meaningless to the casual 'flex' user, but the first 258030da1778Schristos line identifies the version of 'flex' (same as reported by 258130da1778Schristos '--version'), and the next line the flags used when generating the 25823c3a7b76Schristos scanner, including those that are on by default. 25833c3a7b76Schristos 258430da1778Schristos'--warn, '%option warn'' 25853c3a7b76Schristos warn about certain things. In particular, if the default rule can 25863c3a7b76Schristos be matched but no default rule has been given, the flex will warn 25873c3a7b76Schristos you. We recommend using this option always. 25883c3a7b76Schristos 25893c3a7b76Schristos 25903c3a7b76SchristosFile: flex.info, Node: Miscellaneous Options, Prev: Debugging Options, Up: Scanner Options 25913c3a7b76Schristos 25923c3a7b76Schristos16.6 Miscellaneous Options 25933c3a7b76Schristos========================== 25943c3a7b76Schristos 259530da1778Schristos'-c' 25963c3a7b76Schristos A do-nothing option included for POSIX compliance. 25973c3a7b76Schristos 259830da1778Schristos'-h, -?, --help' 259930da1778Schristos generates a "help" summary of 'flex''s options to 'stdout' and then 260030da1778Schristos exits. 26013c3a7b76Schristos 260230da1778Schristos'-n' 26033c3a7b76Schristos Another do-nothing option included for POSIX compliance. 26043c3a7b76Schristos 260530da1778Schristos'-V, --version' 260630da1778Schristos prints the version number to 'stdout' and exits. 26073c3a7b76Schristos 26083c3a7b76Schristos 26093c3a7b76SchristosFile: flex.info, Node: Performance, Next: Cxx, Prev: Scanner Options, Up: Top 26103c3a7b76Schristos 26113c3a7b76Schristos17 Performance Considerations 26123c3a7b76Schristos***************************** 26133c3a7b76Schristos 261430da1778SchristosThe main design goal of 'flex' is that it generate high-performance 26153c3a7b76Schristosscanners. It has been optimized for dealing well with large sets of 26163c3a7b76Schristosrules. Aside from the effects on scanner speed of the table compression 261730da1778Schristos'-C' options outlined above, there are a number of options/actions which 261830da1778Schristosdegrade performance. These are, from most expensive to least: 26193c3a7b76Schristos 26203c3a7b76Schristos REJECT 26213c3a7b76Schristos arbitrary trailing context 26223c3a7b76Schristos 26233c3a7b76Schristos pattern sets that require backing up 26243c3a7b76Schristos %option yylineno 26253c3a7b76Schristos %array 26263c3a7b76Schristos 26273c3a7b76Schristos %option interactive 26283c3a7b76Schristos %option always-interactive 26293c3a7b76Schristos 2630dded093eSchristos ^ beginning-of-line operator 26313c3a7b76Schristos yymore() 26323c3a7b76Schristos 26333c3a7b76Schristos with the first two all being quite expensive and the last two being 263430da1778Schristosquite cheap. Note also that 'unput()' is implemented as a routine call 263530da1778Schristosthat potentially does quite a bit of work, while 'yyless()' is a 26363c3a7b76Schristosquite-cheap macro. So if you are just putting back some excess text you 263730da1778Schristosscanned, use 'yyless()'. 26383c3a7b76Schristos 263930da1778Schristos 'REJECT' should be avoided at all costs when performance is 26403c3a7b76Schristosimportant. It is a particularly expensive option. 26413c3a7b76Schristos 264230da1778Schristos There is one case when '%option yylineno' can be expensive. That is 26433c3a7b76Schristoswhen your patterns match long tokens that could _possibly_ contain a 26443c3a7b76Schristosnewline character. There is no performance penalty for rules that can 26453c3a7b76Schristosnot possibly match newlines, since flex does not need to check them for 264630da1778Schristosnewlines. In general, you should avoid rules such as '[^f]+', which 26473c3a7b76Schristosmatch very long tokens, including newlines, and may possibly match your 264830da1778Schristosentire file! A better approach is to separate '[^f]+' into two rules: 26493c3a7b76Schristos 26503c3a7b76Schristos %option yylineno 26513c3a7b76Schristos %% 26523c3a7b76Schristos [^f\n]+ 26533c3a7b76Schristos \n+ 26543c3a7b76Schristos 26553c3a7b76Schristos The above scanner does not incur a performance penalty. 26563c3a7b76Schristos 26573c3a7b76Schristos Getting rid of backing up is messy and often may be an enormous 26583c3a7b76Schristosamount of work for a complicated scanner. In principal, one begins by 265930da1778Schristosusing the '-b' flag to generate a 'lex.backup' file. For example, on 26603c3a7b76Schristosthe input: 26613c3a7b76Schristos 26623c3a7b76Schristos %% 26633c3a7b76Schristos foo return TOK_KEYWORD; 26643c3a7b76Schristos foobar return TOK_KEYWORD; 26653c3a7b76Schristos 26663c3a7b76Schristos the file looks like: 26673c3a7b76Schristos 26683c3a7b76Schristos State #6 is non-accepting - 26693c3a7b76Schristos associated rule line numbers: 26703c3a7b76Schristos 2 3 26713c3a7b76Schristos out-transitions: [ o ] 26723c3a7b76Schristos jam-transitions: EOF [ \001-n p-\177 ] 26733c3a7b76Schristos 26743c3a7b76Schristos State #8 is non-accepting - 26753c3a7b76Schristos associated rule line numbers: 26763c3a7b76Schristos 3 26773c3a7b76Schristos out-transitions: [ a ] 26783c3a7b76Schristos jam-transitions: EOF [ \001-` b-\177 ] 26793c3a7b76Schristos 26803c3a7b76Schristos State #9 is non-accepting - 26813c3a7b76Schristos associated rule line numbers: 26823c3a7b76Schristos 3 26833c3a7b76Schristos out-transitions: [ r ] 26843c3a7b76Schristos jam-transitions: EOF [ \001-q s-\177 ] 26853c3a7b76Schristos 26863c3a7b76Schristos Compressed tables always back up. 26873c3a7b76Schristos 26883c3a7b76Schristos The first few lines tell us that there's a scanner state in which it 268930da1778Schristoscan make a transition on an 'o' but not on any other character, and that 269030da1778Schristosin that state the currently scanned text does not match any rule. The 269130da1778Schristosstate occurs when trying to match the rules found at lines 2 and 3 in 269230da1778Schristosthe input file. If the scanner is in that state and then reads 26933c3a7b76Schristossomething other than an 'o', it will have to back up to find a rule 26943c3a7b76Schristoswhich is matched. With a bit of headscratching one can see that this 269530da1778Schristosmust be the state it's in when it has seen 'fo'. When this has 269630da1778Schristoshappened, if anything other than another 'o' is seen, the scanner will 269730da1778Schristoshave to back up to simply match the 'f' (by the default rule). 26983c3a7b76Schristos 26993c3a7b76Schristos The comment regarding State #8 indicates there's a problem when 270030da1778Schristos'foob' has been scanned. Indeed, on any character other than an 'a', 27013c3a7b76Schristosthe scanner will have to back up to accept "foo". Similarly, the 270230da1778Schristoscomment for State #9 concerns when 'fooba' has been scanned and an 'r' 27033c3a7b76Schristosdoes not follow. 27043c3a7b76Schristos 27053c3a7b76Schristos The final comment reminds us that there's no point going to all the 270630da1778Schristostrouble of removing backing up from the rules unless we're using '-Cf' 270730da1778Schristosor '-CF', since there's no performance gain doing so with compressed 27083c3a7b76Schristosscanners. 27093c3a7b76Schristos 27103c3a7b76Schristos The way to remove the backing up is to add "error" rules: 27113c3a7b76Schristos 27123c3a7b76Schristos %% 27133c3a7b76Schristos foo return TOK_KEYWORD; 27143c3a7b76Schristos foobar return TOK_KEYWORD; 27153c3a7b76Schristos 27163c3a7b76Schristos fooba | 27173c3a7b76Schristos foob | 27183c3a7b76Schristos fo { 27193c3a7b76Schristos /* false alarm, not really a keyword */ 27203c3a7b76Schristos return TOK_ID; 27213c3a7b76Schristos } 27223c3a7b76Schristos 27233c3a7b76Schristos Eliminating backing up among a list of keywords can also be done 27243c3a7b76Schristosusing a "catch-all" rule: 27253c3a7b76Schristos 27263c3a7b76Schristos %% 27273c3a7b76Schristos foo return TOK_KEYWORD; 27283c3a7b76Schristos foobar return TOK_KEYWORD; 27293c3a7b76Schristos 27303c3a7b76Schristos [a-z]+ return TOK_ID; 27313c3a7b76Schristos 27323c3a7b76Schristos This is usually the best solution when appropriate. 27333c3a7b76Schristos 27343c3a7b76Schristos Backing up messages tend to cascade. With a complicated set of rules 27353c3a7b76Schristosit's not uncommon to get hundreds of messages. If one can decipher 27363c3a7b76Schristosthem, though, it often only takes a dozen or so rules to eliminate the 27373c3a7b76Schristosbacking up (though it's easy to make a mistake and have an error rule 273830da1778Schristosaccidentally match a valid token. A possible future 'flex' feature will 273930da1778Schristosbe to automatically add rules to eliminate backing up). 27403c3a7b76Schristos 27413c3a7b76Schristos It's important to keep in mind that you gain the benefits of 274230da1778Schristoseliminating backing up only if you eliminate _every_ instance of backing 274330da1778Schristosup. Leaving just one means you gain nothing. 27443c3a7b76Schristos 27453c3a7b76Schristos _Variable_ trailing context (where both the leading and trailing 27463c3a7b76Schristosparts do not have a fixed length) entails almost the same performance 274730da1778Schristosloss as 'REJECT' (i.e., substantial). So when possible a rule like: 27483c3a7b76Schristos 27493c3a7b76Schristos %% 27503c3a7b76Schristos mouse|rat/(cat|dog) run(); 27513c3a7b76Schristos 27523c3a7b76Schristos is better written: 27533c3a7b76Schristos 27543c3a7b76Schristos %% 27553c3a7b76Schristos mouse/cat|dog run(); 27563c3a7b76Schristos rat/cat|dog run(); 27573c3a7b76Schristos 27583c3a7b76Schristos or as 27593c3a7b76Schristos 27603c3a7b76Schristos %% 27613c3a7b76Schristos mouse|rat/cat run(); 27623c3a7b76Schristos mouse|rat/dog run(); 27633c3a7b76Schristos 276430da1778Schristos Note that here the special '|' action does _not_ provide any savings, 276530da1778Schristosand can even make things worse (*note Limitations::). 27663c3a7b76Schristos 27673c3a7b76Schristos Another area where the user can increase a scanner's performance (and 27683c3a7b76Schristosone that's easier to implement) arises from the fact that the longer the 27693c3a7b76Schristostokens matched, the faster the scanner will run. This is because with 27703c3a7b76Schristoslong tokens the processing of most input characters takes place in the 27713c3a7b76Schristos(short) inner scanning loop, and does not often have to go through the 277230da1778Schristosadditional work of setting up the scanning environment (e.g., 'yytext') 27733c3a7b76Schristosfor the action. Recall the scanner for C comments: 27743c3a7b76Schristos 27753c3a7b76Schristos %x comment 27763c3a7b76Schristos %% 27773c3a7b76Schristos int line_num = 1; 27783c3a7b76Schristos 27793c3a7b76Schristos "/*" BEGIN(comment); 27803c3a7b76Schristos 27813c3a7b76Schristos <comment>[^*\n]* 27823c3a7b76Schristos <comment>"*"+[^*/\n]* 27833c3a7b76Schristos <comment>\n ++line_num; 27843c3a7b76Schristos <comment>"*"+"/" BEGIN(INITIAL); 27853c3a7b76Schristos 27863c3a7b76Schristos This could be sped up by writing it as: 27873c3a7b76Schristos 27883c3a7b76Schristos %x comment 27893c3a7b76Schristos %% 27903c3a7b76Schristos int line_num = 1; 27913c3a7b76Schristos 27923c3a7b76Schristos "/*" BEGIN(comment); 27933c3a7b76Schristos 27943c3a7b76Schristos <comment>[^*\n]* 27953c3a7b76Schristos <comment>[^*\n]*\n ++line_num; 27963c3a7b76Schristos <comment>"*"+[^*/\n]* 27973c3a7b76Schristos <comment>"*"+[^*/\n]*\n ++line_num; 27983c3a7b76Schristos <comment>"*"+"/" BEGIN(INITIAL); 27993c3a7b76Schristos 28003c3a7b76Schristos Now instead of each newline requiring the processing of another 28013c3a7b76Schristosaction, recognizing the newlines is distributed over the other rules to 28023c3a7b76Schristoskeep the matched text as long as possible. Note that _adding_ rules 28033c3a7b76Schristosdoes _not_ slow down the scanner! The speed of the scanner is 28043c3a7b76Schristosindependent of the number of rules or (modulo the considerations given 28053c3a7b76Schristosat the beginning of this section) how complicated the rules are with 280630da1778Schristosregard to operators such as '*' and '|'. 28073c3a7b76Schristos 28083c3a7b76Schristos A final example in speeding up a scanner: suppose you want to scan 28093c3a7b76Schristosthrough a file containing identifiers and keywords, one per line and 28103c3a7b76Schristoswith no other extraneous characters, and recognize all the keywords. A 28113c3a7b76Schristosnatural first approach is: 28123c3a7b76Schristos 28133c3a7b76Schristos %% 28143c3a7b76Schristos asm | 28153c3a7b76Schristos auto | 28163c3a7b76Schristos break | 28173c3a7b76Schristos ... etc ... 28183c3a7b76Schristos volatile | 28193c3a7b76Schristos while /* it's a keyword */ 28203c3a7b76Schristos 28213c3a7b76Schristos .|\n /* it's not a keyword */ 28223c3a7b76Schristos 28233c3a7b76Schristos To eliminate the back-tracking, introduce a catch-all rule: 28243c3a7b76Schristos 28253c3a7b76Schristos %% 28263c3a7b76Schristos asm | 28273c3a7b76Schristos auto | 28283c3a7b76Schristos break | 28293c3a7b76Schristos ... etc ... 28303c3a7b76Schristos volatile | 28313c3a7b76Schristos while /* it's a keyword */ 28323c3a7b76Schristos 28333c3a7b76Schristos [a-z]+ | 28343c3a7b76Schristos .|\n /* it's not a keyword */ 28353c3a7b76Schristos 28363c3a7b76Schristos Now, if it's guaranteed that there's exactly one word per line, then 28373c3a7b76Schristoswe can reduce the total number of matches by a half by merging in the 28383c3a7b76Schristosrecognition of newlines with that of the other tokens: 28393c3a7b76Schristos 28403c3a7b76Schristos %% 28413c3a7b76Schristos asm\n | 28423c3a7b76Schristos auto\n | 28433c3a7b76Schristos break\n | 28443c3a7b76Schristos ... etc ... 28453c3a7b76Schristos volatile\n | 28463c3a7b76Schristos while\n /* it's a keyword */ 28473c3a7b76Schristos 28483c3a7b76Schristos [a-z]+\n | 28493c3a7b76Schristos .|\n /* it's not a keyword */ 28503c3a7b76Schristos 28513c3a7b76Schristos One has to be careful here, as we have now reintroduced backing up 28523c3a7b76Schristosinto the scanner. In particular, while _we_ know that there will never 28533c3a7b76Schristosbe any characters in the input stream other than letters or newlines, 285430da1778Schristos'flex' can't figure this out, and it will plan for possibly needing to 285530da1778Schristosback up when it has scanned a token like 'auto' and then the next 28563c3a7b76Schristoscharacter is something other than a newline or a letter. Previously it 285730da1778Schristoswould then just match the 'auto' rule and be done, but now it has no 285830da1778Schristos'auto' rule, only a 'auto\n' rule. To eliminate the possibility of 28593c3a7b76Schristosbacking up, we could either duplicate all rules but without final 28603c3a7b76Schristosnewlines, or, since we never expect to encounter such an input and 286130da1778Schristostherefore don't how it's classified, we can introduce one more catch-all 286230da1778Schristosrule, this one which doesn't include a newline: 28633c3a7b76Schristos 28643c3a7b76Schristos %% 28653c3a7b76Schristos asm\n | 28663c3a7b76Schristos auto\n | 28673c3a7b76Schristos break\n | 28683c3a7b76Schristos ... etc ... 28693c3a7b76Schristos volatile\n | 28703c3a7b76Schristos while\n /* it's a keyword */ 28713c3a7b76Schristos 28723c3a7b76Schristos [a-z]+\n | 28733c3a7b76Schristos [a-z]+ | 28743c3a7b76Schristos .|\n /* it's not a keyword */ 28753c3a7b76Schristos 287630da1778Schristos Compiled with '-Cf', this is about as fast as one can get a 'flex' 28773c3a7b76Schristosscanner to go for this particular problem. 28783c3a7b76Schristos 287930da1778Schristos A final note: 'flex' is slow when matching 'NUL's, particularly when 288030da1778Schristosa token contains multiple 'NUL's. It's best to write rules which match 28813c3a7b76Schristos_short_ amounts of text if it's anticipated that the text will often 288230da1778Schristosinclude 'NUL's. 28833c3a7b76Schristos 2884dded093eSchristos Another final note regarding performance: as mentioned in *note 288530da1778SchristosMatching::, dynamically resizing 'yytext' to accommodate huge tokens is 28863c3a7b76Schristosa slow process because it presently requires that the (huge) token be 28873c3a7b76Schristosrescanned from the beginning. Thus if performance is vital, you should 28883c3a7b76Schristosattempt to match "large" quantities of text but not "huge" quantities, 28893c3a7b76Schristoswhere the cutoff between the two is at about 8K characters per token. 28903c3a7b76Schristos 28913c3a7b76Schristos 28923c3a7b76SchristosFile: flex.info, Node: Cxx, Next: Reentrant, Prev: Performance, Up: Top 28933c3a7b76Schristos 28943c3a7b76Schristos18 Generating C++ Scanners 28953c3a7b76Schristos************************** 28963c3a7b76Schristos 28973c3a7b76Schristos*IMPORTANT*: the present form of the scanning class is _experimental_ 28983c3a7b76Schristosand may change considerably between major releases. 28993c3a7b76Schristos 290030da1778Schristos 'flex' provides two different ways to generate scanners for use with 290130da1778SchristosC++. The first way is to simply compile a scanner generated by 'flex' 29023c3a7b76Schristosusing a C++ compiler instead of a C compiler. You should not encounter 29033c3a7b76Schristosany compilation errors (*note Reporting Bugs::). You can then use C++ 29043c3a7b76Schristoscode in your rule actions instead of C code. Note that the default 290530da1778Schristosinput source for your scanner remains 'yyin', and default echoing is 290630da1778Schristosstill done to 'yyout'. Both of these remain 'FILE *' variables and not 29073c3a7b76SchristosC++ _streams_. 29083c3a7b76Schristos 290930da1778Schristos You can also use 'flex' to generate a C++ scanner class, using the 291030da1778Schristos'-+' option (or, equivalently, '%option c++)', which is automatically 291130da1778Schristosspecified if the name of the 'flex' executable ends in a '+', such as 291230da1778Schristos'flex++'. When using this option, 'flex' defaults to generating the 291330da1778Schristosscanner to the file 'lex.yy.cc' instead of 'lex.yy.c'. The generated 291430da1778Schristosscanner includes the header file 'FlexLexer.h', which defines the 29153c3a7b76Schristosinterface to two C++ classes. 29163c3a7b76Schristos 291730da1778Schristos The first class in 'FlexLexer.h', 'FlexLexer', provides an abstract 291830da1778Schristosbase class defining the general scanner class interface. It provides 291930da1778Schristosthe following member functions: 29203c3a7b76Schristos 292130da1778Schristos'const char* YYText()' 292230da1778Schristos returns the text of the most recently matched token, the equivalent 292330da1778Schristos of 'yytext'. 29243c3a7b76Schristos 292530da1778Schristos'int YYLeng()' 29263c3a7b76Schristos returns the length of the most recently matched token, the 292730da1778Schristos equivalent of 'yyleng'. 29283c3a7b76Schristos 292930da1778Schristos'int lineno() const' 293030da1778Schristos returns the current input line number (see '%option yylineno)', or 293130da1778Schristos '1' if '%option yylineno' was not used. 29323c3a7b76Schristos 293330da1778Schristos'void set_debug( int flag )' 29343c3a7b76Schristos sets the debugging flag for the scanner, equivalent to assigning to 293530da1778Schristos 'yy_flex_debug' (*note Scanner Options::). Note that you must 293630da1778Schristos build the scanner using '%option debug' to include debugging 29373c3a7b76Schristos information in it. 29383c3a7b76Schristos 293930da1778Schristos'int debug() const' 29403c3a7b76Schristos returns the current setting of the debugging flag. 29413c3a7b76Schristos 29423c3a7b76Schristos Also provided are member functions equivalent to 294330da1778Schristos'yy_switch_to_buffer()', 'yy_create_buffer()' (though the first argument 294430da1778Schristosis an 'istream&' object reference and not a 'FILE*)', 294530da1778Schristos'yy_flush_buffer()', 'yy_delete_buffer()', and 'yyrestart()' (again, the 294630da1778Schristosfirst argument is a 'istream&' object reference). 29473c3a7b76Schristos 294830da1778Schristos The second class defined in 'FlexLexer.h' is 'yyFlexLexer', which is 294930da1778Schristosderived from 'FlexLexer'. It defines the following additional member 29503c3a7b76Schristosfunctions: 29513c3a7b76Schristos 295230da1778Schristos'yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )' 295330da1778Schristos'yyFlexLexer( istream& arg_yyin, ostream& arg_yyout )' 295430da1778Schristos constructs a 'yyFlexLexer' object using the given streams for input 295530da1778Schristos and output. If not specified, the streams default to 'cin' and 295630da1778Schristos 'cout', respectively. 'yyFlexLexer' does not take ownership of its 295730da1778Schristos stream arguments. It's up to the user to ensure the streams 295830da1778Schristos pointed to remain alive at least as long as the 'yyFlexLexer' 295930da1778Schristos instance. 29603c3a7b76Schristos 296130da1778Schristos'virtual int yylex()' 296230da1778Schristos performs the same role is 'yylex()' does for ordinary 'flex' 29633c3a7b76Schristos scanners: it scans the input stream, consuming tokens, until a 296430da1778Schristos rule's action returns a value. If you derive a subclass 'S' from 296530da1778Schristos 'yyFlexLexer' and want to access the member functions and variables 296630da1778Schristos of 'S' inside 'yylex()', then you need to use '%option yyclass="S"' 296730da1778Schristos to inform 'flex' that you will be using that subclass instead of 296830da1778Schristos 'yyFlexLexer'. In this case, rather than generating 296930da1778Schristos 'yyFlexLexer::yylex()', 'flex' generates 'S::yylex()' (and also 297030da1778Schristos generates a dummy 'yyFlexLexer::yylex()' that calls 297130da1778Schristos 'yyFlexLexer::LexerError()' if called). 29723c3a7b76Schristos 297330da1778Schristos'virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)' 297430da1778Schristos'virtual void switch_streams(istream& new_in, ostream& new_out)' 297530da1778Schristos reassigns 'yyin' to 'new_in' (if non-null) and 'yyout' to 'new_out' 297630da1778Schristos (if non-null), deleting the previous input buffer if 'yyin' is 297730da1778Schristos reassigned. 29783c3a7b76Schristos 297930da1778Schristos'int yylex( istream* new_in, ostream* new_out = 0 )' 298030da1778Schristos'int yylex( istream& new_in, ostream& new_out )' 298130da1778Schristos first switches the input streams via 'switch_streams( new_in, 298230da1778Schristos new_out )' and then returns the value of 'yylex()'. 29833c3a7b76Schristos 298430da1778Schristos In addition, 'yyFlexLexer' defines the following protected virtual 29853c3a7b76Schristosfunctions which you can redefine in derived classes to tailor the 29863c3a7b76Schristosscanner: 29873c3a7b76Schristos 298830da1778Schristos'virtual int LexerInput( char* buf, int max_size )' 298930da1778Schristos reads up to 'max_size' characters into 'buf' and returns the number 299030da1778Schristos of characters read. To indicate end-of-input, return 0 characters. 299130da1778Schristos Note that 'interactive' scanners (see the '-B' and '-I' flags in 299230da1778Schristos *note Scanner Options::) define the macro 'YY_INTERACTIVE'. If you 299330da1778Schristos redefine 'LexerInput()' and need to take different actions 299430da1778Schristos depending on whether or not the scanner might be scanning an 299530da1778Schristos interactive input source, you can test for the presence of this 299630da1778Schristos name via '#ifdef' statements. 29973c3a7b76Schristos 299830da1778Schristos'virtual void LexerOutput( const char* buf, int size )' 299930da1778Schristos writes out 'size' characters from the buffer 'buf', which, while 300030da1778Schristos 'NUL'-terminated, may also contain internal 'NUL's if the scanner's 300130da1778Schristos rules can match text with 'NUL's in them. 30023c3a7b76Schristos 300330da1778Schristos'virtual void LexerError( const char* msg )' 30043c3a7b76Schristos reports a fatal error message. The default version of this 300530da1778Schristos function writes the message to the stream 'cerr' and exits. 30063c3a7b76Schristos 300730da1778Schristos Note that a 'yyFlexLexer' object contains its _entire_ scanning 30083c3a7b76Schristosstate. Thus you can use such objects to create reentrant scanners, but 3009dded093eSchristossee also *note Reentrant::. You can instantiate multiple instances of 301030da1778Schristosthe same 'yyFlexLexer' class, and you can also combine multiple C++ 301130da1778Schristosscanner classes together in the same program using the '-P' option 30123c3a7b76Schristosdiscussed above. 30133c3a7b76Schristos 301430da1778Schristos Finally, note that the '%array' feature is not available to C++ 301530da1778Schristosscanner classes; you must use '%pointer' (the default). 30163c3a7b76Schristos 30173c3a7b76Schristos Here is an example of a simple C++ scanner: 30183c3a7b76Schristos 30193c3a7b76Schristos // An example of using the flex C++ scanner class. 30203c3a7b76Schristos 30213c3a7b76Schristos %{ 3022dded093eSchristos #include <iostream> 3023dded093eSchristos using namespace std; 30243c3a7b76Schristos int mylineno = 0; 30253c3a7b76Schristos %} 30263c3a7b76Schristos 302730da1778Schristos %option noyywrap c++ 3028dded093eSchristos 30293c3a7b76Schristos string \"[^\n"]+\" 30303c3a7b76Schristos 30313c3a7b76Schristos ws [ \t]+ 30323c3a7b76Schristos 30333c3a7b76Schristos alpha [A-Za-z] 30343c3a7b76Schristos dig [0-9] 30353c3a7b76Schristos name ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])* 30363c3a7b76Schristos num1 [-+]?{dig}+\.?([eE][-+]?{dig}+)? 30373c3a7b76Schristos num2 [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)? 30383c3a7b76Schristos number {num1}|{num2} 30393c3a7b76Schristos 30403c3a7b76Schristos %% 30413c3a7b76Schristos 30423c3a7b76Schristos {ws} /* skip blanks and tabs */ 30433c3a7b76Schristos 30443c3a7b76Schristos "/*" { 30453c3a7b76Schristos int c; 30463c3a7b76Schristos 30473c3a7b76Schristos while((c = yyinput()) != 0) 30483c3a7b76Schristos { 30493c3a7b76Schristos if(c == '\n') 30503c3a7b76Schristos ++mylineno; 30513c3a7b76Schristos 3052dded093eSchristos else if(c == '*') 30533c3a7b76Schristos { 30543c3a7b76Schristos if((c = yyinput()) == '/') 30553c3a7b76Schristos break; 30563c3a7b76Schristos else 30573c3a7b76Schristos unput(c); 30583c3a7b76Schristos } 30593c3a7b76Schristos } 30603c3a7b76Schristos } 30613c3a7b76Schristos 3062dded093eSchristos {number} cout << "number " << YYText() << '\n'; 30633c3a7b76Schristos 30643c3a7b76Schristos \n mylineno++; 30653c3a7b76Schristos 3066dded093eSchristos {name} cout << "name " << YYText() << '\n'; 30673c3a7b76Schristos 3068dded093eSchristos {string} cout << "string " << YYText() << '\n'; 30693c3a7b76Schristos 30703c3a7b76Schristos %% 30713c3a7b76Schristos 307230da1778Schristos // This include is required if main() is an another source file. 307330da1778Schristos //#include <FlexLexer.h> 307430da1778Schristos 30753c3a7b76Schristos int main( int /* argc */, char** /* argv */ ) 30763c3a7b76Schristos { 3077dded093eSchristos FlexLexer* lexer = new yyFlexLexer; 30783c3a7b76Schristos while(lexer->yylex() != 0) 30793c3a7b76Schristos ; 30803c3a7b76Schristos return 0; 30813c3a7b76Schristos } 30823c3a7b76Schristos 30833c3a7b76Schristos If you want to create multiple (different) lexer classes, you use the 308430da1778Schristos'-P' flag (or the 'prefix=' option) to rename each 'yyFlexLexer' to some 308530da1778Schristosother 'xxFlexLexer'. You then can include '<FlexLexer.h>' in your other 308630da1778Schristossources once per lexer class, first renaming 'yyFlexLexer' as follows: 30873c3a7b76Schristos 30883c3a7b76Schristos #undef yyFlexLexer 30893c3a7b76Schristos #define yyFlexLexer xxFlexLexer 30903c3a7b76Schristos #include <FlexLexer.h> 30913c3a7b76Schristos 30923c3a7b76Schristos #undef yyFlexLexer 30933c3a7b76Schristos #define yyFlexLexer zzFlexLexer 30943c3a7b76Schristos #include <FlexLexer.h> 30953c3a7b76Schristos 309630da1778Schristos if, for example, you used '%option prefix="xx"' for one of your 309730da1778Schristosscanners and '%option prefix="zz"' for the other. 30983c3a7b76Schristos 30993c3a7b76Schristos 31003c3a7b76SchristosFile: flex.info, Node: Reentrant, Next: Lex and Posix, Prev: Cxx, Up: Top 31013c3a7b76Schristos 31023c3a7b76Schristos19 Reentrant C Scanners 31033c3a7b76Schristos*********************** 31043c3a7b76Schristos 310530da1778Schristos'flex' has the ability to generate a reentrant C scanner. This is 310630da1778Schristosaccomplished by specifying '%option reentrant' ('-R') The generated 31073c3a7b76Schristosscanner is both portable, and safe to use in one or more separate 31083c3a7b76Schristosthreads of control. The most common use for reentrant scanners is from 310930da1778Schristoswithin multi-threaded applications. Any thread may create and execute a 311030da1778Schristosreentrant 'flex' scanner without the need for synchronization with other 311130da1778Schristosthreads. 31123c3a7b76Schristos 31133c3a7b76Schristos* Menu: 31143c3a7b76Schristos 31153c3a7b76Schristos* Reentrant Uses:: 31163c3a7b76Schristos* Reentrant Overview:: 31173c3a7b76Schristos* Reentrant Example:: 31183c3a7b76Schristos* Reentrant Detail:: 31193c3a7b76Schristos* Reentrant Functions:: 31203c3a7b76Schristos 31213c3a7b76Schristos 31223c3a7b76SchristosFile: flex.info, Node: Reentrant Uses, Next: Reentrant Overview, Prev: Reentrant, Up: Reentrant 31233c3a7b76Schristos 31243c3a7b76Schristos19.1 Uses for Reentrant Scanners 31253c3a7b76Schristos================================ 31263c3a7b76Schristos 31273c3a7b76SchristosHowever, there are other uses for a reentrant scanner. For example, you 312830da1778Schristoscould scan two or more files simultaneously to implement a 'diff' at the 312930da1778Schristostoken level (i.e., instead of at the character level): 31303c3a7b76Schristos 31313c3a7b76Schristos /* Example of maintaining more than one active scanner. */ 31323c3a7b76Schristos 31333c3a7b76Schristos do { 31343c3a7b76Schristos int tok1, tok2; 31353c3a7b76Schristos 31363c3a7b76Schristos tok1 = yylex( scanner_1 ); 31373c3a7b76Schristos tok2 = yylex( scanner_2 ); 31383c3a7b76Schristos 31393c3a7b76Schristos if( tok1 != tok2 ) 31403c3a7b76Schristos printf("Files are different."); 31413c3a7b76Schristos 31423c3a7b76Schristos } while ( tok1 && tok2 ); 31433c3a7b76Schristos 31443c3a7b76Schristos Another use for a reentrant scanner is recursion. (Note that a 31453c3a7b76Schristosrecursive scanner can also be created using a non-reentrant scanner and 31463c3a7b76Schristosbuffer states. *Note Multiple Input Buffers::.) 31473c3a7b76Schristos 314830da1778Schristos The following crude scanner supports the 'eval' command by invoking 31493c3a7b76Schristosanother instance of itself. 31503c3a7b76Schristos 31513c3a7b76Schristos /* Example of recursive invocation. */ 31523c3a7b76Schristos 31533c3a7b76Schristos %option reentrant 31543c3a7b76Schristos 31553c3a7b76Schristos %% 31563c3a7b76Schristos "eval(".+")" { 31573c3a7b76Schristos yyscan_t scanner; 31583c3a7b76Schristos YY_BUFFER_STATE buf; 31593c3a7b76Schristos 31603c3a7b76Schristos yylex_init( &scanner ); 31613c3a7b76Schristos yytext[yyleng-1] = ' '; 31623c3a7b76Schristos 31633c3a7b76Schristos buf = yy_scan_string( yytext + 5, scanner ); 31643c3a7b76Schristos yylex( scanner ); 31653c3a7b76Schristos 31663c3a7b76Schristos yy_delete_buffer(buf,scanner); 31673c3a7b76Schristos yylex_destroy( scanner ); 31683c3a7b76Schristos } 31693c3a7b76Schristos ... 31703c3a7b76Schristos %% 31713c3a7b76Schristos 31723c3a7b76Schristos 31733c3a7b76SchristosFile: flex.info, Node: Reentrant Overview, Next: Reentrant Example, Prev: Reentrant Uses, Up: Reentrant 31743c3a7b76Schristos 31753c3a7b76Schristos19.2 An Overview of the Reentrant API 31763c3a7b76Schristos===================================== 31773c3a7b76Schristos 31783c3a7b76SchristosThe API for reentrant scanners is different than for non-reentrant 31793c3a7b76Schristosscanners. Here is a quick overview of the API: 31803c3a7b76Schristos 318130da1778Schristos '%option reentrant' must be specified. 31823c3a7b76Schristos 318330da1778Schristos * All functions take one additional argument: 'yyscanner' 31843c3a7b76Schristos 31853c3a7b76Schristos * All global variables are replaced by their macro equivalents. (We 31863c3a7b76Schristos tell you this because it may be important to you during debugging.) 31873c3a7b76Schristos 318830da1778Schristos * 'yylex_init' and 'yylex_destroy' must be called before and after 318930da1778Schristos 'yylex', respectively. 31903c3a7b76Schristos 31913c3a7b76Schristos * Accessor methods (get/set functions) provide access to common 319230da1778Schristos 'flex' variables. 31933c3a7b76Schristos 319430da1778Schristos * User-specific data can be stored in 'yyextra'. 31953c3a7b76Schristos 31963c3a7b76Schristos 31973c3a7b76SchristosFile: flex.info, Node: Reentrant Example, Next: Reentrant Detail, Prev: Reentrant Overview, Up: Reentrant 31983c3a7b76Schristos 31993c3a7b76Schristos19.3 Reentrant Example 32003c3a7b76Schristos====================== 32013c3a7b76Schristos 32023c3a7b76SchristosFirst, an example of a reentrant scanner: 32033c3a7b76Schristos /* This scanner prints "//" comments. */ 32043c3a7b76Schristos 32053c3a7b76Schristos %option reentrant stack noyywrap 32063c3a7b76Schristos %x COMMENT 32073c3a7b76Schristos 32083c3a7b76Schristos %% 32093c3a7b76Schristos 32103c3a7b76Schristos "//" yy_push_state( COMMENT, yyscanner); 32113c3a7b76Schristos .|\n 32123c3a7b76Schristos 32133c3a7b76Schristos <COMMENT>\n yy_pop_state( yyscanner ); 32143c3a7b76Schristos <COMMENT>[^\n]+ fprintf( yyout, "%s\n", yytext); 32153c3a7b76Schristos 32163c3a7b76Schristos %% 32173c3a7b76Schristos 32183c3a7b76Schristos int main ( int argc, char * argv[] ) 32193c3a7b76Schristos { 32203c3a7b76Schristos yyscan_t scanner; 32213c3a7b76Schristos 32223c3a7b76Schristos yylex_init ( &scanner ); 32233c3a7b76Schristos yylex ( scanner ); 32243c3a7b76Schristos yylex_destroy ( scanner ); 32253c3a7b76Schristos return 0; 32263c3a7b76Schristos } 32273c3a7b76Schristos 32283c3a7b76Schristos 32293c3a7b76SchristosFile: flex.info, Node: Reentrant Detail, Next: Reentrant Functions, Prev: Reentrant Example, Up: Reentrant 32303c3a7b76Schristos 32313c3a7b76Schristos19.4 The Reentrant API in Detail 32323c3a7b76Schristos================================ 32333c3a7b76Schristos 32343c3a7b76SchristosHere are the things you need to do or know to use the reentrant C API of 323530da1778Schristos'flex'. 32363c3a7b76Schristos 32373c3a7b76Schristos* Menu: 32383c3a7b76Schristos 32393c3a7b76Schristos* Specify Reentrant:: 32403c3a7b76Schristos* Extra Reentrant Argument:: 32413c3a7b76Schristos* Global Replacement:: 32423c3a7b76Schristos* Init and Destroy Functions:: 32433c3a7b76Schristos* Accessor Methods:: 32443c3a7b76Schristos* Extra Data:: 32453c3a7b76Schristos* About yyscan_t:: 32463c3a7b76Schristos 32473c3a7b76Schristos 32483c3a7b76SchristosFile: flex.info, Node: Specify Reentrant, Next: Extra Reentrant Argument, Prev: Reentrant Detail, Up: Reentrant Detail 32493c3a7b76Schristos 32503c3a7b76Schristos19.4.1 Declaring a Scanner As Reentrant 32513c3a7b76Schristos--------------------------------------- 32523c3a7b76Schristos 32533c3a7b76Schristos%option reentrant (-reentrant) must be specified. 32543c3a7b76Schristos 325530da1778Schristos Notice that '%option reentrant' is specified in the above example 325630da1778Schristos(*note Reentrant Example::. Had this option not been specified, 'flex' 32573c3a7b76Schristoswould have happily generated a non-reentrant scanner without 325830da1778Schristoscomplaining. You may explicitly specify '%option noreentrant', if you 32593c3a7b76Schristosdo _not_ want a reentrant scanner, although it is not necessary. The 32603c3a7b76Schristosdefault is to generate a non-reentrant scanner. 32613c3a7b76Schristos 32623c3a7b76Schristos 32633c3a7b76SchristosFile: flex.info, Node: Extra Reentrant Argument, Next: Global Replacement, Prev: Specify Reentrant, Up: Reentrant Detail 32643c3a7b76Schristos 32653c3a7b76Schristos19.4.2 The Extra Argument 32663c3a7b76Schristos------------------------- 32673c3a7b76Schristos 326830da1778SchristosAll functions take one additional argument: 'yyscanner'. 32693c3a7b76Schristos 327030da1778Schristos Notice that the calls to 'yy_push_state' and 'yy_pop_state' both have 327130da1778Schristosan argument, 'yyscanner' , that is not present in a non-reentrant 327230da1778Schristosscanner. Here are the declarations of 'yy_push_state' and 327330da1778Schristos'yy_pop_state' in the reentrant scanner: 32743c3a7b76Schristos 32753c3a7b76Schristos static void yy_push_state ( int new_state , yyscan_t yyscanner ) ; 32763c3a7b76Schristos static void yy_pop_state ( yyscan_t yyscanner ) ; 32773c3a7b76Schristos 327830da1778Schristos Notice that the argument 'yyscanner' appears in the declaration of 327930da1778Schristosboth functions. In fact, all 'flex' functions in a reentrant scanner 32803c3a7b76Schristoshave this additional argument. It is always the last argument in the 328130da1778Schristosargument list, it is always of type 'yyscan_t' (which is typedef'd to 328230da1778Schristos'void *') and it is always named 'yyscanner'. As you may have guessed, 328330da1778Schristos'yyscanner' is a pointer to an opaque data structure encapsulating the 32843c3a7b76Schristoscurrent state of the scanner. For a list of function declarations, see 3285dded093eSchristos*note Reentrant Functions::. Note that preprocessor macros, such as 328630da1778Schristos'BEGIN', 'ECHO', and 'REJECT', do not take this additional argument. 32873c3a7b76Schristos 32883c3a7b76Schristos 32893c3a7b76SchristosFile: flex.info, Node: Global Replacement, Next: Init and Destroy Functions, Prev: Extra Reentrant Argument, Up: Reentrant Detail 32903c3a7b76Schristos 32913c3a7b76Schristos19.4.3 Global Variables Replaced By Macros 32923c3a7b76Schristos------------------------------------------ 32933c3a7b76Schristos 32943c3a7b76SchristosAll global variables in traditional flex have been replaced by macro 32953c3a7b76Schristosequivalents. 32963c3a7b76Schristos 329730da1778Schristos Note that in the above example, 'yyout' and 'yytext' are not plain 329830da1778Schristosvariables. These are macros that will expand to their equivalent 329930da1778Schristoslvalue. All of the familiar 'flex' globals have been replaced by their 330030da1778Schristosmacro equivalents. In particular, 'yytext', 'yyleng', 'yylineno', 330130da1778Schristos'yyin', 'yyout', 'yyextra', 'yylval', and 'yylloc' are macros. You may 330230da1778Schristossafely use these macros in actions as if they were plain variables. We 330330da1778Schristosonly tell you this so you don't expect to link to these variables 33043c3a7b76Schristosexternally. Currently, each macro expands to a member of an internal 33053c3a7b76Schristosstruct, e.g., 33063c3a7b76Schristos 33073c3a7b76Schristos #define yytext (((struct yyguts_t*)yyscanner)->yytext_r) 33083c3a7b76Schristos 330930da1778Schristos One important thing to remember about 'yytext' and friends is that 331030da1778Schristos'yytext' is not a global variable in a reentrant scanner, you can not 33113c3a7b76Schristosaccess it directly from outside an action or from other functions. You 331230da1778Schristosmust use an accessor method, e.g., 'yyget_text', to accomplish this. 33133c3a7b76Schristos(See below). 33143c3a7b76Schristos 33153c3a7b76Schristos 33163c3a7b76SchristosFile: flex.info, Node: Init and Destroy Functions, Next: Accessor Methods, Prev: Global Replacement, Up: Reentrant Detail 33173c3a7b76Schristos 33183c3a7b76Schristos19.4.4 Init and Destroy Functions 33193c3a7b76Schristos--------------------------------- 33203c3a7b76Schristos 332130da1778Schristos'yylex_init' and 'yylex_destroy' must be called before and after 332230da1778Schristos'yylex', respectively. 33233c3a7b76Schristos 33243c3a7b76Schristos int yylex_init ( yyscan_t * ptr_yy_globals ) ; 33253c3a7b76Schristos int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ; 33263c3a7b76Schristos int yylex ( yyscan_t yyscanner ) ; 33273c3a7b76Schristos int yylex_destroy ( yyscan_t yyscanner ) ; 33283c3a7b76Schristos 332930da1778Schristos The function 'yylex_init' must be called before calling any other 333030da1778Schristosfunction. The argument to 'yylex_init' is the address of an 333130da1778Schristosuninitialized pointer to be filled in by 'yylex_init', overwriting any 333230da1778Schristosprevious contents. The function 'yylex_init_extra' may be used instead, 333330da1778Schristostaking as its first argument a variable of type 'YY_EXTRA_TYPE'. See 33343c3a7b76Schristosthe section on yyextra, below, for more details. 33353c3a7b76Schristos 333630da1778Schristos The value stored in 'ptr_yy_globals' should thereafter be passed to 333730da1778Schristos'yylex' and 'yylex_destroy'. Flex does not save the argument passed to 333830da1778Schristos'yylex_init', so it is safe to pass the address of a local pointer to 333930da1778Schristos'yylex_init' so long as it remains in scope for the duration of all 334030da1778Schristoscalls to the scanner, up to and including the call to 'yylex_destroy'. 33413c3a7b76Schristos 334230da1778Schristos The function 'yylex' should be familiar to you by now. The reentrant 33433c3a7b76Schristosversion takes one argument, which is the value returned (via an 334430da1778Schristosargument) by 'yylex_init'. Otherwise, it behaves the same as the 334530da1778Schristosnon-reentrant version of 'yylex'. 33463c3a7b76Schristos 334730da1778Schristos Both 'yylex_init' and 'yylex_init_extra' returns 0 (zero) on success, 33483c3a7b76Schristosor non-zero on failure, in which case errno is set to one of the 33493c3a7b76Schristosfollowing values: 33503c3a7b76Schristos 33513c3a7b76Schristos * ENOMEM Memory allocation error. *Note memory-management::. 33523c3a7b76Schristos * EINVAL Invalid argument. 33533c3a7b76Schristos 335430da1778Schristos The function 'yylex_destroy' should be called to free resources used 335530da1778Schristosby the scanner. After 'yylex_destroy' is called, the contents of 335630da1778Schristos'yyscanner' should not be used. Of course, there is no need to destroy 335730da1778Schristosa scanner if you plan to reuse it. A 'flex' scanner (both reentrant and 335830da1778Schristosnon-reentrant) may be restarted by calling 'yyrestart'. 33593c3a7b76Schristos 33603c3a7b76Schristos Below is an example of a program that creates a scanner, uses it, 33613c3a7b76Schristosthen destroys it when done: 33623c3a7b76Schristos 33633c3a7b76Schristos int main () 33643c3a7b76Schristos { 33653c3a7b76Schristos yyscan_t scanner; 33663c3a7b76Schristos int tok; 33673c3a7b76Schristos 33683c3a7b76Schristos yylex_init(&scanner); 33693c3a7b76Schristos 3370dded093eSchristos while ((tok=yylex(scanner)) > 0) 33713c3a7b76Schristos printf("tok=%d yytext=%s\n", tok, yyget_text(scanner)); 33723c3a7b76Schristos 33733c3a7b76Schristos yylex_destroy(scanner); 33743c3a7b76Schristos return 0; 33753c3a7b76Schristos } 33763c3a7b76Schristos 33773c3a7b76Schristos 33783c3a7b76SchristosFile: flex.info, Node: Accessor Methods, Next: Extra Data, Prev: Init and Destroy Functions, Up: Reentrant Detail 33793c3a7b76Schristos 33803c3a7b76Schristos19.4.5 Accessing Variables with Reentrant Scanners 33813c3a7b76Schristos-------------------------------------------------- 33823c3a7b76Schristos 338330da1778SchristosAccessor methods (get/set functions) provide access to common 'flex' 33843c3a7b76Schristosvariables. 33853c3a7b76Schristos 33863c3a7b76Schristos Many scanners that you build will be part of a larger project. 338730da1778SchristosPortions of your project will need access to 'flex' values, such as 338830da1778Schristos'yytext'. In a non-reentrant scanner, these values are global, so there 338930da1778Schristosis no problem accessing them. However, in a reentrant scanner, there 339030da1778Schristosare no global 'flex' values. You can not access them directly. 339130da1778SchristosInstead, you must access 'flex' values using accessor methods (get/set 339230da1778Schristosfunctions). Each accessor method is named 'yyget_NAME' or 'yyset_NAME', 339330da1778Schristoswhere 'NAME' is the name of the 'flex' variable you want. For example: 33943c3a7b76Schristos 33953c3a7b76Schristos /* Set the last character of yytext to NULL. */ 33963c3a7b76Schristos void chop ( yyscan_t scanner ) 33973c3a7b76Schristos { 33983c3a7b76Schristos int len = yyget_leng( scanner ); 33993c3a7b76Schristos yyget_text( scanner )[len - 1] = '\0'; 34003c3a7b76Schristos } 34013c3a7b76Schristos 34023c3a7b76Schristos The above code may be called from within an action like this: 34033c3a7b76Schristos 34043c3a7b76Schristos %% 34053c3a7b76Schristos .+\n { chop( yyscanner );} 34063c3a7b76Schristos 340730da1778Schristos You may find that '%option header-file' is particularly useful for 34083c3a7b76Schristosgenerating prototypes of all the accessor functions. *Note 34093c3a7b76Schristosoption-header::. 34103c3a7b76Schristos 34113c3a7b76Schristos 34123c3a7b76SchristosFile: flex.info, Node: Extra Data, Next: About yyscan_t, Prev: Accessor Methods, Up: Reentrant Detail 34133c3a7b76Schristos 34143c3a7b76Schristos19.4.6 Extra Data 34153c3a7b76Schristos----------------- 34163c3a7b76Schristos 341730da1778SchristosUser-specific data can be stored in 'yyextra'. 34183c3a7b76Schristos 34193c3a7b76Schristos In a reentrant scanner, it is unwise to use global variables to 34203c3a7b76Schristoscommunicate with or maintain state between different pieces of your 34213c3a7b76Schristosprogram. However, you may need access to external data or invoke 34223c3a7b76Schristosexternal functions from within the scanner actions. Likewise, you may 34233c3a7b76Schristosneed to pass information to your scanner (e.g., open file descriptors, 34243c3a7b76Schristosor database connections). In a non-reentrant scanner, the only way to 342530da1778Schristosdo this would be through the use of global variables. 'Flex' allows you 342630da1778Schristosto store arbitrary, "extra" data in a scanner. This data is accessible 342730da1778Schristosthrough the accessor methods 'yyget_extra' and 'yyset_extra' from 342830da1778Schristosoutside the scanner, and through the shortcut macro 'yyextra' from 34293c3a7b76Schristoswithin the scanner itself. They are defined as follows: 34303c3a7b76Schristos 34313c3a7b76Schristos #define YY_EXTRA_TYPE void* 34323c3a7b76Schristos YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner ); 34333c3a7b76Schristos void yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner); 34343c3a7b76Schristos 343530da1778Schristos In addition, an extra form of 'yylex_init' is provided, 343630da1778Schristos'yylex_init_extra'. This function is provided so that the yyextra value 343730da1778Schristoscan be accessed from within the very first yyalloc, used to allocate the 343830da1778Schristosscanner itself. 34393c3a7b76Schristos 344030da1778Schristos By default, 'YY_EXTRA_TYPE' is defined as type 'void *'. You may 344130da1778Schristosredefine this type using '%option extra-type="your_type"' in the 34423c3a7b76Schristosscanner: 34433c3a7b76Schristos 34443c3a7b76Schristos /* An example of overriding YY_EXTRA_TYPE. */ 34453c3a7b76Schristos %{ 34463c3a7b76Schristos #include <sys/stat.h> 34473c3a7b76Schristos #include <unistd.h> 34483c3a7b76Schristos %} 34493c3a7b76Schristos %option reentrant 34503c3a7b76Schristos %option extra-type="struct stat *" 34513c3a7b76Schristos %% 34523c3a7b76Schristos 34533c3a7b76Schristos __filesize__ printf( "%ld", yyextra->st_size ); 34543c3a7b76Schristos __lastmod__ printf( "%ld", yyextra->st_mtime ); 34553c3a7b76Schristos %% 34563c3a7b76Schristos void scan_file( char* filename ) 34573c3a7b76Schristos { 34583c3a7b76Schristos yyscan_t scanner; 34593c3a7b76Schristos struct stat buf; 34603c3a7b76Schristos FILE *in; 34613c3a7b76Schristos 34623c3a7b76Schristos in = fopen( filename, "r" ); 34633c3a7b76Schristos stat( filename, &buf ); 34643c3a7b76Schristos 34653c3a7b76Schristos yylex_init_extra( buf, &scanner ); 34663c3a7b76Schristos yyset_in( in, scanner ); 34673c3a7b76Schristos yylex( scanner ); 34683c3a7b76Schristos yylex_destroy( scanner ); 34693c3a7b76Schristos 34703c3a7b76Schristos fclose( in ); 34713c3a7b76Schristos } 34723c3a7b76Schristos 34733c3a7b76Schristos 34743c3a7b76SchristosFile: flex.info, Node: About yyscan_t, Prev: Extra Data, Up: Reentrant Detail 34753c3a7b76Schristos 34763c3a7b76Schristos19.4.7 About yyscan_t 34773c3a7b76Schristos--------------------- 34783c3a7b76Schristos 347930da1778Schristos'yyscan_t' is defined as: 34803c3a7b76Schristos 34813c3a7b76Schristos typedef void* yyscan_t; 34823c3a7b76Schristos 348330da1778Schristos It is initialized by 'yylex_init()' to point to an internal 34843c3a7b76Schristosstructure. You should never access this value directly. In particular, 348530da1778Schristosyou should never attempt to free it (use 'yylex_destroy()' instead.) 34863c3a7b76Schristos 34873c3a7b76Schristos 34883c3a7b76SchristosFile: flex.info, Node: Reentrant Functions, Prev: Reentrant Detail, Up: Reentrant 34893c3a7b76Schristos 34903c3a7b76Schristos19.5 Functions and Macros Available in Reentrant C Scanners 34913c3a7b76Schristos=========================================================== 34923c3a7b76Schristos 34933c3a7b76SchristosThe following Functions are available in a reentrant scanner: 34943c3a7b76Schristos 34953c3a7b76Schristos char *yyget_text ( yyscan_t scanner ); 34963c3a7b76Schristos int yyget_leng ( yyscan_t scanner ); 34973c3a7b76Schristos FILE *yyget_in ( yyscan_t scanner ); 34983c3a7b76Schristos FILE *yyget_out ( yyscan_t scanner ); 34993c3a7b76Schristos int yyget_lineno ( yyscan_t scanner ); 35003c3a7b76Schristos YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner ); 35013c3a7b76Schristos int yyget_debug ( yyscan_t scanner ); 35023c3a7b76Schristos 35033c3a7b76Schristos void yyset_debug ( int flag, yyscan_t scanner ); 35043c3a7b76Schristos void yyset_in ( FILE * in_str , yyscan_t scanner ); 35053c3a7b76Schristos void yyset_out ( FILE * out_str , yyscan_t scanner ); 35063c3a7b76Schristos void yyset_lineno ( int line_number , yyscan_t scanner ); 35073c3a7b76Schristos void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner ); 35083c3a7b76Schristos 35093c3a7b76Schristos There are no "set" functions for yytext and yyleng. This is 35103c3a7b76Schristosintentional. 35113c3a7b76Schristos 35123c3a7b76Schristos The following Macro shortcuts are available in actions in a reentrant 35133c3a7b76Schristosscanner: 35143c3a7b76Schristos 35153c3a7b76Schristos yytext 35163c3a7b76Schristos yyleng 35173c3a7b76Schristos yyin 35183c3a7b76Schristos yyout 35193c3a7b76Schristos yylineno 35203c3a7b76Schristos yyextra 35213c3a7b76Schristos yy_flex_debug 35223c3a7b76Schristos 35233c3a7b76Schristos In a reentrant C scanner, support for yylineno is always present 35243c3a7b76Schristos(i.e., you may access yylineno), but the value is never modified by 352530da1778Schristos'flex' unless '%option yylineno' is enabled. This is to allow the user 352630da1778Schristosto maintain the line count independently of 'flex'. 35273c3a7b76Schristos 352830da1778Schristos The following functions and macros are made available when '%option 352930da1778Schristosbison-bridge' ('--bison-bridge') is specified: 35303c3a7b76Schristos 35313c3a7b76Schristos YYSTYPE * yyget_lval ( yyscan_t scanner ); 35323c3a7b76Schristos void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner ); 35333c3a7b76Schristos yylval 35343c3a7b76Schristos 353530da1778Schristos The following functions and macros are made available when '%option 353630da1778Schristosbison-locations' ('--bison-locations') is specified: 35373c3a7b76Schristos 35383c3a7b76Schristos YYLTYPE *yyget_lloc ( yyscan_t scanner ); 35393c3a7b76Schristos void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner ); 35403c3a7b76Schristos yylloc 35413c3a7b76Schristos 354230da1778Schristos Support for yylval assumes that 'YYSTYPE' is a valid type. Support 354330da1778Schristosfor yylloc assumes that 'YYSLYPE' is a valid type. Typically, these 354430da1778Schristostypes are generated by 'bison', and are included in section 1 of the 354530da1778Schristos'flex' input. 35463c3a7b76Schristos 35473c3a7b76Schristos 35483c3a7b76SchristosFile: flex.info, Node: Lex and Posix, Next: Memory Management, Prev: Reentrant, Up: Top 35493c3a7b76Schristos 35503c3a7b76Schristos20 Incompatibilities with Lex and Posix 35513c3a7b76Schristos*************************************** 35523c3a7b76Schristos 355330da1778Schristos'flex' is a rewrite of the AT&T Unix _lex_ tool (the two implementations 355430da1778Schristosdo not share any code, though), with some extensions and 35553c3a7b76Schristosincompatibilities, both of which are of concern to those who wish to 355630da1778Schristoswrite scanners acceptable to both implementations. 'flex' is fully 355730da1778Schristoscompliant with the POSIX 'lex' specification, except that when using 355830da1778Schristos'%pointer' (the default), a call to 'unput()' destroys the contents of 355930da1778Schristos'yytext', which is counter to the POSIX specification. In this section 356030da1778Schristoswe discuss all of the known areas of incompatibility between 'flex', 356130da1778SchristosAT&T 'lex', and the POSIX specification. 'flex''s '-l' option turns on 356230da1778Schristosmaximum compatibility with the original AT&T 'lex' implementation, at 35633c3a7b76Schristosthe cost of a major loss in the generated scanner's performance. We 356430da1778Schristosnote below which incompatibilities can be overcome using the '-l' 356530da1778Schristosoption. 'flex' is fully compatible with 'lex' with the following 35663c3a7b76Schristosexceptions: 35673c3a7b76Schristos 356830da1778Schristos * The undocumented 'lex' scanner internal variable 'yylineno' is not 356930da1778Schristos supported unless '-l' or '%option yylineno' is used. 35703c3a7b76Schristos 357130da1778Schristos * 'yylineno' should be maintained on a per-buffer basis, rather than 35723c3a7b76Schristos a per-scanner (single global variable) basis. 35733c3a7b76Schristos 357430da1778Schristos * 'yylineno' is not part of the POSIX specification. 35753c3a7b76Schristos 357630da1778Schristos * The 'input()' routine is not redefinable, though it may be called 35773c3a7b76Schristos to read characters following whatever has been matched by a rule. 357830da1778Schristos If 'input()' encounters an end-of-file the normal 'yywrap()' 357930da1778Schristos processing is done. A "real" end-of-file is returned by 'input()' 358030da1778Schristos as 'EOF'. 35813c3a7b76Schristos 358230da1778Schristos * Input is instead controlled by defining the 'YY_INPUT()' macro. 35833c3a7b76Schristos 358430da1778Schristos * The 'flex' restriction that 'input()' cannot be redefined is in 35853c3a7b76Schristos accordance with the POSIX specification, which simply does not 35863c3a7b76Schristos specify any way of controlling the scanner's input other than by 358730da1778Schristos making an initial assignment to 'yyin'. 35883c3a7b76Schristos 358930da1778Schristos * The 'unput()' routine is not redefinable. This restriction is in 35903c3a7b76Schristos accordance with POSIX. 35913c3a7b76Schristos 359230da1778Schristos * 'flex' scanners are not as reentrant as 'lex' scanners. In 35933c3a7b76Schristos particular, if you have an interactive scanner and an interrupt 35943c3a7b76Schristos handler which long-jumps out of the scanner, and the scanner is 35953c3a7b76Schristos subsequently called again, you may get the following message: 35963c3a7b76Schristos 3597dded093eSchristos fatal flex scanner internal error--end of buffer missed 35983c3a7b76Schristos 35993c3a7b76Schristos To reenter the scanner, first use: 36003c3a7b76Schristos 36013c3a7b76Schristos yyrestart( yyin ); 36023c3a7b76Schristos 36033c3a7b76Schristos Note that this call will throw away any buffered input; usually 36043c3a7b76Schristos this isn't a problem with an interactive scanner. *Note 360530da1778Schristos Reentrant::, for 'flex''s reentrant API. 36063c3a7b76Schristos 360730da1778Schristos * Also note that 'flex' C++ scanner classes _are_ reentrant, so if 360830da1778Schristos using C++ is an option for you, you should use them instead. *Note 360930da1778Schristos Cxx::, and *note Reentrant:: for details. 36103c3a7b76Schristos 361130da1778Schristos * 'output()' is not supported. Output from the ECHO macro is done to 361230da1778Schristos the file-pointer 'yyout' (default 'stdout)'. 36133c3a7b76Schristos 361430da1778Schristos * 'output()' is not part of the POSIX specification. 36153c3a7b76Schristos 361630da1778Schristos * 'lex' does not support exclusive start conditions (%x), though they 36173c3a7b76Schristos are in the POSIX specification. 36183c3a7b76Schristos 361930da1778Schristos * When definitions are expanded, 'flex' encloses them in parentheses. 362030da1778Schristos With 'lex', the following: 36213c3a7b76Schristos 36223c3a7b76Schristos NAME [A-Z][A-Z0-9]* 36233c3a7b76Schristos %% 36243c3a7b76Schristos foo{NAME}? printf( "Found it\n" ); 36253c3a7b76Schristos %% 36263c3a7b76Schristos 362730da1778Schristos will not match the string 'foo' because when the macro is expanded 362830da1778Schristos the rule is equivalent to 'foo[A-Z][A-Z0-9]*?' and the precedence 362930da1778Schristos is such that the '?' is associated with '[A-Z0-9]*'. With 'flex', 363030da1778Schristos the rule will be expanded to 'foo([A-Z][A-Z0-9]*)?' and so the 363130da1778Schristos string 'foo' will match. 36323c3a7b76Schristos 363330da1778Schristos * Note that if the definition begins with '^' or ends with '$' then 36343c3a7b76Schristos it is _not_ expanded with parentheses, to allow these operators to 36353c3a7b76Schristos appear in definitions without losing their special meanings. But 363630da1778Schristos the '<s>', '/', and '<<EOF>>' operators cannot be used in a 'flex' 36373c3a7b76Schristos definition. 36383c3a7b76Schristos 363930da1778Schristos * Using '-l' results in the 'lex' behavior of no parentheses around 36403c3a7b76Schristos the definition. 36413c3a7b76Schristos 36423c3a7b76Schristos * The POSIX specification is that the definition be enclosed in 36433c3a7b76Schristos parentheses. 36443c3a7b76Schristos 364530da1778Schristos * Some implementations of 'lex' allow a rule's action to begin on a 36463c3a7b76Schristos separate line, if the rule's pattern has trailing whitespace: 36473c3a7b76Schristos 36483c3a7b76Schristos %% 36493c3a7b76Schristos foo|bar<space here> 36503c3a7b76Schristos { foobar_action();} 36513c3a7b76Schristos 365230da1778Schristos 'flex' does not support this feature. 36533c3a7b76Schristos 365430da1778Schristos * The 'lex' '%r' (generate a Ratfor scanner) option is not supported. 365530da1778Schristos It is not part of the POSIX specification. 36563c3a7b76Schristos 365730da1778Schristos * After a call to 'unput()', _yytext_ is undefined until the next 365830da1778Schristos token is matched, unless the scanner was built using '%array'. 365930da1778Schristos This is not the case with 'lex' or the POSIX specification. The 366030da1778Schristos '-l' option does away with this incompatibility. 36613c3a7b76Schristos 366230da1778Schristos * The precedence of the '{,}' (numeric range) operator is different. 366330da1778Schristos The AT&T and POSIX specifications of 'lex' interpret 'abc{1,3}' as 366430da1778Schristos match one, two, or three occurrences of 'abc'", whereas 'flex' 366530da1778Schristos interprets it as "match 'ab' followed by one, two, or three 366630da1778Schristos occurrences of 'c'". The '-l' and '--posix' options do away with 36673c3a7b76Schristos this incompatibility. 36683c3a7b76Schristos 366930da1778Schristos * The precedence of the '^' operator is different. 'lex' interprets 367030da1778Schristos '^foo|bar' as "match either 'foo' at the beginning of a line, or 367130da1778Schristos 'bar' anywhere", whereas 'flex' interprets it as "match either 367230da1778Schristos 'foo' or 'bar' if they come at the beginning of a line". The 36733c3a7b76Schristos latter is in agreement with the POSIX specification. 36743c3a7b76Schristos 367530da1778Schristos * The special table-size declarations such as '%a' supported by 'lex' 367630da1778Schristos are not required by 'flex' scanners.. 'flex' ignores them. 367730da1778Schristos * The name 'FLEX_SCANNER' is '#define''d so scanners may be written 367830da1778Schristos for use with either 'flex' or 'lex'. Scanners also include 367930da1778Schristos 'YY_FLEX_MAJOR_VERSION', 'YY_FLEX_MINOR_VERSION' and 368030da1778Schristos 'YY_FLEX_SUBMINOR_VERSION' indicating which version of 'flex' 36813c3a7b76Schristos generated the scanner. For example, for the 2.5.22 release, these 36823c3a7b76Schristos defines would be 2, 5 and 22 respectively. If the version of 368330da1778Schristos 'flex' being used is a beta version, then the symbol 'FLEX_BETA' is 368430da1778Schristos defined. 36853c3a7b76Schristos 368630da1778Schristos * The symbols '[[' and ']]' in the code sections of the input may 36873c3a7b76Schristos conflict with the m4 delimiters. *Note M4 Dependency::. 36883c3a7b76Schristos 368930da1778Schristos The following 'flex' features are not included in 'lex' or the POSIX 36903c3a7b76Schristosspecification: 36913c3a7b76Schristos 36923c3a7b76Schristos * C++ scanners 36933c3a7b76Schristos * %option 36943c3a7b76Schristos * start condition scopes 36953c3a7b76Schristos * start condition stacks 36963c3a7b76Schristos * interactive/non-interactive scanners 36973c3a7b76Schristos * yy_scan_string() and friends 36983c3a7b76Schristos * yyterminate() 36993c3a7b76Schristos * yy_set_interactive() 37003c3a7b76Schristos * yy_set_bol() 37013c3a7b76Schristos * YY_AT_BOL() <<EOF>> 37023c3a7b76Schristos * <*> 37033c3a7b76Schristos * YY_DECL 37043c3a7b76Schristos * YY_START 37053c3a7b76Schristos * YY_USER_ACTION 37063c3a7b76Schristos * YY_USER_INIT 37073c3a7b76Schristos * #line directives 37083c3a7b76Schristos * %{}'s around actions 37093c3a7b76Schristos * reentrant C API 37103c3a7b76Schristos * multiple actions on a line 371130da1778Schristos * almost all of the 'flex' command-line options 37123c3a7b76Schristos 371330da1778Schristos The feature "multiple actions on a line" refers to the fact that with 371430da1778Schristos'flex' you can put multiple actions on the same line, separated with 371530da1778Schristossemi-colons, while with 'lex', the following: 37163c3a7b76Schristos 37173c3a7b76Schristos foo handle_foo(); ++num_foos_seen; 37183c3a7b76Schristos 37193c3a7b76Schristos is (rather surprisingly) truncated to 37203c3a7b76Schristos 37213c3a7b76Schristos foo handle_foo(); 37223c3a7b76Schristos 372330da1778Schristos 'flex' does not truncate the action. Actions that are not enclosed 37243c3a7b76Schristosin braces are simply terminated at the end of the line. 37253c3a7b76Schristos 37263c3a7b76Schristos 37273c3a7b76SchristosFile: flex.info, Node: Memory Management, Next: Serialized Tables, Prev: Lex and Posix, Up: Top 37283c3a7b76Schristos 37293c3a7b76Schristos21 Memory Management 37303c3a7b76Schristos******************** 37313c3a7b76Schristos 37323c3a7b76SchristosThis chapter describes how flex handles dynamic memory, and how you can 37333c3a7b76Schristosoverride the default behavior. 37343c3a7b76Schristos 37353c3a7b76Schristos* Menu: 37363c3a7b76Schristos 37373c3a7b76Schristos* The Default Memory Management:: 37383c3a7b76Schristos* Overriding The Default Memory Management:: 37393c3a7b76Schristos* A Note About yytext And Memory:: 37403c3a7b76Schristos 37413c3a7b76Schristos 37423c3a7b76SchristosFile: flex.info, Node: The Default Memory Management, Next: Overriding The Default Memory Management, Prev: Memory Management, Up: Memory Management 37433c3a7b76Schristos 37443c3a7b76Schristos21.1 The Default Memory Management 37453c3a7b76Schristos================================== 37463c3a7b76Schristos 374730da1778SchristosFlex allocates dynamic memory during initialization, and once in a while 374830da1778Schristosfrom within a call to yylex(). Initialization takes place during the 374930da1778Schristosfirst call to yylex(). Thereafter, flex may reallocate more memory if 375030da1778Schristosit needs to enlarge a buffer. As of version 2.5.9 Flex will clean up 375130da1778Schristosall memory when you call 'yylex_destroy' *Note faq-memory-leak::. 37523c3a7b76Schristos 37533c3a7b76Schristos Flex allocates dynamic memory for four purposes, listed below (1) 37543c3a7b76Schristos 37553c3a7b76Schristos16kB for the input buffer. 37563c3a7b76Schristos Flex allocates memory for the character buffer used to perform 37573c3a7b76Schristos pattern matching. Flex must read ahead from the input stream and 375830da1778Schristos store it in a large character buffer. This buffer is typically the 375930da1778Schristos largest chunk of dynamic memory flex consumes. This buffer will 376030da1778Schristos grow if necessary, doubling the size each time. Flex frees this 376130da1778Schristos memory when you call yylex_destroy(). The default size of this 376230da1778Schristos buffer (16384 bytes) is almost always too large. The ideal size 376330da1778Schristos for this buffer is the length of the longest token expected, in 376430da1778Schristos bytes, plus a little more. Flex will allocate a few extra bytes 376530da1778Schristos for housekeeping. Currently, to override the size of the input 376630da1778Schristos buffer you must '#define YY_BUF_SIZE' to whatever number of bytes 376730da1778Schristos you want. We don't plan to change this in the near future, but we 376830da1778Schristos reserve the right to do so if we ever add a more robust memory 376930da1778Schristos management API. 37703c3a7b76Schristos 37713c3a7b76Schristos64kb for the REJECT state. This will only be allocated if you use REJECT. 3772dded093eSchristos The size is large enough to hold the same number of states as 37733c3a7b76Schristos characters in the input buffer. If you override the size of the 377430da1778Schristos input buffer (via 'YY_BUF_SIZE'), then you automatically override 37753c3a7b76Schristos the size of this buffer as well. 37763c3a7b76Schristos 37773c3a7b76Schristos100 bytes for the start condition stack. 37783c3a7b76Schristos Flex allocates memory for the start condition stack. This is the 37793c3a7b76Schristos stack used for pushing start states, i.e., with yy_push_state(). 37803c3a7b76Schristos It will grow if necessary. Since the states are simply integers, 37813c3a7b76Schristos this stack doesn't consume much memory. This stack is not present 378230da1778Schristos if '%option stack' is not specified. You will rarely need to tune 37833c3a7b76Schristos this buffer. The ideal size for this stack is the maximum depth 37843c3a7b76Schristos expected. The memory for this stack is automatically destroyed 37853c3a7b76Schristos when you call yylex_destroy(). *Note option-stack::. 37863c3a7b76Schristos 37873c3a7b76Schristos40 bytes for each YY_BUFFER_STATE. 37883c3a7b76Schristos Flex allocates memory for each YY_BUFFER_STATE. The buffer state 378930da1778Schristos itself is about 40 bytes, plus an additional large character buffer 379030da1778Schristos (described above.) The initial buffer state is created during 379130da1778Schristos initialization, and with each call to yy_create_buffer(). You 379230da1778Schristos can't tune the size of this, but you can tune the character buffer 379330da1778Schristos as described above. Any buffer state that you explicitly create by 379430da1778Schristos calling yy_create_buffer() is _NOT_ destroyed automatically. You 379530da1778Schristos must call yy_delete_buffer() to free the memory. The exception to 379630da1778Schristos this rule is that flex will delete the current buffer automatically 379730da1778Schristos when you call yylex_destroy(). If you delete the current buffer, 379830da1778Schristos be sure to set it to NULL. That way, flex will not try to delete 379930da1778Schristos the buffer a second time (possibly crashing your program!) At the 380030da1778Schristos time of this writing, flex does not provide a growable stack for 380130da1778Schristos the buffer states. You have to manage that yourself. *Note 380230da1778Schristos Multiple Input Buffers::. 38033c3a7b76Schristos 38043c3a7b76Schristos84 bytes for the reentrant scanner guts 38053c3a7b76Schristos Flex allocates about 84 bytes for the reentrant scanner structure 38063c3a7b76Schristos when you call yylex_init(). It is destroyed when the user calls 38073c3a7b76Schristos yylex_destroy(). 38083c3a7b76Schristos 38093c3a7b76Schristos ---------- Footnotes ---------- 38103c3a7b76Schristos 38113c3a7b76Schristos (1) The quantities given here are approximate, and may vary due to 381230da1778Schristoshost architecture, compiler configuration, or due to future enhancements 381330da1778Schristosto flex. 38143c3a7b76Schristos 38153c3a7b76Schristos 38163c3a7b76SchristosFile: flex.info, Node: Overriding The Default Memory Management, Next: A Note About yytext And Memory, Prev: The Default Memory Management, Up: Memory Management 38173c3a7b76Schristos 38183c3a7b76Schristos21.2 Overriding The Default Memory Management 38193c3a7b76Schristos============================================= 38203c3a7b76Schristos 382130da1778SchristosFlex calls the functions 'yyalloc', 'yyrealloc', and 'yyfree' when it 38223c3a7b76Schristosneeds to allocate or free memory. By default, these functions are 382330da1778Schristoswrappers around the standard C functions, 'malloc', 'realloc', and 382430da1778Schristos'free', respectively. You can override the default implementations by 38253c3a7b76Schristostelling flex that you will provide your own implementations. 38263c3a7b76Schristos 38273c3a7b76Schristos To override the default implementations, you must do two things: 38283c3a7b76Schristos 38293c3a7b76Schristos 1. Suppress the default implementations by specifying one or more of 38303c3a7b76Schristos the following options: 38313c3a7b76Schristos 383230da1778Schristos * '%option noyyalloc' 383330da1778Schristos * '%option noyyrealloc' 383430da1778Schristos * '%option noyyfree'. 38353c3a7b76Schristos 38363c3a7b76Schristos 2. Provide your own implementation of the following functions: (1) 38373c3a7b76Schristos 38383c3a7b76Schristos // For a non-reentrant scanner 38393c3a7b76Schristos void * yyalloc (size_t bytes); 38403c3a7b76Schristos void * yyrealloc (void * ptr, size_t bytes); 38413c3a7b76Schristos void yyfree (void * ptr); 38423c3a7b76Schristos 38433c3a7b76Schristos // For a reentrant scanner 38443c3a7b76Schristos void * yyalloc (size_t bytes, void * yyscanner); 38453c3a7b76Schristos void * yyrealloc (void * ptr, size_t bytes, void * yyscanner); 38463c3a7b76Schristos void yyfree (void * ptr, void * yyscanner); 38473c3a7b76Schristos 384830da1778Schristos In the following example, we will override all three memory routines. 384930da1778SchristosWe assume that there is a custom allocator with garbage collection. In 385030da1778Schristosorder to make this example interesting, we will use a reentrant scanner, 385130da1778Schristospassing a pointer to the custom allocator through 'yyextra'. 38523c3a7b76Schristos 38533c3a7b76Schristos %{ 38543c3a7b76Schristos #include "some_allocator.h" 38553c3a7b76Schristos %} 38563c3a7b76Schristos 38573c3a7b76Schristos /* Suppress the default implementations. */ 38583c3a7b76Schristos %option noyyalloc noyyrealloc noyyfree 38593c3a7b76Schristos %option reentrant 38603c3a7b76Schristos 38613c3a7b76Schristos /* Initialize the allocator. */ 386230da1778Schristos %{ 38633c3a7b76Schristos #define YY_EXTRA_TYPE struct allocator* 38643c3a7b76Schristos #define YY_USER_INIT yyextra = allocator_create(); 386530da1778Schristos %} 38663c3a7b76Schristos 38673c3a7b76Schristos %% 38683c3a7b76Schristos .|\n ; 38693c3a7b76Schristos %% 38703c3a7b76Schristos 38713c3a7b76Schristos /* Provide our own implementations. */ 38723c3a7b76Schristos void * yyalloc (size_t bytes, void* yyscanner) { 38733c3a7b76Schristos return allocator_alloc (yyextra, bytes); 38743c3a7b76Schristos } 38753c3a7b76Schristos 38763c3a7b76Schristos void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) { 38773c3a7b76Schristos return allocator_realloc (yyextra, bytes); 38783c3a7b76Schristos } 38793c3a7b76Schristos 38803c3a7b76Schristos void yyfree (void * ptr, void * yyscanner) { 38813c3a7b76Schristos /* Do nothing -- we leave it to the garbage collector. */ 38823c3a7b76Schristos } 38833c3a7b76Schristos 388430da1778Schristos 38853c3a7b76Schristos ---------- Footnotes ---------- 38863c3a7b76Schristos 38873c3a7b76Schristos (1) It is not necessary to override all (or any) of the memory 388830da1778Schristosmanagement routines. You may, for example, override 'yyrealloc', but 388930da1778Schristosnot 'yyfree' or 'yyalloc'. 38903c3a7b76Schristos 38913c3a7b76Schristos 38923c3a7b76SchristosFile: flex.info, Node: A Note About yytext And Memory, Prev: Overriding The Default Memory Management, Up: Memory Management 38933c3a7b76Schristos 38943c3a7b76Schristos21.3 A Note About yytext And Memory 38953c3a7b76Schristos=================================== 38963c3a7b76Schristos 389730da1778SchristosWhen flex finds a match, 'yytext' points to the first character of the 38983c3a7b76Schristosmatch in the input buffer. The string itself is part of the input 38993c3a7b76Schristosbuffer, and is _NOT_ allocated separately. The value of yytext will be 39003c3a7b76Schristosoverwritten the next time yylex() is called. In short, the value of 39013c3a7b76Schristosyytext is only valid from within the matched rule's action. 39023c3a7b76Schristos 39033c3a7b76Schristos Often, you want the value of yytext to persist for later processing, 39043c3a7b76Schristosi.e., by a parser with non-zero lookahead. In order to preserve yytext, 39053c3a7b76Schristosyou will have to copy it with strdup() or a similar function. But this 39063c3a7b76Schristosintroduces some headache because your parser is now responsible for 39073c3a7b76Schristosfreeing the copy of yytext. If you use a yacc or bison parser, 39083c3a7b76Schristos(commonly used with flex), you will discover that the error recovery 39093c3a7b76Schristosmechanisms can cause memory to be leaked. 39103c3a7b76Schristos 39113c3a7b76Schristos To prevent memory leaks from strdup'd yytext, you will have to track 39123c3a7b76Schristosthe memory somehow. Our experience has shown that a garbage collection 391330da1778Schristosmechanism or a pooled memory mechanism will save you a lot of grief when 391430da1778Schristoswriting parsers. 39153c3a7b76Schristos 39163c3a7b76Schristos 39173c3a7b76SchristosFile: flex.info, Node: Serialized Tables, Next: Diagnostics, Prev: Memory Management, Up: Top 39183c3a7b76Schristos 39193c3a7b76Schristos22 Serialized Tables 39203c3a7b76Schristos******************** 39213c3a7b76Schristos 392230da1778SchristosA 'flex' scanner has the ability to save the DFA tables to a file, and 392330da1778Schristosload them at runtime when needed. The motivation for this feature is to 392430da1778Schristosreduce the runtime memory footprint. Traditionally, these tables have 392530da1778Schristosbeen compiled into the scanner as C arrays, and are sometimes quite 392630da1778Schristoslarge. Since the tables are compiled into the scanner, the memory used 392730da1778Schristosby the tables can never be freed. This is a waste of memory, especially 392830da1778Schristosif an application uses several scanners, but none of them at the same 392930da1778Schristostime. 39303c3a7b76Schristos 39313c3a7b76Schristos The serialization feature allows the tables to be loaded at runtime, 39323c3a7b76Schristosbefore scanning begins. The tables may be discarded when scanning is 39333c3a7b76Schristosfinished. 39343c3a7b76Schristos 39353c3a7b76Schristos* Menu: 39363c3a7b76Schristos 39373c3a7b76Schristos* Creating Serialized Tables:: 39383c3a7b76Schristos* Loading and Unloading Serialized Tables:: 39393c3a7b76Schristos* Tables File Format:: 39403c3a7b76Schristos 39413c3a7b76Schristos 39423c3a7b76SchristosFile: flex.info, Node: Creating Serialized Tables, Next: Loading and Unloading Serialized Tables, Prev: Serialized Tables, Up: Serialized Tables 39433c3a7b76Schristos 39443c3a7b76Schristos22.1 Creating Serialized Tables 39453c3a7b76Schristos=============================== 39463c3a7b76Schristos 39473c3a7b76SchristosYou may create a scanner with serialized tables by specifying: 39483c3a7b76Schristos 39493c3a7b76Schristos %option tables-file=FILE 39503c3a7b76Schristos or 39513c3a7b76Schristos --tables-file=FILE 39523c3a7b76Schristos 39533c3a7b76Schristos These options instruct flex to save the DFA tables to the file FILE. 39543c3a7b76SchristosThe tables will _not_ be embedded in the generated scanner. The scanner 39553c3a7b76Schristoswill not function on its own. The scanner will be dependent upon the 39563c3a7b76Schristosserialized tables. You must load the tables from this file at runtime 39573c3a7b76Schristosbefore you can scan anything. 39583c3a7b76Schristos 395930da1778Schristos If you do not specify a filename to '--tables-file', the tables will 396030da1778Schristosbe saved to 'lex.yy.tables', where 'yy' is the appropriate prefix. 39613c3a7b76Schristos 39623c3a7b76Schristos If your project uses several different scanners, you can concatenate 39633c3a7b76Schristosthe serialized tables into one file, and flex will find the correct set 39643c3a7b76Schristosof tables, using the scanner prefix as part of the lookup key. An 39653c3a7b76Schristosexample follows: 39663c3a7b76Schristos 39673c3a7b76Schristos $ flex --tables-file --prefix=cpp cpp.l 39683c3a7b76Schristos $ flex --tables-file --prefix=c c.l 39693c3a7b76Schristos $ cat lex.cpp.tables lex.c.tables > all.tables 39703c3a7b76Schristos 397130da1778Schristos The above example created two scanners, 'cpp', and 'c'. Since we did 397230da1778Schristosnot specify a filename, the tables were serialized to 'lex.c.tables' and 397330da1778Schristos'lex.cpp.tables', respectively. Then, we concatenated the two files 397430da1778Schristostogether into 'all.tables', which we will distribute with our project. 39753c3a7b76SchristosAt runtime, we will open the file and tell flex to load the tables from 39763c3a7b76Schristosit. Flex will find the correct tables automatically. (See next 39773c3a7b76Schristossection). 39783c3a7b76Schristos 39793c3a7b76Schristos 39803c3a7b76SchristosFile: flex.info, Node: Loading and Unloading Serialized Tables, Next: Tables File Format, Prev: Creating Serialized Tables, Up: Serialized Tables 39813c3a7b76Schristos 39823c3a7b76Schristos22.2 Loading and Unloading Serialized Tables 39833c3a7b76Schristos============================================ 39843c3a7b76Schristos 398530da1778SchristosIf you've built your scanner with '%option tables-file', then you must 39863c3a7b76Schristosload the scanner tables at runtime. This can be accomplished with the 39873c3a7b76Schristosfollowing function: 39883c3a7b76Schristos 39893c3a7b76Schristos -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER]) 39903c3a7b76Schristos Locates scanner tables in the stream pointed to by FP and loads 399130da1778Schristos them. Memory for the tables is allocated via 'yyalloc'. You must 399230da1778Schristos call this function before the first call to 'yylex'. The argument 39933c3a7b76Schristos SCANNER only appears in the reentrant scanner. This function 399430da1778Schristos returns '0' (zero) on success, or non-zero on error. 39953c3a7b76Schristos 39963c3a7b76Schristos The loaded tables are *not* automatically destroyed (unloaded) when 399730da1778Schristosyou call 'yylex_destroy'. The reason is that you may create several 39983c3a7b76Schristosscanners of the same type (in a reentrant scanner), each of which needs 399930da1778Schristosaccess to these tables. To avoid a nasty memory leak, you must call the 400030da1778Schristosfollowing function: 40013c3a7b76Schristos 40023c3a7b76Schristos -- Function: int yytables_destroy ([yyscan_t SCANNER]) 40033c3a7b76Schristos Unloads the scanner tables. The tables must be loaded again before 40043c3a7b76Schristos you can scan any more data. The argument SCANNER only appears in 400530da1778Schristos the reentrant scanner. This function returns '0' (zero) on 40063c3a7b76Schristos success, or non-zero on error. 40073c3a7b76Schristos 400830da1778Schristos *The functions 'yytables_fload' and 'yytables_destroy' are not 40093c3a7b76Schristosthread-safe.* You must ensure that these functions are called exactly 40103c3a7b76Schristosonce (for each scanner type) in a threaded program, before any thread 401130da1778Schristoscalls 'yylex'. After the tables are loaded, they are never written to, 40123c3a7b76Schristosand no thread protection is required thereafter - until you destroy 40133c3a7b76Schristosthem. 40143c3a7b76Schristos 40153c3a7b76Schristos 40163c3a7b76SchristosFile: flex.info, Node: Tables File Format, Prev: Loading and Unloading Serialized Tables, Up: Serialized Tables 40173c3a7b76Schristos 40183c3a7b76Schristos22.3 Tables File Format 40193c3a7b76Schristos======================= 40203c3a7b76Schristos 402130da1778SchristosThis section defines the file format of serialized 'flex' tables. 40223c3a7b76Schristos 40233c3a7b76Schristos The tables format allows for one or more sets of tables to be 40243c3a7b76Schristosspecified, where each set corresponds to a given scanner. Scanners are 40253c3a7b76Schristosindexed by name, as described below. The file format is as follows: 40263c3a7b76Schristos 40273c3a7b76Schristos TABLE SET 1 40283c3a7b76Schristos +-------------------------------+ 40293c3a7b76Schristos Header | uint32 th_magic; | 40303c3a7b76Schristos | uint32 th_hsize; | 40313c3a7b76Schristos | uint32 th_ssize; | 40323c3a7b76Schristos | uint16 th_flags; | 40333c3a7b76Schristos | char th_version[]; | 40343c3a7b76Schristos | char th_name[]; | 40353c3a7b76Schristos | uint8 th_pad64[]; | 40363c3a7b76Schristos +-------------------------------+ 40373c3a7b76Schristos Table 1 | uint16 td_id; | 40383c3a7b76Schristos | uint16 td_flags; | 40393c3a7b76Schristos | uint32 td_hilen; | 4040dded093eSchristos | uint32 td_lolen; | 40413c3a7b76Schristos | void td_data[]; | 40423c3a7b76Schristos | uint8 td_pad64[]; | 40433c3a7b76Schristos +-------------------------------+ 40443c3a7b76Schristos Table 2 | | 40453c3a7b76Schristos . . . 40463c3a7b76Schristos . . . 40473c3a7b76Schristos . . . 40483c3a7b76Schristos . . . 40493c3a7b76Schristos Table n | | 40503c3a7b76Schristos +-------------------------------+ 40513c3a7b76Schristos TABLE SET 2 40523c3a7b76Schristos . 40533c3a7b76Schristos . 40543c3a7b76Schristos . 40553c3a7b76Schristos TABLE SET N 40563c3a7b76Schristos 40573c3a7b76Schristos The above diagram shows that a complete set of tables consists of a 40583c3a7b76Schristosheader followed by multiple individual tables. Furthermore, multiple 40593c3a7b76Schristoscomplete sets may be present in the same file, each set with its own 406030da1778Schristosheader and tables. The sets are contiguous in the file. The only way 406130da1778Schristosto know if another set follows is to check the next four bytes for the 40623c3a7b76Schristosmagic number (or check for EOF). The header and tables sections are 40633c3a7b76Schristospadded to 64-bit boundaries. Below we describe each field in detail. 406430da1778SchristosThis format does not specify how the scanner will expand the given data, 406530da1778Schristosi.e., data may be serialized as int8, but expanded to an int32 array at 406630da1778Schristosruntime. This is to reduce the size of the serialized data where 406730da1778Schristospossible. Remember, _all integer values are in network byte order_. 40683c3a7b76Schristos 40693c3a7b76SchristosFields of a table header: 40703c3a7b76Schristos 407130da1778Schristos'th_magic' 40723c3a7b76Schristos Magic number, always 0xF13C57B1. 40733c3a7b76Schristos 407430da1778Schristos'th_hsize' 407530da1778Schristos Size of this entire header, in bytes, including all fields plus any 407630da1778Schristos padding. 40773c3a7b76Schristos 407830da1778Schristos'th_ssize' 40793c3a7b76Schristos Size of this entire set, in bytes, including the header, all 40803c3a7b76Schristos tables, plus any padding. 40813c3a7b76Schristos 408230da1778Schristos'th_flags' 40833c3a7b76Schristos Bit flags for this table set. Currently unused. 40843c3a7b76Schristos 408530da1778Schristos'th_version[]' 408630da1778Schristos Flex version in NULL-terminated string format. e.g., '2.5.13a'. 40873c3a7b76Schristos This is the version of flex that was used to create the serialized 40883c3a7b76Schristos tables. 40893c3a7b76Schristos 409030da1778Schristos'th_name[]' 409130da1778Schristos Contains the name of this table set. The default is 'yytables', 409230da1778Schristos and is prefixed accordingly, e.g., 'footables'. Must be 40933c3a7b76Schristos NULL-terminated. 40943c3a7b76Schristos 409530da1778Schristos'th_pad64[]' 40963c3a7b76Schristos Zero or more NULL bytes, padding the entire header to the next 40973c3a7b76Schristos 64-bit boundary as calculated from the beginning of the header. 40983c3a7b76Schristos 40993c3a7b76SchristosFields of a table: 41003c3a7b76Schristos 410130da1778Schristos'td_id' 41023c3a7b76Schristos Specifies the table identifier. Possible values are: 410330da1778Schristos 'YYTD_ID_ACCEPT (0x01)' 410430da1778Schristos 'yy_accept' 410530da1778Schristos 'YYTD_ID_BASE (0x02)' 410630da1778Schristos 'yy_base' 410730da1778Schristos 'YYTD_ID_CHK (0x03)' 410830da1778Schristos 'yy_chk' 410930da1778Schristos 'YYTD_ID_DEF (0x04)' 411030da1778Schristos 'yy_def' 411130da1778Schristos 'YYTD_ID_EC (0x05)' 411230da1778Schristos 'yy_ec ' 411330da1778Schristos 'YYTD_ID_META (0x06)' 411430da1778Schristos 'yy_meta' 411530da1778Schristos 'YYTD_ID_NUL_TRANS (0x07)' 411630da1778Schristos 'yy_NUL_trans' 411730da1778Schristos 'YYTD_ID_NXT (0x08)' 411830da1778Schristos 'yy_nxt'. This array may be two dimensional. See the 411930da1778Schristos 'td_hilen' field below. 412030da1778Schristos 'YYTD_ID_RULE_CAN_MATCH_EOL (0x09)' 412130da1778Schristos 'yy_rule_can_match_eol' 412230da1778Schristos 'YYTD_ID_START_STATE_LIST (0x0A)' 412330da1778Schristos 'yy_start_state_list'. This array is handled specially 41243c3a7b76Schristos because it is an array of pointers to structs. See the 412530da1778Schristos 'td_flags' field below. 412630da1778Schristos 'YYTD_ID_TRANSITION (0x0B)' 412730da1778Schristos 'yy_transition'. This array is handled specially because it 412830da1778Schristos is an array of structs. See the 'td_lolen' field below. 412930da1778Schristos 'YYTD_ID_ACCLIST (0x0C)' 413030da1778Schristos 'yy_acclist' 41313c3a7b76Schristos 413230da1778Schristos'td_flags' 413330da1778Schristos Bit flags describing how to interpret the data in 'td_data'. The 41343c3a7b76Schristos data arrays are one-dimensional by default, but may be two 413530da1778Schristos dimensional as specified in the 'td_hilen' field. 41363c3a7b76Schristos 413730da1778Schristos 'YYTD_DATA8 (0x01)' 41383c3a7b76Schristos The data is serialized as an array of type int8. 413930da1778Schristos 'YYTD_DATA16 (0x02)' 41403c3a7b76Schristos The data is serialized as an array of type int16. 414130da1778Schristos 'YYTD_DATA32 (0x04)' 41423c3a7b76Schristos The data is serialized as an array of type int32. 414330da1778Schristos 'YYTD_PTRANS (0x08)' 41443c3a7b76Schristos The data is a list of indexes of entries in the expanded 414530da1778Schristos 'yy_transition' array. Each index should be expanded to a 414630da1778Schristos pointer to the corresponding entry in the 'yy_transition' 414730da1778Schristos array. We count on the fact that the 'yy_transition' array 41483c3a7b76Schristos has already been seen. 414930da1778Schristos 'YYTD_STRUCT (0x10)' 41503c3a7b76Schristos The data is a list of yy_trans_info structs, each of which 41513c3a7b76Schristos consists of two integers. There is no padding between struct 41523c3a7b76Schristos elements or between structs. The type of each member is 415330da1778Schristos determined by the 'YYTD_DATA*' bits. 41543c3a7b76Schristos 415530da1778Schristos'td_hilen' 415630da1778Schristos If 'td_hilen' is non-zero, then the data is a two-dimensional 415730da1778Schristos array. Otherwise, the data is a one-dimensional array. 'td_hilen' 41583c3a7b76Schristos contains the number of elements in the higher dimensional array, 415930da1778Schristos and 'td_lolen' contains the number of elements in the lowest 41603c3a7b76Schristos dimension. 41613c3a7b76Schristos 416230da1778Schristos Conceptually, 'td_data' is either 'sometype td_data[td_lolen]', or 416330da1778Schristos 'sometype td_data[td_hilen][td_lolen]', where 'sometype' is 416430da1778Schristos specified by the 'td_flags' field. It is possible for both 416530da1778Schristos 'td_lolen' and 'td_hilen' to be zero, in which case 'td_data' is a 41663c3a7b76Schristos zero length array, and no data is loaded, i.e., this table is 41673c3a7b76Schristos simply skipped. Flex does not currently generate tables of zero 41683c3a7b76Schristos length. 41693c3a7b76Schristos 417030da1778Schristos'td_lolen' 4171dded093eSchristos Specifies the number of elements in the lowest dimension array. If 4172dded093eSchristos this is a one-dimensional array, then it is simply the number of 4173dded093eSchristos elements in this array. The element size is determined by the 417430da1778Schristos 'td_flags' field. 4175dded093eSchristos 417630da1778Schristos'td_data[]' 41773c3a7b76Schristos The table data. This array may be a one- or two-dimensional array, 417830da1778Schristos of type 'int8', 'int16', 'int32', 'struct yy_trans_info', or 417930da1778Schristos 'struct yy_trans_info*', depending upon the values in the 418030da1778Schristos 'td_flags', 'td_hilen', and 'td_lolen' fields. 41813c3a7b76Schristos 418230da1778Schristos'td_pad64[]' 41833c3a7b76Schristos Zero or more NULL bytes, padding the entire table to the next 41843c3a7b76Schristos 64-bit boundary as calculated from the beginning of this table. 41853c3a7b76Schristos 41863c3a7b76Schristos 41873c3a7b76SchristosFile: flex.info, Node: Diagnostics, Next: Limitations, Prev: Serialized Tables, Up: Top 41883c3a7b76Schristos 41893c3a7b76Schristos23 Diagnostics 41903c3a7b76Schristos************** 41913c3a7b76Schristos 419230da1778SchristosThe following is a list of 'flex' diagnostic messages: 41933c3a7b76Schristos 419430da1778Schristos * 'warning, rule cannot be matched' indicates that the given rule 41953c3a7b76Schristos cannot be matched because it follows other rules that will always 419630da1778Schristos match the same text as it. For example, in the following 'foo' 41973c3a7b76Schristos cannot be matched because it comes after an identifier "catch-all" 41983c3a7b76Schristos rule: 41993c3a7b76Schristos 42003c3a7b76Schristos [a-z]+ got_identifier(); 42013c3a7b76Schristos foo got_foo(); 42023c3a7b76Schristos 420330da1778Schristos Using 'REJECT' in a scanner suppresses this warning. 42043c3a7b76Schristos 420530da1778Schristos * 'warning, -s option given but default rule can be matched' means 42063c3a7b76Schristos that it is possible (perhaps only in a particular start condition) 42073c3a7b76Schristos that the default rule (match any single character) is the only one 420830da1778Schristos that will match a particular input. Since '-s' was given, 42093c3a7b76Schristos presumably this is not intended. 42103c3a7b76Schristos 421130da1778Schristos * 'reject_used_but_not_detected undefined' or 421230da1778Schristos 'yymore_used_but_not_detected undefined'. These errors can occur 421330da1778Schristos at compile time. They indicate that the scanner uses 'REJECT' or 421430da1778Schristos 'yymore()' but that 'flex' failed to notice the fact, meaning that 421530da1778Schristos 'flex' scanned the first two sections looking for occurrences of 42163c3a7b76Schristos these actions and failed to find any, but somehow you snuck some in 421730da1778Schristos (via a #include file, for example). Use '%option reject' or 421830da1778Schristos '%option yymore' to indicate to 'flex' that you really do use these 421930da1778Schristos features. 42203c3a7b76Schristos 422130da1778Schristos * 'flex scanner jammed'. a scanner compiled with '-s' has 42223c3a7b76Schristos encountered an input string which wasn't matched by any of its 42233c3a7b76Schristos rules. This error can also occur due to internal problems. 42243c3a7b76Schristos 422530da1778Schristos * 'token too large, exceeds YYLMAX'. your scanner uses '%array' and 422630da1778Schristos one of its rules matched a string longer than the 'YYLMAX' constant 422730da1778Schristos (8K bytes by default). You can increase the value by #define'ing 422830da1778Schristos 'YYLMAX' in the definitions section of your 'flex' input. 42293c3a7b76Schristos 423030da1778Schristos * 'scanner requires -8 flag to use the character 'x''. Your scanner 423130da1778Schristos specification includes recognizing the 8-bit character ''x'' and 42323c3a7b76Schristos you did not specify the -8 flag, and your scanner defaulted to 423330da1778Schristos 7-bit because you used the '-Cf' or '-CF' table compression 423430da1778Schristos options. See the discussion of the '-7' flag, *note Scanner 42353c3a7b76Schristos Options::, for details. 42363c3a7b76Schristos 423730da1778Schristos * 'flex scanner push-back overflow'. you used 'unput()' to push back 42383c3a7b76Schristos so much text that the scanner's buffer could not hold both the 423930da1778Schristos pushed-back text and the current token in 'yytext'. Ideally the 42403c3a7b76Schristos scanner should dynamically resize the buffer in this case, but at 42413c3a7b76Schristos present it does not. 42423c3a7b76Schristos 424330da1778Schristos * 'input buffer overflow, can't enlarge buffer because scanner uses 42443c3a7b76Schristos REJECT'. the scanner was working on matching an extremely large 42453c3a7b76Schristos token and needed to expand the input buffer. This doesn't work 424630da1778Schristos with scanners that use 'REJECT'. 42473c3a7b76Schristos 424830da1778Schristos * 'fatal flex scanner internal error--end of buffer missed'. This 424930da1778Schristos can occur in a scanner which is reentered after a long-jump has 425030da1778Schristos jumped out (or over) the scanner's activation frame. Before 425130da1778Schristos reentering the scanner, use: 42523c3a7b76Schristos yyrestart( yyin ); 42533c3a7b76Schristos or, as noted above, switch to using the C++ scanner class. 42543c3a7b76Schristos 425530da1778Schristos * 'too many start conditions in <> construct!' you listed more start 42563c3a7b76Schristos conditions in a <> construct than exist (so you must have listed at 42573c3a7b76Schristos least one of them twice). 42583c3a7b76Schristos 42593c3a7b76Schristos 42603c3a7b76SchristosFile: flex.info, Node: Limitations, Next: Bibliography, Prev: Diagnostics, Up: Top 42613c3a7b76Schristos 42623c3a7b76Schristos24 Limitations 42633c3a7b76Schristos************** 42643c3a7b76Schristos 42653c3a7b76SchristosSome trailing context patterns cannot be properly matched and generate 426630da1778Schristoswarning messages ('dangerous trailing context'). These are patterns 42673c3a7b76Schristoswhere the ending of the first part of the rule matches the beginning of 426830da1778Schristosthe second part, such as 'zx*/xy*', where the 'x*' matches the 'x' at 42693c3a7b76Schristosthe beginning of the trailing context. (Note that the POSIX draft 42703c3a7b76Schristosstates that the text matched by such patterns is undefined.) For some 42713c3a7b76Schristostrailing context rules, parts which are actually fixed-length are not 42723c3a7b76Schristosrecognized as such, leading to the abovementioned performance loss. In 427330da1778Schristosparticular, parts using '|' or '{n}' (such as 'foo{3}') are always 427430da1778Schristosconsidered variable-length. Combining trailing context with the special 427530da1778Schristos'|' action can result in _fixed_ trailing context being turned into the 427630da1778Schristosmore expensive _variable_ trailing context. For example, in the 427730da1778Schristosfollowing: 42783c3a7b76Schristos 42793c3a7b76Schristos %% 42803c3a7b76Schristos abc | 42813c3a7b76Schristos xyz/def 42823c3a7b76Schristos 428330da1778Schristos Use of 'unput()' invalidates yytext and yyleng, unless the '%array' 428430da1778Schristosdirective or the '-l' option has been used. Pattern-matching of 'NUL's 42853c3a7b76Schristosis substantially slower than matching other characters. Dynamic 42863c3a7b76Schristosresizing of the input buffer is slow, as it entails rescanning all the 42873c3a7b76Schristostext matched so far by the current (generally huge) token. Due to both 42883c3a7b76Schristosbuffering of input and read-ahead, you cannot intermix calls to 428930da1778Schristos'<stdio.h>' routines, such as, getchar(), with 'flex' rules and expect 429030da1778Schristosit to work. Call 'input()' instead. The total table entries listed by 429130da1778Schristosthe '-v' flag excludes the number of table entries needed to determine 42923c3a7b76Schristoswhat rule has been matched. The number of entries is equal to the 429330da1778Schristosnumber of DFA states if the scanner does not use 'REJECT', and somewhat 429430da1778Schristosgreater than the number of states if it does. 'REJECT' cannot be used 429530da1778Schristoswith the '-f' or '-F' options. 42963c3a7b76Schristos 429730da1778Schristos The 'flex' internal algorithms need documentation. 42983c3a7b76Schristos 42993c3a7b76Schristos 43003c3a7b76SchristosFile: flex.info, Node: Bibliography, Next: FAQ, Prev: Limitations, Up: Top 43013c3a7b76Schristos 43023c3a7b76Schristos25 Additional Reading 43033c3a7b76Schristos********************* 43043c3a7b76Schristos 43053c3a7b76SchristosYou may wish to read more about the following programs: 43063c3a7b76Schristos * lex 43073c3a7b76Schristos * yacc 43083c3a7b76Schristos * sed 43093c3a7b76Schristos * awk 43103c3a7b76Schristos 43113c3a7b76Schristos The following books may contain material of interest: 43123c3a7b76Schristos 43133c3a7b76Schristos John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and 43143c3a7b76SchristosAssociates. Be sure to get the 2nd edition. 43153c3a7b76Schristos 43163c3a7b76Schristos M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_ 43173c3a7b76Schristos 43183c3a7b76Schristos Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles, 43193c3a7b76SchristosTechniques and Tools_, Addison-Wesley (1986). Describes the 432030da1778Schristospattern-matching techniques used by 'flex' (deterministic finite 43213c3a7b76Schristosautomata). 43223c3a7b76Schristos 43233c3a7b76Schristos 43243c3a7b76SchristosFile: flex.info, Node: FAQ, Next: Appendices, Prev: Bibliography, Up: Top 43253c3a7b76Schristos 43263c3a7b76SchristosFAQ 43273c3a7b76Schristos*** 43283c3a7b76Schristos 432930da1778SchristosFrom time to time, the 'flex' maintainer receives certain questions. 43303c3a7b76SchristosRather than repeat answers to well-understood problems, we publish them 43313c3a7b76Schristoshere. 43323c3a7b76Schristos 43333c3a7b76Schristos* Menu: 43343c3a7b76Schristos 43353c3a7b76Schristos* When was flex born?:: 43363c3a7b76Schristos* How do I expand backslash-escape sequences in C-style quoted strings?:: 43373c3a7b76Schristos* Why do flex scanners call fileno if it is not ANSI compatible?:: 43383c3a7b76Schristos* Does flex support recursive pattern definitions?:: 43393c3a7b76Schristos* How do I skip huge chunks of input (tens of megabytes) while using flex?:: 43403c3a7b76Schristos* Flex is not matching my patterns in the same order that I defined them.:: 43413c3a7b76Schristos* My actions are executing out of order or sometimes not at all.:: 43423c3a7b76Schristos* How can I have multiple input sources feed into the same scanner at the same time?:: 43433c3a7b76Schristos* Can I build nested parsers that work with the same input file?:: 43443c3a7b76Schristos* How can I match text only at the end of a file?:: 43453c3a7b76Schristos* How can I make REJECT cascade across start condition boundaries?:: 43463c3a7b76Schristos* Why cant I use fast or full tables with interactive mode?:: 43473c3a7b76Schristos* How much faster is -F or -f than -C?:: 43483c3a7b76Schristos* If I have a simple grammar cant I just parse it with flex?:: 43493c3a7b76Schristos* Why doesn't yyrestart() set the start state back to INITIAL?:: 43503c3a7b76Schristos* How can I match C-style comments?:: 43513c3a7b76Schristos* The period isn't working the way I expected.:: 43523c3a7b76Schristos* Can I get the flex manual in another format?:: 43533c3a7b76Schristos* Does there exist a "faster" NDFA->DFA algorithm?:: 43543c3a7b76Schristos* How does flex compile the DFA so quickly?:: 43553c3a7b76Schristos* How can I use more than 8192 rules?:: 43563c3a7b76Schristos* How do I abandon a file in the middle of a scan and switch to a new file?:: 43573c3a7b76Schristos* How do I execute code only during initialization (only before the first scan)?:: 43583c3a7b76Schristos* How do I execute code at termination?:: 43593c3a7b76Schristos* Where else can I find help?:: 43603c3a7b76Schristos* Can I include comments in the "rules" section of the file?:: 43613c3a7b76Schristos* I get an error about undefined yywrap().:: 43623c3a7b76Schristos* How can I change the matching pattern at run time?:: 43633c3a7b76Schristos* How can I expand macros in the input?:: 43643c3a7b76Schristos* How can I build a two-pass scanner?:: 43653c3a7b76Schristos* How do I match any string not matched in the preceding rules?:: 43663c3a7b76Schristos* I am trying to port code from AT&T lex that uses yysptr and yysbuf.:: 43673c3a7b76Schristos* Is there a way to make flex treat NULL like a regular character?:: 43683c3a7b76Schristos* Whenever flex can not match the input it says "flex scanner jammed".:: 43693c3a7b76Schristos* Why doesn't flex have non-greedy operators like perl does?:: 43703c3a7b76Schristos* Memory leak - 16386 bytes allocated by malloc.:: 43713c3a7b76Schristos* How do I track the byte offset for lseek()?:: 43723c3a7b76Schristos* How do I use my own I/O classes in a C++ scanner?:: 43733c3a7b76Schristos* How do I skip as many chars as possible?:: 43743c3a7b76Schristos* deleteme00:: 43753c3a7b76Schristos* Are certain equivalent patterns faster than others?:: 43763c3a7b76Schristos* Is backing up a big deal?:: 43773c3a7b76Schristos* Can I fake multi-byte character support?:: 43783c3a7b76Schristos* deleteme01:: 43793c3a7b76Schristos* Can you discuss some flex internals?:: 43803c3a7b76Schristos* unput() messes up yy_at_bol:: 43813c3a7b76Schristos* The | operator is not doing what I want:: 43823c3a7b76Schristos* Why can't flex understand this variable trailing context pattern?:: 43833c3a7b76Schristos* The ^ operator isn't working:: 43843c3a7b76Schristos* Trailing context is getting confused with trailing optional patterns:: 43853c3a7b76Schristos* Is flex GNU or not?:: 43863c3a7b76Schristos* ERASEME53:: 43873c3a7b76Schristos* I need to scan if-then-else blocks and while loops:: 43883c3a7b76Schristos* ERASEME55:: 43893c3a7b76Schristos* ERASEME56:: 43903c3a7b76Schristos* ERASEME57:: 43913c3a7b76Schristos* Is there a repository for flex scanners?:: 43923c3a7b76Schristos* How can I conditionally compile or preprocess my flex input file?:: 43933c3a7b76Schristos* Where can I find grammars for lex and yacc?:: 43943c3a7b76Schristos* I get an end-of-buffer message for each character scanned.:: 43953c3a7b76Schristos* unnamed-faq-62:: 43963c3a7b76Schristos* unnamed-faq-63:: 43973c3a7b76Schristos* unnamed-faq-64:: 43983c3a7b76Schristos* unnamed-faq-65:: 43993c3a7b76Schristos* unnamed-faq-66:: 44003c3a7b76Schristos* unnamed-faq-67:: 44013c3a7b76Schristos* unnamed-faq-68:: 44023c3a7b76Schristos* unnamed-faq-69:: 44033c3a7b76Schristos* unnamed-faq-70:: 44043c3a7b76Schristos* unnamed-faq-71:: 44053c3a7b76Schristos* unnamed-faq-72:: 44063c3a7b76Schristos* unnamed-faq-73:: 44073c3a7b76Schristos* unnamed-faq-74:: 44083c3a7b76Schristos* unnamed-faq-75:: 44093c3a7b76Schristos* unnamed-faq-76:: 44103c3a7b76Schristos* unnamed-faq-77:: 44113c3a7b76Schristos* unnamed-faq-78:: 44123c3a7b76Schristos* unnamed-faq-79:: 44133c3a7b76Schristos* unnamed-faq-80:: 44143c3a7b76Schristos* unnamed-faq-81:: 44153c3a7b76Schristos* unnamed-faq-82:: 44163c3a7b76Schristos* unnamed-faq-83:: 44173c3a7b76Schristos* unnamed-faq-84:: 44183c3a7b76Schristos* unnamed-faq-85:: 44193c3a7b76Schristos* unnamed-faq-86:: 44203c3a7b76Schristos* unnamed-faq-87:: 44213c3a7b76Schristos* unnamed-faq-88:: 44223c3a7b76Schristos* unnamed-faq-90:: 44233c3a7b76Schristos* unnamed-faq-91:: 44243c3a7b76Schristos* unnamed-faq-92:: 44253c3a7b76Schristos* unnamed-faq-93:: 44263c3a7b76Schristos* unnamed-faq-94:: 44273c3a7b76Schristos* unnamed-faq-95:: 44283c3a7b76Schristos* unnamed-faq-96:: 44293c3a7b76Schristos* unnamed-faq-97:: 44303c3a7b76Schristos* unnamed-faq-98:: 44313c3a7b76Schristos* unnamed-faq-99:: 44323c3a7b76Schristos* unnamed-faq-100:: 44333c3a7b76Schristos* unnamed-faq-101:: 44343c3a7b76Schristos* What is the difference between YYLEX_PARAM and YY_DECL?:: 44353c3a7b76Schristos* Why do I get "conflicting types for yylex" error?:: 44363c3a7b76Schristos* How do I access the values set in a Flex action from within a Bison action?:: 44373c3a7b76Schristos 44383c3a7b76Schristos 44393c3a7b76SchristosFile: flex.info, Node: When was flex born?, Next: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ 44403c3a7b76Schristos 44413c3a7b76SchristosWhen was flex born? 44423c3a7b76Schristos=================== 44433c3a7b76Schristos 444430da1778SchristosVern Paxson took over the 'Software Tools' lex project from Jef 444530da1778SchristosPoskanzer in 1982. At that point it was written in Ratfor. Around 1987 444630da1778Schristosor so, Paxson translated it into C, and a legend was born :-). 44473c3a7b76Schristos 44483c3a7b76Schristos 44493c3a7b76SchristosFile: flex.info, Node: How do I expand backslash-escape sequences in C-style quoted strings?, Next: Why do flex scanners call fileno if it is not ANSI compatible?, Prev: When was flex born?, Up: FAQ 44503c3a7b76Schristos 44513c3a7b76SchristosHow do I expand backslash-escape sequences in C-style quoted strings? 44523c3a7b76Schristos===================================================================== 44533c3a7b76Schristos 44543c3a7b76SchristosA key point when scanning quoted strings is that you cannot (easily) 44553c3a7b76Schristoswrite a single rule that will precisely match the string if you allow 445630da1778Schristosthings like embedded escape sequences and newlines. If you try to match 445730da1778Schristosstrings with a single rule then you'll wind up having to rescan the 445830da1778Schristosstring anyway to find any escape sequences. 44593c3a7b76Schristos 44603c3a7b76Schristos Instead you can use exclusive start conditions and a set of rules, 446130da1778Schristosone for matching non-escaped text, one for matching a single escape, one 446230da1778Schristosfor matching an embedded newline, and one for recognizing the end of the 446330da1778Schristosstring. Each of these rules is then faced with the question of where to 446430da1778Schristosput its intermediary results. The best solution is for the rules to 446530da1778Schristosappend their local value of 'yytext' to the end of a "string literal" 446630da1778Schristosbuffer. A rule like the escape-matcher will append to the buffer the 446730da1778Schristosmeaning of the escape sequence rather than the literal text in 'yytext'. 446830da1778SchristosIn this way, 'yytext' does not need to be modified at all. 44693c3a7b76Schristos 44703c3a7b76Schristos 44713c3a7b76SchristosFile: flex.info, Node: Why do flex scanners call fileno if it is not ANSI compatible?, Next: Does flex support recursive pattern definitions?, Prev: How do I expand backslash-escape sequences in C-style quoted strings?, Up: FAQ 44723c3a7b76Schristos 44733c3a7b76SchristosWhy do flex scanners call fileno if it is not ANSI compatible? 44743c3a7b76Schristos============================================================== 44753c3a7b76Schristos 447630da1778SchristosFlex scanners call 'fileno()' in order to get the file descriptor 447730da1778Schristoscorresponding to 'yyin'. The file descriptor may be passed to 447830da1778Schristos'isatty()' or 'read()', depending upon which '%options' you specified. 447930da1778SchristosIf your system does not have 'fileno()' support, to get rid of the 448030da1778Schristos'read()' call, do not specify '%option read'. To get rid of the 448130da1778Schristos'isatty()' call, you must specify one of '%option always-interactive' or 448230da1778Schristos'%option never-interactive'. 44833c3a7b76Schristos 44843c3a7b76Schristos 44853c3a7b76SchristosFile: flex.info, Node: Does flex support recursive pattern definitions?, Next: How do I skip huge chunks of input (tens of megabytes) while using flex?, Prev: Why do flex scanners call fileno if it is not ANSI compatible?, Up: FAQ 44863c3a7b76Schristos 44873c3a7b76SchristosDoes flex support recursive pattern definitions? 44883c3a7b76Schristos================================================ 44893c3a7b76Schristos 44903c3a7b76Schristose.g., 44913c3a7b76Schristos 44923c3a7b76Schristos %% 44933c3a7b76Schristos block "{"({block}|{statement})*"}" 44943c3a7b76Schristos 44953c3a7b76Schristos No. You cannot have recursive definitions. The pattern-matching 44963c3a7b76Schristospower of regular expressions in general (and therefore flex scanners, 44973c3a7b76Schristostoo) is limited. In particular, regular expressions cannot "balance" 44983c3a7b76Schristosparentheses to an arbitrary degree. For example, it's impossible to 44993c3a7b76Schristoswrite a regular expression that matches all strings containing the same 45003c3a7b76Schristosnumber of '{'s as '}'s. For more powerful pattern matching, you need a 450130da1778Schristosparser, such as 'GNU bison'. 45023c3a7b76Schristos 45033c3a7b76Schristos 45043c3a7b76SchristosFile: flex.info, Node: How do I skip huge chunks of input (tens of megabytes) while using flex?, Next: Flex is not matching my patterns in the same order that I defined them., Prev: Does flex support recursive pattern definitions?, Up: FAQ 45053c3a7b76Schristos 45063c3a7b76SchristosHow do I skip huge chunks of input (tens of megabytes) while using flex? 45073c3a7b76Schristos======================================================================== 45083c3a7b76Schristos 450930da1778SchristosUse 'fseek()' (or 'lseek()') to position yyin, then call 'yyrestart()'. 45103c3a7b76Schristos 45113c3a7b76Schristos 45123c3a7b76SchristosFile: flex.info, Node: Flex is not matching my patterns in the same order that I defined them., Next: My actions are executing out of order or sometimes not at all., Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?, Up: FAQ 45133c3a7b76Schristos 45143c3a7b76SchristosFlex is not matching my patterns in the same order that I defined them. 45153c3a7b76Schristos======================================================================= 45163c3a7b76Schristos 451730da1778Schristos'flex' picks the rule that matches the most text (i.e., the longest 451830da1778Schristospossible input string). This is because 'flex' uses an entirely 45193c3a7b76Schristosdifferent matching technique ("deterministic finite automata") that 45203c3a7b76Schristosactually does all of the matching simultaneously, in parallel. (Seems 45213c3a7b76Schristosimpossible, but it's actually a fairly simple technique once you 45223c3a7b76Schristosunderstand the principles.) 45233c3a7b76Schristos 45243c3a7b76Schristos A side-effect of this parallel matching is that when the input 452530da1778Schristosmatches more than one rule, 'flex' scanners pick the rule that matched 45263c3a7b76Schristosthe _most_ text. This is explained further in the manual, in the 45273c3a7b76Schristossection *Note Matching::. 45283c3a7b76Schristos 452930da1778Schristos If you want 'flex' to choose a shorter match, then you can work 45303c3a7b76Schristosaround this behavior by expanding your short rule to match more text, 45313c3a7b76Schristosthen put back the extra: 45323c3a7b76Schristos 45333c3a7b76Schristos data_.* yyless( 5 ); BEGIN BLOCKIDSTATE; 45343c3a7b76Schristos 45353c3a7b76Schristos Another fix would be to make the second rule active only during the 453630da1778Schristos'<BLOCKIDSTATE>' start condition, and make that start condition 453730da1778Schristosexclusive by declaring it with '%x' instead of '%s'. 45383c3a7b76Schristos 45393c3a7b76Schristos A final fix is to change the input language so that the ambiguity for 454030da1778Schristos'data_' is removed, by adding characters to it that don't match the 454130da1778Schristosidentifier rule, or by removing characters (such as '_') from the 454230da1778Schristosidentifier rule so it no longer matches 'data_'. (Of course, you might 45433c3a7b76Schristosalso not have the option of changing the input language.) 45443c3a7b76Schristos 45453c3a7b76Schristos 45463c3a7b76SchristosFile: flex.info, Node: My actions are executing out of order or sometimes not at all., Next: How can I have multiple input sources feed into the same scanner at the same time?, Prev: Flex is not matching my patterns in the same order that I defined them., Up: FAQ 45473c3a7b76Schristos 45483c3a7b76SchristosMy actions are executing out of order or sometimes not at all. 45493c3a7b76Schristos============================================================== 45503c3a7b76Schristos 455130da1778SchristosMost likely, you have (in error) placed the opening '{' of the action 45523c3a7b76Schristosblock on a different line than the rule, e.g., 45533c3a7b76Schristos 45543c3a7b76Schristos ^(foo|bar) 45553c3a7b76Schristos { <<<--- WRONG! 45563c3a7b76Schristos 45573c3a7b76Schristos } 45583c3a7b76Schristos 455930da1778Schristos 'flex' requires that the opening '{' of an action associated with a 456030da1778Schristosrule begin on the same line as does the rule. You need instead to write 456130da1778Schristosyour rules as follows: 45623c3a7b76Schristos 45633c3a7b76Schristos ^(foo|bar) { // CORRECT! 45643c3a7b76Schristos 45653c3a7b76Schristos } 45663c3a7b76Schristos 45673c3a7b76Schristos 45683c3a7b76SchristosFile: flex.info, Node: How can I have multiple input sources feed into the same scanner at the same time?, Next: Can I build nested parsers that work with the same input file?, Prev: My actions are executing out of order or sometimes not at all., Up: FAQ 45693c3a7b76Schristos 45703c3a7b76SchristosHow can I have multiple input sources feed into the same scanner at the same time? 45713c3a7b76Schristos================================================================================== 45723c3a7b76Schristos 45733c3a7b76SchristosIf ... 457430da1778Schristos * your scanner is free of backtracking (verified using 'flex''s '-b' 45753c3a7b76Schristos flag), 457630da1778Schristos * AND you run your scanner interactively ('-I' option; default unless 457730da1778Schristos using special table compression options), 457830da1778Schristos * AND you feed it one character at a time by redefining 'YY_INPUT' to 457930da1778Schristos do so, 45803c3a7b76Schristos 45813c3a7b76Schristos then every time it matches a token, it will have exhausted its input 45823c3a7b76Schristosbuffer (because the scanner is free of backtracking). This means you 458330da1778Schristoscan safely use 'select()' at the point and only call 'yylex()' for 458430da1778Schristosanother token if 'select()' indicates there's data available. 45853c3a7b76Schristos 458630da1778Schristos That is, move the 'select()' out from the input function to a point 458730da1778Schristoswhere it determines whether 'yylex()' gets called for the next token. 45883c3a7b76Schristos 45893c3a7b76Schristos With this approach, you will still have problems if your input can 459030da1778Schristosarrive piecemeal; 'select()' could inform you that the beginning of a 459130da1778Schristostoken is available, you call 'yylex()' to get it, but it winds up 45923c3a7b76Schristosblocking waiting for the later characters in the token. 45933c3a7b76Schristos 45943c3a7b76Schristos Here's another way: Move your input multiplexing inside of 459530da1778Schristos'YY_INPUT'. That is, whenever 'YY_INPUT' is called, it 'select()''s to 459630da1778Schristossee where input is available. If input is available for the scanner, it 459730da1778Schristosreads and returns the next byte. If input is available from another 45983c3a7b76Schristossource, it calls whatever function is responsible for reading from that 45993c3a7b76Schristossource. (If no input is available, it blocks until some input is 46003c3a7b76Schristosavailable.) I've used this technique in an interpreter I wrote that 460130da1778Schristosboth reads keyboard input using a 'flex' scanner and IPC traffic from 46023c3a7b76Schristossockets, and it works fine. 46033c3a7b76Schristos 46043c3a7b76Schristos 46053c3a7b76SchristosFile: flex.info, Node: Can I build nested parsers that work with the same input file?, Next: How can I match text only at the end of a file?, Prev: How can I have multiple input sources feed into the same scanner at the same time?, Up: FAQ 46063c3a7b76Schristos 46073c3a7b76SchristosCan I build nested parsers that work with the same input file? 46083c3a7b76Schristos============================================================== 46093c3a7b76Schristos 46103c3a7b76SchristosThis is not going to work without some additional effort. The reason is 461130da1778Schristosthat 'flex' block-buffers the input it reads from 'yyin'. This means 461230da1778Schristosthat the "outermost" 'yylex()', when called, will automatically slurp up 461330da1778Schristosthe first 8K of input available on yyin, and subsequent calls to other 461430da1778Schristos'yylex()''s won't see that input. You might be tempted to work around 461530da1778Schristosthis problem by redefining 'YY_INPUT' to only return a small amount of 461630da1778Schristostext, but it turns out that that approach is quite difficult. Instead, 461730da1778Schristosthe best solution is to combine all of your scanners into one large 461830da1778Schristosscanner, using a different exclusive start condition for each. 46193c3a7b76Schristos 46203c3a7b76Schristos 46213c3a7b76SchristosFile: flex.info, Node: How can I match text only at the end of a file?, Next: How can I make REJECT cascade across start condition boundaries?, Prev: Can I build nested parsers that work with the same input file?, Up: FAQ 46223c3a7b76Schristos 46233c3a7b76SchristosHow can I match text only at the end of a file? 46243c3a7b76Schristos=============================================== 46253c3a7b76Schristos 46263c3a7b76SchristosThere is no way to write a rule which is "match this text, but only if 46273c3a7b76Schristosit comes at the end of the file". You can fake it, though, if you 46283c3a7b76Schristoshappen to have a character lying around that you don't allow in your 462930da1778Schristosinput. Then you redefine 'YY_INPUT' to call your own routine which, if 463030da1778Schristosit sees an 'EOF', returns the magic character first (and remembers to 463130da1778Schristosreturn a real 'EOF' next time it's called). Then you could write: 46323c3a7b76Schristos 46333c3a7b76Schristos <COMMENT>(.|\n)*{EOF_CHAR} /* saw comment at EOF */ 46343c3a7b76Schristos 46353c3a7b76Schristos 46363c3a7b76SchristosFile: flex.info, Node: How can I make REJECT cascade across start condition boundaries?, Next: Why cant I use fast or full tables with interactive mode?, Prev: How can I match text only at the end of a file?, Up: FAQ 46373c3a7b76Schristos 46383c3a7b76SchristosHow can I make REJECT cascade across start condition boundaries? 46393c3a7b76Schristos================================================================ 46403c3a7b76Schristos 464130da1778SchristosYou can do this as follows. Suppose you have a start condition 'A', and 464230da1778Schristosafter exhausting all of the possible matches in '<A>', you want to try 464330da1778Schristosmatches in '<INITIAL>'. Then you could use the following: 46443c3a7b76Schristos 46453c3a7b76Schristos %x A 46463c3a7b76Schristos %% 46473c3a7b76Schristos <A>rule_that_is_long ...; REJECT; 46483c3a7b76Schristos <A>rule ...; REJECT; /* shorter rule */ 46493c3a7b76Schristos <A>etc. 46503c3a7b76Schristos ... 46513c3a7b76Schristos <A>.|\n { 46523c3a7b76Schristos /* Shortest and last rule in <A>, so 46533c3a7b76Schristos * cascaded REJECTs will eventually 46543c3a7b76Schristos * wind up matching this rule. We want 46553c3a7b76Schristos * to now switch to the initial state 46563c3a7b76Schristos * and try matching from there instead. 46573c3a7b76Schristos */ 46583c3a7b76Schristos yyless(0); /* put back matched text */ 46593c3a7b76Schristos BEGIN(INITIAL); 46603c3a7b76Schristos } 46613c3a7b76Schristos 46623c3a7b76Schristos 46633c3a7b76SchristosFile: flex.info, Node: Why cant I use fast or full tables with interactive mode?, Next: How much faster is -F or -f than -C?, Prev: How can I make REJECT cascade across start condition boundaries?, Up: FAQ 46643c3a7b76Schristos 46653c3a7b76SchristosWhy can't I use fast or full tables with interactive mode? 46663c3a7b76Schristos========================================================== 46673c3a7b76Schristos 46683c3a7b76SchristosOne of the assumptions flex makes is that interactive applications are 46693c3a7b76Schristosinherently slow (they're waiting on a human after all). It has to do 46703c3a7b76Schristoswith how the scanner detects that it must be finished scanning a token. 46713c3a7b76SchristosFor interactive scanners, after scanning each character the current 46723c3a7b76Schristosstate is looked up in a table (essentially) to see whether there's a 46733c3a7b76Schristoschance of another input character possibly extending the length of the 46743c3a7b76Schristosmatch. If not, the scanner halts. For non-interactive scanners, the 46753c3a7b76Schristosend-of-token test is much simpler, basically a compare with 0, so no 46763c3a7b76Schristosmemory bus cycles. Since the test occurs in the innermost scanning 46773c3a7b76Schristosloop, one would like to make it go as fast as possible. 46783c3a7b76Schristos 467930da1778Schristos Still, it seems reasonable to allow the user to choose to trade off a 468030da1778Schristosbit of performance in this area to gain the corresponding flexibility. 468130da1778SchristosThere might be another reason, though, why fast scanners don't support 468230da1778Schristosthe interactive option. 46833c3a7b76Schristos 46843c3a7b76Schristos 46853c3a7b76SchristosFile: flex.info, Node: How much faster is -F or -f than -C?, Next: If I have a simple grammar cant I just parse it with flex?, Prev: Why cant I use fast or full tables with interactive mode?, Up: FAQ 46863c3a7b76Schristos 46873c3a7b76SchristosHow much faster is -F or -f than -C? 46883c3a7b76Schristos==================================== 46893c3a7b76Schristos 46903c3a7b76SchristosMuch faster (factor of 2-3). 46913c3a7b76Schristos 46923c3a7b76Schristos 46933c3a7b76SchristosFile: flex.info, Node: If I have a simple grammar cant I just parse it with flex?, Next: Why doesn't yyrestart() set the start state back to INITIAL?, Prev: How much faster is -F or -f than -C?, Up: FAQ 46943c3a7b76Schristos 46953c3a7b76SchristosIf I have a simple grammar can't I just parse it with flex? 46963c3a7b76Schristos=========================================================== 46973c3a7b76Schristos 46983c3a7b76SchristosIs your grammar recursive? That's almost always a sign that you're 46993c3a7b76Schristosbetter off using a parser/scanner rather than just trying to use a 47003c3a7b76Schristosscanner alone. 47013c3a7b76Schristos 47023c3a7b76Schristos 47033c3a7b76SchristosFile: flex.info, Node: Why doesn't yyrestart() set the start state back to INITIAL?, Next: How can I match C-style comments?, Prev: If I have a simple grammar cant I just parse it with flex?, Up: FAQ 47043c3a7b76Schristos 47053c3a7b76SchristosWhy doesn't yyrestart() set the start state back to INITIAL? 47063c3a7b76Schristos============================================================ 47073c3a7b76Schristos 47083c3a7b76SchristosThere are two reasons. The first is that there might be programs that 470930da1778Schristosrely on the start state not changing across file changes. The second is 471030da1778Schristosthat beginning with 'flex' version 2.4, use of 'yyrestart()' is no 47113c3a7b76Schristoslonger required, so fixing the problem there doesn't solve the more 47123c3a7b76Schristosgeneral problem. 47133c3a7b76Schristos 47143c3a7b76Schristos 47153c3a7b76SchristosFile: flex.info, Node: How can I match C-style comments?, Next: The period isn't working the way I expected., Prev: Why doesn't yyrestart() set the start state back to INITIAL?, Up: FAQ 47163c3a7b76Schristos 47173c3a7b76SchristosHow can I match C-style comments? 47183c3a7b76Schristos================================= 47193c3a7b76Schristos 47203c3a7b76SchristosYou might be tempted to try something like this: 47213c3a7b76Schristos 47223c3a7b76Schristos "/*".*"*/" // WRONG! 47233c3a7b76Schristos 47243c3a7b76Schristos or, worse, this: 47253c3a7b76Schristos 47263c3a7b76Schristos "/*"(.|\n)"*/" // WRONG! 47273c3a7b76Schristos 47283c3a7b76Schristos The above rules will eat too much input, and blow up on things like: 47293c3a7b76Schristos 47303c3a7b76Schristos /* a comment */ do_my_thing( "oops */" ); 47313c3a7b76Schristos 47323c3a7b76Schristos Here is one way which allows you to track line information: 47333c3a7b76Schristos 47343c3a7b76Schristos <INITIAL>{ 47353c3a7b76Schristos "/*" BEGIN(IN_COMMENT); 47363c3a7b76Schristos } 47373c3a7b76Schristos <IN_COMMENT>{ 47383c3a7b76Schristos "*/" BEGIN(INITIAL); 47393c3a7b76Schristos [^*\n]+ // eat comment in chunks 47403c3a7b76Schristos "*" // eat the lone star 47413c3a7b76Schristos \n yylineno++; 47423c3a7b76Schristos } 47433c3a7b76Schristos 47443c3a7b76Schristos 47453c3a7b76SchristosFile: flex.info, Node: The period isn't working the way I expected., Next: Can I get the flex manual in another format?, Prev: How can I match C-style comments?, Up: FAQ 47463c3a7b76Schristos 47473c3a7b76SchristosThe '.' isn't working the way I expected. 47483c3a7b76Schristos========================================= 47493c3a7b76Schristos 475030da1778SchristosHere are some tips for using '.': 47513c3a7b76Schristos 47523c3a7b76Schristos * A common mistake is to place the grouping parenthesis AFTER an 475330da1778Schristos operator, when you really meant to place the parenthesis BEFORE the 475430da1778Schristos operator, e.g., you probably want this '(foo|bar)+' and NOT this 475530da1778Schristos '(foo|bar+)'. 47563c3a7b76Schristos 475730da1778Schristos The first pattern matches the words 'foo' or 'bar' any number of 475830da1778Schristos times, e.g., it matches the text 'barfoofoobarfoo'. The second 475930da1778Schristos pattern matches a single instance of 'foo' or a single instance of 476030da1778Schristos 'bar' followed by one or more 'r's, e.g., it matches the text 476130da1778Schristos 'barrrr' . 476230da1778Schristos * A '.' inside '[]''s just means a literal'.' (period), and NOT "any 47633c3a7b76Schristos character except newline". 476430da1778Schristos * Remember that '.' matches any character EXCEPT '\n' (and 'EOF'). 476530da1778Schristos If you really want to match ANY character, including newlines, then 476630da1778Schristos use '(.|\n)' Beware that the regex '(.|\n)+' will match your entire 476730da1778Schristos input! 476830da1778Schristos * Finally, if you want to match a literal '.' (a period), then use 476930da1778Schristos '[.]' or '"."' 47703c3a7b76Schristos 47713c3a7b76Schristos 47723c3a7b76SchristosFile: flex.info, Node: Can I get the flex manual in another format?, Next: Does there exist a "faster" NDFA->DFA algorithm?, Prev: The period isn't working the way I expected., Up: FAQ 47733c3a7b76Schristos 47743c3a7b76SchristosCan I get the flex manual in another format? 47753c3a7b76Schristos============================================ 47763c3a7b76Schristos 477730da1778SchristosThe 'flex' source distribution includes a texinfo manual. You are free 477830da1778Schristosto convert that texinfo into whatever format you desire. The 'texinfo' 47793c3a7b76Schristospackage includes tools for conversion to a number of formats. 47803c3a7b76Schristos 47813c3a7b76Schristos 47823c3a7b76SchristosFile: flex.info, Node: Does there exist a "faster" NDFA->DFA algorithm?, Next: How does flex compile the DFA so quickly?, Prev: Can I get the flex manual in another format?, Up: FAQ 47833c3a7b76Schristos 47843c3a7b76SchristosDoes there exist a "faster" NDFA->DFA algorithm? 47853c3a7b76Schristos================================================ 47863c3a7b76Schristos 47873c3a7b76SchristosThere's no way around the potential exponential running time - it can 47883c3a7b76Schristostake you exponential time just to enumerate all of the DFA states. In 47893c3a7b76Schristospractice, though, the running time is closer to linear, or sometimes 47903c3a7b76Schristosquadratic. 47913c3a7b76Schristos 47923c3a7b76Schristos 47933c3a7b76SchristosFile: flex.info, Node: How does flex compile the DFA so quickly?, Next: How can I use more than 8192 rules?, Prev: Does there exist a "faster" NDFA->DFA algorithm?, Up: FAQ 47943c3a7b76Schristos 47953c3a7b76SchristosHow does flex compile the DFA so quickly? 47963c3a7b76Schristos========================================= 47973c3a7b76Schristos 479830da1778SchristosThere are two big speed wins that 'flex' uses: 47993c3a7b76Schristos 48003c3a7b76Schristos 1. It analyzes the input rules to construct equivalence classes for 48013c3a7b76Schristos those characters that always make the same transitions. It then 48023c3a7b76Schristos rewrites the NFA using equivalence classes for transitions instead 48033c3a7b76Schristos of characters. This cuts down the NFA->DFA computation time 48043c3a7b76Schristos dramatically, to the point where, for uncompressed DFA tables, the 48053c3a7b76Schristos DFA generation is often I/O bound in writing out the tables. 48063c3a7b76Schristos 2. It maintains hash values for previously computed DFA states, so 48073c3a7b76Schristos testing whether a newly constructed DFA state is equivalent to a 48083c3a7b76Schristos previously constructed state can be done very quickly, by first 48093c3a7b76Schristos comparing hash values. 48103c3a7b76Schristos 48113c3a7b76Schristos 48123c3a7b76SchristosFile: flex.info, Node: How can I use more than 8192 rules?, Next: How do I abandon a file in the middle of a scan and switch to a new file?, Prev: How does flex compile the DFA so quickly?, Up: FAQ 48133c3a7b76Schristos 48143c3a7b76SchristosHow can I use more than 8192 rules? 48153c3a7b76Schristos=================================== 48163c3a7b76Schristos 481730da1778Schristos'Flex' is compiled with an upper limit of 8192 rules per scanner. If 48183c3a7b76Schristosyou need more than 8192 rules in your scanner, you'll have to recompile 481930da1778Schristos'flex' with the following changes in 'flexdef.h': 48203c3a7b76Schristos 48213c3a7b76Schristos < #define YY_TRAILING_MASK 0x2000 48223c3a7b76Schristos < #define YY_TRAILING_HEAD_MASK 0x4000 48233c3a7b76Schristos -- 48243c3a7b76Schristos > #define YY_TRAILING_MASK 0x20000000 48253c3a7b76Schristos > #define YY_TRAILING_HEAD_MASK 0x40000000 48263c3a7b76Schristos 48273c3a7b76Schristos This should work okay as long as your C compiler uses 32 bit 48283c3a7b76Schristosintegers. But you might want to think about whether using such a huge 48293c3a7b76Schristosnumber of rules is the best way to solve your problem. 48303c3a7b76Schristos 48313c3a7b76Schristos The following may also be relevant: 48323c3a7b76Schristos 48333c3a7b76Schristos With luck, you should be able to increase the definitions in 48343c3a7b76Schristosflexdef.h for: 48353c3a7b76Schristos 48363c3a7b76Schristos #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 48373c3a7b76Schristos #define MAXIMUM_MNS 31999 48383c3a7b76Schristos #define BAD_SUBSCRIPT -32767 48393c3a7b76Schristos 48403c3a7b76Schristos recompile everything, and it'll all work. Flex only has these 48413c3a7b76Schristos16-bit-like values built into it because a long time ago it was 48423c3a7b76Schristosdeveloped on a machine with 16-bit ints. I've given this advice to 48433c3a7b76Schristosothers in the past but haven't heard back from them whether it worked 48443c3a7b76Schristosokay or not... 48453c3a7b76Schristos 48463c3a7b76Schristos 48473c3a7b76SchristosFile: flex.info, Node: How do I abandon a file in the middle of a scan and switch to a new file?, Next: How do I execute code only during initialization (only before the first scan)?, Prev: How can I use more than 8192 rules?, Up: FAQ 48483c3a7b76Schristos 48493c3a7b76SchristosHow do I abandon a file in the middle of a scan and switch to a new file? 48503c3a7b76Schristos========================================================================= 48513c3a7b76Schristos 485230da1778SchristosJust call 'yyrestart(newfile)'. Be sure to reset the start state if you 485330da1778Schristoswant a "fresh start, since 'yyrestart' does NOT reset the start state 485430da1778Schristosback to 'INITIAL'. 48553c3a7b76Schristos 48563c3a7b76Schristos 48573c3a7b76SchristosFile: flex.info, Node: How do I execute code only during initialization (only before the first scan)?, Next: How do I execute code at termination?, Prev: How do I abandon a file in the middle of a scan and switch to a new file?, Up: FAQ 48583c3a7b76Schristos 48593c3a7b76SchristosHow do I execute code only during initialization (only before the first scan)? 48603c3a7b76Schristos============================================================================== 48613c3a7b76Schristos 486230da1778SchristosYou can specify an initial action by defining the macro 'YY_USER_INIT' 486330da1778Schristos(though note that 'yyout' may not be available at the time this macro is 486430da1778Schristosexecuted). Or you can add to the beginning of your rules section: 48653c3a7b76Schristos 48663c3a7b76Schristos %% 48673c3a7b76Schristos /* Must be indented! */ 48683c3a7b76Schristos static int did_init = 0; 48693c3a7b76Schristos 48703c3a7b76Schristos if ( ! did_init ){ 48713c3a7b76Schristos do_my_init(); 48723c3a7b76Schristos did_init = 1; 48733c3a7b76Schristos } 48743c3a7b76Schristos 48753c3a7b76Schristos 48763c3a7b76SchristosFile: flex.info, Node: How do I execute code at termination?, Next: Where else can I find help?, Prev: How do I execute code only during initialization (only before the first scan)?, Up: FAQ 48773c3a7b76Schristos 48783c3a7b76SchristosHow do I execute code at termination? 48793c3a7b76Schristos===================================== 48803c3a7b76Schristos 488130da1778SchristosYou can specify an action for the '<<EOF>>' rule. 48823c3a7b76Schristos 48833c3a7b76Schristos 48843c3a7b76SchristosFile: flex.info, Node: Where else can I find help?, Next: Can I include comments in the "rules" section of the file?, Prev: How do I execute code at termination?, Up: FAQ 48853c3a7b76Schristos 48863c3a7b76SchristosWhere else can I find help? 48873c3a7b76Schristos=========================== 48883c3a7b76Schristos 48893c3a7b76SchristosYou can find the flex homepage on the web at 489030da1778Schristos<http://flex.sourceforge.net/>. See that page for details about flex 48913c3a7b76Schristosmailing lists as well. 48923c3a7b76Schristos 48933c3a7b76Schristos 48943c3a7b76SchristosFile: flex.info, Node: Can I include comments in the "rules" section of the file?, Next: I get an error about undefined yywrap()., Prev: Where else can I find help?, Up: FAQ 48953c3a7b76Schristos 48963c3a7b76SchristosCan I include comments in the "rules" section of the file? 48973c3a7b76Schristos========================================================== 48983c3a7b76Schristos 48993c3a7b76SchristosYes, just about anywhere you want to. See the manual for the specific 49003c3a7b76Schristossyntax. 49013c3a7b76Schristos 49023c3a7b76Schristos 49033c3a7b76SchristosFile: flex.info, Node: I get an error about undefined yywrap()., Next: How can I change the matching pattern at run time?, Prev: Can I include comments in the "rules" section of the file?, Up: FAQ 49043c3a7b76Schristos 49053c3a7b76SchristosI get an error about undefined yywrap(). 49063c3a7b76Schristos======================================== 49073c3a7b76Schristos 490830da1778SchristosYou must supply a 'yywrap()' function of your own, or link to 'libfl.a' 49093c3a7b76Schristos(which provides one), or use 49103c3a7b76Schristos 49113c3a7b76Schristos %option noyywrap 49123c3a7b76Schristos 491330da1778Schristos in your source to say you don't want a 'yywrap()' function. 49143c3a7b76Schristos 49153c3a7b76Schristos 49163c3a7b76SchristosFile: flex.info, Node: How can I change the matching pattern at run time?, Next: How can I expand macros in the input?, Prev: I get an error about undefined yywrap()., Up: FAQ 49173c3a7b76Schristos 49183c3a7b76SchristosHow can I change the matching pattern at run time? 49193c3a7b76Schristos================================================== 49203c3a7b76Schristos 49213c3a7b76SchristosYou can't, it's compiled into a static table when flex builds the 49223c3a7b76Schristosscanner. 49233c3a7b76Schristos 49243c3a7b76Schristos 49253c3a7b76SchristosFile: flex.info, Node: How can I expand macros in the input?, Next: How can I build a two-pass scanner?, Prev: How can I change the matching pattern at run time?, Up: FAQ 49263c3a7b76Schristos 49273c3a7b76SchristosHow can I expand macros in the input? 49283c3a7b76Schristos===================================== 49293c3a7b76Schristos 493030da1778SchristosThe best way to approach this problem is at a higher level, e.g., in the 493130da1778Schristosparser. 49323c3a7b76Schristos 49333c3a7b76Schristos However, you can do this using multiple input buffers. 49343c3a7b76Schristos 49353c3a7b76Schristos %% 49363c3a7b76Schristos macro/[a-z]+ { 49373c3a7b76Schristos /* Saw the macro "macro" followed by extra stuff. */ 49383c3a7b76Schristos main_buffer = YY_CURRENT_BUFFER; 49393c3a7b76Schristos expansion_buffer = yy_scan_string(expand(yytext)); 49403c3a7b76Schristos yy_switch_to_buffer(expansion_buffer); 49413c3a7b76Schristos } 49423c3a7b76Schristos 49433c3a7b76Schristos <<EOF>> { 49443c3a7b76Schristos if ( expansion_buffer ) 49453c3a7b76Schristos { 49463c3a7b76Schristos // We were doing an expansion, return to where 49473c3a7b76Schristos // we were. 49483c3a7b76Schristos yy_switch_to_buffer(main_buffer); 49493c3a7b76Schristos yy_delete_buffer(expansion_buffer); 49503c3a7b76Schristos expansion_buffer = 0; 49513c3a7b76Schristos } 49523c3a7b76Schristos else 49533c3a7b76Schristos yyterminate(); 49543c3a7b76Schristos } 49553c3a7b76Schristos 49563c3a7b76Schristos You probably will want a stack of expansion buffers to allow nested 49573c3a7b76Schristosmacros. From the above though hopefully the idea is clear. 49583c3a7b76Schristos 49593c3a7b76Schristos 49603c3a7b76SchristosFile: flex.info, Node: How can I build a two-pass scanner?, Next: How do I match any string not matched in the preceding rules?, Prev: How can I expand macros in the input?, Up: FAQ 49613c3a7b76Schristos 49623c3a7b76SchristosHow can I build a two-pass scanner? 49633c3a7b76Schristos=================================== 49643c3a7b76Schristos 49653c3a7b76SchristosOne way to do it is to filter the first pass to a temporary file, then 49663c3a7b76Schristosprocess the temporary file on the second pass. You will probably see a 49673c3a7b76Schristosperformance hit, due to all the disk I/O. 49683c3a7b76Schristos 49693c3a7b76Schristos When you need to look ahead far forward like this, it almost always 49703c3a7b76Schristosmeans that the right solution is to build a parse tree of the entire 497130da1778Schristosinput, then walk it after the parse in order to generate the output. In 497230da1778Schristosa sense, this is a two-pass approach, once through the text and once 49733c3a7b76Schristosthrough the parse tree, but the performance hit for the latter is 49743c3a7b76Schristosusually an order of magnitude smaller, since everything is already 49753c3a7b76Schristosclassified, in binary format, and residing in memory. 49763c3a7b76Schristos 49773c3a7b76Schristos 49783c3a7b76SchristosFile: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ 49793c3a7b76Schristos 49803c3a7b76SchristosHow do I match any string not matched in the preceding rules? 49813c3a7b76Schristos============================================================= 49823c3a7b76Schristos 498330da1778SchristosOne way to assign precedence, is to place the more specific rules first. 498430da1778SchristosIf two rules would match the same input (same sequence of characters) 498530da1778Schristosthen the first rule listed in the 'flex' input wins, e.g., 49863c3a7b76Schristos 49873c3a7b76Schristos %% 49883c3a7b76Schristos foo[a-zA-Z_]+ return FOO_ID; 49893c3a7b76Schristos bar[a-zA-Z_]+ return BAR_ID; 49903c3a7b76Schristos [a-zA-Z_]+ return GENERIC_ID; 49913c3a7b76Schristos 499230da1778Schristos Note that the rule '[a-zA-Z_]+' must come *after* the others. It 49933c3a7b76Schristoswill match the same amount of text as the more specific rules, and in 499430da1778Schristosthat case the 'flex' scanner will pick the first rule listed in your 49953c3a7b76Schristosscanner as the one to match. 49963c3a7b76Schristos 49973c3a7b76Schristos 49983c3a7b76SchristosFile: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ 49993c3a7b76Schristos 50003c3a7b76SchristosI am trying to port code from AT&T lex that uses yysptr and yysbuf. 50013c3a7b76Schristos=================================================================== 50023c3a7b76Schristos 50033c3a7b76SchristosThose are internal variables pointing into the AT&T scanner's input 50043c3a7b76Schristosbuffer. I imagine they're being manipulated in user versions of the 500530da1778Schristos'input()' and 'unput()' functions. If so, what you need to do is 50063c3a7b76Schristosanalyze those functions to figure out what they're doing, and then 500730da1778Schristosreplace 'input()' with an appropriate definition of 'YY_INPUT'. You 500830da1778Schristosshouldn't need to (and must not) replace 'flex''s 'unput()' function. 50093c3a7b76Schristos 50103c3a7b76Schristos 50113c3a7b76SchristosFile: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ 50123c3a7b76Schristos 50133c3a7b76SchristosIs there a way to make flex treat NULL like a regular character? 50143c3a7b76Schristos================================================================ 50153c3a7b76Schristos 501630da1778SchristosYes, '\0' and '\x00' should both do the trick. Perhaps you have an 5017*463ae347Schristosancient version of 'flex'. The latest release is version 2.6.4. 50183c3a7b76Schristos 50193c3a7b76Schristos 50203c3a7b76SchristosFile: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesn't flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ 50213c3a7b76Schristos 50223c3a7b76SchristosWhenever flex can not match the input it says "flex scanner jammed". 50233c3a7b76Schristos==================================================================== 50243c3a7b76Schristos 50253c3a7b76SchristosYou need to add a rule that matches the otherwise-unmatched text, e.g., 50263c3a7b76Schristos 50273c3a7b76Schristos %option yylineno 50283c3a7b76Schristos %% 50293c3a7b76Schristos [[a bunch of rules here]] 50303c3a7b76Schristos 50313c3a7b76Schristos . printf("bad input character '%s' at line %d\n", yytext, yylineno); 50323c3a7b76Schristos 503330da1778Schristos See '%option default' for more information. 50343c3a7b76Schristos 50353c3a7b76Schristos 50363c3a7b76SchristosFile: flex.info, Node: Why doesn't flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ 50373c3a7b76Schristos 50383c3a7b76SchristosWhy doesn't flex have non-greedy operators like perl does? 50393c3a7b76Schristos========================================================== 50403c3a7b76Schristos 50413c3a7b76SchristosA DFA can do a non-greedy match by stopping the first time it enters an 50423c3a7b76Schristosaccepting state, instead of consuming input until it determines that no 50433c3a7b76Schristosfurther matching is possible (a "jam" state). This is actually easier 50443c3a7b76Schristosto implement than longest leftmost match (which flex does). 50453c3a7b76Schristos 50463c3a7b76Schristos But it's also much less useful than longest leftmost match. In 50473c3a7b76Schristosgeneral, when you find yourself wishing for non-greedy matching, that's 50483c3a7b76Schristosusually a sign that you're trying to make the scanner do some parsing. 50493c3a7b76SchristosThat's generally the wrong approach, since it lacks the power to do a 50503c3a7b76Schristosdecent job. Better is to either introduce a separate parser, or to 50513c3a7b76Schristossplit the scanner into multiple scanners using (exclusive) start 50523c3a7b76Schristosconditions. 50533c3a7b76Schristos 505430da1778Schristos You might have a separate start state once you've seen the 'BEGIN'. 505530da1778SchristosIn that state, you might then have a regex that will match 'END' (to 505630da1778Schristoskick you out of the state), and perhaps '(.|\n)' to get a single 50573c3a7b76Schristoscharacter within the chunk ... 50583c3a7b76Schristos 50593c3a7b76Schristos This approach also has much better error-reporting properties. 50603c3a7b76Schristos 50613c3a7b76Schristos 50623c3a7b76SchristosFile: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesn't flex have non-greedy operators like perl does?, Up: FAQ 50633c3a7b76Schristos 50643c3a7b76SchristosMemory leak - 16386 bytes allocated by malloc. 50653c3a7b76Schristos============================================== 50663c3a7b76Schristos 506730da1778SchristosUPDATED 2002-07-10: As of 'flex' version 2.5.9, this leak means that you 506830da1778Schristosdid not call 'yylex_destroy()'. If you are using an earlier version of 506930da1778Schristos'flex', then read on. 50703c3a7b76Schristos 50713c3a7b76Schristos The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the 507230da1778Schristosread-buffer, and about 40 for 'struct yy_buffer_state' (depending upon 50733c3a7b76Schristosalignment). The leak is in the non-reentrant C scanner only (NOT in the 507430da1778Schristosreentrant scanner, NOT in the C++ scanner). Since 'flex' doesn't know 50753c3a7b76Schristoswhen you are done, the buffer is never freed. 50763c3a7b76Schristos 507730da1778Schristos However, the leak won't multiply since the buffer is reused no matter 507830da1778Schristoshow many times you call 'yylex()'. 50793c3a7b76Schristos 50803c3a7b76Schristos If you want to reclaim the memory when you are completely done 50813c3a7b76Schristosscanning, then you might try this: 50823c3a7b76Schristos 50833c3a7b76Schristos /* For non-reentrant C scanner only. */ 50843c3a7b76Schristos yy_delete_buffer(YY_CURRENT_BUFFER); 50853c3a7b76Schristos yy_init = 1; 50863c3a7b76Schristos 508730da1778Schristos Note: 'yy_init' is an "internal variable", and hasn't been tested in 50883c3a7b76Schristosthis situation. It is possible that some other globals may need 50893c3a7b76Schristosresetting as well. 50903c3a7b76Schristos 50913c3a7b76Schristos 50923c3a7b76SchristosFile: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ 50933c3a7b76Schristos 50943c3a7b76SchristosHow do I track the byte offset for lseek()? 50953c3a7b76Schristos=========================================== 50963c3a7b76Schristos 50973c3a7b76Schristos > We thought that it would be possible to have this number through the 50983c3a7b76Schristos > evaluation of the following expression: 50993c3a7b76Schristos > 51003c3a7b76Schristos > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf 51013c3a7b76Schristos 51023c3a7b76Schristos While this is the right idea, it has two problems. The first is that 510330da1778Schristosit's possible that 'flex' will request less than 'YY_READ_BUF_SIZE' 510430da1778Schristosduring an invocation of 'YY_INPUT' (or that your input source will 510530da1778Schristosreturn less even though 'YY_READ_BUF_SIZE' bytes were requested). The 510630da1778Schristossecond problem is that when refilling its internal buffer, 'flex' keeps 51073c3a7b76Schristossome characters from the previous buffer (because usually it's in the 510830da1778Schristosmiddle of a match, and needs those characters to construct 'yytext' for 510930da1778Schristosthe match once it's done). Because of this, 'yy_c_buf_p - 51103c3a7b76SchristosYY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters 51113c3a7b76Schristosalready read from the current buffer. 51123c3a7b76Schristos 51133c3a7b76Schristos An alternative solution is to count the number of characters you've 51143c3a7b76Schristosmatched since starting to scan. This can be done by using 511530da1778Schristos'YY_USER_ACTION'. For example, 51163c3a7b76Schristos 51173c3a7b76Schristos #define YY_USER_ACTION num_chars += yyleng; 51183c3a7b76Schristos 51193c3a7b76Schristos (You need to be careful to update your bookkeeping if you use 512030da1778Schristos'yymore('), 'yyless()', 'unput()', or 'input()'.) 51213c3a7b76Schristos 51223c3a7b76Schristos 51233c3a7b76SchristosFile: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ 51243c3a7b76Schristos 51253c3a7b76SchristosHow do I use my own I/O classes in a C++ scanner? 51263c3a7b76Schristos================================================= 51273c3a7b76Schristos 512830da1778SchristosWhen the flex C++ scanning class rewrite finally happens, then this sort 512930da1778Schristosof thing should become much easier. 51303c3a7b76Schristos 51313c3a7b76Schristos You can do this by passing the various functions (such as 513230da1778Schristos'LexerInput()' and 'LexerOutput()') NULL 'iostream*''s, and then dealing 513330da1778Schristoswith your own I/O classes surreptitiously (i.e., stashing them in 513430da1778Schristosspecial member variables). This works because the only assumption about 513530da1778Schristosthe lexer regarding what's done with the iostream's is that they're 513630da1778Schristosultimately passed to 'LexerInput()' and 'LexerOutput', which then do 513730da1778Schristoswhatever is necessary with them. 51383c3a7b76Schristos 51393c3a7b76Schristos 51403c3a7b76SchristosFile: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ 51413c3a7b76Schristos 51423c3a7b76SchristosHow do I skip as many chars as possible? 51433c3a7b76Schristos======================================== 51443c3a7b76Schristos 51453c3a7b76SchristosHow do I skip as many chars as possible - without interfering with the 51463c3a7b76Schristosother patterns? 51473c3a7b76Schristos 51483c3a7b76Schristos In the example below, we want to skip over characters until we see 51493c3a7b76Schristosthe phrase "endskip". The following will _NOT_ work correctly (do you 51503c3a7b76Schristossee why not?) 51513c3a7b76Schristos 51523c3a7b76Schristos /* INCORRECT SCANNER */ 51533c3a7b76Schristos %x SKIP 51543c3a7b76Schristos %% 51553c3a7b76Schristos <INITIAL>startskip BEGIN(SKIP); 51563c3a7b76Schristos ... 51573c3a7b76Schristos <SKIP>"endskip" BEGIN(INITIAL); 51583c3a7b76Schristos <SKIP>.* ; 51593c3a7b76Schristos 51603c3a7b76Schristos The problem is that the pattern .* will eat up the word "endskip." 51613c3a7b76SchristosThe simplest (but slow) fix is: 51623c3a7b76Schristos 51633c3a7b76Schristos <SKIP>"endskip" BEGIN(INITIAL); 51643c3a7b76Schristos <SKIP>. ; 51653c3a7b76Schristos 516630da1778Schristos The fix involves making the second rule match more, without making it 516730da1778Schristosmatch "endskip" plus something else. So for example: 51683c3a7b76Schristos 51693c3a7b76Schristos <SKIP>"endskip" BEGIN(INITIAL); 51703c3a7b76Schristos <SKIP>[^e]+ ; 51713c3a7b76Schristos <SKIP>. ;/* so you eat up e's, too */ 51723c3a7b76Schristos 51733c3a7b76Schristos 51743c3a7b76SchristosFile: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ 51753c3a7b76Schristos 51763c3a7b76Schristosdeleteme00 51773c3a7b76Schristos========== 51783c3a7b76Schristos 51793c3a7b76Schristos QUESTION: 51803c3a7b76Schristos When was flex born? 51813c3a7b76Schristos 51823c3a7b76Schristos Vern Paxson took over 51833c3a7b76Schristos the Software Tools lex project from Jef Poskanzer in 1982. At that point it 51843c3a7b76Schristos was written in Ratfor. Around 1987 or so, Paxson translated it into C, and 51853c3a7b76Schristos a legend was born :-). 51863c3a7b76Schristos 51873c3a7b76Schristos 51883c3a7b76SchristosFile: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ 51893c3a7b76Schristos 51903c3a7b76SchristosAre certain equivalent patterns faster than others? 51913c3a7b76Schristos=================================================== 51923c3a7b76Schristos 51933c3a7b76Schristos To: Adoram Rogel <adoram@orna.hybridge.com> 51943c3a7b76Schristos Subject: Re: Flex 2.5.2 performance questions 51953c3a7b76Schristos In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT. 51963c3a7b76Schristos Date: Wed, 18 Sep 96 10:51:02 PDT 51973c3a7b76Schristos From: Vern Paxson <vern> 51983c3a7b76Schristos 51993c3a7b76Schristos [Note, the most recent flex release is 2.5.4, which you can get from 52003c3a7b76Schristos ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.] 52013c3a7b76Schristos 52023c3a7b76Schristos > 1. Using the pattern 52033c3a7b76Schristos > ([Ff](oot)?)?[Nn](ote)?(\.)? 52043c3a7b76Schristos > instead of 52053c3a7b76Schristos > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.))) 52063c3a7b76Schristos > (in a very complicated flex program) caused the program to slow from 52073c3a7b76Schristos > 300K+/min to 100K/min (no other changes were done). 52083c3a7b76Schristos 52093c3a7b76Schristos These two are not equivalent. For example, the first can match "footnote." 52103c3a7b76Schristos but the second can only match "footnote". This is almost certainly the 52113c3a7b76Schristos cause in the discrepancy - the slower scanner run is matching more tokens, 52123c3a7b76Schristos and/or having to do more backing up. 52133c3a7b76Schristos 52143c3a7b76Schristos > 2. Which of these two are better: [Ff]oot or (F|f)oot ? 52153c3a7b76Schristos 52163c3a7b76Schristos From a performance point of view, they're equivalent (modulo presumably 52173c3a7b76Schristos minor effects such as memory cache hit rates; and the presence of trailing 52183c3a7b76Schristos context, see below). From a space point of view, the first is slightly 52193c3a7b76Schristos preferable. 52203c3a7b76Schristos 52213c3a7b76Schristos > 3. I have a pattern that look like this: 52223c3a7b76Schristos > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd) 52233c3a7b76Schristos > 52243c3a7b76Schristos > running yet another complicated program that includes the following rule: 52253c3a7b76Schristos > <snext>{and}/{no4}{bb}{pats} 52263c3a7b76Schristos > 52273c3a7b76Schristos > gets me to "too complicated - over 32,000 states"... 52283c3a7b76Schristos 52293c3a7b76Schristos I can't tell from this example whether the trailing context is variable-length 52303c3a7b76Schristos or fixed-length (it could be the latter if {and} is fixed-length). If it's 52313c3a7b76Schristos variable length, which flex -p will tell you, then this reflects a basic 52323c3a7b76Schristos performance problem, and if you can eliminate it by restructuring your 52333c3a7b76Schristos scanner, you will see significant improvement. 52343c3a7b76Schristos 52353c3a7b76Schristos > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about 52363c3a7b76Schristos > 10 patterns and changed the rule to be 5 rules. 52373c3a7b76Schristos > This did compile, but what is the rule of thumb here ? 52383c3a7b76Schristos 52393c3a7b76Schristos The rule is to avoid trailing context other than fixed-length, in which for 52403c3a7b76Schristos a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use 52413c3a7b76Schristos of the '|' operator automatically makes the pattern variable length, so in 52423c3a7b76Schristos this case '[Ff]oot' is preferred to '(F|f)oot'. 52433c3a7b76Schristos 52443c3a7b76Schristos > 4. I changed a rule that looked like this: 52453c3a7b76Schristos > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN... 52463c3a7b76Schristos > 52473c3a7b76Schristos > to the next 2 rules: 52483c3a7b76Schristos > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;} 52493c3a7b76Schristos > <snext8>{and}{bb}/{ROMAN} { BEGIN... 52503c3a7b76Schristos > 52513c3a7b76Schristos > Again, I understand the using [^...] will cause a great performance loss 52523c3a7b76Schristos 52533c3a7b76Schristos Actually, it doesn't cause any sort of performance loss. It's a surprising 52543c3a7b76Schristos fact about regular expressions that they always match in linear time 52553c3a7b76Schristos regardless of how complex they are. 52563c3a7b76Schristos 52573c3a7b76Schristos > but are there any specific rules about it ? 52583c3a7b76Schristos 52593c3a7b76Schristos See the "Performance Considerations" section of the man page, and also 52603c3a7b76Schristos the example in MISC/fastwc/. 52613c3a7b76Schristos 52623c3a7b76Schristos Vern 52633c3a7b76Schristos 52643c3a7b76Schristos 52653c3a7b76SchristosFile: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ 52663c3a7b76Schristos 52673c3a7b76SchristosIs backing up a big deal? 52683c3a7b76Schristos========================= 52693c3a7b76Schristos 52703c3a7b76Schristos To: Adoram Rogel <adoram@hybridge.com> 52713c3a7b76Schristos Subject: Re: Flex 2.5.2 performance questions 52723c3a7b76Schristos In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT. 52733c3a7b76Schristos Date: Thu, 19 Sep 96 09:58:00 PDT 52743c3a7b76Schristos From: Vern Paxson <vern> 52753c3a7b76Schristos 52763c3a7b76Schristos > a lot about the backing up problem. 52773c3a7b76Schristos > I believe that there lies my biggest problem, and I'll try to improve 52783c3a7b76Schristos > it. 52793c3a7b76Schristos 52803c3a7b76Schristos Since you have variable trailing context, this is a bigger performance 52813c3a7b76Schristos problem. Fixing it is usually easier than fixing backing up, which in a 52823c3a7b76Schristos complicated scanner (yours seems to fit the bill) can be extremely 52833c3a7b76Schristos difficult to do correctly. 52843c3a7b76Schristos 52853c3a7b76Schristos You also don't mention what flags you are using for your scanner. 52863c3a7b76Schristos -f makes a large speed difference, and -Cfe buys you nearly as much 52873c3a7b76Schristos speed but the resulting scanner is considerably smaller. 52883c3a7b76Schristos 52893c3a7b76Schristos > I have an | operator in {and} and in {pats} so both of them are variable 52903c3a7b76Schristos > length. 52913c3a7b76Schristos 52923c3a7b76Schristos -p should have reported this. 52933c3a7b76Schristos 52943c3a7b76Schristos > Is changing one of them to fixed-length is enough ? 52953c3a7b76Schristos 52963c3a7b76Schristos Yes. 52973c3a7b76Schristos 52983c3a7b76Schristos > Is it possible to change the 32,000 states limit ? 52993c3a7b76Schristos 53003c3a7b76Schristos Yes. I've appended instructions on how. Before you make this change, 53013c3a7b76Schristos though, you should think about whether there are ways to fundamentally 53023c3a7b76Schristos simplify your scanner - those are certainly preferable! 53033c3a7b76Schristos 53043c3a7b76Schristos Vern 53053c3a7b76Schristos 53063c3a7b76Schristos To increase the 32K limit (on a machine with 32 bit integers), you increase 53073c3a7b76Schristos the magnitude of the following in flexdef.h: 53083c3a7b76Schristos 53093c3a7b76Schristos #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 53103c3a7b76Schristos #define MAXIMUM_MNS 31999 53113c3a7b76Schristos #define BAD_SUBSCRIPT -32767 53123c3a7b76Schristos #define MAX_SHORT 32700 53133c3a7b76Schristos 53143c3a7b76Schristos Adding a 0 or two after each should do the trick. 53153c3a7b76Schristos 53163c3a7b76Schristos 53173c3a7b76SchristosFile: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ 53183c3a7b76Schristos 53193c3a7b76SchristosCan I fake multi-byte character support? 53203c3a7b76Schristos======================================== 53213c3a7b76Schristos 53223c3a7b76Schristos To: Heeman_Lee@hp.com 53233c3a7b76Schristos Subject: Re: flex - multi-byte support? 53243c3a7b76Schristos In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT. 53253c3a7b76Schristos Date: Fri, 04 Oct 1996 11:42:18 PDT 53263c3a7b76Schristos From: Vern Paxson <vern> 53273c3a7b76Schristos 53283c3a7b76Schristos > I assume as long as my *.l file defines the 53293c3a7b76Schristos > range of expected character code values (in octal format), flex will 53303c3a7b76Schristos > scan the file and read multi-byte characters correctly. But I have no 53313c3a7b76Schristos > confidence in this assumption. 53323c3a7b76Schristos 53333c3a7b76Schristos Your lack of confidence is justified - this won't work. 53343c3a7b76Schristos 53353c3a7b76Schristos Flex has in it a widespread assumption that the input is processed 53363c3a7b76Schristos one byte at a time. Fixing this is on the to-do list, but is involved, 53373c3a7b76Schristos so it won't happen any time soon. In the interim, the best I can suggest 53383c3a7b76Schristos (unless you want to try fixing it yourself) is to write your rules in 53393c3a7b76Schristos terms of pairs of bytes, using definitions in the first section: 53403c3a7b76Schristos 53413c3a7b76Schristos X \xfe\xc2 53423c3a7b76Schristos ... 53433c3a7b76Schristos %% 53443c3a7b76Schristos foo{X}bar found_foo_fe_c2_bar(); 53453c3a7b76Schristos 53463c3a7b76Schristos etc. Definitely a pain - sorry about that. 53473c3a7b76Schristos 53483c3a7b76Schristos By the way, the email address you used for me is ancient, indicating you 53493c3a7b76Schristos have a very old version of flex. You can get the most recent, 2.5.4, from 53503c3a7b76Schristos ftp.ee.lbl.gov. 53513c3a7b76Schristos 53523c3a7b76Schristos Vern 53533c3a7b76Schristos 53543c3a7b76Schristos 53553c3a7b76SchristosFile: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ 53563c3a7b76Schristos 53573c3a7b76Schristosdeleteme01 53583c3a7b76Schristos========== 53593c3a7b76Schristos 53603c3a7b76Schristos To: moleary@primus.com 53613c3a7b76Schristos Subject: Re: Flex / Unicode compatibility question 53623c3a7b76Schristos In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT. 53633c3a7b76Schristos Date: Tue, 22 Oct 1996 11:06:13 PDT 53643c3a7b76Schristos From: Vern Paxson <vern> 53653c3a7b76Schristos 53663c3a7b76Schristos Unfortunately flex at the moment has a widespread assumption within it 53673c3a7b76Schristos that characters are processed 8 bits at a time. I don't see any easy 53683c3a7b76Schristos fix for this (other than writing your rules in terms of double characters - 53693c3a7b76Schristos a pain). I also don't know of a wider lex, though you might try surfing 53703c3a7b76Schristos the Plan 9 stuff because I know it's a Unicode system, and also the PCCT 53713c3a7b76Schristos toolkit (try searching say Alta Vista for "Purdue Compiler Construction 53723c3a7b76Schristos Toolkit"). 53733c3a7b76Schristos 53743c3a7b76Schristos Fixing flex to handle wider characters is on the long-term to-do list. 53753c3a7b76Schristos But since flex is a strictly spare-time project these days, this probably 53763c3a7b76Schristos won't happen for quite a while, unless someone else does it first. 53773c3a7b76Schristos 53783c3a7b76Schristos Vern 53793c3a7b76Schristos 53803c3a7b76Schristos 53813c3a7b76SchristosFile: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ 53823c3a7b76Schristos 53833c3a7b76SchristosCan you discuss some flex internals? 53843c3a7b76Schristos==================================== 53853c3a7b76Schristos 53863c3a7b76Schristos To: Johan Linde <jl@theophys.kth.se> 53873c3a7b76Schristos Subject: Re: translation of flex 53883c3a7b76Schristos In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST. 53893c3a7b76Schristos Date: Mon, 11 Nov 1996 10:33:50 PST 53903c3a7b76Schristos From: Vern Paxson <vern> 53913c3a7b76Schristos 53923c3a7b76Schristos > I'm working for the Swedish team translating GNU program, and I'm currently 53933c3a7b76Schristos > working with flex. I have a few questions about some of the messages which 53943c3a7b76Schristos > I hope you can answer. 53953c3a7b76Schristos 53963c3a7b76Schristos All of the things you're wondering about, by the way, concerning flex 53973c3a7b76Schristos internals - probably the only person who understands what they mean in 53983c3a7b76Schristos English is me! So I wouldn't worry too much about getting them right. 53993c3a7b76Schristos That said ... 54003c3a7b76Schristos 54013c3a7b76Schristos > #: main.c:545 54023c3a7b76Schristos > msgid " %d protos created\n" 54033c3a7b76Schristos > 54043c3a7b76Schristos > Does proto mean prototype? 54053c3a7b76Schristos 54063c3a7b76Schristos Yes - prototypes of state compression tables. 54073c3a7b76Schristos 54083c3a7b76Schristos > #: main.c:539 54093c3a7b76Schristos > msgid " %d/%d (peak %d) template nxt-chk entries created\n" 54103c3a7b76Schristos > 54113c3a7b76Schristos > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?) 54123c3a7b76Schristos > However, 'template next-check entries' doesn't make much sense to me. To be 54133c3a7b76Schristos > able to find a good translation I need to know a little bit more about it. 54143c3a7b76Schristos 54153c3a7b76Schristos There is a scheme in the Aho/Sethi/Ullman compiler book for compressing 54163c3a7b76Schristos scanner tables. It involves creating two pairs of tables. The first has 54173c3a7b76Schristos "base" and "default" entries, the second has "next" and "check" entries. 54183c3a7b76Schristos The "base" entry is indexed by the current state and yields an index into 54193c3a7b76Schristos the next/check table. The "default" entry gives what to do if the state 54203c3a7b76Schristos transition isn't found in next/check. The "next" entry gives the next 54213c3a7b76Schristos state to enter, but only if the "check" entry verifies that this entry is 54223c3a7b76Schristos correct for the current state. Flex creates templates of series of 54233c3a7b76Schristos next/check entries and then encodes differences from these templates as a 54243c3a7b76Schristos way to compress the tables. 54253c3a7b76Schristos 54263c3a7b76Schristos > #: main.c:533 54273c3a7b76Schristos > msgid " %d/%d base-def entries created\n" 54283c3a7b76Schristos > 54293c3a7b76Schristos > The same problem here for 'base-def'. 54303c3a7b76Schristos 54313c3a7b76Schristos See above. 54323c3a7b76Schristos 54333c3a7b76Schristos Vern 54343c3a7b76Schristos 54353c3a7b76Schristos 54363c3a7b76SchristosFile: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ 54373c3a7b76Schristos 54383c3a7b76Schristosunput() messes up yy_at_bol 54393c3a7b76Schristos=========================== 54403c3a7b76Schristos 54413c3a7b76Schristos To: Xinying Li <xli@npac.syr.edu> 54423c3a7b76Schristos Subject: Re: FLEX ? 54433c3a7b76Schristos In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST. 54443c3a7b76Schristos Date: Wed, 13 Nov 1996 19:51:54 PST 54453c3a7b76Schristos From: Vern Paxson <vern> 54463c3a7b76Schristos 54473c3a7b76Schristos > "unput()" them to input flow, question occurs. If I do this after I scan 54483c3a7b76Schristos > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That 54493c3a7b76Schristos > means the carriage flag has gone. 54503c3a7b76Schristos 54513c3a7b76Schristos You can control this by calling yy_set_bol(). It's described in the manual. 54523c3a7b76Schristos 54533c3a7b76Schristos > And if in pre-reading it goes to the end of file, is anything done 54543c3a7b76Schristos > to control the end of curren buffer and end of file? 54553c3a7b76Schristos 54563c3a7b76Schristos No, there's no way to put back an end-of-file. 54573c3a7b76Schristos 54583c3a7b76Schristos > By the way I am using flex 2.5.2 and using the "-l". 54593c3a7b76Schristos 54603c3a7b76Schristos The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and 54613c3a7b76Schristos 2.5.3. You can get it from ftp.ee.lbl.gov. 54623c3a7b76Schristos 54633c3a7b76Schristos Vern 54643c3a7b76Schristos 54653c3a7b76Schristos 54663c3a7b76SchristosFile: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ 54673c3a7b76Schristos 54683c3a7b76SchristosThe | operator is not doing what I want 54693c3a7b76Schristos======================================= 54703c3a7b76Schristos 54713c3a7b76Schristos To: Alain.ISSARD@st.com 54723c3a7b76Schristos Subject: Re: Start condition with FLEX 54733c3a7b76Schristos In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST. 54743c3a7b76Schristos Date: Mon, 18 Nov 1996 10:41:34 PST 54753c3a7b76Schristos From: Vern Paxson <vern> 54763c3a7b76Schristos 54773c3a7b76Schristos > I am not able to use the start condition scope and to use the | (OR) with 54783c3a7b76Schristos > rules having start conditions. 54793c3a7b76Schristos 54803c3a7b76Schristos The problem is that if you use '|' as a regular expression operator, for 54813c3a7b76Schristos example "a|b" meaning "match either 'a' or 'b'", then it must *not* have 54823c3a7b76Schristos any blanks around it. If you instead want the special '|' *action* (which 54833c3a7b76Schristos from your scanner appears to be the case), which is a way of giving two 54843c3a7b76Schristos different rules the same action: 54853c3a7b76Schristos 54863c3a7b76Schristos foo | 54873c3a7b76Schristos bar matched_foo_or_bar(); 54883c3a7b76Schristos 54893c3a7b76Schristos then '|' *must* be separated from the first rule by whitespace and *must* 54903c3a7b76Schristos be followed by a new line. You *cannot* write it as: 54913c3a7b76Schristos 54923c3a7b76Schristos foo | bar matched_foo_or_bar(); 54933c3a7b76Schristos 54943c3a7b76Schristos even though you might think you could because yacc supports this syntax. 54953c3a7b76Schristos The reason for this unfortunately incompatibility is historical, but it's 54963c3a7b76Schristos unlikely to be changed. 54973c3a7b76Schristos 54983c3a7b76Schristos Your problems with start condition scope are simply due to syntax errors 54993c3a7b76Schristos from your use of '|' later confusing flex. 55003c3a7b76Schristos 55013c3a7b76Schristos Let me know if you still have problems. 55023c3a7b76Schristos 55033c3a7b76Schristos Vern 55043c3a7b76Schristos 55053c3a7b76Schristos 55063c3a7b76SchristosFile: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ 55073c3a7b76Schristos 55083c3a7b76SchristosWhy can't flex understand this variable trailing context pattern? 55093c3a7b76Schristos================================================================= 55103c3a7b76Schristos 55113c3a7b76Schristos To: Gregory Margo <gmargo@newton.vip.best.com> 55123c3a7b76Schristos Subject: Re: flex-2.5.3 bug report 55133c3a7b76Schristos In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST. 55143c3a7b76Schristos Date: Sat, 23 Nov 1996 17:07:32 PST 55153c3a7b76Schristos From: Vern Paxson <vern> 55163c3a7b76Schristos 55173c3a7b76Schristos > Enclosed is a lex file that "real" lex will process, but I cannot get 55183c3a7b76Schristos > flex to process it. Could you try it and maybe point me in the right direction? 55193c3a7b76Schristos 55203c3a7b76Schristos Your problem is that some of the definitions in the scanner use the '/' 55213c3a7b76Schristos trailing context operator, and have it enclosed in ()'s. Flex does not 55223c3a7b76Schristos allow this operator to be enclosed in ()'s because doing so allows undefined 55233c3a7b76Schristos regular expressions such as "(a/b)+". So the solution is to remove the 55243c3a7b76Schristos parentheses. Note that you must also be building the scanner with the -l 55253c3a7b76Schristos option for AT&T lex compatibility. Without this option, flex automatically 55263c3a7b76Schristos encloses the definitions in parentheses. 55273c3a7b76Schristos 55283c3a7b76Schristos Vern 55293c3a7b76Schristos 55303c3a7b76Schristos 55313c3a7b76SchristosFile: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ 55323c3a7b76Schristos 55333c3a7b76SchristosThe ^ operator isn't working 55343c3a7b76Schristos============================ 55353c3a7b76Schristos 55363c3a7b76Schristos To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de> 55373c3a7b76Schristos Subject: Re: Flex Bug ? 55383c3a7b76Schristos In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST. 55393c3a7b76Schristos Date: Tue, 26 Nov 1996 11:15:05 PST 55403c3a7b76Schristos From: Vern Paxson <vern> 55413c3a7b76Schristos 55423c3a7b76Schristos > In my lexer code, i have the line : 55433c3a7b76Schristos > ^\*.* { } 55443c3a7b76Schristos > 55453c3a7b76Schristos > Thus all lines starting with an astrix (*) are comment lines. 55463c3a7b76Schristos > This does not work ! 55473c3a7b76Schristos 55483c3a7b76Schristos I can't get this problem to reproduce - it works fine for me. Note 55493c3a7b76Schristos though that if what you have is slightly different: 55503c3a7b76Schristos 55513c3a7b76Schristos COMMENT ^\*.* 55523c3a7b76Schristos %% 55533c3a7b76Schristos {COMMENT} { } 55543c3a7b76Schristos 55553c3a7b76Schristos then it won't work, because flex pushes back macro definitions enclosed 55563c3a7b76Schristos in ()'s, so the rule becomes 55573c3a7b76Schristos 55583c3a7b76Schristos (^\*.*) { } 55593c3a7b76Schristos 55603c3a7b76Schristos and now that the '^' operator is not at the immediate beginning of the 55613c3a7b76Schristos line, it's interpreted as just a regular character. You can avoid this 55623c3a7b76Schristos behavior by using the "-l" lex-compatibility flag, or "%option lex-compat". 55633c3a7b76Schristos 55643c3a7b76Schristos Vern 55653c3a7b76Schristos 55663c3a7b76Schristos 55673c3a7b76SchristosFile: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ 55683c3a7b76Schristos 55693c3a7b76SchristosTrailing context is getting confused with trailing optional patterns 55703c3a7b76Schristos==================================================================== 55713c3a7b76Schristos 55723c3a7b76Schristos To: Adoram Rogel <adoram@hybridge.com> 55733c3a7b76Schristos Subject: Re: Flex 2.5.4 BOF ??? 55743c3a7b76Schristos In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST. 55753c3a7b76Schristos Date: Wed, 27 Nov 1996 10:56:25 PST 55763c3a7b76Schristos From: Vern Paxson <vern> 55773c3a7b76Schristos 55783c3a7b76Schristos > Organization(s)?/[a-z] 55793c3a7b76Schristos > 55803c3a7b76Schristos > This matched "Organizations" (looking in debug mode, the trailing s 55813c3a7b76Schristos > was matched with trailing context instead of the optional (s) in the 55823c3a7b76Schristos > end of the word. 55833c3a7b76Schristos 55843c3a7b76Schristos That should only happen with lex. Flex can properly match this pattern. 55853c3a7b76Schristos (That might be what you're saying, I'm just not sure.) 55863c3a7b76Schristos 55873c3a7b76Schristos > Is there a way to avoid this dangerous trailing context problem ? 55883c3a7b76Schristos 55893c3a7b76Schristos Unfortunately, there's no easy way. On the other hand, I don't see why 55903c3a7b76Schristos it should be a problem. Lex's matching is clearly wrong, and I'd hope 55913c3a7b76Schristos that usually the intent remains the same as expressed with the pattern, 55923c3a7b76Schristos so flex's matching will be correct. 55933c3a7b76Schristos 55943c3a7b76Schristos Vern 55953c3a7b76Schristos 55963c3a7b76Schristos 55973c3a7b76SchristosFile: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ 55983c3a7b76Schristos 55993c3a7b76SchristosIs flex GNU or not? 56003c3a7b76Schristos=================== 56013c3a7b76Schristos 56023c3a7b76Schristos To: Cameron MacKinnon <mackin@interlog.com> 56033c3a7b76Schristos Subject: Re: Flex documentation bug 56043c3a7b76Schristos In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST. 56053c3a7b76Schristos Date: Sun, 01 Dec 1996 22:29:39 PST 56063c3a7b76Schristos From: Vern Paxson <vern> 56073c3a7b76Schristos 56083c3a7b76Schristos > I'm not sure how or where to submit bug reports (documentation or 56093c3a7b76Schristos > otherwise) for the GNU project stuff ... 56103c3a7b76Schristos 56113c3a7b76Schristos Well, strictly speaking flex isn't part of the GNU project. They just 56123c3a7b76Schristos distribute it because no one's written a decent GPL'd lex replacement. 56133c3a7b76Schristos So you should send bugs directly to me. Those sent to the GNU folks 56143c3a7b76Schristos sometimes find there way to me, but some may drop between the cracks. 56153c3a7b76Schristos 56163c3a7b76Schristos > In GNU Info, under the section 'Start Conditions', and also in the man 56173c3a7b76Schristos > page (mine's dated April '95) is a nice little snippet showing how to 56183c3a7b76Schristos > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in 56193c3a7b76Schristos > size. Unfortunately, no overflow checking is ever done ... 56203c3a7b76Schristos 56213c3a7b76Schristos This is already mentioned in the manual: 56223c3a7b76Schristos 56233c3a7b76Schristos Finally, here's an example of how to match C-style quoted 56243c3a7b76Schristos strings using exclusive start conditions, including expanded 56253c3a7b76Schristos escape sequences (but not including checking for a string 56263c3a7b76Schristos that's too long): 56273c3a7b76Schristos 56283c3a7b76Schristos The reason for not doing the overflow checking is that it will needlessly 56293c3a7b76Schristos clutter up an example whose main purpose is just to demonstrate how to 56303c3a7b76Schristos use flex. 56313c3a7b76Schristos 56323c3a7b76Schristos The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov. 56333c3a7b76Schristos 56343c3a7b76Schristos Vern 56353c3a7b76Schristos 56363c3a7b76Schristos 56373c3a7b76SchristosFile: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ 56383c3a7b76Schristos 56393c3a7b76SchristosERASEME53 56403c3a7b76Schristos========= 56413c3a7b76Schristos 56423c3a7b76Schristos To: tsv@cs.UManitoba.CA 56433c3a7b76Schristos Subject: Re: Flex (reg).. 56443c3a7b76Schristos In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST. 56453c3a7b76Schristos Date: Thu, 06 Mar 1997 15:54:19 PST 56463c3a7b76Schristos From: Vern Paxson <vern> 56473c3a7b76Schristos 56483c3a7b76Schristos > [:alpha:] ([:alnum:] | \\_)* 56493c3a7b76Schristos 56503c3a7b76Schristos If your rule really has embedded blanks as shown above, then it won't 56513c3a7b76Schristos work, as the first blank delimits the rule from the action. (It wouldn't 56523c3a7b76Schristos even compile ...) You need instead: 56533c3a7b76Schristos 56543c3a7b76Schristos [:alpha:]([:alnum:]|\\_)* 56553c3a7b76Schristos 56563c3a7b76Schristos and that should work fine - there's no restriction on what can go inside 56573c3a7b76Schristos of ()'s except for the trailing context operator, '/'. 56583c3a7b76Schristos 56593c3a7b76Schristos Vern 56603c3a7b76Schristos 56613c3a7b76Schristos 56623c3a7b76SchristosFile: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ 56633c3a7b76Schristos 56643c3a7b76SchristosI need to scan if-then-else blocks and while loops 56653c3a7b76Schristos================================================== 56663c3a7b76Schristos 56673c3a7b76Schristos To: "Mike Stolnicki" <mstolnic@ford.com> 56683c3a7b76Schristos Subject: Re: FLEX help 56693c3a7b76Schristos In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT. 56703c3a7b76Schristos Date: Fri, 30 May 1997 10:46:35 PDT 56713c3a7b76Schristos From: Vern Paxson <vern> 56723c3a7b76Schristos 56733c3a7b76Schristos > We'd like to add "if-then-else", "while", and "for" statements to our 56743c3a7b76Schristos > language ... 56753c3a7b76Schristos > We've investigated many possible solutions. The one solution that seems 56763c3a7b76Schristos > the most reasonable involves knowing the position of a TOKEN in yyin. 56773c3a7b76Schristos 56783c3a7b76Schristos I strongly advise you to instead build a parse tree (abstract syntax tree) 56793c3a7b76Schristos and loop over that instead. You'll find this has major benefits in keeping 56803c3a7b76Schristos your interpreter simple and extensible. 56813c3a7b76Schristos 56823c3a7b76Schristos That said, the functionality you mention for get_position and set_position 56833c3a7b76Schristos have been on the to-do list for a while. As flex is a purely spare-time 56843c3a7b76Schristos project for me, no guarantees when this will be added (in particular, it 56853c3a7b76Schristos for sure won't be for many months to come). 56863c3a7b76Schristos 56873c3a7b76Schristos Vern 56883c3a7b76Schristos 56893c3a7b76Schristos 56903c3a7b76SchristosFile: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ 56913c3a7b76Schristos 56923c3a7b76SchristosERASEME55 56933c3a7b76Schristos========= 56943c3a7b76Schristos 56953c3a7b76Schristos To: Colin Paul Adams <colin@colina.demon.co.uk> 56963c3a7b76Schristos Subject: Re: Flex C++ classes and Bison 56973c3a7b76Schristos In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT. 56983c3a7b76Schristos Date: Fri, 15 Aug 1997 10:48:19 PDT 56993c3a7b76Schristos From: Vern Paxson <vern> 57003c3a7b76Schristos 57013c3a7b76Schristos > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control 57023c3a7b76Schristos > *parm) 57033c3a7b76Schristos > 57043c3a7b76Schristos > I have been trying to get this to work as a C++ scanner, but it does 57053c3a7b76Schristos > not appear to be possible (warning that it matches no declarations in 57063c3a7b76Schristos > yyFlexLexer, or something like that). 57073c3a7b76Schristos > 57083c3a7b76Schristos > Is this supposed to be possible, or is it being worked on (I DID 57093c3a7b76Schristos > notice the comment that scanner classes are still experimental, so I'm 57103c3a7b76Schristos > not too hopeful)? 57113c3a7b76Schristos 57123c3a7b76Schristos What you need to do is derive a subclass from yyFlexLexer that provides 57133c3a7b76Schristos the above yylex() method, squirrels away lvalp and parm into member 57143c3a7b76Schristos variables, and then invokes yyFlexLexer::yylex() to do the regular scanning. 57153c3a7b76Schristos 57163c3a7b76Schristos Vern 57173c3a7b76Schristos 57183c3a7b76Schristos 57193c3a7b76SchristosFile: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ 57203c3a7b76Schristos 57213c3a7b76SchristosERASEME56 57223c3a7b76Schristos========= 57233c3a7b76Schristos 57243c3a7b76Schristos To: Mikael.Latvala@lmf.ericsson.se 57253c3a7b76Schristos Subject: Re: Possible mistake in Flex v2.5 document 57263c3a7b76Schristos In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT. 57273c3a7b76Schristos Date: Fri, 05 Sep 1997 10:01:54 PDT 57283c3a7b76Schristos From: Vern Paxson <vern> 57293c3a7b76Schristos 57303c3a7b76Schristos > In that example you show how to count comment lines when using 57313c3a7b76Schristos > C style /* ... */ comments. My question is, shouldn't you take into 57323c3a7b76Schristos > account a scenario where end of a comment marker occurs inside 57333c3a7b76Schristos > character or string literals? 57343c3a7b76Schristos 57353c3a7b76Schristos The scanner certainly needs to also scan character and string literals. 57363c3a7b76Schristos However it does that (there's an example in the man page for strings), the 57373c3a7b76Schristos lexer will recognize the beginning of the literal before it runs across the 57383c3a7b76Schristos embedded "/*". Consequently, it will finish scanning the literal before it 57393c3a7b76Schristos even considers the possibility of matching "/*". 57403c3a7b76Schristos 57413c3a7b76Schristos Example: 57423c3a7b76Schristos 57433c3a7b76Schristos '([^']*|{ESCAPE_SEQUENCE})' 57443c3a7b76Schristos 57453c3a7b76Schristos will match all the text between the ''s (inclusive). So the lexer 57463c3a7b76Schristos considers this as a token beginning at the first ', and doesn't even 57473c3a7b76Schristos attempt to match other tokens inside it. 57483c3a7b76Schristos 57493c3a7b76Schristos I thinnk this subtlety is not worth putting in the manual, as I suspect 57503c3a7b76Schristos it would confuse more people than it would enlighten. 57513c3a7b76Schristos 57523c3a7b76Schristos Vern 57533c3a7b76Schristos 57543c3a7b76Schristos 57553c3a7b76SchristosFile: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ 57563c3a7b76Schristos 57573c3a7b76SchristosERASEME57 57583c3a7b76Schristos========= 57593c3a7b76Schristos 57603c3a7b76Schristos To: "Marty Leisner" <leisner@sdsp.mc.xerox.com> 57613c3a7b76Schristos Subject: Re: flex limitations 57623c3a7b76Schristos In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT. 57633c3a7b76Schristos Date: Mon, 08 Sep 1997 11:38:08 PDT 57643c3a7b76Schristos From: Vern Paxson <vern> 57653c3a7b76Schristos 57663c3a7b76Schristos > %% 57673c3a7b76Schristos > [a-zA-Z]+ /* skip a line */ 57683c3a7b76Schristos > { printf("got %s\n", yytext); } 57693c3a7b76Schristos > %% 57703c3a7b76Schristos 57713c3a7b76Schristos What version of flex are you using? If I feed this to 2.5.4, it complains: 57723c3a7b76Schristos 57733c3a7b76Schristos "bug.l", line 5: EOF encountered inside an action 57743c3a7b76Schristos "bug.l", line 5: unrecognized rule 57753c3a7b76Schristos "bug.l", line 5: fatal parse error 57763c3a7b76Schristos 57773c3a7b76Schristos Not the world's greatest error message, but it manages to flag the problem. 57783c3a7b76Schristos 57793c3a7b76Schristos (With the introduction of start condition scopes, flex can't accommodate 57803c3a7b76Schristos an action on a separate line, since it's ambiguous with an indented rule.) 57813c3a7b76Schristos 57823c3a7b76Schristos You can get 2.5.4 from ftp.ee.lbl.gov. 57833c3a7b76Schristos 57843c3a7b76Schristos Vern 57853c3a7b76Schristos 57863c3a7b76Schristos 57873c3a7b76SchristosFile: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ 57883c3a7b76Schristos 57893c3a7b76SchristosIs there a repository for flex scanners? 57903c3a7b76Schristos======================================== 57913c3a7b76Schristos 57923c3a7b76SchristosNot that we know of. You might try asking on comp.compilers. 57933c3a7b76Schristos 57943c3a7b76Schristos 57953c3a7b76SchristosFile: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ 57963c3a7b76Schristos 57973c3a7b76SchristosHow can I conditionally compile or preprocess my flex input file? 57983c3a7b76Schristos================================================================= 57993c3a7b76Schristos 58003c3a7b76SchristosFlex doesn't have a preprocessor like C does. You might try using m4, 58013c3a7b76Schristosor the C preprocessor plus a sed script to clean up the result. 58023c3a7b76Schristos 58033c3a7b76Schristos 58043c3a7b76SchristosFile: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ 58053c3a7b76Schristos 58063c3a7b76SchristosWhere can I find grammars for lex and yacc? 58073c3a7b76Schristos=========================================== 58083c3a7b76Schristos 58093c3a7b76SchristosIn the sources for flex and bison. 58103c3a7b76Schristos 58113c3a7b76Schristos 58123c3a7b76SchristosFile: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ 58133c3a7b76Schristos 58143c3a7b76SchristosI get an end-of-buffer message for each character scanned. 58153c3a7b76Schristos========================================================== 58163c3a7b76Schristos 58173c3a7b76SchristosThis will happen if your LexerInput() function returns only one 58183c3a7b76Schristoscharacter at a time, which can happen either if you're scanner is 581930da1778Schristos"interactive", or if the streams library on your platform always returns 582030da1778Schristos1 for yyin->gcount(). 58213c3a7b76Schristos 58223c3a7b76Schristos Solution: override LexerInput() with a version that returns whole 58233c3a7b76Schristosbuffers. 58243c3a7b76Schristos 58253c3a7b76Schristos 58263c3a7b76SchristosFile: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ 58273c3a7b76Schristos 58283c3a7b76Schristosunnamed-faq-62 58293c3a7b76Schristos============== 58303c3a7b76Schristos 58313c3a7b76Schristos To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 58323c3a7b76Schristos Subject: Re: Flex maximums 58333c3a7b76Schristos In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST. 58343c3a7b76Schristos Date: Mon, 17 Nov 1997 17:16:15 PST 58353c3a7b76Schristos From: Vern Paxson <vern> 58363c3a7b76Schristos 58373c3a7b76Schristos > I took a quick look into the flex-sources and altered some #defines in 58383c3a7b76Schristos > flexdefs.h: 58393c3a7b76Schristos > 58403c3a7b76Schristos > #define INITIAL_MNS 64000 58413c3a7b76Schristos > #define MNS_INCREMENT 1024000 58423c3a7b76Schristos > #define MAXIMUM_MNS 64000 58433c3a7b76Schristos 58443c3a7b76Schristos The things to fix are to add a couple of zeroes to: 58453c3a7b76Schristos 58463c3a7b76Schristos #define JAMSTATE -32766 /* marks a reference to the state that always jams */ 58473c3a7b76Schristos #define MAXIMUM_MNS 31999 58483c3a7b76Schristos #define BAD_SUBSCRIPT -32767 58493c3a7b76Schristos #define MAX_SHORT 32700 58503c3a7b76Schristos 58513c3a7b76Schristos and, if you get complaints about too many rules, make the following change too: 58523c3a7b76Schristos 58533c3a7b76Schristos #define YY_TRAILING_MASK 0x200000 58543c3a7b76Schristos #define YY_TRAILING_HEAD_MASK 0x400000 58553c3a7b76Schristos 58563c3a7b76Schristos - Vern 58573c3a7b76Schristos 58583c3a7b76Schristos 58593c3a7b76SchristosFile: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ 58603c3a7b76Schristos 58613c3a7b76Schristosunnamed-faq-63 58623c3a7b76Schristos============== 58633c3a7b76Schristos 58643c3a7b76Schristos To: jimmey@lexis-nexis.com (Jimmey Todd) 58653c3a7b76Schristos Subject: Re: FLEX question regarding istream vs ifstream 58663c3a7b76Schristos In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST. 58673c3a7b76Schristos Date: Mon, 15 Dec 1997 13:21:35 PST 58683c3a7b76Schristos From: Vern Paxson <vern> 58693c3a7b76Schristos 58703c3a7b76Schristos > stdin_handle = YY_CURRENT_BUFFER; 58713c3a7b76Schristos > ifstream fin( "aFile" ); 58723c3a7b76Schristos > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) ); 58733c3a7b76Schristos > 58743c3a7b76Schristos > What I'm wanting to do, is pass the contents of a file thru one set 58753c3a7b76Schristos > of rules and then pass stdin thru another set... It works great if, I 58763c3a7b76Schristos > don't use the C++ classes. But since everything else that I'm doing is 58773c3a7b76Schristos > in C++, I thought I'd be consistent. 58783c3a7b76Schristos > 58793c3a7b76Schristos > The problem is that 'yy_create_buffer' is expecting an istream* as it's 58803c3a7b76Schristos > first argument (as stated in the man page). However, fin is a ifstream 58813c3a7b76Schristos > object. Any ideas on what I might be doing wrong? Any help would be 58823c3a7b76Schristos > appreciated. Thanks!! 58833c3a7b76Schristos 58843c3a7b76Schristos You need to pass &fin, to turn it into an ifstream* instead of an ifstream. 58853c3a7b76Schristos Then its type will be compatible with the expected istream*, because ifstream 58863c3a7b76Schristos is derived from istream. 58873c3a7b76Schristos 58883c3a7b76Schristos Vern 58893c3a7b76Schristos 58903c3a7b76Schristos 58913c3a7b76SchristosFile: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ 58923c3a7b76Schristos 58933c3a7b76Schristosunnamed-faq-64 58943c3a7b76Schristos============== 58953c3a7b76Schristos 58963c3a7b76Schristos To: Enda Fadian <fadiane@piercom.ie> 58973c3a7b76Schristos Subject: Re: Question related to Flex man page? 58983c3a7b76Schristos In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST. 58993c3a7b76Schristos Date: Tue, 16 Dec 1997 14:17:09 PST 59003c3a7b76Schristos From: Vern Paxson <vern> 59013c3a7b76Schristos 59023c3a7b76Schristos > Can you explain to me what is ment by a long-jump in relation to flex? 59033c3a7b76Schristos 59043c3a7b76Schristos Using the longjmp() function while inside yylex() or a routine called by it. 59053c3a7b76Schristos 59063c3a7b76Schristos > what is the flex activation frame. 59073c3a7b76Schristos 59083c3a7b76Schristos Just yylex()'s stack frame. 59093c3a7b76Schristos 59103c3a7b76Schristos > As far as I can see yyrestart will bring me back to the sart of the input 59113c3a7b76Schristos > file and using flex++ isnot really an option! 59123c3a7b76Schristos 59133c3a7b76Schristos No, yyrestart() doesn't imply a rewind, even though its name might sound 59143c3a7b76Schristos like it does. It tells the scanner to flush its internal buffers and 59153c3a7b76Schristos start reading from the given file at its present location. 59163c3a7b76Schristos 59173c3a7b76Schristos Vern 59183c3a7b76Schristos 59193c3a7b76Schristos 59203c3a7b76SchristosFile: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ 59213c3a7b76Schristos 59223c3a7b76Schristosunnamed-faq-65 59233c3a7b76Schristos============== 59243c3a7b76Schristos 59253c3a7b76Schristos To: hassan@larc.info.uqam.ca (Hassan Alaoui) 59263c3a7b76Schristos Subject: Re: Need urgent Help 59273c3a7b76Schristos In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST. 59283c3a7b76Schristos Date: Sun, 21 Dec 1997 21:30:46 PST 59293c3a7b76Schristos From: Vern Paxson <vern> 59303c3a7b76Schristos 59313c3a7b76Schristos > /usr/lib/yaccpar: In function `int yyparse()': 59323c3a7b76Schristos > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)' 59333c3a7b76Schristos > 59343c3a7b76Schristos > ld: Undefined symbol 59353c3a7b76Schristos > _yylex 59363c3a7b76Schristos > _yyparse 59373c3a7b76Schristos > _yyin 59383c3a7b76Schristos 59393c3a7b76Schristos This is a known problem with Solaris C++ (and/or Solaris yacc). I believe 59403c3a7b76Schristos the fix is to explicitly insert some 'extern "C"' statements for the 59413c3a7b76Schristos corresponding routines/symbols. 59423c3a7b76Schristos 59433c3a7b76Schristos Vern 59443c3a7b76Schristos 59453c3a7b76Schristos 59463c3a7b76SchristosFile: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ 59473c3a7b76Schristos 59483c3a7b76Schristosunnamed-faq-66 59493c3a7b76Schristos============== 59503c3a7b76Schristos 59513c3a7b76Schristos To: mc0307@mclink.it 59523c3a7b76Schristos Cc: gnu@prep.ai.mit.edu 59533c3a7b76Schristos Subject: Re: [mc0307@mclink.it: Help request] 59543c3a7b76Schristos In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST. 59553c3a7b76Schristos Date: Sun, 21 Dec 1997 22:33:37 PST 59563c3a7b76Schristos From: Vern Paxson <vern> 59573c3a7b76Schristos 59583c3a7b76Schristos > This is my definition for float and integer types: 59593c3a7b76Schristos > . . . 59603c3a7b76Schristos > NZD [1-9] 59613c3a7b76Schristos > ... 59623c3a7b76Schristos > I've tested my program on other lex version (on UNIX Sun Solaris an HP 59633c3a7b76Schristos > UNIX) and it work well, so I think that my definitions are correct. 59643c3a7b76Schristos > There are any differences between Lex and Flex? 59653c3a7b76Schristos 59663c3a7b76Schristos There are indeed differences, as discussed in the man page. The one 59673c3a7b76Schristos you are probably running into is that when flex expands a name definition, 59683c3a7b76Schristos it puts parentheses around the expansion, while lex does not. There's 59693c3a7b76Schristos an example in the man page of how this can lead to different matching. 59703c3a7b76Schristos Flex's behavior complies with the POSIX standard (or at least with the 59713c3a7b76Schristos last POSIX draft I saw). 59723c3a7b76Schristos 59733c3a7b76Schristos Vern 59743c3a7b76Schristos 59753c3a7b76Schristos 59763c3a7b76SchristosFile: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ 59773c3a7b76Schristos 59783c3a7b76Schristosunnamed-faq-67 59793c3a7b76Schristos============== 59803c3a7b76Schristos 59813c3a7b76Schristos To: hassan@larc.info.uqam.ca (Hassan Alaoui) 59823c3a7b76Schristos Subject: Re: Thanks 59833c3a7b76Schristos In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST. 59843c3a7b76Schristos Date: Mon, 22 Dec 1997 14:35:05 PST 59853c3a7b76Schristos From: Vern Paxson <vern> 59863c3a7b76Schristos 59873c3a7b76Schristos > Thank you very much for your help. I compile and link well with C++ while 59883c3a7b76Schristos > declaring 'yylex ...' extern, But a little problem remains. I get a 59893c3a7b76Schristos > segmentation default when executing ( I linked with lfl library) while it 59903c3a7b76Schristos > works well when using LEX instead of flex. Do you have some ideas about the 59913c3a7b76Schristos > reason for this ? 59923c3a7b76Schristos 59933c3a7b76Schristos The one possible reason for this that comes to mind is if you've defined 59943c3a7b76Schristos yytext as "extern char yytext[]" (which is what lex uses) instead of 59953c3a7b76Schristos "extern char *yytext" (which is what flex uses). If it's not that, then 59963c3a7b76Schristos I'm afraid I don't know what the problem might be. 59973c3a7b76Schristos 59983c3a7b76Schristos Vern 59993c3a7b76Schristos 60003c3a7b76Schristos 60013c3a7b76SchristosFile: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ 60023c3a7b76Schristos 60033c3a7b76Schristosunnamed-faq-68 60043c3a7b76Schristos============== 60053c3a7b76Schristos 60063c3a7b76Schristos To: "Bart Niswonger" <NISWONGR@almaden.ibm.com> 60073c3a7b76Schristos Subject: Re: flex 2.5: c++ scanners & start conditions 60083c3a7b76Schristos In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST. 60093c3a7b76Schristos Date: Tue, 06 Jan 1998 19:19:30 PST 60103c3a7b76Schristos From: Vern Paxson <vern> 60113c3a7b76Schristos 60123c3a7b76Schristos > The problem is that when I do this (using %option c++) start 60133c3a7b76Schristos > conditions seem to not apply. 60143c3a7b76Schristos 60153c3a7b76Schristos The BEGIN macro modifies the yy_start variable. For C scanners, this 60163c3a7b76Schristos is a static with scope visible through the whole file. For C++ scanners, 60173c3a7b76Schristos it's a member variable, so it only has visible scope within a member 60183c3a7b76Schristos function. Your lexbegin() routine is not a member function when you 60193c3a7b76Schristos build a C++ scanner, so it's not modifying the correct yy_start. The 60203c3a7b76Schristos diagnostic that indicates this is that you found you needed to add 60213c3a7b76Schristos a declaration of yy_start in order to get your scanner to compile when 60223c3a7b76Schristos using C++; instead, the correct fix is to make lexbegin() a member 60233c3a7b76Schristos function (by deriving from yyFlexLexer). 60243c3a7b76Schristos 60253c3a7b76Schristos Vern 60263c3a7b76Schristos 60273c3a7b76Schristos 60283c3a7b76SchristosFile: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ 60293c3a7b76Schristos 60303c3a7b76Schristosunnamed-faq-69 60313c3a7b76Schristos============== 60323c3a7b76Schristos 60333c3a7b76Schristos To: "Boris Zinin" <boris@ippe.rssi.ru> 60343c3a7b76Schristos Subject: Re: current position in flex buffer 60353c3a7b76Schristos In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST. 60363c3a7b76Schristos Date: Mon, 12 Jan 1998 12:03:15 PST 60373c3a7b76Schristos From: Vern Paxson <vern> 60383c3a7b76Schristos 60393c3a7b76Schristos > The problem is how to determine the current position in flex active 60403c3a7b76Schristos > buffer when a rule is matched.... 60413c3a7b76Schristos 60423c3a7b76Schristos You will need to keep track of this explicitly, such as by redefining 60433c3a7b76Schristos YY_USER_ACTION to count the number of characters matched. 60443c3a7b76Schristos 60453c3a7b76Schristos The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov. 60463c3a7b76Schristos 60473c3a7b76Schristos Vern 60483c3a7b76Schristos 60493c3a7b76Schristos 60503c3a7b76SchristosFile: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ 60513c3a7b76Schristos 60523c3a7b76Schristosunnamed-faq-70 60533c3a7b76Schristos============== 60543c3a7b76Schristos 60553c3a7b76Schristos To: Bik.Dhaliwal@bis.org 60563c3a7b76Schristos Subject: Re: Flex question 60573c3a7b76Schristos In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST. 60583c3a7b76Schristos Date: Tue, 27 Jan 1998 22:41:52 PST 60593c3a7b76Schristos From: Vern Paxson <vern> 60603c3a7b76Schristos 60613c3a7b76Schristos > That requirement involves knowing 60623c3a7b76Schristos > the character position at which a particular token was matched 60633c3a7b76Schristos > in the lexer. 60643c3a7b76Schristos 60653c3a7b76Schristos The way you have to do this is by explicitly keeping track of where 60663c3a7b76Schristos you are in the file, by counting the number of characters scanned 60673c3a7b76Schristos for each token (available in yyleng). It may prove convenient to 60683c3a7b76Schristos do this by redefining YY_USER_ACTION, as described in the manual. 60693c3a7b76Schristos 60703c3a7b76Schristos Vern 60713c3a7b76Schristos 60723c3a7b76Schristos 60733c3a7b76SchristosFile: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ 60743c3a7b76Schristos 60753c3a7b76Schristosunnamed-faq-71 60763c3a7b76Schristos============== 60773c3a7b76Schristos 60783c3a7b76Schristos To: Vladimir Alexiev <vladimir@cs.ualberta.ca> 60793c3a7b76Schristos Subject: Re: flex: how to control start condition from parser? 60803c3a7b76Schristos In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST. 60813c3a7b76Schristos Date: Tue, 27 Jan 1998 22:45:37 PST 60823c3a7b76Schristos From: Vern Paxson <vern> 60833c3a7b76Schristos 60843c3a7b76Schristos > It seems useful for the parser to be able to tell the lexer about such 60853c3a7b76Schristos > context dependencies, because then they don't have to be limited to 60863c3a7b76Schristos > local or sequential context. 60873c3a7b76Schristos 60883c3a7b76Schristos One way to do this is to have the parser call a stub routine that's 60893c3a7b76Schristos included in the scanner's .l file, and consequently that has access ot 60903c3a7b76Schristos BEGIN. The only ugliness is that the parser can't pass in the state 60913c3a7b76Schristos it wants, because those aren't visible - but if you don't have many 60923c3a7b76Schristos such states, then using a different set of names doesn't seem like 60933c3a7b76Schristos to much of a burden. 60943c3a7b76Schristos 60953c3a7b76Schristos While generating a .h file like you suggests is certainly cleaner, 60963c3a7b76Schristos flex development has come to a virtual stand-still :-(, so a workaround 60973c3a7b76Schristos like the above is much more pragmatic than waiting for a new feature. 60983c3a7b76Schristos 60993c3a7b76Schristos Vern 61003c3a7b76Schristos 61013c3a7b76Schristos 61023c3a7b76SchristosFile: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ 61033c3a7b76Schristos 61043c3a7b76Schristosunnamed-faq-72 61053c3a7b76Schristos============== 61063c3a7b76Schristos 61073c3a7b76Schristos To: Barbara Denny <denny@3com.com> 61083c3a7b76Schristos Subject: Re: freebsd flex bug? 61093c3a7b76Schristos In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST. 61103c3a7b76Schristos Date: Fri, 30 Jan 1998 12:42:32 PST 61113c3a7b76Schristos From: Vern Paxson <vern> 61123c3a7b76Schristos 61133c3a7b76Schristos > lex.yy.c:1996: parse error before `=' 61143c3a7b76Schristos 61153c3a7b76Schristos This is the key, identifying this error. (It may help to pinpoint 61163c3a7b76Schristos it by using flex -L, so it doesn't generate #line directives in its 61173c3a7b76Schristos output.) I will bet you heavy money that you have a start condition 61183c3a7b76Schristos name that is also a variable name, or something like that; flex spits 61193c3a7b76Schristos out #define's for each start condition name, mapping them to a number, 61203c3a7b76Schristos so you can wind up with: 61213c3a7b76Schristos 61223c3a7b76Schristos %x foo 61233c3a7b76Schristos %% 61243c3a7b76Schristos ... 61253c3a7b76Schristos %% 61263c3a7b76Schristos void bar() 61273c3a7b76Schristos { 61283c3a7b76Schristos int foo = 3; 61293c3a7b76Schristos } 61303c3a7b76Schristos 61313c3a7b76Schristos and the penultimate will turn into "int 1 = 3" after C preprocessing, 61323c3a7b76Schristos since flex will put "#define foo 1" in the generated scanner. 61333c3a7b76Schristos 61343c3a7b76Schristos Vern 61353c3a7b76Schristos 61363c3a7b76Schristos 61373c3a7b76SchristosFile: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ 61383c3a7b76Schristos 61393c3a7b76Schristosunnamed-faq-73 61403c3a7b76Schristos============== 61413c3a7b76Schristos 61423c3a7b76Schristos To: Maurice Petrie <mpetrie@infoscigroup.com> 61433c3a7b76Schristos Subject: Re: Lost flex .l file 61443c3a7b76Schristos In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST. 61453c3a7b76Schristos Date: Mon, 02 Feb 1998 11:15:12 PST 61463c3a7b76Schristos From: Vern Paxson <vern> 61473c3a7b76Schristos 61483c3a7b76Schristos > I am curious as to 61493c3a7b76Schristos > whether there is a simple way to backtrack from the generated source to 61503c3a7b76Schristos > reproduce the lost list of tokens we are searching on. 61513c3a7b76Schristos 61523c3a7b76Schristos In theory, it's straight-forward to go from the DFA representation 61533c3a7b76Schristos back to a regular-expression representation - the two are isomorphic. 61543c3a7b76Schristos In practice, a huge headache, because you have to unpack all the tables 61553c3a7b76Schristos back into a single DFA representation, and then write a program to munch 61563c3a7b76Schristos on that and translate it into an RE. 61573c3a7b76Schristos 61583c3a7b76Schristos Sorry for the less-than-happy news ... 61593c3a7b76Schristos 61603c3a7b76Schristos Vern 61613c3a7b76Schristos 61623c3a7b76Schristos 61633c3a7b76SchristosFile: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ 61643c3a7b76Schristos 61653c3a7b76Schristosunnamed-faq-74 61663c3a7b76Schristos============== 61673c3a7b76Schristos 61683c3a7b76Schristos To: jimmey@lexis-nexis.com (Jimmey Todd) 61693c3a7b76Schristos Subject: Re: Flex performance question 61703c3a7b76Schristos In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. 61713c3a7b76Schristos Date: Thu, 19 Feb 1998 08:48:51 PST 61723c3a7b76Schristos From: Vern Paxson <vern> 61733c3a7b76Schristos 61743c3a7b76Schristos > What I have found, is that the smaller the data chunk, the faster the 61753c3a7b76Schristos > program executes. This is the opposite of what I expected. Should this be 61763c3a7b76Schristos > happening this way? 61773c3a7b76Schristos 61783c3a7b76Schristos This is exactly what will happen if your input file has embedded NULs. 61793c3a7b76Schristos From the man page: 61803c3a7b76Schristos 61813c3a7b76Schristos A final note: flex is slow when matching NUL's, particularly 61823c3a7b76Schristos when a token contains multiple NUL's. It's best to write 61833c3a7b76Schristos rules which match short amounts of text if it's anticipated 61843c3a7b76Schristos that the text will often include NUL's. 61853c3a7b76Schristos 61863c3a7b76Schristos So that's the first thing to look for. 61873c3a7b76Schristos 61883c3a7b76Schristos Vern 61893c3a7b76Schristos 61903c3a7b76Schristos 61913c3a7b76SchristosFile: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ 61923c3a7b76Schristos 61933c3a7b76Schristosunnamed-faq-75 61943c3a7b76Schristos============== 61953c3a7b76Schristos 61963c3a7b76Schristos To: jimmey@lexis-nexis.com (Jimmey Todd) 61973c3a7b76Schristos Subject: Re: Flex performance question 61983c3a7b76Schristos In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. 61993c3a7b76Schristos Date: Thu, 19 Feb 1998 15:42:25 PST 62003c3a7b76Schristos From: Vern Paxson <vern> 62013c3a7b76Schristos 62023c3a7b76Schristos So there are several problems. 62033c3a7b76Schristos 62043c3a7b76Schristos First, to go fast, you want to match as much text as possible, which 62053c3a7b76Schristos your scanners don't in the case that what they're scanning is *not* 62063c3a7b76Schristos a <RN> tag. So you want a rule like: 62073c3a7b76Schristos 62083c3a7b76Schristos [^<]+ 62093c3a7b76Schristos 62103c3a7b76Schristos Second, C++ scanners are particularly slow if they're interactive, 62113c3a7b76Schristos which they are by default. Using -B speeds it up by a factor of 3-4 62123c3a7b76Schristos on my workstation. 62133c3a7b76Schristos 62143c3a7b76Schristos Third, C++ scanners that use the istream interface are slow, because 62153c3a7b76Schristos of how poorly implemented istream's are. I built two versions of 62163c3a7b76Schristos the following scanner: 62173c3a7b76Schristos 62183c3a7b76Schristos %% 62193c3a7b76Schristos .*\n 62203c3a7b76Schristos .* 62213c3a7b76Schristos %% 62223c3a7b76Schristos 62233c3a7b76Schristos and the C version inhales a 2.5MB file on my workstation in 0.8 seconds. 62243c3a7b76Schristos The C++ istream version, using -B, takes 3.8 seconds. 62253c3a7b76Schristos 62263c3a7b76Schristos Vern 62273c3a7b76Schristos 62283c3a7b76Schristos 62293c3a7b76SchristosFile: flex.info, Node: unnamed-faq-76, Next: unnamed-faq-77, Prev: unnamed-faq-75, Up: FAQ 62303c3a7b76Schristos 62313c3a7b76Schristosunnamed-faq-76 62323c3a7b76Schristos============== 62333c3a7b76Schristos 62343c3a7b76Schristos To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com> 62353c3a7b76Schristos Subject: Re: FLEX 2.5 & THE YEAR 2000 62363c3a7b76Schristos In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT. 62373c3a7b76Schristos Date: Wed, 03 Jun 1998 10:22:26 PDT 62383c3a7b76Schristos From: Vern Paxson <vern> 62393c3a7b76Schristos 62403c3a7b76Schristos > I am researching the Y2K problem with General Electric R&D 62413c3a7b76Schristos > and need to know if there are any known issues concerning 62423c3a7b76Schristos > the above mentioned software and Y2K regardless of version. 62433c3a7b76Schristos 62443c3a7b76Schristos There shouldn't be, all it ever does with the date is ask the system 62453c3a7b76Schristos for it and then print it out. 62463c3a7b76Schristos 62473c3a7b76Schristos Vern 62483c3a7b76Schristos 62493c3a7b76Schristos 62503c3a7b76SchristosFile: flex.info, Node: unnamed-faq-77, Next: unnamed-faq-78, Prev: unnamed-faq-76, Up: FAQ 62513c3a7b76Schristos 62523c3a7b76Schristosunnamed-faq-77 62533c3a7b76Schristos============== 62543c3a7b76Schristos 62553c3a7b76Schristos To: "Hans Dermot Doran" <htd@ibhdoran.com> 62563c3a7b76Schristos Subject: Re: flex problem 62573c3a7b76Schristos In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT. 62583c3a7b76Schristos Date: Tue, 21 Jul 1998 14:23:34 PDT 62593c3a7b76Schristos From: Vern Paxson <vern> 62603c3a7b76Schristos 62613c3a7b76Schristos > To overcome this, I gets() the stdin into a string and lex the string. The 62623c3a7b76Schristos > string is lexed OK except that the end of string isn't lexed properly 62633c3a7b76Schristos > (yy_scan_string()), that is the lexer dosn't recognise the end of string. 62643c3a7b76Schristos 62653c3a7b76Schristos Flex doesn't contain mechanisms for recognizing buffer endpoints. But if 62663c3a7b76Schristos you use fgets instead (which you should anyway, to protect against buffer 62673c3a7b76Schristos overflows), then the final \n will be preserved in the string, and you can 62683c3a7b76Schristos scan that in order to find the end of the string. 62693c3a7b76Schristos 62703c3a7b76Schristos Vern 62713c3a7b76Schristos 62723c3a7b76Schristos 62733c3a7b76SchristosFile: flex.info, Node: unnamed-faq-78, Next: unnamed-faq-79, Prev: unnamed-faq-77, Up: FAQ 62743c3a7b76Schristos 62753c3a7b76Schristosunnamed-faq-78 62763c3a7b76Schristos============== 62773c3a7b76Schristos 62783c3a7b76Schristos To: soumen@almaden.ibm.com 62793c3a7b76Schristos Subject: Re: Flex++ 2.5.3 instance member vs. static member 62803c3a7b76Schristos In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT. 62813c3a7b76Schristos Date: Tue, 28 Jul 1998 01:10:34 PDT 62823c3a7b76Schristos From: Vern Paxson <vern> 62833c3a7b76Schristos 62843c3a7b76Schristos > %{ 62853c3a7b76Schristos > int mylineno = 0; 62863c3a7b76Schristos > %} 62873c3a7b76Schristos > ws [ \t]+ 62883c3a7b76Schristos > alpha [A-Za-z] 62893c3a7b76Schristos > dig [0-9] 62903c3a7b76Schristos > %% 62913c3a7b76Schristos > 62923c3a7b76Schristos > Now you'd expect mylineno to be a member of each instance of class 62933c3a7b76Schristos > yyFlexLexer, but is this the case? A look at the lex.yy.cc file seems to 62943c3a7b76Schristos > indicate otherwise; unless I am missing something the declaration of 62953c3a7b76Schristos > mylineno seems to be outside any class scope. 62963c3a7b76Schristos > 62973c3a7b76Schristos > How will this work if I want to run a multi-threaded application with each 62983c3a7b76Schristos > thread creating a FlexLexer instance? 62993c3a7b76Schristos 63003c3a7b76Schristos Derive your own subclass and make mylineno a member variable of it. 63013c3a7b76Schristos 63023c3a7b76Schristos Vern 63033c3a7b76Schristos 63043c3a7b76Schristos 63053c3a7b76SchristosFile: flex.info, Node: unnamed-faq-79, Next: unnamed-faq-80, Prev: unnamed-faq-78, Up: FAQ 63063c3a7b76Schristos 63073c3a7b76Schristosunnamed-faq-79 63083c3a7b76Schristos============== 63093c3a7b76Schristos 63103c3a7b76Schristos To: Adoram Rogel <adoram@hybridge.com> 63113c3a7b76Schristos Subject: Re: More than 32K states change hangs 63123c3a7b76Schristos In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT. 63133c3a7b76Schristos Date: Tue, 04 Aug 1998 22:28:45 PDT 63143c3a7b76Schristos From: Vern Paxson <vern> 63153c3a7b76Schristos 63163c3a7b76Schristos > Vern Paxson, 63173c3a7b76Schristos > 63183c3a7b76Schristos > I followed your advice, posted on Usenet bu you, and emailed to me 63193c3a7b76Schristos > personally by you, on how to overcome the 32K states limit. I'm running 63203c3a7b76Schristos > on Linux machines. 63213c3a7b76Schristos > I took the full source of version 2.5.4 and did the following changes in 63223c3a7b76Schristos > flexdef.h: 63233c3a7b76Schristos > #define JAMSTATE -327660 63243c3a7b76Schristos > #define MAXIMUM_MNS 319990 63253c3a7b76Schristos > #define BAD_SUBSCRIPT -327670 63263c3a7b76Schristos > #define MAX_SHORT 327000 63273c3a7b76Schristos > 63283c3a7b76Schristos > and compiled. 63293c3a7b76Schristos > All looked fine, including check and bigcheck, so I installed. 63303c3a7b76Schristos 63313c3a7b76Schristos Hmmm, you shouldn't increase MAX_SHORT, though looking through my email 63323c3a7b76Schristos archives I see that I did indeed recommend doing so. Try setting it back 63333c3a7b76Schristos to 32700; that should suffice that you no longer need -Ca. If it still 63343c3a7b76Schristos hangs, then the interesting question is - where? 63353c3a7b76Schristos 63363c3a7b76Schristos > Compiling the same hanged program with a out-of-the-box (RedHat 4.2 63373c3a7b76Schristos > distribution of Linux) 63383c3a7b76Schristos > flex 2.5.4 binary works. 63393c3a7b76Schristos 63403c3a7b76Schristos Since Linux comes with source code, you should diff it against what 63413c3a7b76Schristos you have to see what problems they missed. 63423c3a7b76Schristos 63433c3a7b76Schristos > Should I always compile with the -Ca option now ? even short and simple 63443c3a7b76Schristos > filters ? 63453c3a7b76Schristos 63463c3a7b76Schristos No, definitely not. It's meant to be for those situations where you 63473c3a7b76Schristos absolutely must squeeze every last cycle out of your scanner. 63483c3a7b76Schristos 63493c3a7b76Schristos Vern 63503c3a7b76Schristos 63513c3a7b76Schristos 63523c3a7b76SchristosFile: flex.info, Node: unnamed-faq-80, Next: unnamed-faq-81, Prev: unnamed-faq-79, Up: FAQ 63533c3a7b76Schristos 63543c3a7b76Schristosunnamed-faq-80 63553c3a7b76Schristos============== 63563c3a7b76Schristos 63573c3a7b76Schristos To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com> 63583c3a7b76Schristos Subject: Re: flex output for static code portion 63593c3a7b76Schristos In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT. 63603c3a7b76Schristos Date: Mon, 17 Aug 1998 23:57:42 PDT 63613c3a7b76Schristos From: Vern Paxson <vern> 63623c3a7b76Schristos 63633c3a7b76Schristos > I would like to use flex under the hood to generate a binary file 63643c3a7b76Schristos > containing the data structures that control the parse. 63653c3a7b76Schristos 63663c3a7b76Schristos This has been on the wish-list for a long time. In principle it's 63673c3a7b76Schristos straight-forward - you redirect mkdata() et al's I/O to another file, 63683c3a7b76Schristos and modify the skeleton to have a start-up function that slurps these 63693c3a7b76Schristos into dynamic arrays. The concerns are (1) the scanner generation code 63703c3a7b76Schristos is hairy and full of corner cases, so it's easy to get surprised when 63713c3a7b76Schristos going down this path :-( ; and (2) being careful about buffering so 63723c3a7b76Schristos that when the tables change you make sure the scanner starts in the 63733c3a7b76Schristos correct state and reading at the right point in the input file. 63743c3a7b76Schristos 63753c3a7b76Schristos > I was wondering if you know of anyone who has used flex in this way. 63763c3a7b76Schristos 63773c3a7b76Schristos I don't - but it seems like a reasonable project to undertake (unlike 63783c3a7b76Schristos numerous other flex tweaks :-). 63793c3a7b76Schristos 63803c3a7b76Schristos Vern 63813c3a7b76Schristos 63823c3a7b76Schristos 63833c3a7b76SchristosFile: flex.info, Node: unnamed-faq-81, Next: unnamed-faq-82, Prev: unnamed-faq-80, Up: FAQ 63843c3a7b76Schristos 63853c3a7b76Schristosunnamed-faq-81 63863c3a7b76Schristos============== 63873c3a7b76Schristos 63883c3a7b76Schristos Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11]) 63893c3a7b76Schristos by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838 63903c3a7b76Schristos for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT) 63913c3a7b76Schristos Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2]) 63923c3a7b76Schristos by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694 63933c3a7b76Schristos for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200 63943c3a7b76Schristos Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200 63953c3a7b76Schristos From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de> 63963c3a7b76Schristos Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de> 63973c3a7b76Schristos Subject: "flex scanner push-back overflow" 63983c3a7b76Schristos To: vern@ee.lbl.gov 63993c3a7b76Schristos Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST) 64003c3a7b76Schristos Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 64013c3a7b76Schristos X-NoJunk: Do NOT send commercial mail, spam or ads to this address! 64023c3a7b76Schristos X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/ 64033c3a7b76Schristos X-Mailer: ELM [version 2.4ME+ PL28 (25)] 64043c3a7b76Schristos MIME-Version: 1.0 64053c3a7b76Schristos Content-Type: text/plain; charset=US-ASCII 64063c3a7b76Schristos Content-Transfer-Encoding: 7bit 64073c3a7b76Schristos 64083c3a7b76Schristos Hi Vern, 64093c3a7b76Schristos 64103c3a7b76Schristos Yesterday, I encountered a strange problem: I use the macro processor m4 64113c3a7b76Schristos to include some lengthy lists into a .l file. Following is a flex macro 64123c3a7b76Schristos definition that causes some serious pain in my neck: 64133c3a7b76Schristos 64143c3a7b76Schristos AUTHOR ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...]) 64153c3a7b76Schristos 64163c3a7b76Schristos The complete list contains about 10kB. When I try to "flex" this file 64173c3a7b76Schristos (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased 64183c3a7b76Schristos some of the predefined values in flexdefs.h) I get the error: 64193c3a7b76Schristos 64203c3a7b76Schristos myflex/flex -8 sentag.tmp.l 64213c3a7b76Schristos flex scanner push-back overflow 64223c3a7b76Schristos 64233c3a7b76Schristos When I remove the slashes in the macro definition everything works fine. 64243c3a7b76Schristos As I understand it, the double quotes escape the slash-character so it 64253c3a7b76Schristos really means "/" and not "trailing context". Furthermore, I tried to 64263c3a7b76Schristos escape the slashes with backslashes, but with no use, the same error message 64273c3a7b76Schristos appeared when flexing the code. 64283c3a7b76Schristos 64293c3a7b76Schristos Do you have an idea what's going on here? 64303c3a7b76Schristos 64313c3a7b76Schristos Greetings from Germany, 64323c3a7b76Schristos Georg 64333c3a7b76Schristos -- 64343c3a7b76Schristos Georg Rehm georg@cl-ki.uni-osnabrueck.de 64353c3a7b76Schristos Institute for Semantic Information Processing, University of Osnabrueck, FRG 64363c3a7b76Schristos 64373c3a7b76Schristos 64383c3a7b76SchristosFile: flex.info, Node: unnamed-faq-82, Next: unnamed-faq-83, Prev: unnamed-faq-81, Up: FAQ 64393c3a7b76Schristos 64403c3a7b76Schristosunnamed-faq-82 64413c3a7b76Schristos============== 64423c3a7b76Schristos 64433c3a7b76Schristos To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE 64443c3a7b76Schristos Subject: Re: "flex scanner push-back overflow" 64453c3a7b76Schristos In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT. 64463c3a7b76Schristos Date: Thu, 20 Aug 1998 07:05:35 PDT 64473c3a7b76Schristos From: Vern Paxson <vern> 64483c3a7b76Schristos 64493c3a7b76Schristos > myflex/flex -8 sentag.tmp.l 64503c3a7b76Schristos > flex scanner push-back overflow 64513c3a7b76Schristos 64523c3a7b76Schristos Flex itself uses a flex scanner. That scanner is running out of buffer 64533c3a7b76Schristos space when it tries to unput() the humongous macro you've defined. When 64543c3a7b76Schristos you remove the '/'s, you make it small enough so that it fits in the buffer; 64553c3a7b76Schristos removing spaces would do the same thing. 64563c3a7b76Schristos 64573c3a7b76Schristos The fix is to either rethink how come you're using such a big macro and 64583c3a7b76Schristos perhaps there's another/better way to do it; or to rebuild flex's own 64593c3a7b76Schristos scan.c with a larger value for 64603c3a7b76Schristos 64613c3a7b76Schristos #define YY_BUF_SIZE 16384 64623c3a7b76Schristos 64633c3a7b76Schristos - Vern 64643c3a7b76Schristos 64653c3a7b76Schristos 64663c3a7b76SchristosFile: flex.info, Node: unnamed-faq-83, Next: unnamed-faq-84, Prev: unnamed-faq-82, Up: FAQ 64673c3a7b76Schristos 64683c3a7b76Schristosunnamed-faq-83 64693c3a7b76Schristos============== 64703c3a7b76Schristos 64713c3a7b76Schristos To: Jan Kort <jan@research.techforce.nl> 64723c3a7b76Schristos Subject: Re: Flex 64733c3a7b76Schristos In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200. 64743c3a7b76Schristos Date: Sat, 05 Sep 1998 00:59:49 PDT 64753c3a7b76Schristos From: Vern Paxson <vern> 64763c3a7b76Schristos 64773c3a7b76Schristos > %% 64783c3a7b76Schristos > 64793c3a7b76Schristos > "TEST1\n" { fprintf(stderr, "TEST1\n"); yyless(5); } 64803c3a7b76Schristos > ^\n { fprintf(stderr, "empty line\n"); } 64813c3a7b76Schristos > . { } 64823c3a7b76Schristos > \n { fprintf(stderr, "new line\n"); } 64833c3a7b76Schristos > 64843c3a7b76Schristos > %% 64853c3a7b76Schristos > -- input --------------------------------------- 64863c3a7b76Schristos > TEST1 64873c3a7b76Schristos > -- output -------------------------------------- 64883c3a7b76Schristos > TEST1 64893c3a7b76Schristos > empty line 64903c3a7b76Schristos > ------------------------------------------------ 64913c3a7b76Schristos 64923c3a7b76Schristos IMHO, it's not clear whether or not this is in fact a bug. It depends 64933c3a7b76Schristos on whether you view yyless() as backing up in the input stream, or as 64943c3a7b76Schristos pushing new characters onto the beginning of the input stream. Flex 64953c3a7b76Schristos interprets it as the latter (for implementation convenience, I'll admit), 64963c3a7b76Schristos and so considers the newline as in fact matching at the beginning of a 64973c3a7b76Schristos line, as after all the last token scanned an entire line and so the 64983c3a7b76Schristos scanner is now at the beginning of a new line. 64993c3a7b76Schristos 65003c3a7b76Schristos I agree that this is counter-intuitive for yyless(), given its 65013c3a7b76Schristos functional description (it's less so for unput(), depending on whether 65023c3a7b76Schristos you're unput()'ing new text or scanned text). But I don't plan to 65033c3a7b76Schristos change it any time soon, as it's a pain to do so. Consequently, 65043c3a7b76Schristos you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak 65053c3a7b76Schristos your scanner into the behavior you desire. 65063c3a7b76Schristos 65073c3a7b76Schristos Sorry for the less-than-completely-satisfactory answer. 65083c3a7b76Schristos 65093c3a7b76Schristos Vern 65103c3a7b76Schristos 65113c3a7b76Schristos 65123c3a7b76SchristosFile: flex.info, Node: unnamed-faq-84, Next: unnamed-faq-85, Prev: unnamed-faq-83, Up: FAQ 65133c3a7b76Schristos 65143c3a7b76Schristosunnamed-faq-84 65153c3a7b76Schristos============== 65163c3a7b76Schristos 65173c3a7b76Schristos To: Patrick Krusenotto <krusenot@mac-info-link.de> 65183c3a7b76Schristos Subject: Re: Problems with restarting flex-2.5.2-generated scanner 65193c3a7b76Schristos In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT. 65203c3a7b76Schristos Date: Thu, 24 Sep 1998 23:28:43 PDT 65213c3a7b76Schristos From: Vern Paxson <vern> 65223c3a7b76Schristos 65233c3a7b76Schristos > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately 65243c3a7b76Schristos > trying to make my scanner restart with a new file after my parser stops 65253c3a7b76Schristos > with a parse error. When my compiler restarts, the parser always 65263c3a7b76Schristos > receives the token after the token (in the old file!) that caused the 65273c3a7b76Schristos > parser error. 65283c3a7b76Schristos 65293c3a7b76Schristos I suspect the problem is that your parser has read ahead in order 65303c3a7b76Schristos to attempt to resolve an ambiguity, and when it's restarted it picks 65313c3a7b76Schristos up with that token rather than reading a fresh one. If you're using 65323c3a7b76Schristos yacc, then the special "error" production can sometimes be used to 65333c3a7b76Schristos consume tokens in an attempt to get the parser into a consistent state. 65343c3a7b76Schristos 65353c3a7b76Schristos Vern 65363c3a7b76Schristos 65373c3a7b76Schristos 65383c3a7b76SchristosFile: flex.info, Node: unnamed-faq-85, Next: unnamed-faq-86, Prev: unnamed-faq-84, Up: FAQ 65393c3a7b76Schristos 65403c3a7b76Schristosunnamed-faq-85 65413c3a7b76Schristos============== 65423c3a7b76Schristos 65433c3a7b76Schristos To: Henric Jungheim <junghelh@pe-nelson.com> 65443c3a7b76Schristos Subject: Re: flex 2.5.4a 65453c3a7b76Schristos In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST. 65463c3a7b76Schristos Date: Tue, 27 Oct 1998 16:50:14 PST 65473c3a7b76Schristos From: Vern Paxson <vern> 65483c3a7b76Schristos 65493c3a7b76Schristos > This brings up a feature request: How about a command line 65503c3a7b76Schristos > option to specify the filename when reading from stdin? That way one 65513c3a7b76Schristos > doesn't need to create a temporary file in order to get the "#line" 65523c3a7b76Schristos > directives to make sense. 65533c3a7b76Schristos 65543c3a7b76Schristos Use -o combined with -t (per the man page description of -o). 65553c3a7b76Schristos 65563c3a7b76Schristos > P.S., Is there any simple way to use non-blocking IO to parse multiple 65573c3a7b76Schristos > streams? 65583c3a7b76Schristos 65593c3a7b76Schristos Simple, no. 65603c3a7b76Schristos 65613c3a7b76Schristos One approach might be to return a magic character on EWOULDBLOCK and 65623c3a7b76Schristos have a rule 65633c3a7b76Schristos 65643c3a7b76Schristos .*<magic-character> // put back .*, eat magic character 65653c3a7b76Schristos 65663c3a7b76Schristos This is off the top of my head, not sure it'll work. 65673c3a7b76Schristos 65683c3a7b76Schristos Vern 65693c3a7b76Schristos 65703c3a7b76Schristos 65713c3a7b76SchristosFile: flex.info, Node: unnamed-faq-86, Next: unnamed-faq-87, Prev: unnamed-faq-85, Up: FAQ 65723c3a7b76Schristos 65733c3a7b76Schristosunnamed-faq-86 65743c3a7b76Schristos============== 65753c3a7b76Schristos 65763c3a7b76Schristos To: "Repko, Billy D" <billy.d.repko@intel.com> 65773c3a7b76Schristos Subject: Re: Compiling scanners 65783c3a7b76Schristos In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST. 65793c3a7b76Schristos Date: Thu, 14 Jan 1999 00:25:30 PST 65803c3a7b76Schristos From: Vern Paxson <vern> 65813c3a7b76Schristos 65823c3a7b76Schristos > It appears that maybe it cannot find the lfl library. 65833c3a7b76Schristos 65843c3a7b76Schristos The Makefile in the distribution builds it, so you should have it. 65853c3a7b76Schristos It's exceedingly trivial, just a main() that calls yylex() and 65863c3a7b76Schristos a yyrap() that always returns 1. 65873c3a7b76Schristos 65883c3a7b76Schristos > %% 65893c3a7b76Schristos > \n ++num_lines; ++num_chars; 65903c3a7b76Schristos > . ++num_chars; 65913c3a7b76Schristos 65923c3a7b76Schristos You can't indent your rules like this - that's where the errors are coming 65933c3a7b76Schristos from. Flex copies indented text to the output file, it's how you do things 65943c3a7b76Schristos like 65953c3a7b76Schristos 65963c3a7b76Schristos int num_lines_seen = 0; 65973c3a7b76Schristos 65983c3a7b76Schristos to declare local variables. 65993c3a7b76Schristos 66003c3a7b76Schristos Vern 66013c3a7b76Schristos 66023c3a7b76Schristos 66033c3a7b76SchristosFile: flex.info, Node: unnamed-faq-87, Next: unnamed-faq-88, Prev: unnamed-faq-86, Up: FAQ 66043c3a7b76Schristos 66053c3a7b76Schristosunnamed-faq-87 66063c3a7b76Schristos============== 66073c3a7b76Schristos 66083c3a7b76Schristos To: Erick Branderhorst <Erick.Branderhorst@asml.nl> 66093c3a7b76Schristos Subject: Re: flex input buffer 66103c3a7b76Schristos In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST. 66113c3a7b76Schristos Date: Tue, 09 Feb 1999 21:03:37 PST 66123c3a7b76Schristos From: Vern Paxson <vern> 66133c3a7b76Schristos 66143c3a7b76Schristos > In the flex.skl file the size of the default input buffers is set. Can you 66153c3a7b76Schristos > explain why this size is set and why it is such a high number. 66163c3a7b76Schristos 66173c3a7b76Schristos It's large to optimize performance when scanning large files. You can 66183c3a7b76Schristos safely make it a lot lower if needed. 66193c3a7b76Schristos 66203c3a7b76Schristos Vern 66213c3a7b76Schristos 66223c3a7b76Schristos 66233c3a7b76SchristosFile: flex.info, Node: unnamed-faq-88, Next: unnamed-faq-90, Prev: unnamed-faq-87, Up: FAQ 66243c3a7b76Schristos 66253c3a7b76Schristosunnamed-faq-88 66263c3a7b76Schristos============== 66273c3a7b76Schristos 66283c3a7b76Schristos To: "Guido Minnen" <guidomi@cogs.susx.ac.uk> 66293c3a7b76Schristos Subject: Re: Flex error message 66303c3a7b76Schristos In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST. 66313c3a7b76Schristos Date: Thu, 25 Feb 1999 00:11:31 PST 66323c3a7b76Schristos From: Vern Paxson <vern> 66333c3a7b76Schristos 66343c3a7b76Schristos > I'm extending a larger scanner written in Flex and I keep running into 66353c3a7b76Schristos > problems. More specifically, I get the error message: 66363c3a7b76Schristos > "flex: input rules are too complicated (>= 32000 NFA states)" 66373c3a7b76Schristos 66383c3a7b76Schristos Increase the definitions in flexdef.h for: 66393c3a7b76Schristos 66403c3a7b76Schristos #define JAMSTATE -32766 /* marks a reference to the state that always j 66413c3a7b76Schristos ams */ 66423c3a7b76Schristos #define MAXIMUM_MNS 31999 66433c3a7b76Schristos #define BAD_SUBSCRIPT -32767 66443c3a7b76Schristos 66453c3a7b76Schristos recompile everything, and it should all work. 66463c3a7b76Schristos 66473c3a7b76Schristos Vern 66483c3a7b76Schristos 66493c3a7b76Schristos 66503c3a7b76SchristosFile: flex.info, Node: unnamed-faq-90, Next: unnamed-faq-91, Prev: unnamed-faq-88, Up: FAQ 66513c3a7b76Schristos 66523c3a7b76Schristosunnamed-faq-90 66533c3a7b76Schristos============== 66543c3a7b76Schristos 66553c3a7b76Schristos To: "Dmitriy Goldobin" <gold@ems.chel.su> 66563c3a7b76Schristos Subject: Re: FLEX trouble 66573c3a7b76Schristos In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT. 66583c3a7b76Schristos Date: Tue, 01 Jun 1999 00:15:07 PDT 66593c3a7b76Schristos From: Vern Paxson <vern> 66603c3a7b76Schristos 66613c3a7b76Schristos > I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20 66623c3a7b76Schristos > but rule "/*"(.|\n)*"*/" don't work ? 66633c3a7b76Schristos 66643c3a7b76Schristos The second of these will have to scan the entire input stream (because 66653c3a7b76Schristos "(.|\n)*" matches an arbitrary amount of any text) in order to see if 66663c3a7b76Schristos it ends with "*/", terminating the comment. That potentially will overflow 66673c3a7b76Schristos the input buffer. 66683c3a7b76Schristos 66693c3a7b76Schristos > More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error 66703c3a7b76Schristos > 'unrecognized rule'. 66713c3a7b76Schristos 66723c3a7b76Schristos You can't use the '/' operator inside parentheses. It's not clear 66733c3a7b76Schristos what "(a/b)*" actually means. 66743c3a7b76Schristos 66753c3a7b76Schristos > I now use workaround with state <comment>, but single-rule is 66763c3a7b76Schristos > better, i think. 66773c3a7b76Schristos 66783c3a7b76Schristos Single-rule is nice but will always have the problem of either setting 66793c3a7b76Schristos restrictions on comments (like not allowing multi-line comments) and/or 66803c3a7b76Schristos running the risk of consuming the entire input stream, as noted above. 66813c3a7b76Schristos 66823c3a7b76Schristos Vern 66833c3a7b76Schristos 66843c3a7b76Schristos 66853c3a7b76SchristosFile: flex.info, Node: unnamed-faq-91, Next: unnamed-faq-92, Prev: unnamed-faq-90, Up: FAQ 66863c3a7b76Schristos 66873c3a7b76Schristosunnamed-faq-91 66883c3a7b76Schristos============== 66893c3a7b76Schristos 66903c3a7b76Schristos Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18]) 66913c3a7b76Schristos by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100 66923c3a7b76Schristos for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT) 66933c3a7b76Schristos Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999 66943c3a7b76Schristos To: vern@ee.lbl.gov 66953c3a7b76Schristos Date: Tue, 15 Jun 1999 08:55:43 -0700 66963c3a7b76Schristos From: "Aki Niimura" <neko@my-deja.com> 66973c3a7b76Schristos Message-ID: <KNONDOHDOBGAEAAA@my-deja.com> 66983c3a7b76Schristos Mime-Version: 1.0 66993c3a7b76Schristos Cc: 67003c3a7b76Schristos X-Sent-Mail: on 67013c3a7b76Schristos Reply-To: 67023c3a7b76Schristos X-Mailer: MailCity Service 67033c3a7b76Schristos Subject: A question on flex C++ scanner 67043c3a7b76Schristos X-Sender-Ip: 12.72.207.61 67053c3a7b76Schristos Organization: My Deja Email (http://www.my-deja.com:80) 67063c3a7b76Schristos Content-Type: text/plain; charset=us-ascii 67073c3a7b76Schristos Content-Transfer-Encoding: 7bit 67083c3a7b76Schristos 67093c3a7b76Schristos Dear Dr. Paxon, 67103c3a7b76Schristos 67113c3a7b76Schristos I have been using flex for years. 67123c3a7b76Schristos It works very well on many projects. 67133c3a7b76Schristos Most case, I used it to generate a scanner on C language. 67143c3a7b76Schristos However, one project I needed to generate a scanner 67153c3a7b76Schristos on C++ lanuage. Thanks to your enhancement, flex did 67163c3a7b76Schristos the job. 67173c3a7b76Schristos 67183c3a7b76Schristos Currently, I'm working on enhancing my previous project. 67193c3a7b76Schristos I need to deal with multiple input streams (recursive 67203c3a7b76Schristos inclusion) in this scanner (C++). 67213c3a7b76Schristos I did similar thing for another scanner (C) as you 67223c3a7b76Schristos explained in your documentation. 67233c3a7b76Schristos 67243c3a7b76Schristos The generated scanner (C++) has necessary methods: 67253c3a7b76Schristos - switch_to_buffer(struct yy_buffer_state *b) 67263c3a7b76Schristos - yy_create_buffer(istream *is, int sz) 67273c3a7b76Schristos - yy_delete_buffer(struct yy_buffer_state *b) 67283c3a7b76Schristos 67293c3a7b76Schristos However, I couldn't figure out how to access current 67303c3a7b76Schristos buffer (yy_current_buffer). 67313c3a7b76Schristos 67323c3a7b76Schristos yy_current_buffer is a protected member of yyFlexLexer. 67333c3a7b76Schristos I can't access it directly. 67343c3a7b76Schristos Then, I thought yy_create_buffer() with is = 0 might 67353c3a7b76Schristos return current stream buffer. But it seems not as far 67363c3a7b76Schristos as I checked the source. (flex 2.5.4) 67373c3a7b76Schristos 67383c3a7b76Schristos I went through the Web in addition to Flex documentation. 67393c3a7b76Schristos However, it hasn't been successful, so far. 67403c3a7b76Schristos 67413c3a7b76Schristos It is not my intention to bother you, but, can you 67423c3a7b76Schristos comment about how to obtain the current stream buffer? 67433c3a7b76Schristos 67443c3a7b76Schristos Your response would be highly appreciated. 67453c3a7b76Schristos 67463c3a7b76Schristos Best regards, 67473c3a7b76Schristos Aki Niimura 67483c3a7b76Schristos 67493c3a7b76Schristos --== Sent via Deja.com http://www.deja.com/ ==-- 67503c3a7b76Schristos Share what you know. Learn what you don't. 67513c3a7b76Schristos 67523c3a7b76Schristos 67533c3a7b76SchristosFile: flex.info, Node: unnamed-faq-92, Next: unnamed-faq-93, Prev: unnamed-faq-91, Up: FAQ 67543c3a7b76Schristos 67553c3a7b76Schristosunnamed-faq-92 67563c3a7b76Schristos============== 67573c3a7b76Schristos 67583c3a7b76Schristos To: neko@my-deja.com 67593c3a7b76Schristos Subject: Re: A question on flex C++ scanner 67603c3a7b76Schristos In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT. 67613c3a7b76Schristos Date: Tue, 15 Jun 1999 09:04:24 PDT 67623c3a7b76Schristos From: Vern Paxson <vern> 67633c3a7b76Schristos 67643c3a7b76Schristos > However, I couldn't figure out how to access current 67653c3a7b76Schristos > buffer (yy_current_buffer). 67663c3a7b76Schristos 67673c3a7b76Schristos Derive your own subclass from yyFlexLexer. 67683c3a7b76Schristos 67693c3a7b76Schristos Vern 67703c3a7b76Schristos 67713c3a7b76Schristos 67723c3a7b76SchristosFile: flex.info, Node: unnamed-faq-93, Next: unnamed-faq-94, Prev: unnamed-faq-92, Up: FAQ 67733c3a7b76Schristos 67743c3a7b76Schristosunnamed-faq-93 67753c3a7b76Schristos============== 67763c3a7b76Schristos 67773c3a7b76Schristos To: "Stones, Darren" <Darren.Stones@nectech.co.uk> 67783c3a7b76Schristos Subject: Re: You're the man to see? 67793c3a7b76Schristos In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT. 67803c3a7b76Schristos Date: Wed, 23 Jun 1999 09:01:40 PDT 67813c3a7b76Schristos From: Vern Paxson <vern> 67823c3a7b76Schristos 67833c3a7b76Schristos > I hope you can help me. I am using Flex and Bison to produce an interpreted 67843c3a7b76Schristos > language. However all goes well until I try to implement an IF statement or 67853c3a7b76Schristos > a WHILE. I cannot get this to work as the parser parses all the conditions 67863c3a7b76Schristos > eg. the TRUE and FALSE conditons to check for a rule match. So I cannot 67873c3a7b76Schristos > make a decision!! 67883c3a7b76Schristos 67893c3a7b76Schristos You need to use the parser to build a parse tree (= abstract syntax trwee), 67903c3a7b76Schristos and when that's all done you recursively evaluate the tree, binding variables 67913c3a7b76Schristos to values at that time. 67923c3a7b76Schristos 67933c3a7b76Schristos Vern 67943c3a7b76Schristos 67953c3a7b76Schristos 67963c3a7b76SchristosFile: flex.info, Node: unnamed-faq-94, Next: unnamed-faq-95, Prev: unnamed-faq-93, Up: FAQ 67973c3a7b76Schristos 67983c3a7b76Schristosunnamed-faq-94 67993c3a7b76Schristos============== 68003c3a7b76Schristos 68013c3a7b76Schristos To: Petr Danecek <petr@ics.cas.cz> 68023c3a7b76Schristos Subject: Re: flex - question 68033c3a7b76Schristos In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT. 68043c3a7b76Schristos Date: Fri, 02 Jul 1999 16:52:13 PDT 68053c3a7b76Schristos From: Vern Paxson <vern> 68063c3a7b76Schristos 68073c3a7b76Schristos > file, it takes an enormous amount of time. It is funny, because the 68083c3a7b76Schristos > source code has only 12 rules!!! I think it looks like an exponencial 68093c3a7b76Schristos > growth. 68103c3a7b76Schristos 68113c3a7b76Schristos Right, that's the problem - some patterns (those with a lot of 68123c3a7b76Schristos ambiguity, where yours has because at any given time the scanner can 68133c3a7b76Schristos be in the middle of all sorts of combinations of the different 68143c3a7b76Schristos rules) blow up exponentially. 68153c3a7b76Schristos 68163c3a7b76Schristos For your rules, there is an easy fix. Change the ".*" that comes fater 68173c3a7b76Schristos the directory name to "[^ ]*". With that in place, the rules are no 68183c3a7b76Schristos longer nearly so ambiguous, because then once one of the directories 68193c3a7b76Schristos has been matched, no other can be matched (since they all require a 68203c3a7b76Schristos leading blank). 68213c3a7b76Schristos 68223c3a7b76Schristos If that's not an acceptable solution, then you can enter a start state 68233c3a7b76Schristos to pick up the .*\n after each directory is matched. 68243c3a7b76Schristos 68253c3a7b76Schristos Also note that for speed, you'll want to add a ".*" rule at the end, 68263c3a7b76Schristos otherwise rules that don't match any of the patterns will be matched 68273c3a7b76Schristos very slowly, a character at a time. 68283c3a7b76Schristos 68293c3a7b76Schristos Vern 68303c3a7b76Schristos 68313c3a7b76Schristos 68323c3a7b76SchristosFile: flex.info, Node: unnamed-faq-95, Next: unnamed-faq-96, Prev: unnamed-faq-94, Up: FAQ 68333c3a7b76Schristos 68343c3a7b76Schristosunnamed-faq-95 68353c3a7b76Schristos============== 68363c3a7b76Schristos 68373c3a7b76Schristos To: Tielman Koekemoer <tielman@spi.co.za> 68383c3a7b76Schristos Subject: Re: Please help. 68393c3a7b76Schristos In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT. 68403c3a7b76Schristos Date: Thu, 08 Jul 1999 08:20:39 PDT 68413c3a7b76Schristos From: Vern Paxson <vern> 68423c3a7b76Schristos 68433c3a7b76Schristos > I was hoping you could help me with my problem. 68443c3a7b76Schristos > 68453c3a7b76Schristos > I tried compiling (gnu)flex on a Solaris 2.4 machine 68463c3a7b76Schristos > but when I ran make (after configure) I got an error. 68473c3a7b76Schristos > 68483c3a7b76Schristos > -------------------------------------------------------------- 68493c3a7b76Schristos > gcc -c -I. -I. -g -O parse.c 68503c3a7b76Schristos > ./flex -t -p ./scan.l >scan.c 68513c3a7b76Schristos > sh: ./flex: not found 68523c3a7b76Schristos > *** Error code 1 68533c3a7b76Schristos > make: Fatal error: Command failed for target `scan.c' 68543c3a7b76Schristos > ------------------------------------------------------------- 68553c3a7b76Schristos > 68563c3a7b76Schristos > What's strange to me is that I'm only 68573c3a7b76Schristos > trying to install flex now. I then edited the Makefile to 68583c3a7b76Schristos > and changed where it says "FLEX = flex" to "FLEX = lex" 68593c3a7b76Schristos > ( lex: the native Solaris one ) but then it complains about 68603c3a7b76Schristos > the "-p" option. Is there any way I can compile flex without 68613c3a7b76Schristos > using flex or lex? 68623c3a7b76Schristos > 68633c3a7b76Schristos > Thanks so much for your time. 68643c3a7b76Schristos 68653c3a7b76Schristos You managed to step on the bootstrap sequence, which first copies 68663c3a7b76Schristos initscan.c to scan.c in order to build flex. Try fetching a fresh 68673c3a7b76Schristos distribution from ftp.ee.lbl.gov. (Or you can first try removing 68683c3a7b76Schristos ".bootstrap" and doing a make again.) 68693c3a7b76Schristos 68703c3a7b76Schristos Vern 68713c3a7b76Schristos 68723c3a7b76Schristos 68733c3a7b76SchristosFile: flex.info, Node: unnamed-faq-96, Next: unnamed-faq-97, Prev: unnamed-faq-95, Up: FAQ 68743c3a7b76Schristos 68753c3a7b76Schristosunnamed-faq-96 68763c3a7b76Schristos============== 68773c3a7b76Schristos 68783c3a7b76Schristos To: Tielman Koekemoer <tielman@spi.co.za> 68793c3a7b76Schristos Subject: Re: Please help. 68803c3a7b76Schristos In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT. 68813c3a7b76Schristos Date: Fri, 09 Jul 1999 00:27:20 PDT 68823c3a7b76Schristos From: Vern Paxson <vern> 68833c3a7b76Schristos 68843c3a7b76Schristos > First I removed .bootstrap (and ran make) - no luck. I downloaded the 68853c3a7b76Schristos > software but I still have the same problem. Is there anything else I 68863c3a7b76Schristos > could try. 68873c3a7b76Schristos 68883c3a7b76Schristos Try: 68893c3a7b76Schristos 68903c3a7b76Schristos cp initscan.c scan.c 68913c3a7b76Schristos touch scan.c 68923c3a7b76Schristos make scan.o 68933c3a7b76Schristos 68943c3a7b76Schristos If this last tries to first build scan.c from scan.l using ./flex, then 68953c3a7b76Schristos your "make" is broken, in which case compile scan.c to scan.o by hand. 68963c3a7b76Schristos 68973c3a7b76Schristos Vern 68983c3a7b76Schristos 68993c3a7b76Schristos 69003c3a7b76SchristosFile: flex.info, Node: unnamed-faq-97, Next: unnamed-faq-98, Prev: unnamed-faq-96, Up: FAQ 69013c3a7b76Schristos 69023c3a7b76Schristosunnamed-faq-97 69033c3a7b76Schristos============== 69043c3a7b76Schristos 69053c3a7b76Schristos To: Sumanth Kamenani <skamenan@crl.nmsu.edu> 69063c3a7b76Schristos Subject: Re: Error 69073c3a7b76Schristos In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT. 69083c3a7b76Schristos Date: Tue, 20 Jul 1999 00:18:26 PDT 69093c3a7b76Schristos From: Vern Paxson <vern> 69103c3a7b76Schristos 69113c3a7b76Schristos > I am getting a compilation error. The error is given as "unknown symbol- yylex". 69123c3a7b76Schristos 69133c3a7b76Schristos The parser relies on calling yylex(), but you're instead using the C++ scanning 69143c3a7b76Schristos class, so you need to supply a yylex() "glue" function that calls an instance 69153c3a7b76Schristos scanner of the scanner (e.g., "scanner->yylex()"). 69163c3a7b76Schristos 69173c3a7b76Schristos Vern 69183c3a7b76Schristos 69193c3a7b76Schristos 69203c3a7b76SchristosFile: flex.info, Node: unnamed-faq-98, Next: unnamed-faq-99, Prev: unnamed-faq-97, Up: FAQ 69213c3a7b76Schristos 69223c3a7b76Schristosunnamed-faq-98 69233c3a7b76Schristos============== 69243c3a7b76Schristos 69253c3a7b76Schristos To: daniel@synchrods.synchrods.COM (Daniel Senderowicz) 69263c3a7b76Schristos Subject: Re: lex 69273c3a7b76Schristos In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST. 69283c3a7b76Schristos Date: Tue, 23 Nov 1999 15:54:30 PST 69293c3a7b76Schristos From: Vern Paxson <vern> 69303c3a7b76Schristos 69313c3a7b76Schristos Well, your problem is the 69323c3a7b76Schristos 69333c3a7b76Schristos switch (yybgin-yysvec-1) { /* witchcraft */ 69343c3a7b76Schristos 69353c3a7b76Schristos at the beginning of lex rules. "witchcraft" == "non-portable". It's 69363c3a7b76Schristos assuming knowledge of the AT&T lex's internal variables. 69373c3a7b76Schristos 69383c3a7b76Schristos For flex, you can probably do the equivalent using a switch on YYSTATE. 69393c3a7b76Schristos 69403c3a7b76Schristos Vern 69413c3a7b76Schristos 69423c3a7b76Schristos 69433c3a7b76SchristosFile: flex.info, Node: unnamed-faq-99, Next: unnamed-faq-100, Prev: unnamed-faq-98, Up: FAQ 69443c3a7b76Schristos 69453c3a7b76Schristosunnamed-faq-99 69463c3a7b76Schristos============== 69473c3a7b76Schristos 69483c3a7b76Schristos To: archow@hss.hns.com 69493c3a7b76Schristos Subject: Re: Regarding distribution of flex and yacc based grammars 69503c3a7b76Schristos In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530. 69513c3a7b76Schristos Date: Wed, 22 Dec 1999 01:56:24 PST 69523c3a7b76Schristos From: Vern Paxson <vern> 69533c3a7b76Schristos 69543c3a7b76Schristos > When we provide the customer with an object code distribution, is it 69553c3a7b76Schristos > necessary for us to provide source 69563c3a7b76Schristos > for the generated C files from flex and bison since they are generated by 69573c3a7b76Schristos > flex and bison ? 69583c3a7b76Schristos 69593c3a7b76Schristos For flex, no. I don't know what the current state of this is for bison. 69603c3a7b76Schristos 69613c3a7b76Schristos > Also, is there any requrirement for us to neccessarily provide source for 69623c3a7b76Schristos > the grammar files which are fed into flex and bison ? 69633c3a7b76Schristos 69643c3a7b76Schristos Again, for flex, no. 69653c3a7b76Schristos 69663c3a7b76Schristos See the file "COPYING" in the flex distribution for the legalese. 69673c3a7b76Schristos 69683c3a7b76Schristos Vern 69693c3a7b76Schristos 69703c3a7b76Schristos 69713c3a7b76SchristosFile: flex.info, Node: unnamed-faq-100, Next: unnamed-faq-101, Prev: unnamed-faq-99, Up: FAQ 69723c3a7b76Schristos 69733c3a7b76Schristosunnamed-faq-100 69743c3a7b76Schristos=============== 69753c3a7b76Schristos 69763c3a7b76Schristos To: Martin Gallwey <gallweym@hyperion.moe.ul.ie> 69773c3a7b76Schristos Subject: Re: Flex, and self referencing rules 69783c3a7b76Schristos In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST. 69793c3a7b76Schristos Date: Sat, 19 Feb 2000 18:33:16 PST 69803c3a7b76Schristos From: Vern Paxson <vern> 69813c3a7b76Schristos 69823c3a7b76Schristos > However, I do not use unput anywhere. I do use self-referencing 69833c3a7b76Schristos > rules like this: 69843c3a7b76Schristos > 69853c3a7b76Schristos > UnaryExpr ({UnionExpr})|("-"{UnaryExpr}) 69863c3a7b76Schristos 69873c3a7b76Schristos You can't do this - flex is *not* a parser like yacc (which does indeed 69883c3a7b76Schristos allow recursion), it is a scanner that's confined to regular expressions. 69893c3a7b76Schristos 69903c3a7b76Schristos Vern 69913c3a7b76Schristos 69923c3a7b76Schristos 69933c3a7b76SchristosFile: flex.info, Node: unnamed-faq-101, Next: What is the difference between YYLEX_PARAM and YY_DECL?, Prev: unnamed-faq-100, Up: FAQ 69943c3a7b76Schristos 69953c3a7b76Schristosunnamed-faq-101 69963c3a7b76Schristos=============== 69973c3a7b76Schristos 69983c3a7b76Schristos To: slg3@lehigh.edu (SAMUEL L. GULDEN) 69993c3a7b76Schristos Subject: Re: Flex problem 70003c3a7b76Schristos In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST. 70013c3a7b76Schristos Date: Thu, 02 Mar 2000 23:00:46 PST 70023c3a7b76Schristos From: Vern Paxson <vern> 70033c3a7b76Schristos 70043c3a7b76Schristos If this is exactly your program: 70053c3a7b76Schristos 70063c3a7b76Schristos > digit [0-9] 70073c3a7b76Schristos > digits {digit}+ 70083c3a7b76Schristos > whitespace [ \t\n]+ 70093c3a7b76Schristos > 70103c3a7b76Schristos > %% 70113c3a7b76Schristos > "[" { printf("open_brac\n");} 70123c3a7b76Schristos > "]" { printf("close_brac\n");} 70133c3a7b76Schristos > "+" { printf("addop\n");} 70143c3a7b76Schristos > "*" { printf("multop\n");} 70153c3a7b76Schristos > {digits} { printf("NUMBER = %s\n", yytext);} 70163c3a7b76Schristos > whitespace ; 70173c3a7b76Schristos 70183c3a7b76Schristos then the problem is that the last rule needs to be "{whitespace}" ! 70193c3a7b76Schristos 70203c3a7b76Schristos Vern 70213c3a7b76Schristos 70223c3a7b76Schristos 70233c3a7b76SchristosFile: flex.info, Node: What is the difference between YYLEX_PARAM and YY_DECL?, Next: Why do I get "conflicting types for yylex" error?, Prev: unnamed-faq-101, Up: FAQ 70243c3a7b76Schristos 70253c3a7b76SchristosWhat is the difference between YYLEX_PARAM and YY_DECL? 70263c3a7b76Schristos======================================================= 70273c3a7b76Schristos 70283c3a7b76SchristosYYLEX_PARAM is not a flex symbol. It is for Bison. It tells Bison to 70293c3a7b76Schristospass extra params when it calls yylex() from the parser. 70303c3a7b76Schristos 70313c3a7b76Schristos YY_DECL is the Flex declaration of yylex. The default is similar to 70323c3a7b76Schristosthis: 70333c3a7b76Schristos 70343c3a7b76Schristos #define int yy_lex () 70353c3a7b76Schristos 70363c3a7b76Schristos 70373c3a7b76SchristosFile: flex.info, Node: Why do I get "conflicting types for yylex" error?, Next: How do I access the values set in a Flex action from within a Bison action?, Prev: What is the difference between YYLEX_PARAM and YY_DECL?, Up: FAQ 70383c3a7b76Schristos 70393c3a7b76SchristosWhy do I get "conflicting types for yylex" error? 70403c3a7b76Schristos================================================= 70413c3a7b76Schristos 70423c3a7b76SchristosThis is a compiler error regarding a generated Bison parser, not a Flex 70433c3a7b76Schristosscanner. It means you need a prototype of yylex() in the top of the 70443c3a7b76SchristosBison file. Be sure the prototype matches YY_DECL. 70453c3a7b76Schristos 70463c3a7b76Schristos 70473c3a7b76SchristosFile: flex.info, Node: How do I access the values set in a Flex action from within a Bison action?, Prev: Why do I get "conflicting types for yylex" error?, Up: FAQ 70483c3a7b76Schristos 70493c3a7b76SchristosHow do I access the values set in a Flex action from within a Bison action? 70503c3a7b76Schristos=========================================================================== 70513c3a7b76Schristos 70523c3a7b76SchristosWith $1, $2, $3, etc. These are called "Semantic Values" in the Bison 705330da1778Schristosmanual. See *note (bison)Top::. 70543c3a7b76Schristos 70553c3a7b76Schristos 70563c3a7b76SchristosFile: flex.info, Node: Appendices, Next: Indices, Prev: FAQ, Up: Top 70573c3a7b76Schristos 70583c3a7b76SchristosAppendix A Appendices 70593c3a7b76Schristos********************* 70603c3a7b76Schristos 70613c3a7b76Schristos* Menu: 70623c3a7b76Schristos 70633c3a7b76Schristos* Makefiles and Flex:: 70643c3a7b76Schristos* Bison Bridge:: 70653c3a7b76Schristos* M4 Dependency:: 70663c3a7b76Schristos* Common Patterns:: 70673c3a7b76Schristos 70683c3a7b76Schristos 70693c3a7b76SchristosFile: flex.info, Node: Makefiles and Flex, Next: Bison Bridge, Prev: Appendices, Up: Appendices 70703c3a7b76Schristos 70713c3a7b76SchristosA.1 Makefiles and Flex 70723c3a7b76Schristos====================== 70733c3a7b76Schristos 70743c3a7b76SchristosIn this appendix, we provide tips for writing Makefiles to build your 70753c3a7b76Schristosscanners. 70763c3a7b76Schristos 707730da1778Schristos In a traditional build environment, we say that the '.c' files are 707830da1778Schristosthe sources, and the '.o' files are the intermediate files. When using 707930da1778Schristos'flex', however, the '.l' files are the sources, and the generated '.c' 708030da1778Schristosfiles (along with the '.o' files) are the intermediate files. This 70813c3a7b76Schristosrequires you to carefully plan your Makefile. 70823c3a7b76Schristos 708330da1778Schristos Modern 'make' programs understand that 'foo.l' is intended to 708430da1778Schristosgenerate 'lex.yy.c' or 'foo.c', and will behave accordingly(1)(2). The 708530da1778Schristosfollowing Makefile does not explicitly instruct 'make' how to build 708630da1778Schristos'foo.c' from 'foo.l'. Instead, it relies on the implicit rules of the 708730da1778Schristos'make' program to build the intermediate file, 'scan.c': 70883c3a7b76Schristos 70893c3a7b76Schristos # Basic Makefile -- relies on implicit rules 70903c3a7b76Schristos # Creates "myprogram" from "scan.l" and "myprogram.c" 70913c3a7b76Schristos # 70923c3a7b76Schristos LEX=flex 70933c3a7b76Schristos myprogram: scan.o myprogram.o 70943c3a7b76Schristos scan.o: scan.l 70953c3a7b76Schristos 709630da1778Schristos 70973c3a7b76Schristos For simple cases, the above may be sufficient. For other cases, you 709830da1778Schristosmay have to explicitly instruct 'make' how to build your scanner. The 70993c3a7b76Schristosfollowing is an example of a Makefile containing explicit rules: 71003c3a7b76Schristos 71013c3a7b76Schristos # Basic Makefile -- provides explicit rules 71023c3a7b76Schristos # Creates "myprogram" from "scan.l" and "myprogram.c" 71033c3a7b76Schristos # 71043c3a7b76Schristos LEX=flex 71053c3a7b76Schristos myprogram: scan.o myprogram.o 71063c3a7b76Schristos $(CC) -o $@ $(LDFLAGS) $^ 71073c3a7b76Schristos 71083c3a7b76Schristos myprogram.o: myprogram.c 71093c3a7b76Schristos $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ 71103c3a7b76Schristos 71113c3a7b76Schristos scan.o: scan.c 71123c3a7b76Schristos $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ 71133c3a7b76Schristos 71143c3a7b76Schristos scan.c: scan.l 71153c3a7b76Schristos $(LEX) $(LFLAGS) -o $@ $^ 71163c3a7b76Schristos 71173c3a7b76Schristos clean: 71183c3a7b76Schristos $(RM) *.o scan.c 71193c3a7b76Schristos 712030da1778Schristos 712130da1778Schristos Notice in the above example that 'scan.c' is in the 'clean' target. 712230da1778SchristosThis is because we consider the file 'scan.c' to be an intermediate 71233c3a7b76Schristosfile. 71243c3a7b76Schristos 712530da1778Schristos Finally, we provide a realistic example of a 'flex' scanner used with 712630da1778Schristosa 'bison' parser(3). There is a tricky problem we have to deal with. 712730da1778SchristosSince a 'flex' scanner will typically include a header file (e.g., 712830da1778Schristos'y.tab.h') generated by the parser, we need to be sure that the header 712930da1778Schristosfile is generated BEFORE the scanner is compiled. We handle this case 713030da1778Schristosin the following example: 71313c3a7b76Schristos 71323c3a7b76Schristos # Makefile example -- scanner and parser. 71333c3a7b76Schristos # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c" 71343c3a7b76Schristos # 71353c3a7b76Schristos LEX = flex 71363c3a7b76Schristos YACC = bison -y 71373c3a7b76Schristos YFLAGS = -d 71383c3a7b76Schristos objects = scan.o parse.o myprogram.o 71393c3a7b76Schristos 71403c3a7b76Schristos myprogram: $(objects) 71413c3a7b76Schristos scan.o: scan.l parse.c 71423c3a7b76Schristos parse.o: parse.y 71433c3a7b76Schristos myprogram.o: myprogram.c 71443c3a7b76Schristos 714530da1778Schristos 71463c3a7b76Schristos In the above example, notice the line, 71473c3a7b76Schristos 71483c3a7b76Schristos scan.o: scan.l parse.c 71493c3a7b76Schristos 715030da1778Schristos , which lists the file 'parse.c' (the generated parser) as a 715130da1778Schristosdependency of 'scan.o'. We want to ensure that the parser is created 71523c3a7b76Schristosbefore the scanner is compiled, and the above line seems to do the 71533c3a7b76Schristostrick. Feel free to experiment with your specific implementation of 715430da1778Schristos'make'. 71553c3a7b76Schristos 715630da1778Schristos For more details on writing Makefiles, see *note (make)Top::. 71573c3a7b76Schristos 71583c3a7b76Schristos ---------- Footnotes ---------- 71593c3a7b76Schristos 716030da1778Schristos (1) GNU 'make' and GNU 'automake' are two such programs that provide 71613c3a7b76Schristosimplicit rules for flex-generated scanners. 71623c3a7b76Schristos 716330da1778Schristos (2) GNU 'automake' may generate code to execute flex in 71643c3a7b76Schristoslex-compatible mode, or to stdout. If this is not what you want, then 71653c3a7b76Schristosyou should provide an explicit rule in your Makefile.am 71663c3a7b76Schristos 71673c3a7b76Schristos (3) This example also applies to yacc parsers. 71683c3a7b76Schristos 71693c3a7b76Schristos 71703c3a7b76SchristosFile: flex.info, Node: Bison Bridge, Next: M4 Dependency, Prev: Makefiles and Flex, Up: Appendices 71713c3a7b76Schristos 71723c3a7b76SchristosA.2 C Scanners with Bison Parsers 71733c3a7b76Schristos================================= 71743c3a7b76Schristos 717530da1778SchristosThis section describes the 'flex' features useful when integrating 717630da1778Schristos'flex' with 'GNU bison'(1). Skip this section if you are not using 717730da1778Schristos'bison' with your scanner. Here we discuss only the 'flex' half of the 717830da1778Schristos'flex' and 'bison' pair. We do not discuss 'bison' in any detail. For 717930da1778Schristosmore information about generating 'bison' parsers, see *note 718030da1778Schristos(bison)Top::. 71813c3a7b76Schristos 718230da1778Schristos A compatible 'bison' scanner is generated by declaring '%option 718330da1778Schristosbison-bridge' or by supplying '--bison-bridge' when invoking 'flex' from 718430da1778Schristosthe command line. This instructs 'flex' that the macro 'yylval' may be 718530da1778Schristosused. The data type for 'yylval', 'YYSTYPE', is typically defined in a 718630da1778Schristosheader file, included in section 1 of the 'flex' input file. For a list 718730da1778Schristosof functions and macros available, *Note bison-functions::. 71883c3a7b76Schristos 71893c3a7b76Schristos The declaration of yylex becomes, 71903c3a7b76Schristos 71913c3a7b76Schristos int yylex ( YYSTYPE * lvalp, yyscan_t scanner ); 71923c3a7b76Schristos 719330da1778Schristos If '%option bison-locations' is specified, then the declaration 71943c3a7b76Schristosbecomes, 71953c3a7b76Schristos 71963c3a7b76Schristos int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner ); 71973c3a7b76Schristos 719830da1778Schristos Note that the macros 'yylval' and 'yylloc' evaluate to pointers. 719930da1778SchristosSupport for 'yylloc' is optional in 'bison', so it is optional in 'flex' 720030da1778Schristosas well. The following is an example of a 'flex' scanner that is 720130da1778Schristoscompatible with 'bison'. 72023c3a7b76Schristos 72033c3a7b76Schristos /* Scanner for "C" assignment statements... sort of. */ 72043c3a7b76Schristos %{ 72053c3a7b76Schristos #include "y.tab.h" /* Generated by bison. */ 72063c3a7b76Schristos %} 72073c3a7b76Schristos 72083c3a7b76Schristos %option bison-bridge bison-locations 72093c3a7b76Schristos % 72103c3a7b76Schristos 72113c3a7b76Schristos [[:digit:]]+ { yylval->num = atoi(yytext); return NUMBER;} 72123c3a7b76Schristos [[:alnum:]]+ { yylval->str = strdup(yytext); return STRING;} 72133c3a7b76Schristos "="|";" { return yytext[0];} 72143c3a7b76Schristos . {} 72153c3a7b76Schristos % 72163c3a7b76Schristos 721730da1778Schristos As you can see, there really is no magic here. We just use 'yylval' 721830da1778Schristosas we would any other variable. The data type of 'yylval' is generated 721930da1778Schristosby 'bison', and included in the file 'y.tab.h'. Here is the 722030da1778Schristoscorresponding 'bison' parser: 72213c3a7b76Schristos 72223c3a7b76Schristos /* Parser to convert "C" assignments to lisp. */ 72233c3a7b76Schristos %{ 72243c3a7b76Schristos /* Pass the argument to yyparse through to yylex. */ 72253c3a7b76Schristos #define YYPARSE_PARAM scanner 72263c3a7b76Schristos #define YYLEX_PARAM scanner 72273c3a7b76Schristos %} 72283c3a7b76Schristos %locations 72293c3a7b76Schristos %pure_parser 72303c3a7b76Schristos %union { 72313c3a7b76Schristos int num; 72323c3a7b76Schristos char* str; 72333c3a7b76Schristos } 72343c3a7b76Schristos %token <str> STRING 72353c3a7b76Schristos %token <num> NUMBER 72363c3a7b76Schristos %% 72373c3a7b76Schristos assignment: 72383c3a7b76Schristos STRING '=' NUMBER ';' { 72393c3a7b76Schristos printf( "(setf %s %d)", $1, $3 ); 72403c3a7b76Schristos } 72413c3a7b76Schristos ; 72423c3a7b76Schristos 72433c3a7b76Schristos ---------- Footnotes ---------- 72443c3a7b76Schristos 72453c3a7b76Schristos (1) The features described here are purely optional, and are by no 72463c3a7b76Schristosmeans the only way to use flex with bison. We merely provide some glue 72473c3a7b76Schristosto ease development of your parser-scanner pair. 72483c3a7b76Schristos 72493c3a7b76Schristos 72503c3a7b76SchristosFile: flex.info, Node: M4 Dependency, Next: Common Patterns, Prev: Bison Bridge, Up: Appendices 72513c3a7b76Schristos 72523c3a7b76SchristosA.3 M4 Dependency 72533c3a7b76Schristos================= 72543c3a7b76Schristos 725530da1778SchristosThe macro processor 'm4'(1) must be installed wherever flex is 725630da1778Schristosinstalled. 'flex' invokes 'm4', found by searching the directories in 725730da1778Schristosthe 'PATH' environment variable. Any code you place in section 1 or in 72583c3a7b76Schristosthe actions will be sent through m4. Please follow these rules to 725930da1778Schristosprotect your code from unwanted 'm4' processing. 72603c3a7b76Schristos 726130da1778Schristos * Do not use symbols that begin with, 'm4_', such as, 'm4_define', or 726230da1778Schristos 'm4_include', since those are reserved for 'm4' macro names. If 72633c3a7b76Schristos for some reason you need m4_ as a prefix, use a preprocessor 72643c3a7b76Schristos #define to get your symbol past m4 unmangled. 72653c3a7b76Schristos 726630da1778Schristos * Do not use the strings '[[' or ']]' anywhere in your code. The 72673c3a7b76Schristos former is not valid in C, except within comments and strings, but 726830da1778Schristos the latter is valid in code such as 'x[y[z]]'. The solution is 726930da1778Schristos simple. To get the literal string '"]]"', use '"]""]"'. To get 727030da1778Schristos the array notation 'x[y[z]]', use 'x[y[z] ]'. Flex will attempt to 72713c3a7b76Schristos detect these sequences in user code, and escape them. However, 727230da1778Schristos it's best to avoid this complexity where possible, by removing such 727330da1778Schristos sequences from your code. 72743c3a7b76Schristos 727530da1778Schristos 'm4' is only required at the time you run 'flex'. The generated 727630da1778Schristosscanner is ordinary C or C++, and does _not_ require 'm4'. 72773c3a7b76Schristos 72783c3a7b76Schristos ---------- Footnotes ---------- 72793c3a7b76Schristos 72803c3a7b76Schristos (1) The use of m4 is subject to change in future revisions of flex. 72813c3a7b76SchristosIt is not part of the public API of flex. Do not depend on it. 72823c3a7b76Schristos 72833c3a7b76Schristos 72843c3a7b76SchristosFile: flex.info, Node: Common Patterns, Prev: M4 Dependency, Up: Appendices 72853c3a7b76Schristos 72863c3a7b76SchristosA.4 Common Patterns 72873c3a7b76Schristos=================== 72883c3a7b76Schristos 72893c3a7b76SchristosThis appendix provides examples of common regular expressions you might 72903c3a7b76Schristosuse in your scanner. 72913c3a7b76Schristos 72923c3a7b76Schristos* Menu: 72933c3a7b76Schristos 72943c3a7b76Schristos* Numbers:: 72953c3a7b76Schristos* Identifiers:: 72963c3a7b76Schristos* Quoted Constructs:: 72973c3a7b76Schristos* Addresses:: 72983c3a7b76Schristos 72993c3a7b76Schristos 73003c3a7b76SchristosFile: flex.info, Node: Numbers, Next: Identifiers, Up: Common Patterns 73013c3a7b76Schristos 73023c3a7b76SchristosA.4.1 Numbers 73033c3a7b76Schristos------------- 73043c3a7b76Schristos 73053c3a7b76SchristosC99 decimal constant 730630da1778Schristos '([[:digit:]]{-}[0])[[:digit:]]*' 73073c3a7b76Schristos 73083c3a7b76SchristosC99 hexadecimal constant 730930da1778Schristos '0[xX][[:xdigit:]]+' 73103c3a7b76Schristos 73113c3a7b76SchristosC99 octal constant 731230da1778Schristos '0[01234567]*' 73133c3a7b76Schristos 73143c3a7b76SchristosC99 floating point constant 73153c3a7b76Schristos {dseq} ([[:digit:]]+) 73163c3a7b76Schristos {dseq_opt} ([[:digit:]]*) 73173c3a7b76Schristos {frac} (({dseq_opt}"."{dseq})|{dseq}".") 73183c3a7b76Schristos {exp} ([eE][+-]?{dseq}) 73193c3a7b76Schristos {exp_opt} ({exp}?) 73203c3a7b76Schristos {fsuff} [flFL] 73213c3a7b76Schristos {fsuff_opt} ({fsuff}?) 73223c3a7b76Schristos {hpref} (0[xX]) 73233c3a7b76Schristos {hdseq} ([[:xdigit:]]+) 73243c3a7b76Schristos {hdseq_opt} ([[:xdigit:]]*) 73253c3a7b76Schristos {hfrac} (({hdseq_opt}"."{hdseq})|({hdseq}".")) 73263c3a7b76Schristos {bexp} ([pP][+-]?{dseq}) 73273c3a7b76Schristos {dfc} (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt})) 73283c3a7b76Schristos {hfc} (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt})) 73293c3a7b76Schristos 73303c3a7b76Schristos {c99_floating_point_constant} ({dfc}|{hfc}) 73313c3a7b76Schristos 73323c3a7b76Schristos See C99 section 6.4.4.2 for the gory details. 73333c3a7b76Schristos 73343c3a7b76Schristos 73353c3a7b76SchristosFile: flex.info, Node: Identifiers, Next: Quoted Constructs, Prev: Numbers, Up: Common Patterns 73363c3a7b76Schristos 73373c3a7b76SchristosA.4.2 Identifiers 73383c3a7b76Schristos----------------- 73393c3a7b76Schristos 73403c3a7b76SchristosC99 Identifier 73413c3a7b76Schristos ucn ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8}))) 73423c3a7b76Schristos nondigit [_[:alpha:]] 73433c3a7b76Schristos c99_id ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})* 73443c3a7b76Schristos 73453c3a7b76Schristos Technically, the above pattern does not encompass all possible C99 73463c3a7b76Schristos identifiers, since C99 allows for "implementation-defined" 73473c3a7b76Schristos characters. In practice, C compilers follow the above pattern, 734830da1778Schristos with the addition of the '$' character. 73493c3a7b76Schristos 73503c3a7b76SchristosUTF-8 Encoded Unicode Code Point 73513c3a7b76Schristos [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2}) 73523c3a7b76Schristos 73533c3a7b76Schristos 73543c3a7b76SchristosFile: flex.info, Node: Quoted Constructs, Next: Addresses, Prev: Identifiers, Up: Common Patterns 73553c3a7b76Schristos 73563c3a7b76SchristosA.4.3 Quoted Constructs 73573c3a7b76Schristos----------------------- 73583c3a7b76Schristos 73593c3a7b76SchristosC99 String Literal 736030da1778Schristos 'L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"' 73613c3a7b76Schristos 73623c3a7b76SchristosC99 Comment 736330da1778Schristos '("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)' 73643c3a7b76Schristos 736530da1778Schristos Note that in C99, a '//'-style comment may be split across lines, 736630da1778Schristos and, contrary to popular belief, does not include the trailing '\n' 736730da1778Schristos character. 73683c3a7b76Schristos 736930da1778Schristos A better way to scan '/* */' comments is by line, rather than 73703c3a7b76Schristos matching possibly huge comments all at once. This will allow you 737130da1778Schristos to scan comments of unlimited length, as long as line breaks appear 737230da1778Schristos at sane intervals. This is also more efficient when used with 737330da1778Schristos automatic line number processing. *Note option-yylineno::. 73743c3a7b76Schristos 73753c3a7b76Schristos <INITIAL>{ 73763c3a7b76Schristos "/*" BEGIN(COMMENT); 73773c3a7b76Schristos } 73783c3a7b76Schristos <COMMENT>{ 73793c3a7b76Schristos "*/" BEGIN(0); 73803c3a7b76Schristos [^*\n]+ ; 73813c3a7b76Schristos "*"[^/] ; 73823c3a7b76Schristos \n ; 73833c3a7b76Schristos } 73843c3a7b76Schristos 73853c3a7b76Schristos 73863c3a7b76SchristosFile: flex.info, Node: Addresses, Prev: Quoted Constructs, Up: Common Patterns 73873c3a7b76Schristos 73883c3a7b76SchristosA.4.4 Addresses 73893c3a7b76Schristos--------------- 73903c3a7b76Schristos 73913c3a7b76SchristosIPv4 Address 7392dded093eSchristos dec-octet [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5] 7393dded093eSchristos IPv4address {dec-octet}\.{dec-octet}\.{dec-octet}\.{dec-octet} 73943c3a7b76Schristos 73953c3a7b76SchristosIPv6 Address 7396dded093eSchristos h16 [0-9A-Fa-f]{1,4} 7397dded093eSchristos ls32 {h16}:{h16}|{IPv4address} 7398dded093eSchristos IPv6address ({h16}:){6}{ls32}| 7399dded093eSchristos ::({h16}:){5}{ls32}| 7400dded093eSchristos ({h16})?::({h16}:){4}{ls32}| 7401dded093eSchristos (({h16}:){0,1}{h16})?::({h16}:){3}{ls32}| 7402dded093eSchristos (({h16}:){0,2}{h16})?::({h16}:){2}{ls32}| 7403dded093eSchristos (({h16}:){0,3}{h16})?::{h16}:{ls32}| 7404dded093eSchristos (({h16}:){0,4}{h16})?::{ls32}| 7405dded093eSchristos (({h16}:){0,5}{h16})?::{h16}| 7406dded093eSchristos (({h16}:){0,6}{h16})?:: 74073c3a7b76Schristos 7408dded093eSchristos See RFC 2373 (http://www.ietf.org/rfc/rfc2373.txt) for details. 740930da1778Schristos Note that you have to fold the definition of 'IPv6address' into one 7410dded093eSchristos line and that it also matches the "unspecified address" "::". 74113c3a7b76Schristos 74123c3a7b76SchristosURI 741330da1778Schristos '(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?' 74143c3a7b76Schristos 74153c3a7b76Schristos This pattern is nearly useless, since it allows just about any 74163c3a7b76Schristos character to appear in a URI, including spaces and control 741730da1778Schristos characters. See RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) for 741830da1778Schristos details. 74193c3a7b76Schristos 74203c3a7b76Schristos 74213c3a7b76SchristosFile: flex.info, Node: Indices, Prev: Appendices, Up: Top 74223c3a7b76Schristos 74233c3a7b76SchristosIndices 74243c3a7b76Schristos******* 74253c3a7b76Schristos 74263c3a7b76Schristos* Menu: 74273c3a7b76Schristos 74283c3a7b76Schristos* Concept Index:: 74293c3a7b76Schristos* Index of Functions and Macros:: 74303c3a7b76Schristos* Index of Variables:: 74313c3a7b76Schristos* Index of Data Types:: 74323c3a7b76Schristos* Index of Hooks:: 74333c3a7b76Schristos* Index of Scanner Options:: 74343c3a7b76Schristos 743530da1778Schristos 743630da1778SchristosFile: flex.info, Node: Concept Index, Next: Index of Functions and Macros, Prev: Indices, Up: Indices 743730da1778Schristos 743830da1778SchristosConcept Index 743930da1778Schristos============= 744030da1778Schristos 744130da1778Schristos[index] 744230da1778Schristos* Menu: 744330da1778Schristos 744430da1778Schristos* $ as normal character in patterns: Patterns. (line 275) 744530da1778Schristos* %array, advantages of: Matching. (line 43) 744630da1778Schristos* %array, use of: Matching. (line 29) 744730da1778Schristos* %array, with C++: Matching. (line 65) 744830da1778Schristos* %option noyywrapp: Generated Scanner. (line 93) 744930da1778Schristos* %pointer, and unput(): Actions. (line 162) 745030da1778Schristos* %pointer, use of: Matching. (line 29) 745130da1778Schristos* %top: Definitions Section. (line 44) 745230da1778Schristos* %{ and %}, in Definitions Section: Definitions Section. (line 40) 745330da1778Schristos* %{ and %}, in Rules Section: Actions. (line 26) 745430da1778Schristos* <<EOF>>, use of: EOF. (line 33) 745530da1778Schristos* [] in patterns: Patterns. (line 15) 745630da1778Schristos* ^ as non-special character in patterns: Patterns. (line 275) 745730da1778Schristos* |, in actions: Actions. (line 33) 745830da1778Schristos* |, use of: Actions. (line 83) 745930da1778Schristos* accessor functions, use of: Accessor Methods. (line 18) 746030da1778Schristos* actions: Actions. (line 6) 746130da1778Schristos* actions, embedded C strings: Actions. (line 26) 746230da1778Schristos* actions, redefining YY_BREAK: Misc Macros. (line 49) 746330da1778Schristos* actions, use of { and }: Actions. (line 26) 746430da1778Schristos* aliases, how to define: Definitions Section. (line 10) 746530da1778Schristos* arguments, command-line: Scanner Options. (line 6) 746630da1778Schristos* array, default size for yytext: User Values. (line 13) 746730da1778Schristos* backing up, eliminating: Performance. (line 54) 746830da1778Schristos* backing up, eliminating by adding error rules: Performance. (line 104) 746930da1778Schristos* backing up, eliminating with catch-all rule: Performance. (line 118) 747030da1778Schristos* backing up, example of eliminating: Performance. (line 49) 747130da1778Schristos* BEGIN: Actions. (line 57) 747230da1778Schristos* BEGIN, explanation: Start Conditions. (line 84) 747330da1778Schristos* beginning of line, in patterns: Patterns. (line 127) 747430da1778Schristos* bison, bridging with flex: Bison Bridge. (line 6) 747530da1778Schristos* bison, parser: Bison Bridge. (line 53) 747630da1778Schristos* bison, scanner to be called from bison: Bison Bridge. (line 34) 747730da1778Schristos* BOL, checking the BOL flag: Misc Macros. (line 46) 747830da1778Schristos* BOL, in patterns: Patterns. (line 127) 747930da1778Schristos* BOL, setting it: Misc Macros. (line 40) 748030da1778Schristos* braces in patterns: Patterns. (line 42) 748130da1778Schristos* bugs, reporting: Reporting Bugs. (line 6) 748230da1778Schristos* C code in flex input: Definitions Section. (line 40) 748330da1778Schristos* C++: Cxx. (line 9) 748430da1778Schristos* C++ and %array: User Values. (line 23) 748530da1778Schristos* C++ I/O, customizing: How do I use my own I/O classes in a C++ scanner?. 748630da1778Schristos (line 9) 748730da1778Schristos* C++ scanners, including multiple scanners: Cxx. (line 197) 748830da1778Schristos* C++ scanners, use of: Cxx. (line 128) 748930da1778Schristos* c++, experimental form of scanner class: Cxx. (line 6) 749030da1778Schristos* C++, multiple different scanners: Cxx. (line 192) 749130da1778Schristos* C-strings, in actions: Actions. (line 26) 749230da1778Schristos* case-insensitive, effect on character classes: Patterns. (line 216) 749330da1778Schristos* character classes in patterns: Patterns. (line 186) 749430da1778Schristos* character classes in patterns, syntax of: Patterns. (line 15) 749530da1778Schristos* character classes, equivalence of: Patterns. (line 205) 749630da1778Schristos* clearing an input buffer: Multiple Input Buffers. 749730da1778Schristos (line 66) 749830da1778Schristos* command-line options: Scanner Options. (line 6) 749930da1778Schristos* comments in flex input: Definitions Section. (line 37) 750030da1778Schristos* comments in the input: Comments in the Input. 750130da1778Schristos (line 24) 750230da1778Schristos* comments, discarding: Actions. (line 176) 750330da1778Schristos* comments, example of scanning C comments: Start Conditions. (line 140) 750430da1778Schristos* comments, in actions: Actions. (line 26) 750530da1778Schristos* comments, in rules section: Comments in the Input. 750630da1778Schristos (line 11) 750730da1778Schristos* comments, syntax of: Comments in the Input. 750830da1778Schristos (line 6) 750930da1778Schristos* comments, valid uses of: Comments in the Input. 751030da1778Schristos (line 24) 751130da1778Schristos* compressing whitespace: Actions. (line 22) 751230da1778Schristos* concatenation, in patterns: Patterns. (line 111) 751330da1778Schristos* copyright of flex: Copyright. (line 6) 751430da1778Schristos* counting characters and lines: Simple Examples. (line 23) 751530da1778Schristos* customizing I/O in C++ scanners: How do I use my own I/O classes in a C++ scanner?. 751630da1778Schristos (line 9) 751730da1778Schristos* default rule: Simple Examples. (line 15) 751830da1778Schristos* default rule <1>: Matching. (line 20) 751930da1778Schristos* defining pattern aliases: Definitions Section. (line 21) 752030da1778Schristos* Definitions, in flex input: Definitions Section. (line 6) 752130da1778Schristos* deleting lines from input: Actions. (line 13) 752230da1778Schristos* discarding C comments: Actions. (line 176) 752330da1778Schristos* distributing flex: Copyright. (line 6) 752430da1778Schristos* ECHO: Actions. (line 54) 752530da1778Schristos* ECHO, and yyout: Generated Scanner. (line 101) 752630da1778Schristos* embedding C code in flex input: Definitions Section. (line 40) 752730da1778Schristos* end of file, in patterns: Patterns. (line 150) 752830da1778Schristos* end of line, in negated character classes: Patterns. (line 237) 752930da1778Schristos* end of line, in patterns: Patterns. (line 131) 753030da1778Schristos* end-of-file, and yyrestart(): Generated Scanner. (line 42) 753130da1778Schristos* EOF and yyrestart(): Generated Scanner. (line 42) 753230da1778Schristos* EOF in patterns, syntax of: Patterns. (line 150) 753330da1778Schristos* EOF, example using multiple input buffers: Multiple Input Buffers. 753430da1778Schristos (line 81) 753530da1778Schristos* EOF, explanation: EOF. (line 6) 753630da1778Schristos* EOF, pushing back: Actions. (line 170) 753730da1778Schristos* EOL, in negated character classes: Patterns. (line 237) 753830da1778Schristos* EOL, in patterns: Patterns. (line 131) 753930da1778Schristos* error messages, end of buffer missed: Lex and Posix. (line 50) 754030da1778Schristos* error reporting, diagnostic messages: Diagnostics. (line 6) 754130da1778Schristos* error reporting, in C++: Cxx. (line 112) 754230da1778Schristos* error rules, to eliminate backing up: Performance. (line 102) 754330da1778Schristos* escape sequences in patterns, syntax of: Patterns. (line 57) 754430da1778Schristos* exiting with yyterminate(): Actions. (line 212) 754530da1778Schristos* experimental form of c++ scanner class: Cxx. (line 6) 754630da1778Schristos* extended scope of start conditions: Start Conditions. (line 270) 754730da1778Schristos* file format: Format. (line 6) 754830da1778Schristos* file format, serialized tables: Tables File Format. (line 6) 754930da1778Schristos* flushing an input buffer: Multiple Input Buffers. 755030da1778Schristos (line 66) 755130da1778Schristos* flushing the internal buffer: Actions. (line 206) 755230da1778Schristos* format of flex input: Format. (line 6) 755330da1778Schristos* format of input file: Format. (line 9) 755430da1778Schristos* freeing tables: Loading and Unloading Serialized Tables. 755530da1778Schristos (line 6) 755630da1778Schristos* getting current start state with YY_START: Start Conditions. 755730da1778Schristos (line 189) 755830da1778Schristos* halting with yyterminate(): Actions. (line 212) 755930da1778Schristos* handling include files with multiple input buffers: Multiple Input Buffers. 756030da1778Schristos (line 87) 756130da1778Schristos* handling include files with multiple input buffers <1>: Multiple Input Buffers. 756230da1778Schristos (line 122) 756330da1778Schristos* header files, with C++: Cxx. (line 197) 756430da1778Schristos* include files, with C++: Cxx. (line 197) 756530da1778Schristos* input file, Definitions section: Definitions Section. (line 6) 756630da1778Schristos* input file, Rules Section: Rules Section. (line 6) 756730da1778Schristos* input file, user code Section: User Code Section. (line 6) 756830da1778Schristos* input(): Actions. (line 173) 756930da1778Schristos* input(), and C++: Actions. (line 202) 757030da1778Schristos* input, format of: Format. (line 6) 757130da1778Schristos* input, matching: Matching. (line 6) 757230da1778Schristos* keywords, for performance: Performance. (line 200) 757330da1778Schristos* lex (traditional) and POSIX: Lex and Posix. (line 6) 757430da1778Schristos* LexerInput, overriding: How do I use my own I/O classes in a C++ scanner?. 757530da1778Schristos (line 9) 757630da1778Schristos* LexerOutput, overriding: How do I use my own I/O classes in a C++ scanner?. 757730da1778Schristos (line 9) 757830da1778Schristos* limitations of flex: Limitations. (line 6) 757930da1778Schristos* literal text in patterns, syntax of: Patterns. (line 54) 758030da1778Schristos* loading tables at runtime: Loading and Unloading Serialized Tables. 758130da1778Schristos (line 6) 758230da1778Schristos* m4: M4 Dependency. (line 6) 758330da1778Schristos* Makefile, example of implicit rules: Makefiles and Flex. (line 21) 758430da1778Schristos* Makefile, explicit example: Makefiles and Flex. (line 33) 758530da1778Schristos* Makefile, syntax: Makefiles and Flex. (line 6) 758630da1778Schristos* matching C-style double-quoted strings: Start Conditions. (line 203) 758730da1778Schristos* matching, and trailing context: Matching. (line 6) 758830da1778Schristos* matching, length of: Matching. (line 6) 758930da1778Schristos* matching, multiple matches: Matching. (line 6) 759030da1778Schristos* member functions, C++: Cxx. (line 9) 759130da1778Schristos* memory management: Memory Management. (line 6) 759230da1778Schristos* memory, allocating input buffers: Multiple Input Buffers. 759330da1778Schristos (line 19) 759430da1778Schristos* memory, considerations for reentrant scanners: Init and Destroy Functions. 759530da1778Schristos (line 6) 759630da1778Schristos* memory, deleting input buffers: Multiple Input Buffers. 759730da1778Schristos (line 46) 759830da1778Schristos* memory, for start condition stacks: Start Conditions. (line 301) 759930da1778Schristos* memory, serialized tables: Serialized Tables. (line 6) 760030da1778Schristos* memory, serialized tables <1>: Loading and Unloading Serialized Tables. 760130da1778Schristos (line 6) 760230da1778Schristos* methods, c++: Cxx. (line 9) 760330da1778Schristos* minimal scanner: Matching. (line 24) 760430da1778Schristos* multiple input streams: Multiple Input Buffers. 760530da1778Schristos (line 6) 760630da1778Schristos* name definitions, not POSIX: Lex and Posix. (line 75) 760730da1778Schristos* negating ranges in patterns: Patterns. (line 23) 760830da1778Schristos* newline, matching in patterns: Patterns. (line 135) 760930da1778Schristos* non-POSIX features of flex: Lex and Posix. (line 142) 761030da1778Schristos* noyywrap, %option: Generated Scanner. (line 93) 761130da1778Schristos* NULL character in patterns, syntax of: Patterns. (line 62) 761230da1778Schristos* octal characters in patterns: Patterns. (line 65) 761330da1778Schristos* options, command-line: Scanner Options. (line 6) 761430da1778Schristos* overriding LexerInput: How do I use my own I/O classes in a C++ scanner?. 761530da1778Schristos (line 9) 761630da1778Schristos* overriding LexerOutput: How do I use my own I/O classes in a C++ scanner?. 761730da1778Schristos (line 9) 761830da1778Schristos* overriding the memory routines: Overriding The Default Memory Management. 761930da1778Schristos (line 38) 762030da1778Schristos* Pascal-like language: Simple Examples. (line 49) 762130da1778Schristos* pattern aliases, defining: Definitions Section. (line 21) 762230da1778Schristos* pattern aliases, expansion of: Patterns. (line 51) 762330da1778Schristos* pattern aliases, how to define: Definitions Section. (line 10) 762430da1778Schristos* pattern aliases, use of: Definitions Section. (line 28) 762530da1778Schristos* patterns and actions on different lines: Lex and Posix. (line 101) 762630da1778Schristos* patterns, character class equivalence: Patterns. (line 205) 762730da1778Schristos* patterns, common: Common Patterns. (line 6) 762830da1778Schristos* patterns, end of line: Patterns. (line 300) 762930da1778Schristos* patterns, grouping and precedence: Patterns. (line 167) 763030da1778Schristos* patterns, in rules section: Patterns. (line 6) 763130da1778Schristos* patterns, invalid trailing context: Patterns. (line 285) 763230da1778Schristos* patterns, matching: Matching. (line 6) 763330da1778Schristos* patterns, precedence of operators: Patterns. (line 161) 763430da1778Schristos* patterns, repetitions with grouping: Patterns. (line 184) 763530da1778Schristos* patterns, special characters treated as non-special: Patterns. 763630da1778Schristos (line 293) 763730da1778Schristos* patterns, syntax: Patterns. (line 9) 763830da1778Schristos* patterns, syntax <1>: Patterns. (line 9) 763930da1778Schristos* patterns, tuning for performance: Performance. (line 49) 764030da1778Schristos* patterns, valid character classes: Patterns. (line 192) 764130da1778Schristos* performance optimization, matching longer tokens: Performance. 764230da1778Schristos (line 167) 764330da1778Schristos* performance optimization, recognizing keywords: Performance. 764430da1778Schristos (line 205) 764530da1778Schristos* performance, backing up: Performance. (line 49) 764630da1778Schristos* performance, considerations: Performance. (line 6) 764730da1778Schristos* performance, using keywords: Performance. (line 200) 764830da1778Schristos* popping an input buffer: Multiple Input Buffers. 764930da1778Schristos (line 60) 765030da1778Schristos* POSIX and lex: Lex and Posix. (line 6) 765130da1778Schristos* POSIX comp;compliance: Lex and Posix. (line 142) 765230da1778Schristos* POSIX, character classes in patterns, syntax of: Patterns. (line 15) 765330da1778Schristos* preprocessor macros, for use in actions: Actions. (line 50) 765430da1778Schristos* pushing an input buffer: Multiple Input Buffers. 765530da1778Schristos (line 52) 765630da1778Schristos* pushing back characters with unput: Actions. (line 143) 765730da1778Schristos* pushing back characters with unput(): Actions. (line 147) 765830da1778Schristos* pushing back characters with yyless: Actions. (line 131) 765930da1778Schristos* pushing back EOF: Actions. (line 170) 766030da1778Schristos* ranges in patterns: Patterns. (line 19) 766130da1778Schristos* ranges in patterns, negating: Patterns. (line 23) 766230da1778Schristos* recognizing C comments: Start Conditions. (line 143) 766330da1778Schristos* reentrant scanners, multiple interleaved scanners: Reentrant Uses. 766430da1778Schristos (line 10) 766530da1778Schristos* reentrant scanners, recursive invocation: Reentrant Uses. (line 30) 766630da1778Schristos* reentrant, accessing flex variables: Global Replacement. (line 6) 766730da1778Schristos* reentrant, accessor functions: Accessor Methods. (line 6) 766830da1778Schristos* reentrant, API explanation: Reentrant Overview. (line 6) 766930da1778Schristos* reentrant, calling functions: Extra Reentrant Argument. 767030da1778Schristos (line 6) 767130da1778Schristos* reentrant, example of: Reentrant Example. (line 6) 767230da1778Schristos* reentrant, explanation: Reentrant. (line 6) 767330da1778Schristos* reentrant, extra data: Extra Data. (line 6) 767430da1778Schristos* reentrant, initialization: Init and Destroy Functions. 767530da1778Schristos (line 6) 767630da1778Schristos* regular expressions, in patterns: Patterns. (line 6) 767730da1778Schristos* REJECT: Actions. (line 61) 767830da1778Schristos* REJECT, calling multiple times: Actions. (line 83) 767930da1778Schristos* REJECT, performance costs: Performance. (line 12) 768030da1778Schristos* reporting bugs: Reporting Bugs. (line 6) 768130da1778Schristos* restarting the scanner: Lex and Posix. (line 54) 768230da1778Schristos* RETURN, within actions: Generated Scanner. (line 57) 768330da1778Schristos* rules, default: Simple Examples. (line 15) 768430da1778Schristos* rules, in flex input: Rules Section. (line 6) 768530da1778Schristos* scanner, definition of: Introduction. (line 6) 768630da1778Schristos* sections of flex input: Format. (line 6) 768730da1778Schristos* serialization: Serialized Tables. (line 6) 768830da1778Schristos* serialization of tables: Creating Serialized Tables. 768930da1778Schristos (line 6) 769030da1778Schristos* serialized tables, multiple scanners: Creating Serialized Tables. 769130da1778Schristos (line 26) 769230da1778Schristos* stack, input buffer pop: Multiple Input Buffers. 769330da1778Schristos (line 60) 769430da1778Schristos* stack, input buffer push: Multiple Input Buffers. 769530da1778Schristos (line 52) 769630da1778Schristos* stacks, routines for manipulating: Start Conditions. (line 286) 769730da1778Schristos* start condition, applying to multiple patterns: Start Conditions. 769830da1778Schristos (line 258) 769930da1778Schristos* start conditions: Start Conditions. (line 6) 770030da1778Schristos* start conditions, behavior of default rule: Start Conditions. 770130da1778Schristos (line 82) 770230da1778Schristos* start conditions, exclusive: Start Conditions. (line 53) 770330da1778Schristos* start conditions, for different interpretations of same input: Start Conditions. 770430da1778Schristos (line 112) 770530da1778Schristos* start conditions, in patterns: Patterns. (line 140) 770630da1778Schristos* start conditions, inclusive: Start Conditions. (line 44) 770730da1778Schristos* start conditions, inclusive v.s. exclusive: Start Conditions. 770830da1778Schristos (line 24) 770930da1778Schristos* start conditions, integer values: Start Conditions. (line 163) 771030da1778Schristos* start conditions, multiple: Start Conditions. (line 17) 771130da1778Schristos* start conditions, special wildcard condition: Start Conditions. 771230da1778Schristos (line 68) 771330da1778Schristos* start conditions, use of a stack: Start Conditions. (line 286) 771430da1778Schristos* start conditions, use of wildcard condition (<*>): Start Conditions. 771530da1778Schristos (line 72) 771630da1778Schristos* start conditions, using BEGIN: Start Conditions. (line 95) 771730da1778Schristos* stdin, default for yyin: Generated Scanner. (line 37) 771830da1778Schristos* stdout, as default for yyout: Generated Scanner. (line 101) 771930da1778Schristos* strings, scanning strings instead of files: Multiple Input Buffers. 772030da1778Schristos (line 175) 772130da1778Schristos* tables, creating serialized: Creating Serialized Tables. 772230da1778Schristos (line 6) 772330da1778Schristos* tables, file format: Tables File Format. (line 6) 772430da1778Schristos* tables, freeing: Loading and Unloading Serialized Tables. 772530da1778Schristos (line 6) 772630da1778Schristos* tables, loading and unloading: Loading and Unloading Serialized Tables. 772730da1778Schristos (line 6) 772830da1778Schristos* terminating with yyterminate(): Actions. (line 212) 772930da1778Schristos* token: Matching. (line 14) 773030da1778Schristos* trailing context, in patterns: Patterns. (line 118) 773130da1778Schristos* trailing context, limits of: Patterns. (line 275) 773230da1778Schristos* trailing context, matching: Matching. (line 6) 773330da1778Schristos* trailing context, performance costs: Performance. (line 12) 773430da1778Schristos* trailing context, variable length: Performance. (line 141) 773530da1778Schristos* unput(): Actions. (line 143) 773630da1778Schristos* unput(), and %pointer: Actions. (line 162) 773730da1778Schristos* unput(), pushing back characters: Actions. (line 147) 773830da1778Schristos* user code, in flex input: User Code Section. (line 6) 773930da1778Schristos* username expansion: Simple Examples. (line 8) 774030da1778Schristos* using integer values of start condition names: Start Conditions. 774130da1778Schristos (line 163) 774230da1778Schristos* verbatim text in patterns, syntax of: Patterns. (line 54) 774330da1778Schristos* warning, dangerous trailing context: Limitations. (line 20) 774430da1778Schristos* warning, rule cannot be matched: Diagnostics. (line 14) 774530da1778Schristos* warnings, diagnostic messages: Diagnostics. (line 6) 774630da1778Schristos* whitespace, compressing: Actions. (line 22) 774730da1778Schristos* yacc interface: Yacc. (line 17) 774830da1778Schristos* yacc, interface: Yacc. (line 6) 774930da1778Schristos* yyalloc, overriding: Overriding The Default Memory Management. 775030da1778Schristos (line 6) 775130da1778Schristos* yyfree, overriding: Overriding The Default Memory Management. 775230da1778Schristos (line 6) 775330da1778Schristos* yyin: Generated Scanner. (line 37) 775430da1778Schristos* yyinput(): Actions. (line 202) 775530da1778Schristos* yyleng: Matching. (line 14) 775630da1778Schristos* yyleng, modification of: Actions. (line 47) 775730da1778Schristos* yyless(): Actions. (line 125) 775830da1778Schristos* yyless(), pushing back characters: Actions. (line 131) 775930da1778Schristos* yylex(), in generated scanner: Generated Scanner. (line 6) 776030da1778Schristos* yylex(), overriding: Generated Scanner. (line 16) 776130da1778Schristos* yylex, overriding the prototype of: Generated Scanner. (line 20) 776230da1778Schristos* yylineno, in a reentrant scanner: Reentrant Functions. (line 36) 776330da1778Schristos* yylineno, performance costs: Performance. (line 12) 776430da1778Schristos* yymore(): Actions. (line 104) 776530da1778Schristos* yymore() to append token to previous token: Actions. (line 110) 776630da1778Schristos* yymore(), mega-kludge: Actions. (line 110) 776730da1778Schristos* yymore, and yyleng: Actions. (line 47) 776830da1778Schristos* yymore, performance penalty of: Actions. (line 119) 776930da1778Schristos* yyout: Generated Scanner. (line 101) 777030da1778Schristos* yyrealloc, overriding: Overriding The Default Memory Management. 777130da1778Schristos (line 6) 777230da1778Schristos* yyrestart(): Generated Scanner. (line 42) 777330da1778Schristos* yyterminate(): Actions. (line 212) 777430da1778Schristos* yytext: Matching. (line 14) 777530da1778Schristos* yytext, default array size: User Values. (line 13) 777630da1778Schristos* yytext, memory considerations: A Note About yytext And Memory. 777730da1778Schristos (line 6) 777830da1778Schristos* yytext, modification of: Actions. (line 42) 777930da1778Schristos* yytext, two types of: Matching. (line 29) 778030da1778Schristos* yywrap(): Generated Scanner. (line 85) 778130da1778Schristos* yywrap, default for: Generated Scanner. (line 93) 778230da1778Schristos* YY_CURRENT_BUFFER, and multiple buffers Finally, the macro: Multiple Input Buffers. 778330da1778Schristos (line 78) 778430da1778Schristos* YY_EXTRA_TYPE, defining your own type: Extra Data. (line 33) 778530da1778Schristos* YY_FLUSH_BUFFER: Actions. (line 206) 778630da1778Schristos* YY_INPUT: Generated Scanner. (line 61) 778730da1778Schristos* YY_INPUT, overriding: Generated Scanner. (line 71) 778830da1778Schristos* YY_START, example: Start Conditions. (line 185) 778930da1778Schristos* YY_USER_ACTION to track each time a rule is matched: Misc Macros. 779030da1778Schristos (line 14) 779130da1778Schristos 7792