xref: /netbsd-src/external/bsd/flex/dist/doc/flex.info-1 (revision 463ae347b383ca644b1399e7e5228310d0bdf969)
156bd8546SchristosThis is flex.info, produced by makeinfo version 6.1 from flex.texi.
23c3a7b76Schristos
33c3a7b76SchristosThe flex manual is placed under the same licensing conditions as the
43c3a7b76Schristosrest of flex:
53c3a7b76Schristos
630da1778Schristos   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex
730da1778SchristosProject.
83c3a7b76Schristos
93c3a7b76Schristos   Copyright (C) 1990, 1997 The Regents of the University of California.
103c3a7b76SchristosAll rights reserved.
113c3a7b76Schristos
123c3a7b76Schristos   This code is derived from software contributed to Berkeley by Vern
133c3a7b76SchristosPaxson.
143c3a7b76Schristos
153c3a7b76Schristos   The United States Government has rights in this work pursuant to
163c3a7b76Schristoscontract no.  DE-AC03-76SF00098 between the United States Department of
173c3a7b76SchristosEnergy and the University of California.
183c3a7b76Schristos
193c3a7b76Schristos   Redistribution and use in source and binary forms, with or without
203c3a7b76Schristosmodification, are permitted provided that the following conditions are
213c3a7b76Schristosmet:
223c3a7b76Schristos
233c3a7b76Schristos  1. Redistributions of source code must retain the above copyright
243c3a7b76Schristos     notice, this list of conditions and the following disclaimer.
253c3a7b76Schristos
263c3a7b76Schristos  2. Redistributions in binary form must reproduce the above copyright
273c3a7b76Schristos     notice, this list of conditions and the following disclaimer in the
283c3a7b76Schristos     documentation and/or other materials provided with the
293c3a7b76Schristos     distribution.
303c3a7b76Schristos
313c3a7b76Schristos   Neither the name of the University nor the names of its contributors
323c3a7b76Schristosmay be used to endorse or promote products derived from this software
333c3a7b76Schristoswithout specific prior written permission.
343c3a7b76Schristos
353c3a7b76Schristos   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
363c3a7b76SchristosWARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
373c3a7b76SchristosMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
3830da1778SchristosINFO-DIR-SECTION Programming
3930da1778SchristosSTART-INFO-DIR-ENTRY
4030da1778Schristos* flex: (flex).      Fast lexical analyzer generator (lex replacement).
4130da1778SchristosEND-INFO-DIR-ENTRY
423c3a7b76Schristos
433c3a7b76Schristos
443c3a7b76SchristosFile: flex.info,  Node: Top,  Next: Copyright,  Prev: (dir),  Up: (dir)
453c3a7b76Schristos
463c3a7b76Schristosflex
473c3a7b76Schristos****
483c3a7b76Schristos
4930da1778SchristosThis manual describes 'flex', a tool for generating programs that
503c3a7b76Schristosperform pattern-matching on text.  The manual includes both tutorial and
513c3a7b76Schristosreference sections.
523c3a7b76Schristos
53*463ae347Schristos   This edition of 'The flex Manual' documents 'flex' version 2.6.4.  It
54*463ae347Schristoswas last updated on 6 May 2017.
553c3a7b76Schristos
563c3a7b76Schristos   This manual was written by Vern Paxson, Will Estes and John Millaway.
573c3a7b76Schristos
583c3a7b76Schristos* Menu:
593c3a7b76Schristos
603c3a7b76Schristos* Copyright::
613c3a7b76Schristos* Reporting Bugs::
623c3a7b76Schristos* Introduction::
633c3a7b76Schristos* Simple Examples::
643c3a7b76Schristos* Format::
653c3a7b76Schristos* Patterns::
663c3a7b76Schristos* Matching::
673c3a7b76Schristos* Actions::
683c3a7b76Schristos* Generated Scanner::
693c3a7b76Schristos* Start Conditions::
703c3a7b76Schristos* Multiple Input Buffers::
713c3a7b76Schristos* EOF::
723c3a7b76Schristos* Misc Macros::
733c3a7b76Schristos* User Values::
743c3a7b76Schristos* Yacc::
753c3a7b76Schristos* Scanner Options::
763c3a7b76Schristos* Performance::
773c3a7b76Schristos* Cxx::
783c3a7b76Schristos* Reentrant::
793c3a7b76Schristos* Lex and Posix::
803c3a7b76Schristos* Memory Management::
813c3a7b76Schristos* Serialized Tables::
823c3a7b76Schristos* Diagnostics::
833c3a7b76Schristos* Limitations::
843c3a7b76Schristos* Bibliography::
853c3a7b76Schristos* FAQ::
863c3a7b76Schristos* Appendices::
873c3a7b76Schristos* Indices::
883c3a7b76Schristos
8930da1778Schristos -- The Detailed Node Listing --
903c3a7b76Schristos
913c3a7b76SchristosFormat of the Input File
923c3a7b76Schristos
933c3a7b76Schristos* Definitions Section::
943c3a7b76Schristos* Rules Section::
953c3a7b76Schristos* User Code Section::
963c3a7b76Schristos* Comments in the Input::
973c3a7b76Schristos
983c3a7b76SchristosScanner Options
993c3a7b76Schristos
1003c3a7b76Schristos* Options for Specifying Filenames::
1013c3a7b76Schristos* Options Affecting Scanner Behavior::
1023c3a7b76Schristos* Code-Level And API Options::
1033c3a7b76Schristos* Options for Scanner Speed and Size::
1043c3a7b76Schristos* Debugging Options::
1053c3a7b76Schristos* Miscellaneous Options::
1063c3a7b76Schristos
1073c3a7b76SchristosReentrant C Scanners
1083c3a7b76Schristos
1093c3a7b76Schristos* Reentrant Uses::
1103c3a7b76Schristos* Reentrant Overview::
1113c3a7b76Schristos* Reentrant Example::
1123c3a7b76Schristos* Reentrant Detail::
1133c3a7b76Schristos* Reentrant Functions::
1143c3a7b76Schristos
1153c3a7b76SchristosThe Reentrant API in Detail
1163c3a7b76Schristos
1173c3a7b76Schristos* Specify Reentrant::
1183c3a7b76Schristos* Extra Reentrant Argument::
1193c3a7b76Schristos* Global Replacement::
1203c3a7b76Schristos* Init and Destroy Functions::
1213c3a7b76Schristos* Accessor Methods::
1223c3a7b76Schristos* Extra Data::
1233c3a7b76Schristos* About yyscan_t::
1243c3a7b76Schristos
1253c3a7b76SchristosMemory Management
1263c3a7b76Schristos
1273c3a7b76Schristos* The Default Memory Management::
1283c3a7b76Schristos* Overriding The Default Memory Management::
1293c3a7b76Schristos* A Note About yytext And Memory::
1303c3a7b76Schristos
1313c3a7b76SchristosSerialized Tables
1323c3a7b76Schristos
1333c3a7b76Schristos* Creating Serialized Tables::
1343c3a7b76Schristos* Loading and Unloading Serialized Tables::
1353c3a7b76Schristos* Tables File Format::
1363c3a7b76Schristos
1373c3a7b76SchristosFAQ
1383c3a7b76Schristos
1393c3a7b76Schristos* When was flex born?::
1403c3a7b76Schristos* How do I expand backslash-escape sequences in C-style quoted strings?::
1413c3a7b76Schristos* Why do flex scanners call fileno if it is not ANSI compatible?::
1423c3a7b76Schristos* Does flex support recursive pattern definitions?::
1433c3a7b76Schristos* How do I skip huge chunks of input (tens of megabytes) while using flex?::
1443c3a7b76Schristos* Flex is not matching my patterns in the same order that I defined them.::
1453c3a7b76Schristos* My actions are executing out of order or sometimes not at all.::
1463c3a7b76Schristos* How can I have multiple input sources feed into the same scanner at the same time?::
1473c3a7b76Schristos* Can I build nested parsers that work with the same input file?::
1483c3a7b76Schristos* How can I match text only at the end of a file?::
1493c3a7b76Schristos* How can I make REJECT cascade across start condition boundaries?::
1503c3a7b76Schristos* Why cant I use fast or full tables with interactive mode?::
1513c3a7b76Schristos* How much faster is -F or -f than -C?::
1523c3a7b76Schristos* If I have a simple grammar cant I just parse it with flex?::
1533c3a7b76Schristos* Why doesn't yyrestart() set the start state back to INITIAL?::
1543c3a7b76Schristos* How can I match C-style comments?::
1553c3a7b76Schristos* The period isn't working the way I expected.::
1563c3a7b76Schristos* Can I get the flex manual in another format?::
1573c3a7b76Schristos* Does there exist a "faster" NDFA->DFA algorithm?::
1583c3a7b76Schristos* How does flex compile the DFA so quickly?::
1593c3a7b76Schristos* How can I use more than 8192 rules?::
1603c3a7b76Schristos* How do I abandon a file in the middle of a scan and switch to a new file?::
1613c3a7b76Schristos* How do I execute code only during initialization (only before the first scan)?::
1623c3a7b76Schristos* How do I execute code at termination?::
1633c3a7b76Schristos* Where else can I find help?::
1643c3a7b76Schristos* Can I include comments in the "rules" section of the file?::
1653c3a7b76Schristos* I get an error about undefined yywrap().::
1663c3a7b76Schristos* How can I change the matching pattern at run time?::
1673c3a7b76Schristos* How can I expand macros in the input?::
1683c3a7b76Schristos* How can I build a two-pass scanner?::
1693c3a7b76Schristos* How do I match any string not matched in the preceding rules?::
1703c3a7b76Schristos* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
1713c3a7b76Schristos* Is there a way to make flex treat NULL like a regular character?::
1723c3a7b76Schristos* Whenever flex can not match the input it says "flex scanner jammed".::
1733c3a7b76Schristos* Why doesn't flex have non-greedy operators like perl does?::
1743c3a7b76Schristos* Memory leak - 16386 bytes allocated by malloc.::
1753c3a7b76Schristos* How do I track the byte offset for lseek()?::
1763c3a7b76Schristos* How do I use my own I/O classes in a C++ scanner?::
1773c3a7b76Schristos* How do I skip as many chars as possible?::
1783c3a7b76Schristos* deleteme00::
1793c3a7b76Schristos* Are certain equivalent patterns faster than others?::
1803c3a7b76Schristos* Is backing up a big deal?::
1813c3a7b76Schristos* Can I fake multi-byte character support?::
1823c3a7b76Schristos* deleteme01::
1833c3a7b76Schristos* Can you discuss some flex internals?::
1843c3a7b76Schristos* unput() messes up yy_at_bol::
1853c3a7b76Schristos* The | operator is not doing what I want::
1863c3a7b76Schristos* Why can't flex understand this variable trailing context pattern?::
1873c3a7b76Schristos* The ^ operator isn't working::
1883c3a7b76Schristos* Trailing context is getting confused with trailing optional patterns::
1893c3a7b76Schristos* Is flex GNU or not?::
1903c3a7b76Schristos* ERASEME53::
1913c3a7b76Schristos* I need to scan if-then-else blocks and while loops::
1923c3a7b76Schristos* ERASEME55::
1933c3a7b76Schristos* ERASEME56::
1943c3a7b76Schristos* ERASEME57::
1953c3a7b76Schristos* Is there a repository for flex scanners?::
1963c3a7b76Schristos* How can I conditionally compile or preprocess my flex input file?::
1973c3a7b76Schristos* Where can I find grammars for lex and yacc?::
1983c3a7b76Schristos* I get an end-of-buffer message for each character scanned.::
1993c3a7b76Schristos* unnamed-faq-62::
2003c3a7b76Schristos* unnamed-faq-63::
2013c3a7b76Schristos* unnamed-faq-64::
2023c3a7b76Schristos* unnamed-faq-65::
2033c3a7b76Schristos* unnamed-faq-66::
2043c3a7b76Schristos* unnamed-faq-67::
2053c3a7b76Schristos* unnamed-faq-68::
2063c3a7b76Schristos* unnamed-faq-69::
2073c3a7b76Schristos* unnamed-faq-70::
2083c3a7b76Schristos* unnamed-faq-71::
2093c3a7b76Schristos* unnamed-faq-72::
2103c3a7b76Schristos* unnamed-faq-73::
2113c3a7b76Schristos* unnamed-faq-74::
2123c3a7b76Schristos* unnamed-faq-75::
2133c3a7b76Schristos* unnamed-faq-76::
2143c3a7b76Schristos* unnamed-faq-77::
2153c3a7b76Schristos* unnamed-faq-78::
2163c3a7b76Schristos* unnamed-faq-79::
2173c3a7b76Schristos* unnamed-faq-80::
2183c3a7b76Schristos* unnamed-faq-81::
2193c3a7b76Schristos* unnamed-faq-82::
2203c3a7b76Schristos* unnamed-faq-83::
2213c3a7b76Schristos* unnamed-faq-84::
2223c3a7b76Schristos* unnamed-faq-85::
2233c3a7b76Schristos* unnamed-faq-86::
2243c3a7b76Schristos* unnamed-faq-87::
2253c3a7b76Schristos* unnamed-faq-88::
2263c3a7b76Schristos* unnamed-faq-90::
2273c3a7b76Schristos* unnamed-faq-91::
2283c3a7b76Schristos* unnamed-faq-92::
2293c3a7b76Schristos* unnamed-faq-93::
2303c3a7b76Schristos* unnamed-faq-94::
2313c3a7b76Schristos* unnamed-faq-95::
2323c3a7b76Schristos* unnamed-faq-96::
2333c3a7b76Schristos* unnamed-faq-97::
2343c3a7b76Schristos* unnamed-faq-98::
2353c3a7b76Schristos* unnamed-faq-99::
2363c3a7b76Schristos* unnamed-faq-100::
2373c3a7b76Schristos* unnamed-faq-101::
2383c3a7b76Schristos* What is the difference between YYLEX_PARAM and YY_DECL?::
2393c3a7b76Schristos* Why do I get "conflicting types for yylex" error?::
2403c3a7b76Schristos* How do I access the values set in a Flex action from within a Bison action?::
2413c3a7b76Schristos
2423c3a7b76SchristosAppendices
2433c3a7b76Schristos
2443c3a7b76Schristos* Makefiles and Flex::
2453c3a7b76Schristos* Bison Bridge::
2463c3a7b76Schristos* M4 Dependency::
2473c3a7b76Schristos* Common Patterns::
2483c3a7b76Schristos
2493c3a7b76SchristosIndices
2503c3a7b76Schristos
2513c3a7b76Schristos* Concept Index::
2523c3a7b76Schristos* Index of Functions and Macros::
2533c3a7b76Schristos* Index of Variables::
2543c3a7b76Schristos* Index of Data Types::
2553c3a7b76Schristos* Index of Hooks::
2563c3a7b76Schristos* Index of Scanner Options::
2573c3a7b76Schristos
25830da1778Schristos
2593c3a7b76Schristos
2603c3a7b76SchristosFile: flex.info,  Node: Copyright,  Next: Reporting Bugs,  Prev: Top,  Up: Top
2613c3a7b76Schristos
2623c3a7b76Schristos1 Copyright
2633c3a7b76Schristos***********
2643c3a7b76Schristos
2653c3a7b76SchristosThe flex manual is placed under the same licensing conditions as the
2663c3a7b76Schristosrest of flex:
2673c3a7b76Schristos
26830da1778Schristos   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2012 The Flex
26930da1778SchristosProject.
2703c3a7b76Schristos
2713c3a7b76Schristos   Copyright (C) 1990, 1997 The Regents of the University of California.
2723c3a7b76SchristosAll rights reserved.
2733c3a7b76Schristos
2743c3a7b76Schristos   This code is derived from software contributed to Berkeley by Vern
2753c3a7b76SchristosPaxson.
2763c3a7b76Schristos
2773c3a7b76Schristos   The United States Government has rights in this work pursuant to
2783c3a7b76Schristoscontract no.  DE-AC03-76SF00098 between the United States Department of
2793c3a7b76SchristosEnergy and the University of California.
2803c3a7b76Schristos
2813c3a7b76Schristos   Redistribution and use in source and binary forms, with or without
2823c3a7b76Schristosmodification, are permitted provided that the following conditions are
2833c3a7b76Schristosmet:
2843c3a7b76Schristos
2853c3a7b76Schristos  1. Redistributions of source code must retain the above copyright
2863c3a7b76Schristos     notice, this list of conditions and the following disclaimer.
2873c3a7b76Schristos
2883c3a7b76Schristos  2. Redistributions in binary form must reproduce the above copyright
2893c3a7b76Schristos     notice, this list of conditions and the following disclaimer in the
2903c3a7b76Schristos     documentation and/or other materials provided with the
2913c3a7b76Schristos     distribution.
2923c3a7b76Schristos
2933c3a7b76Schristos   Neither the name of the University nor the names of its contributors
2943c3a7b76Schristosmay be used to endorse or promote products derived from this software
2953c3a7b76Schristoswithout specific prior written permission.
2963c3a7b76Schristos
2973c3a7b76Schristos   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
2983c3a7b76SchristosWARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
2993c3a7b76SchristosMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
3003c3a7b76Schristos
3013c3a7b76Schristos
3023c3a7b76SchristosFile: flex.info,  Node: Reporting Bugs,  Next: Introduction,  Prev: Copyright,  Up: Top
3033c3a7b76Schristos
3043c3a7b76Schristos2 Reporting Bugs
3053c3a7b76Schristos****************
3063c3a7b76Schristos
30756bd8546SchristosIf you find a bug in 'flex', please report it using GitHub's issue
30856bd8546Schristostracking facility at <https://github.com/westes/flex/issues/>
3093c3a7b76Schristos
3103c3a7b76Schristos
3113c3a7b76SchristosFile: flex.info,  Node: Introduction,  Next: Simple Examples,  Prev: Reporting Bugs,  Up: Top
3123c3a7b76Schristos
3133c3a7b76Schristos3 Introduction
3143c3a7b76Schristos**************
3153c3a7b76Schristos
31630da1778Schristos'flex' is a tool for generating "scanners".  A scanner is a program
31730da1778Schristoswhich recognizes lexical patterns in text.  The 'flex' program reads the
31830da1778Schristosgiven input files, or its standard input if no file names are given, for
31930da1778Schristosa description of a scanner to generate.  The description is in the form
32030da1778Schristosof pairs of regular expressions and C code, called "rules".  'flex'
32130da1778Schristosgenerates as output a C source file, 'lex.yy.c' by default, which
32230da1778Schristosdefines a routine 'yylex()'.  This file can be compiled and linked with
32330da1778Schristosthe flex runtime library to produce an executable.  When the executable
32430da1778Schristosis run, it analyzes its input for occurrences of the regular
32530da1778Schristosexpressions.  Whenever it finds one, it executes the corresponding C
32630da1778Schristoscode.
3273c3a7b76Schristos
3283c3a7b76Schristos
3293c3a7b76SchristosFile: flex.info,  Node: Simple Examples,  Next: Format,  Prev: Introduction,  Up: Top
3303c3a7b76Schristos
3313c3a7b76Schristos4 Some Simple Examples
3323c3a7b76Schristos**********************
3333c3a7b76Schristos
33430da1778SchristosFirst some simple examples to get the flavor of how one uses 'flex'.
3353c3a7b76Schristos
33630da1778Schristos   The following 'flex' input specifies a scanner which, when it
33730da1778Schristosencounters the string 'username' will replace it with the user's login
3383c3a7b76Schristosname:
3393c3a7b76Schristos
3403c3a7b76Schristos         %%
3413c3a7b76Schristos         username    printf( "%s", getlogin() );
3423c3a7b76Schristos
34330da1778Schristos   By default, any text not matched by a 'flex' scanner is copied to the
34430da1778Schristosoutput, so the net effect of this scanner is to copy its input file to
34530da1778Schristosits output with each occurrence of 'username' expanded.  In this input,
34630da1778Schristosthere is just one rule.  'username' is the "pattern" and the 'printf' is
34730da1778Schristosthe "action".  The '%%' symbol marks the beginning of the rules.
3483c3a7b76Schristos
3493c3a7b76Schristos   Here's another simple example:
3503c3a7b76Schristos
3513c3a7b76Schristos                 int num_lines = 0, num_chars = 0;
3523c3a7b76Schristos
3533c3a7b76Schristos         %%
3543c3a7b76Schristos         \n      ++num_lines; ++num_chars;
3553c3a7b76Schristos         .       ++num_chars;
3563c3a7b76Schristos
3573c3a7b76Schristos         %%
358dded093eSchristos
359dded093eSchristos         int main()
3603c3a7b76Schristos                 {
3613c3a7b76Schristos                 yylex();
3623c3a7b76Schristos                 printf( "# of lines = %d, # of chars = %d\n",
3633c3a7b76Schristos                         num_lines, num_chars );
3643c3a7b76Schristos                 }
3653c3a7b76Schristos
3663c3a7b76Schristos   This scanner counts the number of characters and the number of lines
3673c3a7b76Schristosin its input.  It produces no output other than the final report on the
3683c3a7b76Schristoscharacter and line counts.  The first line declares two globals,
36930da1778Schristos'num_lines' and 'num_chars', which are accessible both inside 'yylex()'
37030da1778Schristosand in the 'main()' routine declared after the second '%%'.  There are
37130da1778Schristostwo rules, one which matches a newline ('\n') and increments both the
3723c3a7b76Schristosline count and the character count, and one which matches any character
37330da1778Schristosother than a newline (indicated by the '.' regular expression).
3743c3a7b76Schristos
3753c3a7b76Schristos   A somewhat more complicated example:
3763c3a7b76Schristos
3773c3a7b76Schristos         /* scanner for a toy Pascal-like language */
3783c3a7b76Schristos
3793c3a7b76Schristos         %{
3803c3a7b76Schristos         /* need this for the call to atof() below */
381dded093eSchristos         #include <math.h>
3823c3a7b76Schristos         %}
3833c3a7b76Schristos
3843c3a7b76Schristos         DIGIT    [0-9]
3853c3a7b76Schristos         ID       [a-z][a-z0-9]*
3863c3a7b76Schristos
3873c3a7b76Schristos         %%
3883c3a7b76Schristos
3893c3a7b76Schristos         {DIGIT}+    {
3903c3a7b76Schristos                     printf( "An integer: %s (%d)\n", yytext,
3913c3a7b76Schristos                             atoi( yytext ) );
3923c3a7b76Schristos                     }
3933c3a7b76Schristos
3943c3a7b76Schristos         {DIGIT}+"."{DIGIT}*        {
3953c3a7b76Schristos                     printf( "A float: %s (%g)\n", yytext,
3963c3a7b76Schristos                             atof( yytext ) );
3973c3a7b76Schristos                     }
3983c3a7b76Schristos
3993c3a7b76Schristos         if|then|begin|end|procedure|function        {
4003c3a7b76Schristos                     printf( "A keyword: %s\n", yytext );
4013c3a7b76Schristos                     }
4023c3a7b76Schristos
4033c3a7b76Schristos         {ID}        printf( "An identifier: %s\n", yytext );
4043c3a7b76Schristos
4053c3a7b76Schristos         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
4063c3a7b76Schristos
40756bd8546Schristos         "{"[^{}\n]*"}"     /* eat up one-line comments */
4083c3a7b76Schristos
4093c3a7b76Schristos         [ \t\n]+          /* eat up whitespace */
4103c3a7b76Schristos
4113c3a7b76Schristos         .           printf( "Unrecognized character: %s\n", yytext );
4123c3a7b76Schristos
4133c3a7b76Schristos         %%
4143c3a7b76Schristos
415dded093eSchristos         int main( int argc, char **argv )
4163c3a7b76Schristos             {
4173c3a7b76Schristos             ++argv, --argc;  /* skip over program name */
4183c3a7b76Schristos             if ( argc > 0 )
4193c3a7b76Schristos                     yyin = fopen( argv[0], "r" );
4203c3a7b76Schristos             else
4213c3a7b76Schristos                     yyin = stdin;
4223c3a7b76Schristos
4233c3a7b76Schristos             yylex();
4243c3a7b76Schristos             }
4253c3a7b76Schristos
4263c3a7b76Schristos   This is the beginnings of a simple scanner for a language like
4273c3a7b76SchristosPascal.  It identifies different types of "tokens" and reports on what
4283c3a7b76Schristosit has seen.
4293c3a7b76Schristos
4303c3a7b76Schristos   The details of this example will be explained in the following
4313c3a7b76Schristossections.
4323c3a7b76Schristos
4333c3a7b76Schristos
4343c3a7b76SchristosFile: flex.info,  Node: Format,  Next: Patterns,  Prev: Simple Examples,  Up: Top
4353c3a7b76Schristos
4363c3a7b76Schristos5 Format of the Input File
4373c3a7b76Schristos**************************
4383c3a7b76Schristos
43930da1778SchristosThe 'flex' input file consists of three sections, separated by a line
44030da1778Schristoscontaining only '%%'.
4413c3a7b76Schristos
4423c3a7b76Schristos         definitions
4433c3a7b76Schristos         %%
4443c3a7b76Schristos         rules
4453c3a7b76Schristos         %%
4463c3a7b76Schristos         user code
4473c3a7b76Schristos
4483c3a7b76Schristos* Menu:
4493c3a7b76Schristos
4503c3a7b76Schristos* Definitions Section::
4513c3a7b76Schristos* Rules Section::
4523c3a7b76Schristos* User Code Section::
4533c3a7b76Schristos* Comments in the Input::
4543c3a7b76Schristos
4553c3a7b76Schristos
4563c3a7b76SchristosFile: flex.info,  Node: Definitions Section,  Next: Rules Section,  Prev: Format,  Up: Format
4573c3a7b76Schristos
4583c3a7b76Schristos5.1 Format of the Definitions Section
4593c3a7b76Schristos=====================================
4603c3a7b76Schristos
4613c3a7b76SchristosThe "definitions section" contains declarations of simple "name"
4623c3a7b76Schristosdefinitions to simplify the scanner specification, and declarations of
4633c3a7b76Schristos"start conditions", which are explained in a later section.
4643c3a7b76Schristos
4653c3a7b76Schristos   Name definitions have the form:
4663c3a7b76Schristos
4673c3a7b76Schristos         name definition
4683c3a7b76Schristos
46930da1778Schristos   The 'name' is a word beginning with a letter or an underscore ('_')
47030da1778Schristosfollowed by zero or more letters, digits, '_', or '-' (dash).  The
4713c3a7b76Schristosdefinition is taken to begin at the first non-whitespace character
4723c3a7b76Schristosfollowing the name and continuing to the end of the line.  The
47330da1778Schristosdefinition can subsequently be referred to using '{name}', which will
47430da1778Schristosexpand to '(definition)'.  For example,
4753c3a7b76Schristos
4763c3a7b76Schristos         DIGIT    [0-9]
4773c3a7b76Schristos         ID       [a-z][a-z0-9]*
4783c3a7b76Schristos
47930da1778Schristos   Defines 'DIGIT' to be a regular expression which matches a single
48030da1778Schristosdigit, and 'ID' to be a regular expression which matches a letter
4813c3a7b76Schristosfollowed by zero-or-more letters-or-digits.  A subsequent reference to
4823c3a7b76Schristos
4833c3a7b76Schristos         {DIGIT}+"."{DIGIT}*
4843c3a7b76Schristos
4853c3a7b76Schristos   is identical to
4863c3a7b76Schristos
4873c3a7b76Schristos         ([0-9])+"."([0-9])*
4883c3a7b76Schristos
48930da1778Schristos   and matches one-or-more digits followed by a '.' followed by
4903c3a7b76Schristoszero-or-more digits.
4913c3a7b76Schristos
49230da1778Schristos   An unindented comment (i.e., a line beginning with '/*') is copied
49330da1778Schristosverbatim to the output up to the next '*/'.
4943c3a7b76Schristos
49530da1778Schristos   Any _indented_ text or text enclosed in '%{' and '%}' is also copied
49630da1778Schristosverbatim to the output (with the %{ and %} symbols removed).  The %{ and
49730da1778Schristos%} symbols must appear unindented on lines by themselves.
4983c3a7b76Schristos
49930da1778Schristos   A '%top' block is similar to a '%{' ...  '%}' block, except that the
50030da1778Schristoscode in a '%top' block is relocated to the _top_ of the generated file,
50130da1778Schristosbefore any flex definitions (1).  The '%top' block is useful when you
5023c3a7b76Schristoswant certain preprocessor macros to be defined or certain files to be
50330da1778Schristosincluded before the generated code.  The single characters, '{' and '}'
50430da1778Schristosare used to delimit the '%top' block, as show in the example below:
5053c3a7b76Schristos
5063c3a7b76Schristos         %top{
5073c3a7b76Schristos             /* This code goes at the "top" of the generated file. */
5083c3a7b76Schristos             #include <stdint.h>
5093c3a7b76Schristos             #include <inttypes.h>
5103c3a7b76Schristos         }
5113c3a7b76Schristos
51230da1778Schristos   Multiple '%top' blocks are allowed, and their order is preserved.
5133c3a7b76Schristos
5143c3a7b76Schristos   ---------- Footnotes ----------
5153c3a7b76Schristos
51630da1778Schristos   (1) Actually, 'yyIN_HEADER' is defined before the '%top' block.
5173c3a7b76Schristos
5183c3a7b76Schristos
5193c3a7b76SchristosFile: flex.info,  Node: Rules Section,  Next: User Code Section,  Prev: Definitions Section,  Up: Format
5203c3a7b76Schristos
5213c3a7b76Schristos5.2 Format of the Rules Section
5223c3a7b76Schristos===============================
5233c3a7b76Schristos
52430da1778SchristosThe "rules" section of the 'flex' input contains a series of rules of
5253c3a7b76Schristosthe form:
5263c3a7b76Schristos
5273c3a7b76Schristos         pattern   action
5283c3a7b76Schristos
52930da1778Schristos   where the pattern must be unindented and the action must begin on the
53030da1778Schristossame line.  *Note Patterns::, for a further description of patterns and
53130da1778Schristosactions.
5323c3a7b76Schristos
5333c3a7b76Schristos   In the rules section, any indented or %{ %} enclosed text appearing
5343c3a7b76Schristosbefore the first rule may be used to declare variables which are local
5353c3a7b76Schristosto the scanning routine and (after the declarations) code which is to be
53630da1778Schristosexecuted whenever the scanning routine is entered.  Other indented or %{
53730da1778Schristos%} text in the rule section is still copied to the output, but its
5383c3a7b76Schristosmeaning is not well-defined and it may well cause compile-time errors
5393c3a7b76Schristos(this feature is present for POSIX compliance.  *Note Lex and Posix::,
5403c3a7b76Schristosfor other such features).
5413c3a7b76Schristos
54230da1778Schristos   Any _indented_ text or text enclosed in '%{' and '%}' is copied
54330da1778Schristosverbatim to the output (with the %{ and %} symbols removed).  The %{ and
54430da1778Schristos%} symbols must appear unindented on lines by themselves.
5453c3a7b76Schristos
5463c3a7b76Schristos
5473c3a7b76SchristosFile: flex.info,  Node: User Code Section,  Next: Comments in the Input,  Prev: Rules Section,  Up: Format
5483c3a7b76Schristos
5493c3a7b76Schristos5.3 Format of the User Code Section
5503c3a7b76Schristos===================================
5513c3a7b76Schristos
55230da1778SchristosThe user code section is simply copied to 'lex.yy.c' verbatim.  It is
5533c3a7b76Schristosused for companion routines which call or are called by the scanner.
5543c3a7b76SchristosThe presence of this section is optional; if it is missing, the second
55530da1778Schristos'%%' in the input file may be skipped, too.
5563c3a7b76Schristos
5573c3a7b76Schristos
5583c3a7b76SchristosFile: flex.info,  Node: Comments in the Input,  Prev: User Code Section,  Up: Format
5593c3a7b76Schristos
5603c3a7b76Schristos5.4 Comments in the Input
5613c3a7b76Schristos=========================
5623c3a7b76Schristos
56330da1778SchristosFlex supports C-style comments, that is, anything between '/*' and '*/'
5643c3a7b76Schristosis considered a comment.  Whenever flex encounters a comment, it copies
5653c3a7b76Schristosthe entire comment verbatim to the generated source code.  Comments may
5663c3a7b76Schristosappear just about anywhere, but with the following exceptions:
5673c3a7b76Schristos
5683c3a7b76Schristos   * Comments may not appear in the Rules Section wherever flex is
5693c3a7b76Schristos     expecting a regular expression.  This means comments may not appear
5703c3a7b76Schristos     at the beginning of a line, or immediately following a list of
5713c3a7b76Schristos     scanner states.
57230da1778Schristos   * Comments may not appear on an '%option' line in the Definitions
5733c3a7b76Schristos     Section.
5743c3a7b76Schristos
5753c3a7b76Schristos   If you want to follow a simple rule, then always begin a comment on a
5763c3a7b76Schristosnew line, with one or more whitespace characters before the initial
57730da1778Schristos'/*').  This rule will work anywhere in the input file.
5783c3a7b76Schristos
5793c3a7b76Schristos   All the comments in the following example are valid:
5803c3a7b76Schristos
5813c3a7b76Schristos     %{
5823c3a7b76Schristos     /* code block */
5833c3a7b76Schristos     %}
5843c3a7b76Schristos
5853c3a7b76Schristos     /* Definitions Section */
5863c3a7b76Schristos     %x STATE_X
5873c3a7b76Schristos
5883c3a7b76Schristos     %%
5893c3a7b76Schristos         /* Rules Section */
5903c3a7b76Schristos     ruleA   /* after regex */ { /* code block */ } /* after code block */
5913c3a7b76Schristos             /* Rules Section (indented) */
5923c3a7b76Schristos     <STATE_X>{
5933c3a7b76Schristos     ruleC   ECHO;
5943c3a7b76Schristos     ruleD   ECHO;
5953c3a7b76Schristos     %{
5963c3a7b76Schristos     /* code block */
5973c3a7b76Schristos     %}
5983c3a7b76Schristos     }
5993c3a7b76Schristos     %%
6003c3a7b76Schristos     /* User Code Section */
6013c3a7b76Schristos
60230da1778Schristos
6033c3a7b76Schristos
6043c3a7b76SchristosFile: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
6053c3a7b76Schristos
6063c3a7b76Schristos6 Patterns
6073c3a7b76Schristos**********
6083c3a7b76Schristos
609dded093eSchristosThe patterns in the input (see *note Rules Section::) are written using
6103c3a7b76Schristosan extended set of regular expressions.  These are:
6113c3a7b76Schristos
61230da1778Schristos'x'
6133c3a7b76Schristos     match the character 'x'
6143c3a7b76Schristos
61530da1778Schristos'.'
6163c3a7b76Schristos     any character (byte) except newline
6173c3a7b76Schristos
61830da1778Schristos'[xyz]'
6193c3a7b76Schristos     a "character class"; in this case, the pattern matches either an
6203c3a7b76Schristos     'x', a 'y', or a 'z'
6213c3a7b76Schristos
62230da1778Schristos'[abj-oZ]'
6233c3a7b76Schristos     a "character class" with a range in it; matches an 'a', a 'b', any
6243c3a7b76Schristos     letter from 'j' through 'o', or a 'Z'
6253c3a7b76Schristos
62630da1778Schristos'[^A-Z]'
6273c3a7b76Schristos     a "negated character class", i.e., any character but those in the
6283c3a7b76Schristos     class.  In this case, any character EXCEPT an uppercase letter.
6293c3a7b76Schristos
63030da1778Schristos'[^A-Z\n]'
6313c3a7b76Schristos     any character EXCEPT an uppercase letter or a newline
6323c3a7b76Schristos
63330da1778Schristos'[a-z]{-}[aeiou]'
6343c3a7b76Schristos     the lowercase consonants
6353c3a7b76Schristos
63630da1778Schristos'r*'
6373c3a7b76Schristos     zero or more r's, where r is any regular expression
6383c3a7b76Schristos
63930da1778Schristos'r+'
6403c3a7b76Schristos     one or more r's
6413c3a7b76Schristos
64230da1778Schristos'r?'
6433c3a7b76Schristos     zero or one r's (that is, "an optional r")
6443c3a7b76Schristos
64530da1778Schristos'r{2,5}'
6463c3a7b76Schristos     anywhere from two to five r's
6473c3a7b76Schristos
64830da1778Schristos'r{2,}'
6493c3a7b76Schristos     two or more r's
6503c3a7b76Schristos
65130da1778Schristos'r{4}'
6523c3a7b76Schristos     exactly 4 r's
6533c3a7b76Schristos
65430da1778Schristos'{name}'
65530da1778Schristos     the expansion of the 'name' definition (*note Format::).
6563c3a7b76Schristos
65730da1778Schristos'"[xyz]\"foo"'
65830da1778Schristos     the literal string: '[xyz]"foo'
6593c3a7b76Schristos
66030da1778Schristos'\X'
66130da1778Schristos     if X is 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C
66230da1778Schristos     interpretation of '\x'.  Otherwise, a literal 'X' (used to escape
66330da1778Schristos     operators such as '*')
6643c3a7b76Schristos
66530da1778Schristos'\0'
6663c3a7b76Schristos     a NUL character (ASCII code 0)
6673c3a7b76Schristos
66830da1778Schristos'\123'
6693c3a7b76Schristos     the character with octal value 123
6703c3a7b76Schristos
67130da1778Schristos'\x2a'
6723c3a7b76Schristos     the character with hexadecimal value 2a
6733c3a7b76Schristos
67430da1778Schristos'(r)'
67530da1778Schristos     match an 'r'; parentheses are used to override precedence (see
6763c3a7b76Schristos     below)
6773c3a7b76Schristos
67830da1778Schristos'(?r-s:pattern)'
67930da1778Schristos     apply option 'r' and omit option 's' while interpreting pattern.
68030da1778Schristos     Options may be zero or more of the characters 'i', 's', or 'x'.
6813c3a7b76Schristos
68230da1778Schristos     'i' means case-insensitive.  '-i' means case-sensitive.
6833c3a7b76Schristos
68430da1778Schristos     's' alters the meaning of the '.' syntax to match any single byte
68530da1778Schristos     whatsoever.  '-s' alters the meaning of '.' to match any byte
68630da1778Schristos     except '\n'.
6873c3a7b76Schristos
68830da1778Schristos     'x' ignores comments and whitespace in patterns.  Whitespace is
68930da1778Schristos     ignored unless it is backslash-escaped, contained within '""'s, or
6903c3a7b76Schristos     appears inside a character class.
6913c3a7b76Schristos
6923c3a7b76Schristos     The following are all valid:
6933c3a7b76Schristos
6943c3a7b76Schristos     (?:foo)         same as  (foo)
6953c3a7b76Schristos     (?i:ab7)        same as  ([aA][bB]7)
6963c3a7b76Schristos     (?-i:ab)        same as  (ab)
6973c3a7b76Schristos     (?s:.)          same as  [\x00-\xFF]
6983c3a7b76Schristos     (?-s:.)         same as  [^\n]
6993c3a7b76Schristos     (?ix-s: a . b)  same as  ([Aa][^\n][bB])
7003c3a7b76Schristos     (?x:a  b)       same as  ("ab")
7013c3a7b76Schristos     (?x:a\ b)       same as  ("a b")
7023c3a7b76Schristos     (?x:a" "b)      same as  ("a b")
7033c3a7b76Schristos     (?x:a[ ]b)      same as  ("a b")
7043c3a7b76Schristos     (?x:a
7053c3a7b76Schristos         /* comment */
7063c3a7b76Schristos         b
7073c3a7b76Schristos         c)          same as  (abc)
7083c3a7b76Schristos
70930da1778Schristos'(?# comment )'
71030da1778Schristos     omit everything within '()'.  The first ')' character encountered
7113c3a7b76Schristos     ends the pattern.  It is not possible to for the comment to contain
71230da1778Schristos     a ')' character.  The comment may span lines.
7133c3a7b76Schristos
71430da1778Schristos'rs'
71530da1778Schristos     the regular expression 'r' followed by the regular expression 's';
7163c3a7b76Schristos     called "concatenation"
7173c3a7b76Schristos
71830da1778Schristos'r|s'
71930da1778Schristos     either an 'r' or an 's'
7203c3a7b76Schristos
72130da1778Schristos'r/s'
72230da1778Schristos     an 'r' but only if it is followed by an 's'.  The text matched by
72330da1778Schristos     's' is included when determining whether this rule is the longest
7243c3a7b76Schristos     match, but is then returned to the input before the action is
72530da1778Schristos     executed.  So the action only sees the text matched by 'r'.  This
7263c3a7b76Schristos     type of pattern is called "trailing context".  (There are some
72730da1778Schristos     combinations of 'r/s' that flex cannot match correctly.  *Note
7283c3a7b76Schristos     Limitations::, regarding dangerous trailing context.)
7293c3a7b76Schristos
73030da1778Schristos'^r'
73130da1778Schristos     an 'r', but only at the beginning of a line (i.e., when just
7323c3a7b76Schristos     starting to scan, or right after a newline has been scanned).
7333c3a7b76Schristos
73430da1778Schristos'r$'
73530da1778Schristos     an 'r', but only at the end of a line (i.e., just before a
73630da1778Schristos     newline).  Equivalent to 'r/\n'.
7373c3a7b76Schristos
73830da1778Schristos     Note that 'flex''s notion of "newline" is exactly whatever the C
73930da1778Schristos     compiler used to compile 'flex' interprets '\n' as; in particular,
74030da1778Schristos     on some DOS systems you must either filter out '\r's in the input
74130da1778Schristos     yourself, or explicitly use 'r/\r\n' for 'r$'.
7423c3a7b76Schristos
74330da1778Schristos'<s>r'
74430da1778Schristos     an 'r', but only in start condition 's' (see *note Start
7453c3a7b76Schristos     Conditions:: for discussion of start conditions).
7463c3a7b76Schristos
74730da1778Schristos'<s1,s2,s3>r'
74830da1778Schristos     same, but in any of start conditions 's1', 's2', or 's3'.
7493c3a7b76Schristos
75030da1778Schristos'<*>r'
75130da1778Schristos     an 'r' in any start condition, even an exclusive one.
7523c3a7b76Schristos
75330da1778Schristos'<<EOF>>'
7543c3a7b76Schristos     an end-of-file.
7553c3a7b76Schristos
75630da1778Schristos'<s1,s2><<EOF>>'
75730da1778Schristos     an end-of-file when in start condition 's1' or 's2'
7583c3a7b76Schristos
7593c3a7b76Schristos   Note that inside of a character class, all regular expression
76030da1778Schristosoperators lose their special meaning except escape ('\') and the
76130da1778Schristoscharacter class operators, '-', ']]', and, at the beginning of the
76230da1778Schristosclass, '^'.
7633c3a7b76Schristos
7643c3a7b76Schristos   The regular expressions listed above are grouped according to
7653c3a7b76Schristosprecedence, from highest precedence at the top to lowest at the bottom.
7663c3a7b76SchristosThose grouped together have equal precedence (see special note on the
76730da1778Schristosprecedence of the repeat operator, '{}', under the documentation for the
76830da1778Schristos'--posix' POSIX compliance option).  For example,
7693c3a7b76Schristos
7703c3a7b76Schristos         foo|bar*
7713c3a7b76Schristos
7723c3a7b76Schristos   is the same as
7733c3a7b76Schristos
7743c3a7b76Schristos         (foo)|(ba(r*))
7753c3a7b76Schristos
77630da1778Schristos   since the '*' operator has higher precedence than concatenation, and
77730da1778Schristosconcatenation higher than alternation ('|').  This pattern therefore
77830da1778Schristosmatches _either_ the string 'foo' _or_ the string 'ba' followed by
77930da1778Schristoszero-or-more 'r''s.  To match 'foo' or zero-or-more repetitions of the
78030da1778Schristosstring 'bar', use:
7813c3a7b76Schristos
7823c3a7b76Schristos         foo|(bar)*
7833c3a7b76Schristos
78430da1778Schristos   And to match a sequence of zero or more repetitions of 'foo' and
78530da1778Schristos'bar':
7863c3a7b76Schristos
7873c3a7b76Schristos         (foo|bar)*
7883c3a7b76Schristos
7893c3a7b76Schristos   In addition to characters and ranges of characters, character classes
7903c3a7b76Schristoscan also contain "character class expressions".  These are expressions
79130da1778Schristosenclosed inside '[:' and ':]' delimiters (which themselves must appear
79230da1778Schristosbetween the '[' and ']' of the character class.  Other elements may
7933c3a7b76Schristosoccur inside the character class, too).  The valid expressions are:
7943c3a7b76Schristos
7953c3a7b76Schristos         [:alnum:] [:alpha:] [:blank:]
7963c3a7b76Schristos         [:cntrl:] [:digit:] [:graph:]
7973c3a7b76Schristos         [:lower:] [:print:] [:punct:]
7983c3a7b76Schristos         [:space:] [:upper:] [:xdigit:]
7993c3a7b76Schristos
8003c3a7b76Schristos   These expressions all designate a set of characters equivalent to the
80130da1778Schristoscorresponding standard C 'isXXX' function.  For example, '[:alnum:]'
80230da1778Schristosdesignates those characters for which 'isalnum()' returns true - i.e.,
8033c3a7b76Schristosany alphabetic or numeric character.  Some systems don't provide
80430da1778Schristos'isblank()', so flex defines '[:blank:]' as a blank or a tab.
8053c3a7b76Schristos
8063c3a7b76Schristos   For example, the following character classes are all equivalent:
8073c3a7b76Schristos
8083c3a7b76Schristos         [[:alnum:]]
8093c3a7b76Schristos         [[:alpha:][:digit:]]
8103c3a7b76Schristos         [[:alpha:][0-9]]
8113c3a7b76Schristos         [a-zA-Z0-9]
8123c3a7b76Schristos
8133c3a7b76Schristos   A word of caution.  Character classes are expanded immediately when
81430da1778Schristosseen in the 'flex' input.  This means the character classes are
81530da1778Schristossensitive to the locale in which 'flex' is executed, and the resulting
8163c3a7b76Schristosscanner will not be sensitive to the runtime locale.  This may or may
8173c3a7b76Schristosnot be desirable.
8183c3a7b76Schristos
81930da1778Schristos   * If your scanner is case-insensitive (the '-i' flag), then
82030da1778Schristos     '[:upper:]' and '[:lower:]' are equivalent to '[:alpha:]'.
8213c3a7b76Schristos
82230da1778Schristos   * Character classes with ranges, such as '[a-Z]', should be used with
8233c3a7b76Schristos     caution in a case-insensitive scanner if the range spans upper or
8243c3a7b76Schristos     lowercase characters.  Flex does not know if you want to fold all
82530da1778Schristos     upper and lowercase characters together, or if you want the literal
82630da1778Schristos     numeric range specified (with no case folding).  When in doubt,
82730da1778Schristos     flex will assume that you meant the literal numeric range, and will
82830da1778Schristos     issue a warning.  The exception to this rule is a character range
82930da1778Schristos     such as '[a-z]' or '[S-W]' where it is obvious that you want
83030da1778Schristos     case-folding to occur.  Here are some examples with the '-i' flag
83130da1778Schristos     enabled:
8323c3a7b76Schristos
8333c3a7b76Schristos     Range        Result      Literal Range        Alternate Range
83430da1778Schristos     '[a-t]'      ok          '[a-tA-T]'
83530da1778Schristos     '[A-T]'      ok          '[a-tA-T]'
83630da1778Schristos     '[A-t]'      ambiguous   '[A-Z\[\\\]_`a-t]'   '[a-tA-T]'
83730da1778Schristos     '[_-{]'      ambiguous   '[_`a-z{]'           '[_`a-zA-Z{]'
83830da1778Schristos     '[@-C]'      ambiguous   '[@ABC]'             '[@A-Z\[\\\]_`abc]'
8393c3a7b76Schristos
84030da1778Schristos   * A negated character class such as the example '[^A-Z]' above _will_
84130da1778Schristos     match a newline unless '\n' (or an equivalent escape sequence) is
84230da1778Schristos     one of the characters explicitly present in the negated character
84330da1778Schristos     class (e.g., '[^A-Z\n]').  This is unlike how many other regular
84430da1778Schristos     expression tools treat negated character classes, but unfortunately
84530da1778Schristos     the inconsistency is historically entrenched.  Matching newlines
84630da1778Schristos     means that a pattern like '[^"]*' can match the entire input unless
84730da1778Schristos     there's another quote in the input.
8483c3a7b76Schristos
8493c3a7b76Schristos     Flex allows negation of character class expressions by prepending
85030da1778Schristos     '^' to the POSIX character class name.
8513c3a7b76Schristos
8523c3a7b76Schristos              [:^alnum:] [:^alpha:] [:^blank:]
8533c3a7b76Schristos              [:^cntrl:] [:^digit:] [:^graph:]
8543c3a7b76Schristos              [:^lower:] [:^print:] [:^punct:]
8553c3a7b76Schristos              [:^space:] [:^upper:] [:^xdigit:]
8563c3a7b76Schristos
85730da1778Schristos     Flex will issue a warning if the expressions '[:^upper:]' and
85830da1778Schristos     '[:^lower:]' appear in a case-insensitive scanner, since their
8593c3a7b76Schristos     meaning is unclear.  The current behavior is to skip them entirely,
8603c3a7b76Schristos     but this may change without notice in future revisions of flex.
8613c3a7b76Schristos
86230da1778Schristos   *
86330da1778Schristos     The '{-}' operator computes the difference of two character
86430da1778Schristos     classes.  For example, '[a-c]{-}[b-z]' represents all the
86530da1778Schristos     characters in the class '[a-c]' that are not in the class '[b-z]'
86630da1778Schristos     (which in this case, is just the single character 'a').  The '{-}'
86730da1778Schristos     operator is left associative, so '[abc]{-}[b]{-}[c]' is the same as
86830da1778Schristos     '[a]'.  Be careful not to accidentally create an empty set, which
86930da1778Schristos     will never match.
8703c3a7b76Schristos
87130da1778Schristos   *
87230da1778Schristos     The '{+}' operator computes the union of two character classes.
87330da1778Schristos     For example, '[a-z]{+}[0-9]' is the same as '[a-z0-9]'.  This
8743c3a7b76Schristos     operator is useful when preceded by the result of a difference
87530da1778Schristos     operation, as in, '[[:alpha:]]{-}[[:lower:]]{+}[q]', which is
87630da1778Schristos     equivalent to '[A-Zq]' in the "C" locale.
8773c3a7b76Schristos
87830da1778Schristos   * A rule can have at most one instance of trailing context (the '/'
87930da1778Schristos     operator or the '$' operator).  The start condition, '^', and
88030da1778Schristos     '<<EOF>>' patterns can only occur at the beginning of a pattern,
88130da1778Schristos     and, as well as with '/' and '$', cannot be grouped inside
88230da1778Schristos     parentheses.  A '^' which does not occur at the beginning of a rule
88330da1778Schristos     or a '$' which does not occur at the end of a rule loses its
8843c3a7b76Schristos     special properties and is treated as a normal character.
8853c3a7b76Schristos
8863c3a7b76Schristos   * The following are invalid:
8873c3a7b76Schristos
8883c3a7b76Schristos              foo/bar$
8893c3a7b76Schristos              <sc1>foo<sc2>bar
8903c3a7b76Schristos
89130da1778Schristos     Note that the first of these can be written 'foo/bar\n'.
8923c3a7b76Schristos
89330da1778Schristos   * The following will result in '$' or '^' being treated as a normal
8943c3a7b76Schristos     character:
8953c3a7b76Schristos
8963c3a7b76Schristos              foo|(bar$)
8973c3a7b76Schristos              foo|^bar
8983c3a7b76Schristos
89930da1778Schristos     If the desired meaning is a 'foo' or a 'bar'-followed-by-a-newline,
90030da1778Schristos     the following could be used (the special '|' action is explained
90130da1778Schristos     below, *note Actions::):
9023c3a7b76Schristos
9033c3a7b76Schristos              foo      |
9043c3a7b76Schristos              bar$     /* action goes here */
9053c3a7b76Schristos
90630da1778Schristos     A similar trick will work for matching a 'foo' or a
90730da1778Schristos     'bar'-at-the-beginning-of-a-line.
9083c3a7b76Schristos
9093c3a7b76Schristos
9103c3a7b76SchristosFile: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
9113c3a7b76Schristos
9123c3a7b76Schristos7 How the Input Is Matched
9133c3a7b76Schristos**************************
9143c3a7b76Schristos
9153c3a7b76SchristosWhen the generated scanner is run, it analyzes its input looking for
9163c3a7b76Schristosstrings which match any of its patterns.  If it finds more than one
9173c3a7b76Schristosmatch, it takes the one matching the most text (for trailing context
9183c3a7b76Schristosrules, this includes the length of the trailing part, even though it
9193c3a7b76Schristoswill then be returned to the input).  If it finds two or more matches of
92030da1778Schristosthe same length, the rule listed first in the 'flex' input file is
9213c3a7b76Schristoschosen.
9223c3a7b76Schristos
9233c3a7b76Schristos   Once the match is determined, the text corresponding to the match
9243c3a7b76Schristos(called the "token") is made available in the global character pointer
92530da1778Schristos'yytext', and its length in the global integer 'yyleng'.  The "action"
92630da1778Schristoscorresponding to the matched pattern is then executed (*note Actions::),
92730da1778Schristosand then the remaining input is scanned for another match.
9283c3a7b76Schristos
9293c3a7b76Schristos   If no match is found, then the "default rule" is executed: the next
9303c3a7b76Schristoscharacter in the input is considered matched and copied to the standard
93130da1778Schristosoutput.  Thus, the simplest valid 'flex' input is:
9323c3a7b76Schristos
9333c3a7b76Schristos         %%
9343c3a7b76Schristos
93530da1778Schristos   which generates a scanner that simply copies its input (one character
93630da1778Schristosat a time) to its output.
9373c3a7b76Schristos
93830da1778Schristos   Note that 'yytext' can be defined in two different ways: either as a
9393c3a7b76Schristoscharacter _pointer_ or as a character _array_.  You can control which
94030da1778Schristosdefinition 'flex' uses by including one of the special directives
94130da1778Schristos'%pointer' or '%array' in the first (definitions) section of your flex
94230da1778Schristosinput.  The default is '%pointer', unless you use the '-l' lex
94330da1778Schristoscompatibility option, in which case 'yytext' will be an array.  The
94430da1778Schristosadvantage of using '%pointer' is substantially faster scanning and no
9453c3a7b76Schristosbuffer overflow when matching very large tokens (unless you run out of
9463c3a7b76Schristosdynamic memory).  The disadvantage is that you are restricted in how
94730da1778Schristosyour actions can modify 'yytext' (*note Actions::), and calls to the
94830da1778Schristos'unput()' function destroys the present contents of 'yytext', which can
94930da1778Schristosbe a considerable porting headache when moving between different 'lex'
9503c3a7b76Schristosversions.
9513c3a7b76Schristos
95230da1778Schristos   The advantage of '%array' is that you can then modify 'yytext' to
95330da1778Schristosyour heart's content, and calls to 'unput()' do not destroy 'yytext'
95430da1778Schristos(*note Actions::).  Furthermore, existing 'lex' programs sometimes
95530da1778Schristosaccess 'yytext' externally using declarations of the form:
9563c3a7b76Schristos
9573c3a7b76Schristos         extern char yytext[];
9583c3a7b76Schristos
95930da1778Schristos   This definition is erroneous when used with '%pointer', but correct
96030da1778Schristosfor '%array'.
9613c3a7b76Schristos
96230da1778Schristos   The '%array' declaration defines 'yytext' to be an array of 'YYLMAX'
9633c3a7b76Schristoscharacters, which defaults to a fairly large value.  You can change the
96430da1778Schristossize by simply #define'ing 'YYLMAX' to a different value in the first
96530da1778Schristossection of your 'flex' input.  As mentioned above, with '%pointer'
9663c3a7b76Schristosyytext grows dynamically to accommodate large tokens.  While this means
96730da1778Schristosyour '%pointer' scanner can accommodate very large tokens (such as
9683c3a7b76Schristosmatching entire blocks of comments), bear in mind that each time the
96930da1778Schristosscanner must resize 'yytext' it also must rescan the entire token from
97030da1778Schristosthe beginning, so matching such tokens can prove slow.  'yytext'
97130da1778Schristospresently does _not_ dynamically grow if a call to 'unput()' results in
9723c3a7b76Schristostoo much text being pushed back; instead, a run-time error results.
9733c3a7b76Schristos
97430da1778Schristos   Also note that you cannot use '%array' with C++ scanner classes
9753c3a7b76Schristos(*note Cxx::).
9763c3a7b76Schristos
9773c3a7b76Schristos
9783c3a7b76SchristosFile: flex.info,  Node: Actions,  Next: Generated Scanner,  Prev: Matching,  Up: Top
9793c3a7b76Schristos
9803c3a7b76Schristos8 Actions
9813c3a7b76Schristos*********
9823c3a7b76Schristos
9833c3a7b76SchristosEach pattern in a rule has a corresponding "action", which can be any
9843c3a7b76Schristosarbitrary C statement.  The pattern ends at the first non-escaped
9853c3a7b76Schristoswhitespace character; the remainder of the line is its action.  If the
9863c3a7b76Schristosaction is empty, then when the pattern is matched the input token is
9873c3a7b76Schristossimply discarded.  For example, here is the specification for a program
98830da1778Schristoswhich deletes all occurrences of 'zap me' from its input:
9893c3a7b76Schristos
9903c3a7b76Schristos         %%
9913c3a7b76Schristos         "zap me"
9923c3a7b76Schristos
9933c3a7b76Schristos   This example will copy all other characters in the input to the
9943c3a7b76Schristosoutput since they will be matched by the default rule.
9953c3a7b76Schristos
9963c3a7b76Schristos   Here is a program which compresses multiple blanks and tabs down to a
9973c3a7b76Schristossingle blank, and throws away whitespace found at the end of a line:
9983c3a7b76Schristos
9993c3a7b76Schristos         %%
10003c3a7b76Schristos         [ \t]+        putchar( ' ' );
10013c3a7b76Schristos         [ \t]+$       /* ignore this token */
10023c3a7b76Schristos
100330da1778Schristos   If the action contains a '{', then the action spans till the
100430da1778Schristosbalancing '}' is found, and the action may cross multiple lines.  'flex'
100530da1778Schristosknows about C strings and comments and won't be fooled by braces found
100630da1778Schristoswithin them, but also allows actions to begin with '%{' and will
100730da1778Schristosconsider the action to be all the text up to the next '%}' (regardless
10083c3a7b76Schristosof ordinary braces inside the action).
10093c3a7b76Schristos
101030da1778Schristos   An action consisting solely of a vertical bar ('|') means "same as
10113c3a7b76Schristosthe action for the next rule".  See below for an illustration.
10123c3a7b76Schristos
101330da1778Schristos   Actions can include arbitrary C code, including 'return' statements
101430da1778Schristosto return a value to whatever routine called 'yylex()'.  Each time
101530da1778Schristos'yylex()' is called it continues processing tokens from where it last
10163c3a7b76Schristosleft off until it either reaches the end of the file or executes a
10173c3a7b76Schristosreturn.
10183c3a7b76Schristos
101930da1778Schristos   Actions are free to modify 'yytext' except for lengthening it (adding
102030da1778Schristoscharacters to its end-these will overwrite later characters in the input
102130da1778Schristosstream).  This however does not apply when using '%array' (*note
102230da1778SchristosMatching::).  In that case, 'yytext' may be freely modified in any way.
10233c3a7b76Schristos
102430da1778Schristos   Actions are free to modify 'yyleng' except they should not do so if
102530da1778Schristosthe action also includes use of 'yymore()' (see below).
10263c3a7b76Schristos
102730da1778Schristos   There are a number of special directives which can be included within
102830da1778Schristosan action:
10293c3a7b76Schristos
103030da1778Schristos'ECHO'
10313c3a7b76Schristos     copies yytext to the scanner's output.
10323c3a7b76Schristos
103330da1778Schristos'BEGIN'
10343c3a7b76Schristos     followed by the name of a start condition places the scanner in the
10353c3a7b76Schristos     corresponding start condition (see below).
10363c3a7b76Schristos
103730da1778Schristos'REJECT'
10383c3a7b76Schristos     directs the scanner to proceed on to the "second best" rule which
10393c3a7b76Schristos     matched the input (or a prefix of the input).  The rule is chosen
104030da1778Schristos     as described above in *note Matching::, and 'yytext' and 'yyleng'
10413c3a7b76Schristos     set up appropriately.  It may either be one which matched as much
104230da1778Schristos     text as the originally chosen rule but came later in the 'flex'
10433c3a7b76Schristos     input file, or one which matched less text.  For example, the
10443c3a7b76Schristos     following will both count the words in the input and call the
104530da1778Schristos     routine 'special()' whenever 'frob' is seen:
10463c3a7b76Schristos
10473c3a7b76Schristos                      int word_count = 0;
10483c3a7b76Schristos              %%
10493c3a7b76Schristos
10503c3a7b76Schristos              frob        special(); REJECT;
10513c3a7b76Schristos              [^ \t\n]+   ++word_count;
10523c3a7b76Schristos
105330da1778Schristos     Without the 'REJECT', any occurrences of 'frob' in the input would
10543c3a7b76Schristos     not be counted as words, since the scanner normally executes only
105530da1778Schristos     one action per token.  Multiple uses of 'REJECT' are allowed, each
10563c3a7b76Schristos     one finding the next best choice to the currently active rule.  For
105730da1778Schristos     example, when the following scanner scans the token 'abcd', it will
105830da1778Schristos     write 'abcdabcaba' to the output:
10593c3a7b76Schristos
10603c3a7b76Schristos              %%
10613c3a7b76Schristos              a        |
10623c3a7b76Schristos              ab       |
10633c3a7b76Schristos              abc      |
10643c3a7b76Schristos              abcd     ECHO; REJECT;
10653c3a7b76Schristos              .|\n     /* eat up any unmatched character */
10663c3a7b76Schristos
10673c3a7b76Schristos     The first three rules share the fourth's action since they use the
106830da1778Schristos     special '|' action.
10693c3a7b76Schristos
107030da1778Schristos     'REJECT' is a particularly expensive feature in terms of scanner
10713c3a7b76Schristos     performance; if it is used in _any_ of the scanner's actions it
10723c3a7b76Schristos     will slow down _all_ of the scanner's matching.  Furthermore,
107330da1778Schristos     'REJECT' cannot be used with the '-Cf' or '-CF' options (*note
10743c3a7b76Schristos     Scanner Options::).
10753c3a7b76Schristos
107630da1778Schristos     Note also that unlike the other special actions, 'REJECT' is a
10773c3a7b76Schristos     _branch_.  Code immediately following it in the action will _not_
10783c3a7b76Schristos     be executed.
10793c3a7b76Schristos
108030da1778Schristos'yymore()'
10813c3a7b76Schristos     tells the scanner that the next time it matches a rule, the
10823c3a7b76Schristos     corresponding token should be _appended_ onto the current value of
108330da1778Schristos     'yytext' rather than replacing it.  For example, given the input
108430da1778Schristos     'mega-kludge' the following will write 'mega-mega-kludge' to the
10853c3a7b76Schristos     output:
10863c3a7b76Schristos
10873c3a7b76Schristos              %%
10883c3a7b76Schristos              mega-    ECHO; yymore();
10893c3a7b76Schristos              kludge   ECHO;
10903c3a7b76Schristos
109130da1778Schristos     First 'mega-' is matched and echoed to the output.  Then 'kludge'
109230da1778Schristos     is matched, but the previous 'mega-' is still hanging around at the
109330da1778Schristos     beginning of 'yytext' so the 'ECHO' for the 'kludge' rule will
109430da1778Schristos     actually write 'mega-kludge'.
10953c3a7b76Schristos
109630da1778Schristos   Two notes regarding use of 'yymore()'.  First, 'yymore()' depends on
109730da1778Schristosthe value of 'yyleng' correctly reflecting the size of the current
109830da1778Schristostoken, so you must not modify 'yyleng' if you are using 'yymore()'.
109930da1778SchristosSecond, the presence of 'yymore()' in the scanner's action entails a
11003c3a7b76Schristosminor performance penalty in the scanner's matching speed.
11013c3a7b76Schristos
110230da1778Schristos   'yyless(n)' returns all but the first 'n' characters of the current
11033c3a7b76Schristostoken back to the input stream, where they will be rescanned when the
110430da1778Schristosscanner looks for the next match.  'yytext' and 'yyleng' are adjusted
110530da1778Schristosappropriately (e.g., 'yyleng' will now be equal to 'n').  For example,
110630da1778Schristoson the input 'foobar' the following will write out 'foobarbar':
11073c3a7b76Schristos
11083c3a7b76Schristos         %%
11093c3a7b76Schristos         foobar    ECHO; yyless(3);
11103c3a7b76Schristos         [a-z]+    ECHO;
11113c3a7b76Schristos
111230da1778Schristos   An argument of 0 to 'yyless()' will cause the entire current input
11133c3a7b76Schristosstring to be scanned again.  Unless you've changed how the scanner will
111430da1778Schristossubsequently process its input (using 'BEGIN', for example), this will
11153c3a7b76Schristosresult in an endless loop.
11163c3a7b76Schristos
111730da1778Schristos   Note that 'yyless()' is a macro and can only be used in the flex
11183c3a7b76Schristosinput file, not from other source files.
11193c3a7b76Schristos
112030da1778Schristos   'unput(c)' puts the character 'c' back onto the input stream.  It
11213c3a7b76Schristoswill be the next character scanned.  The following action will take the
11223c3a7b76Schristoscurrent token and cause it to be rescanned enclosed in parentheses.
11233c3a7b76Schristos
11243c3a7b76Schristos         {
11253c3a7b76Schristos         int i;
11263c3a7b76Schristos         /* Copy yytext because unput() trashes yytext */
11273c3a7b76Schristos         char *yycopy = strdup( yytext );
11283c3a7b76Schristos         unput( ')' );
11293c3a7b76Schristos         for ( i = yyleng - 1; i >= 0; --i )
11303c3a7b76Schristos             unput( yycopy[i] );
11313c3a7b76Schristos         unput( '(' );
11323c3a7b76Schristos         free( yycopy );
11333c3a7b76Schristos         }
11343c3a7b76Schristos
113530da1778Schristos   Note that since each 'unput()' puts the given character back at the
11363c3a7b76Schristos_beginning_ of the input stream, pushing back strings must be done
11373c3a7b76Schristosback-to-front.
11383c3a7b76Schristos
113930da1778Schristos   An important potential problem when using 'unput()' is that if you
114030da1778Schristosare using '%pointer' (the default), a call to 'unput()' _destroys_ the
114130da1778Schristoscontents of 'yytext', starting with its rightmost character and
11423c3a7b76Schristosdevouring one character to the left with each call.  If you need the
114330da1778Schristosvalue of 'yytext' preserved after a call to 'unput()' (as in the above
114430da1778Schristosexample), you must either first copy it elsewhere, or build your scanner
114530da1778Schristosusing '%array' instead (*note Matching::).
11463c3a7b76Schristos
114730da1778Schristos   Finally, note that you cannot put back 'EOF' to attempt to mark the
11483c3a7b76Schristosinput stream with an end-of-file.
11493c3a7b76Schristos
115030da1778Schristos   'input()' reads the next character from the input stream.  For
11513c3a7b76Schristosexample, the following is one way to eat up C comments:
11523c3a7b76Schristos
11533c3a7b76Schristos         %%
11543c3a7b76Schristos         "/*"        {
115530da1778Schristos                     int c;
11563c3a7b76Schristos
11573c3a7b76Schristos                     for ( ; ; )
11583c3a7b76Schristos                         {
11593c3a7b76Schristos                         while ( (c = input()) != '*' &&
11603c3a7b76Schristos                                 c != EOF )
11613c3a7b76Schristos                             ;    /* eat up text of comment */
11623c3a7b76Schristos
11633c3a7b76Schristos                         if ( c == '*' )
11643c3a7b76Schristos                             {
11653c3a7b76Schristos                             while ( (c = input()) == '*' )
11663c3a7b76Schristos                                 ;
11673c3a7b76Schristos                             if ( c == '/' )
11683c3a7b76Schristos                                 break;    /* found the end */
11693c3a7b76Schristos                             }
11703c3a7b76Schristos
11713c3a7b76Schristos                         if ( c == EOF )
11723c3a7b76Schristos                             {
11733c3a7b76Schristos                             error( "EOF in comment" );
11743c3a7b76Schristos                             break;
11753c3a7b76Schristos                             }
11763c3a7b76Schristos                         }
11773c3a7b76Schristos                     }
11783c3a7b76Schristos
117930da1778Schristos   (Note that if the scanner is compiled using 'C++', then 'input()' is
11803c3a7b76Schristosinstead referred to as yyinput(), in order to avoid a name clash with
118130da1778Schristosthe 'C++' stream by the name of 'input'.)
11823c3a7b76Schristos
118330da1778Schristos   'YY_FLUSH_BUFFER;' flushes the scanner's internal buffer so that the
1184dded093eSchristosnext time the scanner attempts to match a token, it will first refill
118530da1778Schristosthe buffer using 'YY_INPUT()' (*note Generated Scanner::).  This action
118630da1778Schristosis a special case of the more general 'yy_flush_buffer;' function,
1187dded093eSchristosdescribed below (*note Multiple Input Buffers::)
11883c3a7b76Schristos
118930da1778Schristos   'yyterminate()' can be used in lieu of a return statement in an
11903c3a7b76Schristosaction.  It terminates the scanner and returns a 0 to the scanner's
119130da1778Schristoscaller, indicating "all done".  By default, 'yyterminate()' is also
11923c3a7b76Schristoscalled when an end-of-file is encountered.  It is a macro and may be
11933c3a7b76Schristosredefined.
11943c3a7b76Schristos
11953c3a7b76Schristos
11963c3a7b76SchristosFile: flex.info,  Node: Generated Scanner,  Next: Start Conditions,  Prev: Actions,  Up: Top
11973c3a7b76Schristos
11983c3a7b76Schristos9 The Generated Scanner
11993c3a7b76Schristos***********************
12003c3a7b76Schristos
120130da1778SchristosThe output of 'flex' is the file 'lex.yy.c', which contains the scanning
120230da1778Schristosroutine 'yylex()', a number of tables used by it for matching tokens,
120330da1778Schristosand a number of auxiliary routines and macros.  By default, 'yylex()' is
120430da1778Schristosdeclared as follows:
12053c3a7b76Schristos
12063c3a7b76Schristos         int yylex()
12073c3a7b76Schristos             {
12083c3a7b76Schristos             ... various definitions and the actions in here ...
12093c3a7b76Schristos             }
12103c3a7b76Schristos
12113c3a7b76Schristos   (If your environment supports function prototypes, then it will be
121230da1778Schristos'int yylex( void )'.)  This definition may be changed by defining the
121330da1778Schristos'YY_DECL' macro.  For example, you could use:
12143c3a7b76Schristos
12153c3a7b76Schristos         #define YY_DECL float lexscan( a, b ) float a, b;
12163c3a7b76Schristos
121730da1778Schristos   to give the scanning routine the name 'lexscan', returning a float,
12183c3a7b76Schristosand taking two floats as arguments.  Note that if you give arguments to
12193c3a7b76Schristosthe scanning routine using a K&R-style/non-prototyped function
12203c3a7b76Schristosdeclaration, you must terminate the definition with a semi-colon (;).
12213c3a7b76Schristos
122256bd8546Schristos   'flex' generates 'C99' function definitions by default.  Flex used to
122356bd8546Schristoshave the ability to generate obsolete, er, 'traditional', function
122456bd8546Schristosdefinitions.  This was to support bootstrapping gcc on old systems.
12253c3a7b76SchristosUnfortunately, traditional definitions prevent us from using any
12263c3a7b76Schristosstandard data types smaller than int (such as short, char, or bool) as
122756bd8546Schristosfunction arguments.  Furthermore, traditional definitions support added
122856bd8546Schristosextra complexity in the skeleton file.  For this reason, current
122956bd8546Schristosversions of 'flex' generate standard C99 code only, leaving K&R-style
123056bd8546Schristosfunctions to the historians.
12313c3a7b76Schristos
123230da1778Schristos   Whenever 'yylex()' is called, it scans tokens from the global input
123330da1778Schristosfile 'yyin' (which defaults to stdin).  It continues until it either
123430da1778Schristosreaches an end-of-file (at which point it returns the value 0) or one of
123530da1778Schristosits actions executes a 'return' statement.
12363c3a7b76Schristos
12373c3a7b76Schristos   If the scanner reaches an end-of-file, subsequent calls are undefined
123830da1778Schristosunless either 'yyin' is pointed at a new input file (in which case
123930da1778Schristosscanning continues from that file), or 'yyrestart()' is called.
124030da1778Schristos'yyrestart()' takes one argument, a 'FILE *' pointer (which can be NULL,
124130da1778Schristosif you've set up 'YY_INPUT' to scan from a source other than 'yyin'),
124230da1778Schristosand initializes 'yyin' for scanning from that file.  Essentially there
124330da1778Schristosis no difference between just assigning 'yyin' to a new input file or
124430da1778Schristosusing 'yyrestart()' to do so; the latter is available for compatibility
124530da1778Schristoswith previous versions of 'flex', and because it can be used to switch
124630da1778Schristosinput files in the middle of scanning.  It can also be used to throw
124730da1778Schristosaway the current input buffer, by calling it with an argument of 'yyin';
124830da1778Schristosbut it would be better to use 'YY_FLUSH_BUFFER' (*note Actions::).  Note
124930da1778Schristosthat 'yyrestart()' does _not_ reset the start condition to 'INITIAL'
125030da1778Schristos(*note Start Conditions::).
12513c3a7b76Schristos
125230da1778Schristos   If 'yylex()' stops scanning due to executing a 'return' statement in
12533c3a7b76Schristosone of the actions, the scanner may then be called again and it will
12543c3a7b76Schristosresume scanning where it left off.
12553c3a7b76Schristos
12563c3a7b76Schristos   By default (and for purposes of efficiency), the scanner uses
125730da1778Schristosblock-reads rather than simple 'getc()' calls to read characters from
125830da1778Schristos'yyin'.  The nature of how it gets its input can be controlled by
125930da1778Schristosdefining the 'YY_INPUT' macro.  The calling sequence for 'YY_INPUT()' is
126030da1778Schristos'YY_INPUT(buf,result,max_size)'.  Its action is to place up to
126130da1778Schristos'max_size' characters in the character array 'buf' and return in the
126230da1778Schristosinteger variable 'result' either the number of characters read or the
126330da1778Schristosconstant 'YY_NULL' (0 on Unix systems) to indicate 'EOF'.  The default
126430da1778Schristos'YY_INPUT' reads from the global file-pointer 'yyin'.
12653c3a7b76Schristos
126630da1778Schristos   Here is a sample definition of 'YY_INPUT' (in the definitions section
126730da1778Schristosof the input file):
12683c3a7b76Schristos
12693c3a7b76Schristos         %{
12703c3a7b76Schristos         #define YY_INPUT(buf,result,max_size) \
12713c3a7b76Schristos             { \
12723c3a7b76Schristos             int c = getchar(); \
12733c3a7b76Schristos             result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
12743c3a7b76Schristos             }
12753c3a7b76Schristos         %}
12763c3a7b76Schristos
12773c3a7b76Schristos   This definition will change the input processing to occur one
12783c3a7b76Schristoscharacter at a time.
12793c3a7b76Schristos
12803c3a7b76Schristos   When the scanner receives an end-of-file indication from YY_INPUT, it
128130da1778Schristosthen checks the 'yywrap()' function.  If 'yywrap()' returns false
12823c3a7b76Schristos(zero), then it is assumed that the function has gone ahead and set up
128330da1778Schristos'yyin' to point to another input file, and scanning continues.  If it
128430da1778Schristosreturns true (non-zero), then the scanner terminates, returning 0 to its
128530da1778Schristoscaller.  Note that in either case, the start condition remains
128630da1778Schristosunchanged; it does _not_ revert to 'INITIAL'.
12873c3a7b76Schristos
128830da1778Schristos   If you do not supply your own version of 'yywrap()', then you must
128930da1778Schristoseither use '%option noyywrap' (in which case the scanner behaves as
129030da1778Schristosthough 'yywrap()' returned 1), or you must link with '-lfl' to obtain
12913c3a7b76Schristosthe default version of the routine, which always returns 1.
12923c3a7b76Schristos
12933c3a7b76Schristos   For scanning from in-memory buffers (e.g., scanning strings), see
1294dded093eSchristos*note Scanning Strings::.  *Note Multiple Input Buffers::.
12953c3a7b76Schristos
129630da1778Schristos   The scanner writes its 'ECHO' output to the 'yyout' global (default,
129730da1778Schristos'stdout'), which may be redefined by the user simply by assigning it to
129830da1778Schristossome other 'FILE' pointer.
12993c3a7b76Schristos
13003c3a7b76Schristos
13013c3a7b76SchristosFile: flex.info,  Node: Start Conditions,  Next: Multiple Input Buffers,  Prev: Generated Scanner,  Up: Top
13023c3a7b76Schristos
13033c3a7b76Schristos10 Start Conditions
13043c3a7b76Schristos*******************
13053c3a7b76Schristos
130630da1778Schristos'flex' provides a mechanism for conditionally activating rules.  Any
130730da1778Schristosrule whose pattern is prefixed with '<sc>' will only be active when the
130830da1778Schristosscanner is in the "start condition" named 'sc'.  For example,
13093c3a7b76Schristos
13103c3a7b76Schristos         <STRING>[^"]*        { /* eat up the string body ... */
13113c3a7b76Schristos                     ...
13123c3a7b76Schristos                     }
13133c3a7b76Schristos
131430da1778Schristos   will be active only when the scanner is in the 'STRING' start
13153c3a7b76Schristoscondition, and
13163c3a7b76Schristos
13173c3a7b76Schristos         <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
13183c3a7b76Schristos                     ...
13193c3a7b76Schristos                     }
13203c3a7b76Schristos
13213c3a7b76Schristos   will be active only when the current start condition is either
132230da1778Schristos'INITIAL', 'STRING', or 'QUOTE'.
13233c3a7b76Schristos
13243c3a7b76Schristos   Start conditions are declared in the definitions (first) section of
132530da1778Schristosthe input using unindented lines beginning with either '%s' or '%x'
13263c3a7b76Schristosfollowed by a list of names.  The former declares "inclusive" start
13273c3a7b76Schristosconditions, the latter "exclusive" start conditions.  A start condition
132830da1778Schristosis activated using the 'BEGIN' action.  Until the next 'BEGIN' action is
132930da1778Schristosexecuted, rules with the given start condition will be active and rules
133030da1778Schristoswith other start conditions will be inactive.  If the start condition is
133130da1778Schristosinclusive, then rules with no start conditions at all will also be
133230da1778Schristosactive.  If it is exclusive, then _only_ rules qualified with the start
133330da1778Schristoscondition will be active.  A set of rules contingent on the same
133430da1778Schristosexclusive start condition describe a scanner which is independent of any
133530da1778Schristosof the other rules in the 'flex' input.  Because of this, exclusive
133630da1778Schristosstart conditions make it easy to specify "mini-scanners" which scan
133730da1778Schristosportions of the input that are syntactically different from the rest
133830da1778Schristos(e.g., comments).
13393c3a7b76Schristos
13403c3a7b76Schristos   If the distinction between inclusive and exclusive start conditions
13413c3a7b76Schristosis still a little vague, here's a simple example illustrating the
13423c3a7b76Schristosconnection between the two.  The set of rules:
13433c3a7b76Schristos
13443c3a7b76Schristos         %s example
13453c3a7b76Schristos         %%
13463c3a7b76Schristos
13473c3a7b76Schristos         <example>foo   do_something();
13483c3a7b76Schristos
13493c3a7b76Schristos         bar            something_else();
13503c3a7b76Schristos
13513c3a7b76Schristos   is equivalent to
13523c3a7b76Schristos
13533c3a7b76Schristos         %x example
13543c3a7b76Schristos         %%
13553c3a7b76Schristos
13563c3a7b76Schristos         <example>foo   do_something();
13573c3a7b76Schristos
13583c3a7b76Schristos         <INITIAL,example>bar    something_else();
13593c3a7b76Schristos
136030da1778Schristos   Without the '<INITIAL,example>' qualifier, the 'bar' pattern in the
13613c3a7b76Schristossecond example wouldn't be active (i.e., couldn't match) when in start
136230da1778Schristoscondition 'example'.  If we just used '<example>' to qualify 'bar',
136330da1778Schristosthough, then it would only be active in 'example' and not in 'INITIAL',
13643c3a7b76Schristoswhile in the first example it's active in both, because in the first
136530da1778Schristosexample the 'example' start condition is an inclusive '(%s)' start
13663c3a7b76Schristoscondition.
13673c3a7b76Schristos
136830da1778Schristos   Also note that the special start-condition specifier '<*>' matches
13693c3a7b76Schristosevery start condition.  Thus, the above example could also have been
13703c3a7b76Schristoswritten:
13713c3a7b76Schristos
13723c3a7b76Schristos         %x example
13733c3a7b76Schristos         %%
13743c3a7b76Schristos
13753c3a7b76Schristos         <example>foo   do_something();
13763c3a7b76Schristos
13773c3a7b76Schristos         <*>bar    something_else();
13783c3a7b76Schristos
137930da1778Schristos   The default rule (to 'ECHO' any unmatched character) remains active
13803c3a7b76Schristosin start conditions.  It is equivalent to:
13813c3a7b76Schristos
13823c3a7b76Schristos         <*>.|\n     ECHO;
13833c3a7b76Schristos
138430da1778Schristos   'BEGIN(0)' returns to the original state where only the rules with no
138530da1778Schristosstart conditions are active.  This state can also be referred to as the
138630da1778Schristosstart-condition 'INITIAL', so 'BEGIN(INITIAL)' is equivalent to
138730da1778Schristos'BEGIN(0)'.  (The parentheses around the start condition name are not
13883c3a7b76Schristosrequired but are considered good style.)
13893c3a7b76Schristos
139030da1778Schristos   'BEGIN' actions can also be given as indented code at the beginning
13913c3a7b76Schristosof the rules section.  For example, the following will cause the scanner
139230da1778Schristosto enter the 'SPECIAL' start condition whenever 'yylex()' is called and
139330da1778Schristosthe global variable 'enter_special' is true:
13943c3a7b76Schristos
13953c3a7b76Schristos                 int enter_special;
13963c3a7b76Schristos
13973c3a7b76Schristos         %x SPECIAL
13983c3a7b76Schristos         %%
13993c3a7b76Schristos                 if ( enter_special )
14003c3a7b76Schristos                     BEGIN(SPECIAL);
14013c3a7b76Schristos
14023c3a7b76Schristos         <SPECIAL>blahblahblah
14033c3a7b76Schristos         ...more rules follow...
14043c3a7b76Schristos
14053c3a7b76Schristos   To illustrate the uses of start conditions, here is a scanner which
140630da1778Schristosprovides two different interpretations of a string like '123.456'.  By
140730da1778Schristosdefault it will treat it as three tokens, the integer '123', a dot
140830da1778Schristos('.'), and the integer '456'.  But if the string is preceded earlier in
140930da1778Schristosthe line by the string 'expect-floats' it will treat it as a single
141030da1778Schristostoken, the floating-point number '123.456':
14113c3a7b76Schristos
14123c3a7b76Schristos         %{
14133c3a7b76Schristos         #include <math.h>
14143c3a7b76Schristos         %}
14153c3a7b76Schristos         %s expect
14163c3a7b76Schristos
14173c3a7b76Schristos         %%
14183c3a7b76Schristos         expect-floats        BEGIN(expect);
14193c3a7b76Schristos
1420dded093eSchristos         <expect>[0-9]+.[0-9]+      {
14213c3a7b76Schristos                     printf( "found a float, = %f\n",
14223c3a7b76Schristos                             atof( yytext ) );
14233c3a7b76Schristos                     }
14243c3a7b76Schristos         <expect>\n           {
14253c3a7b76Schristos                     /* that's the end of the line, so
14263c3a7b76Schristos                      * we need another "expect-number"
14273c3a7b76Schristos                      * before we'll recognize any more
14283c3a7b76Schristos                      * numbers
14293c3a7b76Schristos                      */
14303c3a7b76Schristos                     BEGIN(INITIAL);
14313c3a7b76Schristos                     }
14323c3a7b76Schristos
14333c3a7b76Schristos         [0-9]+      {
14343c3a7b76Schristos                     printf( "found an integer, = %d\n",
14353c3a7b76Schristos                             atoi( yytext ) );
14363c3a7b76Schristos                     }
14373c3a7b76Schristos
14383c3a7b76Schristos         "."         printf( "found a dot\n" );
14393c3a7b76Schristos
14403c3a7b76Schristos   Here is a scanner which recognizes (and discards) C comments while
14413c3a7b76Schristosmaintaining a count of the current input line.
14423c3a7b76Schristos
14433c3a7b76Schristos         %x comment
14443c3a7b76Schristos         %%
14453c3a7b76Schristos                 int line_num = 1;
14463c3a7b76Schristos
14473c3a7b76Schristos         "/*"         BEGIN(comment);
14483c3a7b76Schristos
14493c3a7b76Schristos         <comment>[^*\n]*        /* eat anything that's not a '*' */
14503c3a7b76Schristos         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
14513c3a7b76Schristos         <comment>\n             ++line_num;
14523c3a7b76Schristos         <comment>"*"+"/"        BEGIN(INITIAL);
14533c3a7b76Schristos
14543c3a7b76Schristos   This scanner goes to a bit of trouble to match as much text as
14553c3a7b76Schristospossible with each rule.  In general, when attempting to write a
145630da1778Schristoshigh-speed scanner try to match as much possible in each rule, as it's a
145730da1778Schristosbig win.
14583c3a7b76Schristos
145930da1778Schristos   Note that start-conditions names are really integer values and can be
146030da1778Schristosstored as such.  Thus, the above could be extended in the following
14613c3a7b76Schristosfashion:
14623c3a7b76Schristos
14633c3a7b76Schristos         %x comment foo
14643c3a7b76Schristos         %%
14653c3a7b76Schristos                 int line_num = 1;
14663c3a7b76Schristos                 int comment_caller;
14673c3a7b76Schristos
14683c3a7b76Schristos         "/*"         {
14693c3a7b76Schristos                      comment_caller = INITIAL;
14703c3a7b76Schristos                      BEGIN(comment);
14713c3a7b76Schristos                      }
14723c3a7b76Schristos
14733c3a7b76Schristos         ...
14743c3a7b76Schristos
14753c3a7b76Schristos         <foo>"/*"    {
14763c3a7b76Schristos                      comment_caller = foo;
14773c3a7b76Schristos                      BEGIN(comment);
14783c3a7b76Schristos                      }
14793c3a7b76Schristos
14803c3a7b76Schristos         <comment>[^*\n]*        /* eat anything that's not a '*' */
14813c3a7b76Schristos         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
14823c3a7b76Schristos         <comment>\n             ++line_num;
14833c3a7b76Schristos         <comment>"*"+"/"        BEGIN(comment_caller);
14843c3a7b76Schristos
14853c3a7b76Schristos   Furthermore, you can access the current start condition using the
148630da1778Schristosinteger-valued 'YY_START' macro.  For example, the above assignments to
148730da1778Schristos'comment_caller' could instead be written
14883c3a7b76Schristos
14893c3a7b76Schristos         comment_caller = YY_START;
14903c3a7b76Schristos
149130da1778Schristos   Flex provides 'YYSTATE' as an alias for 'YY_START' (since that is
149230da1778Schristoswhat's used by AT&T 'lex').
14933c3a7b76Schristos
14943c3a7b76Schristos   For historical reasons, start conditions do not have their own
14953c3a7b76Schristosname-space within the generated scanner.  The start condition names are
14963c3a7b76Schristosunmodified in the generated scanner and generated header.  *Note
14973c3a7b76Schristosoption-header::.  *Note option-prefix::.
14983c3a7b76Schristos
14993c3a7b76Schristos   Finally, here's an example of how to match C-style quoted strings
15003c3a7b76Schristosusing exclusive start conditions, including expanded escape sequences
15013c3a7b76Schristos(but not including checking for a string that's too long):
15023c3a7b76Schristos
15033c3a7b76Schristos         %x str
15043c3a7b76Schristos
15053c3a7b76Schristos         %%
15063c3a7b76Schristos                 char string_buf[MAX_STR_CONST];
15073c3a7b76Schristos                 char *string_buf_ptr;
15083c3a7b76Schristos
15093c3a7b76Schristos
15103c3a7b76Schristos         \"      string_buf_ptr = string_buf; BEGIN(str);
15113c3a7b76Schristos
15123c3a7b76Schristos         <str>\"        { /* saw closing quote - all done */
15133c3a7b76Schristos                 BEGIN(INITIAL);
15143c3a7b76Schristos                 *string_buf_ptr = '\0';
15153c3a7b76Schristos                 /* return string constant token type and
15163c3a7b76Schristos                  * value to parser
15173c3a7b76Schristos                  */
15183c3a7b76Schristos                 }
15193c3a7b76Schristos
15203c3a7b76Schristos         <str>\n        {
15213c3a7b76Schristos                 /* error - unterminated string constant */
15223c3a7b76Schristos                 /* generate error message */
15233c3a7b76Schristos                 }
15243c3a7b76Schristos
15253c3a7b76Schristos         <str>\\[0-7]{1,3} {
15263c3a7b76Schristos                 /* octal escape sequence */
15273c3a7b76Schristos                 int result;
15283c3a7b76Schristos
15293c3a7b76Schristos                 (void) sscanf( yytext + 1, "%o", &result );
15303c3a7b76Schristos
15313c3a7b76Schristos                 if ( result > 0xff )
15323c3a7b76Schristos                         /* error, constant is out-of-bounds */
15333c3a7b76Schristos
15343c3a7b76Schristos                 *string_buf_ptr++ = result;
15353c3a7b76Schristos                 }
15363c3a7b76Schristos
15373c3a7b76Schristos         <str>\\[0-9]+ {
15383c3a7b76Schristos                 /* generate error - bad escape sequence; something
15393c3a7b76Schristos                  * like '\48' or '\0777777'
15403c3a7b76Schristos                  */
15413c3a7b76Schristos                 }
15423c3a7b76Schristos
15433c3a7b76Schristos         <str>\\n  *string_buf_ptr++ = '\n';
15443c3a7b76Schristos         <str>\\t  *string_buf_ptr++ = '\t';
15453c3a7b76Schristos         <str>\\r  *string_buf_ptr++ = '\r';
15463c3a7b76Schristos         <str>\\b  *string_buf_ptr++ = '\b';
15473c3a7b76Schristos         <str>\\f  *string_buf_ptr++ = '\f';
15483c3a7b76Schristos
15493c3a7b76Schristos         <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
15503c3a7b76Schristos
15513c3a7b76Schristos         <str>[^\\\n\"]+        {
15523c3a7b76Schristos                 char *yptr = yytext;
15533c3a7b76Schristos
15543c3a7b76Schristos                 while ( *yptr )
15553c3a7b76Schristos                         *string_buf_ptr++ = *yptr++;
15563c3a7b76Schristos                 }
15573c3a7b76Schristos
15583c3a7b76Schristos   Often, such as in some of the examples above, you wind up writing a
15593c3a7b76Schristoswhole bunch of rules all preceded by the same start condition(s).  Flex
15603c3a7b76Schristosmakes this a little easier and cleaner by introducing a notion of start
15613c3a7b76Schristoscondition "scope".  A start condition scope is begun with:
15623c3a7b76Schristos
15633c3a7b76Schristos         <SCs>{
15643c3a7b76Schristos
156530da1778Schristos   where '<SCs>' is a list of one or more start conditions.  Inside the
156630da1778Schristosstart condition scope, every rule automatically has the prefix '<SCs>'
156730da1778Schristosapplied to it, until a '}' which matches the initial '{'.  So, for
15683c3a7b76Schristosexample,
15693c3a7b76Schristos
15703c3a7b76Schristos         <ESC>{
15713c3a7b76Schristos             "\\n"   return '\n';
15723c3a7b76Schristos             "\\r"   return '\r';
15733c3a7b76Schristos             "\\f"   return '\f';
15743c3a7b76Schristos             "\\0"   return '\0';
15753c3a7b76Schristos         }
15763c3a7b76Schristos
15773c3a7b76Schristos   is equivalent to:
15783c3a7b76Schristos
15793c3a7b76Schristos         <ESC>"\\n"  return '\n';
15803c3a7b76Schristos         <ESC>"\\r"  return '\r';
15813c3a7b76Schristos         <ESC>"\\f"  return '\f';
15823c3a7b76Schristos         <ESC>"\\0"  return '\0';
15833c3a7b76Schristos
15843c3a7b76Schristos   Start condition scopes may be nested.
15853c3a7b76Schristos
158630da1778Schristos   The following routines are available for manipulating stacks of start
158730da1778Schristosconditions:
15883c3a7b76Schristos
158930da1778Schristos -- Function: void yy_push_state ( int 'new_state' )
15903c3a7b76Schristos     pushes the current start condition onto the top of the start
159130da1778Schristos     condition stack and switches to 'new_state' as though you had used
159230da1778Schristos     'BEGIN new_state' (recall that start condition names are also
15933c3a7b76Schristos     integers).
15943c3a7b76Schristos
15953c3a7b76Schristos -- Function: void yy_pop_state ()
159630da1778Schristos     pops the top of the stack and switches to it via 'BEGIN'.
15973c3a7b76Schristos
15983c3a7b76Schristos -- Function: int yy_top_state ()
15993c3a7b76Schristos     returns the top of the stack without altering the stack's contents.
16003c3a7b76Schristos
16013c3a7b76Schristos   The start condition stack grows dynamically and so has no built-in
16023c3a7b76Schristossize limitation.  If memory is exhausted, program execution aborts.
16033c3a7b76Schristos
160430da1778Schristos   To use start condition stacks, your scanner must include a '%option
16053c3a7b76Schristosstack' directive (*note Scanner Options::).
16063c3a7b76Schristos
16073c3a7b76Schristos
16083c3a7b76SchristosFile: flex.info,  Node: Multiple Input Buffers,  Next: EOF,  Prev: Start Conditions,  Up: Top
16093c3a7b76Schristos
16103c3a7b76Schristos11 Multiple Input Buffers
16113c3a7b76Schristos*************************
16123c3a7b76Schristos
16133c3a7b76SchristosSome scanners (such as those which support "include" files) require
161430da1778Schristosreading from several input streams.  As 'flex' scanners do a large
16153c3a7b76Schristosamount of buffering, one cannot control where the next input will be
161630da1778Schristosread from by simply writing a 'YY_INPUT()' which is sensitive to the
161730da1778Schristosscanning context.  'YY_INPUT()' is only called when the scanner reaches
16183c3a7b76Schristosthe end of its buffer, which may be a long time after scanning a
161930da1778Schristosstatement such as an 'include' statement which requires switching the
16203c3a7b76Schristosinput source.
16213c3a7b76Schristos
162230da1778Schristos   To negotiate these sorts of problems, 'flex' provides a mechanism for
162330da1778Schristoscreating and switching between multiple input buffers.  An input buffer
162430da1778Schristosis created by using:
16253c3a7b76Schristos
16263c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
16273c3a7b76Schristos
162830da1778Schristos   which takes a 'FILE' pointer and a size and creates a buffer
162930da1778Schristosassociated with the given file and large enough to hold 'size'
163030da1778Schristoscharacters (when in doubt, use 'YY_BUF_SIZE' for the size).  It returns
163130da1778Schristosa 'YY_BUFFER_STATE' handle, which may then be passed to other routines
163230da1778Schristos(see below).  The 'YY_BUFFER_STATE' type is a pointer to an opaque
163330da1778Schristos'struct yy_buffer_state' structure, so you may safely initialize
163430da1778Schristos'YY_BUFFER_STATE' variables to '((YY_BUFFER_STATE) 0)' if you wish, and
16353c3a7b76Schristosalso refer to the opaque structure in order to correctly declare input
16363c3a7b76Schristosbuffers in source files other than that of your scanner.  Note that the
163730da1778Schristos'FILE' pointer in the call to 'yy_create_buffer' is only used as the
163830da1778Schristosvalue of 'yyin' seen by 'YY_INPUT'.  If you redefine 'YY_INPUT()' so it
163930da1778Schristosno longer uses 'yyin', then you can safely pass a NULL 'FILE' pointer to
164030da1778Schristos'yy_create_buffer'.  You select a particular buffer to scan from using:
16413c3a7b76Schristos
16423c3a7b76Schristos -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
16433c3a7b76Schristos
16443c3a7b76Schristos   The above function switches the scanner's input buffer so subsequent
164530da1778Schristostokens will come from 'new_buffer'.  Note that 'yy_switch_to_buffer()'
164630da1778Schristosmay be used by 'yywrap()' to set things up for continued scanning,
164730da1778Schristosinstead of opening a new file and pointing 'yyin' at it.  If you are
16483c3a7b76Schristoslooking for a stack of input buffers, then you want to use
164930da1778Schristos'yypush_buffer_state()' instead of this function.  Note also that
165030da1778Schristosswitching input sources via either 'yy_switch_to_buffer()' or 'yywrap()'
165130da1778Schristosdoes _not_ change the start condition.
16523c3a7b76Schristos
16533c3a7b76Schristos -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
16543c3a7b76Schristos
165530da1778Schristos   is used to reclaim the storage associated with a buffer.  ('buffer'
16563c3a7b76Schristoscan be NULL, in which case the routine does nothing.)  You can also
16573c3a7b76Schristosclear the current contents of a buffer using:
16583c3a7b76Schristos
16593c3a7b76Schristos -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
16603c3a7b76Schristos
16613c3a7b76Schristos   This function pushes the new buffer state onto an internal stack.
16623c3a7b76SchristosThe pushed state becomes the new current state.  The stack is maintained
16633c3a7b76Schristosby flex and will grow as required.  This function is intended to be used
166430da1778Schristosinstead of 'yy_switch_to_buffer', when you want to change states, but
16653c3a7b76Schristospreserve the current state for later use.
16663c3a7b76Schristos
16673c3a7b76Schristos -- Function: void yypop_buffer_state ( )
16683c3a7b76Schristos
16693c3a7b76Schristos   This function removes the current state from the top of the stack,
167030da1778Schristosand deletes it by calling 'yy_delete_buffer'.  The next state on the
16713c3a7b76Schristosstack, if any, becomes the new current state.
16723c3a7b76Schristos
16733c3a7b76Schristos -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
16743c3a7b76Schristos
16753c3a7b76Schristos   This function discards the buffer's contents, so the next time the
16763c3a7b76Schristosscanner attempts to match a token from the buffer, it will first fill
167730da1778Schristosthe buffer anew using 'YY_INPUT()'.
16783c3a7b76Schristos
16793c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
16803c3a7b76Schristos
168130da1778Schristos   is an alias for 'yy_create_buffer()', provided for compatibility with
168230da1778Schristosthe C++ use of 'new' and 'delete' for creating and destroying dynamic
168330da1778Schristosobjects.
16843c3a7b76Schristos
168530da1778Schristos   'YY_CURRENT_BUFFER' macro returns a 'YY_BUFFER_STATE' handle to the
16863c3a7b76Schristoscurrent buffer.  It should not be used as an lvalue.
16873c3a7b76Schristos
16883c3a7b76Schristos   Here are two examples of using these features for writing a scanner
168930da1778Schristoswhich expands include files (the '<<EOF>>' feature is discussed below).
16903c3a7b76Schristos
16913c3a7b76Schristos   This first example uses yypush_buffer_state and yypop_buffer_state.
16923c3a7b76SchristosFlex maintains the stack internally.
16933c3a7b76Schristos
16943c3a7b76Schristos         /* the "incl" state is used for picking up the name
16953c3a7b76Schristos          * of an include file
16963c3a7b76Schristos          */
16973c3a7b76Schristos         %x incl
16983c3a7b76Schristos         %%
16993c3a7b76Schristos         include             BEGIN(incl);
17003c3a7b76Schristos
17013c3a7b76Schristos         [a-z]+              ECHO;
17023c3a7b76Schristos         [^a-z\n]*\n?        ECHO;
17033c3a7b76Schristos
17043c3a7b76Schristos         <incl>[ \t]*      /* eat the whitespace */
17053c3a7b76Schristos         <incl>[^ \t\n]+   { /* got the include file name */
17063c3a7b76Schristos                 yyin = fopen( yytext, "r" );
17073c3a7b76Schristos
17083c3a7b76Schristos                 if ( ! yyin )
17093c3a7b76Schristos                     error( ... );
17103c3a7b76Schristos
17113c3a7b76Schristos     			yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
17123c3a7b76Schristos
17133c3a7b76Schristos                 BEGIN(INITIAL);
17143c3a7b76Schristos                 }
17153c3a7b76Schristos
17163c3a7b76Schristos         <<EOF>> {
17173c3a7b76Schristos     			yypop_buffer_state();
17183c3a7b76Schristos
17193c3a7b76Schristos                 if ( !YY_CURRENT_BUFFER )
17203c3a7b76Schristos                     {
17213c3a7b76Schristos                     yyterminate();
17223c3a7b76Schristos                     }
17233c3a7b76Schristos                 }
17243c3a7b76Schristos
17253c3a7b76Schristos   The second example, below, does the same thing as the previous
172630da1778Schristosexample did, but manages its own input buffer stack manually (instead of
172730da1778Schristosletting flex do it).
17283c3a7b76Schristos
17293c3a7b76Schristos         /* the "incl" state is used for picking up the name
17303c3a7b76Schristos          * of an include file
17313c3a7b76Schristos          */
17323c3a7b76Schristos         %x incl
17333c3a7b76Schristos
17343c3a7b76Schristos         %{
17353c3a7b76Schristos         #define MAX_INCLUDE_DEPTH 10
17363c3a7b76Schristos         YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
17373c3a7b76Schristos         int include_stack_ptr = 0;
17383c3a7b76Schristos         %}
17393c3a7b76Schristos
17403c3a7b76Schristos         %%
17413c3a7b76Schristos         include             BEGIN(incl);
17423c3a7b76Schristos
17433c3a7b76Schristos         [a-z]+              ECHO;
17443c3a7b76Schristos         [^a-z\n]*\n?        ECHO;
17453c3a7b76Schristos
17463c3a7b76Schristos         <incl>[ \t]*      /* eat the whitespace */
17473c3a7b76Schristos         <incl>[^ \t\n]+   { /* got the include file name */
17483c3a7b76Schristos                 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
17493c3a7b76Schristos                     {
17503c3a7b76Schristos                     fprintf( stderr, "Includes nested too deeply" );
17513c3a7b76Schristos                     exit( 1 );
17523c3a7b76Schristos                     }
17533c3a7b76Schristos
17543c3a7b76Schristos                 include_stack[include_stack_ptr++] =
17553c3a7b76Schristos                     YY_CURRENT_BUFFER;
17563c3a7b76Schristos
17573c3a7b76Schristos                 yyin = fopen( yytext, "r" );
17583c3a7b76Schristos
17593c3a7b76Schristos                 if ( ! yyin )
17603c3a7b76Schristos                     error( ... );
17613c3a7b76Schristos
17623c3a7b76Schristos                 yy_switch_to_buffer(
17633c3a7b76Schristos                     yy_create_buffer( yyin, YY_BUF_SIZE ) );
17643c3a7b76Schristos
17653c3a7b76Schristos                 BEGIN(INITIAL);
17663c3a7b76Schristos                 }
17673c3a7b76Schristos
17683c3a7b76Schristos         <<EOF>> {
1769*463ae347Schristos                 if ( --include_stack_ptr == 0 )
17703c3a7b76Schristos                     {
17713c3a7b76Schristos                     yyterminate();
17723c3a7b76Schristos                     }
17733c3a7b76Schristos
17743c3a7b76Schristos                 else
17753c3a7b76Schristos                     {
17763c3a7b76Schristos                     yy_delete_buffer( YY_CURRENT_BUFFER );
17773c3a7b76Schristos                     yy_switch_to_buffer(
17783c3a7b76Schristos                          include_stack[include_stack_ptr] );
17793c3a7b76Schristos                     }
17803c3a7b76Schristos                 }
17813c3a7b76Schristos
17823c3a7b76Schristos   The following routines are available for setting up input buffers for
17833c3a7b76Schristosscanning in-memory strings instead of files.  All of them create a new
17843c3a7b76Schristosinput buffer for scanning the string, and return a corresponding
178530da1778Schristos'YY_BUFFER_STATE' handle (which you should delete with
178630da1778Schristos'yy_delete_buffer()' when done with it).  They also switch to the new
178730da1778Schristosbuffer using 'yy_switch_to_buffer()', so the next call to 'yylex()' will
178830da1778Schristosstart scanning the string.
17893c3a7b76Schristos
17903c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
17913c3a7b76Schristos     scans a NUL-terminated string.
17923c3a7b76Schristos
179330da1778Schristos -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int len
179430da1778Schristos          )
179530da1778Schristos     scans 'len' bytes (including possibly 'NUL's) starting at location
179630da1778Schristos     'bytes'.
17973c3a7b76Schristos
17983c3a7b76Schristos   Note that both of these functions create and scan a _copy_ of the
179930da1778Schristosstring or bytes.  (This may be desirable, since 'yylex()' modifies the
18003c3a7b76Schristoscontents of the buffer it is scanning.)  You can avoid the copy by
18013c3a7b76Schristosusing:
18023c3a7b76Schristos
18033c3a7b76Schristos -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t
18043c3a7b76Schristos          size)
180530da1778Schristos     which scans in place the buffer starting at 'base', consisting of
180630da1778Schristos     'size' bytes, the last two bytes of which _must_ be
180730da1778Schristos     'YY_END_OF_BUFFER_CHAR' (ASCII NUL). These last two bytes are not
180830da1778Schristos     scanned; thus, scanning consists of 'base[0]' through
180930da1778Schristos     'base[size-2]', inclusive.
18103c3a7b76Schristos
181130da1778Schristos   If you fail to set up 'base' in this manner (i.e., forget the final
181230da1778Schristostwo 'YY_END_OF_BUFFER_CHAR' bytes), then 'yy_scan_buffer()' returns a
18133c3a7b76SchristosNULL pointer instead of creating a new input buffer.
18143c3a7b76Schristos
18153c3a7b76Schristos -- Data type: yy_size_t
18163c3a7b76Schristos     is an integral type to which you can cast an integer expression
18173c3a7b76Schristos     reflecting the size of the buffer.
18183c3a7b76Schristos
18193c3a7b76Schristos
18203c3a7b76SchristosFile: flex.info,  Node: EOF,  Next: Misc Macros,  Prev: Multiple Input Buffers,  Up: Top
18213c3a7b76Schristos
18223c3a7b76Schristos12 End-of-File Rules
18233c3a7b76Schristos********************
18243c3a7b76Schristos
182530da1778SchristosThe special rule '<<EOF>>' indicates actions which are to be taken when
182630da1778Schristosan end-of-file is encountered and 'yywrap()' returns non-zero (i.e.,
182730da1778Schristosindicates no further files to process).  The action must finish by doing
182830da1778Schristosone of the following things:
18293c3a7b76Schristos
183030da1778Schristos   * assigning 'yyin' to a new input file (in previous versions of
183130da1778Schristos     'flex', after doing the assignment you had to call the special
183230da1778Schristos     action 'YY_NEW_FILE'.  This is no longer necessary.)
18333c3a7b76Schristos
183430da1778Schristos   * executing a 'return' statement;
18353c3a7b76Schristos
183630da1778Schristos   * executing the special 'yyterminate()' action.
18373c3a7b76Schristos
183830da1778Schristos   * or, switching to a new buffer using 'yy_switch_to_buffer()' as
18393c3a7b76Schristos     shown in the example above.
18403c3a7b76Schristos
18413c3a7b76Schristos   <<EOF>> rules may not be used with other patterns; they may only be
18423c3a7b76Schristosqualified with a list of start conditions.  If an unqualified <<EOF>>
184330da1778Schristosrule is given, it applies to _all_ start conditions which do not already
184430da1778Schristoshave <<EOF>> actions.  To specify an <<EOF>> rule for only the initial
184530da1778Schristosstart condition, use:
18463c3a7b76Schristos
18473c3a7b76Schristos         <INITIAL><<EOF>>
18483c3a7b76Schristos
18493c3a7b76Schristos   These rules are useful for catching things like unclosed comments.
18503c3a7b76SchristosAn example:
18513c3a7b76Schristos
18523c3a7b76Schristos         %x quote
18533c3a7b76Schristos         %%
18543c3a7b76Schristos
18553c3a7b76Schristos         ...other rules for dealing with quotes...
18563c3a7b76Schristos
18573c3a7b76Schristos         <quote><<EOF>>   {
18583c3a7b76Schristos                  error( "unterminated quote" );
18593c3a7b76Schristos                  yyterminate();
18603c3a7b76Schristos                  }
18613c3a7b76Schristos        <<EOF>>  {
18623c3a7b76Schristos                  if ( *++filelist )
18633c3a7b76Schristos                      yyin = fopen( *filelist, "r" );
18643c3a7b76Schristos                  else
18653c3a7b76Schristos                     yyterminate();
18663c3a7b76Schristos                  }
18673c3a7b76Schristos
18683c3a7b76Schristos
18693c3a7b76SchristosFile: flex.info,  Node: Misc Macros,  Next: User Values,  Prev: EOF,  Up: Top
18703c3a7b76Schristos
18713c3a7b76Schristos13 Miscellaneous Macros
18723c3a7b76Schristos***********************
18733c3a7b76Schristos
187430da1778SchristosThe macro 'YY_USER_ACTION' can be defined to provide an action which is
18753c3a7b76Schristosalways executed prior to the matched rule's action.  For example, it
18763c3a7b76Schristoscould be #define'd to call a routine to convert yytext to lower-case.
187730da1778SchristosWhen 'YY_USER_ACTION' is invoked, the variable 'yy_act' gives the number
187830da1778Schristosof the matched rule (rules are numbered starting with 1).  Suppose you
187930da1778Schristoswant to profile how often each of your rules is matched.  The following
188030da1778Schristoswould do the trick:
18813c3a7b76Schristos
18823c3a7b76Schristos         #define YY_USER_ACTION ++ctr[yy_act]
18833c3a7b76Schristos
188430da1778Schristos   where 'ctr' is an array to hold the counts for the different rules.
188530da1778SchristosNote that the macro 'YY_NUM_RULES' gives the total number of rules
188630da1778Schristos(including the default rule), even if you use '-s)', so a correct
188730da1778Schristosdeclaration for 'ctr' is:
18883c3a7b76Schristos
18893c3a7b76Schristos         int ctr[YY_NUM_RULES];
18903c3a7b76Schristos
189130da1778Schristos   The macro 'YY_USER_INIT' may be defined to provide an action which is
189230da1778Schristosalways executed before the first scan (and before the scanner's internal
189330da1778Schristosinitializations are done).  For example, it could be used to call a
189430da1778Schristosroutine to read in a data table or open a logging file.
18953c3a7b76Schristos
189630da1778Schristos   The macro 'yy_set_interactive(is_interactive)' can be used to control
189730da1778Schristoswhether the current buffer is considered "interactive".  An interactive
189830da1778Schristosbuffer is processed more slowly, but must be used when the scanner's
189930da1778Schristosinput source is indeed interactive to avoid problems due to waiting to
190030da1778Schristosfill buffers (see the discussion of the '-I' flag in *note Scanner
190130da1778SchristosOptions::).  A non-zero value in the macro invocation marks the buffer
190230da1778Schristosas interactive, a zero value as non-interactive.  Note that use of this
190330da1778Schristosmacro overrides '%option always-interactive' or '%option
190430da1778Schristosnever-interactive' (*note Scanner Options::).  'yy_set_interactive()'
19053c3a7b76Schristosmust be invoked prior to beginning to scan the buffer that is (or is
19063c3a7b76Schristosnot) to be considered interactive.
19073c3a7b76Schristos
190830da1778Schristos   The macro 'yy_set_bol(at_bol)' can be used to control whether the
19093c3a7b76Schristoscurrent buffer's scanning context for the next token match is done as
19103c3a7b76Schristosthough at the beginning of a line.  A non-zero macro argument makes
191130da1778Schristosrules anchored with '^' active, while a zero argument makes '^' rules
19123c3a7b76Schristosinactive.
19133c3a7b76Schristos
191430da1778Schristos   The macro 'YY_AT_BOL()' returns true if the next token scanned from
191530da1778Schristosthe current buffer will have '^' rules active, false otherwise.
19163c3a7b76Schristos
19173c3a7b76Schristos   In the generated scanner, the actions are all gathered in one large
191830da1778Schristosswitch statement and separated using 'YY_BREAK', which may be redefined.
191930da1778SchristosBy default, it is simply a 'break', to separate each rule's action from
192030da1778Schristosthe following rule's.  Redefining 'YY_BREAK' allows, for example, C++
192130da1778Schristosusers to #define YY_BREAK to do nothing (while being very careful that
192230da1778Schristosevery rule ends with a 'break' or a 'return'!)  to avoid suffering from
192330da1778Schristosunreachable statement warnings where because a rule's action ends with
192430da1778Schristos'return', the 'YY_BREAK' is inaccessible.
19253c3a7b76Schristos
19263c3a7b76Schristos
19273c3a7b76SchristosFile: flex.info,  Node: User Values,  Next: Yacc,  Prev: Misc Macros,  Up: Top
19283c3a7b76Schristos
19293c3a7b76Schristos14 Values Available To the User
19303c3a7b76Schristos*******************************
19313c3a7b76Schristos
19323c3a7b76SchristosThis chapter summarizes the various values available to the user in the
19333c3a7b76Schristosrule actions.
19343c3a7b76Schristos
193530da1778Schristos'char *yytext'
19363c3a7b76Schristos     holds the text of the current token.  It may be modified but not
19373c3a7b76Schristos     lengthened (you cannot append characters to the end).
19383c3a7b76Schristos
193930da1778Schristos     If the special directive '%array' appears in the first section of
194030da1778Schristos     the scanner description, then 'yytext' is instead declared 'char
194130da1778Schristos     yytext[YYLMAX]', where 'YYLMAX' is a macro definition that you can
19423c3a7b76Schristos     redefine in the first section if you don't like the default value
194330da1778Schristos     (generally 8KB). Using '%array' results in somewhat slower
194430da1778Schristos     scanners, but the value of 'yytext' becomes immune to calls to
194530da1778Schristos     'unput()', which potentially destroy its value when 'yytext' is a
194630da1778Schristos     character pointer.  The opposite of '%array' is '%pointer', which
19473c3a7b76Schristos     is the default.
19483c3a7b76Schristos
194930da1778Schristos     You cannot use '%array' when generating C++ scanner classes (the
195030da1778Schristos     '-+' flag).
19513c3a7b76Schristos
195230da1778Schristos'int yyleng'
19533c3a7b76Schristos     holds the length of the current token.
19543c3a7b76Schristos
195530da1778Schristos'FILE *yyin'
195630da1778Schristos     is the file which by default 'flex' reads from.  It may be
19573c3a7b76Schristos     redefined but doing so only makes sense before scanning begins or
19583c3a7b76Schristos     after an EOF has been encountered.  Changing it in the midst of
195930da1778Schristos     scanning will have unexpected results since 'flex' buffers its
196030da1778Schristos     input; use 'yyrestart()' instead.  Once scanning terminates because
196130da1778Schristos     an end-of-file has been seen, you can assign 'yyin' at the new
196230da1778Schristos     input file and then call the scanner again to continue scanning.
19633c3a7b76Schristos
196430da1778Schristos'void yyrestart( FILE *new_file )'
196530da1778Schristos     may be called to point 'yyin' at the new input file.  The
19663c3a7b76Schristos     switch-over to the new file is immediate (any previously
196730da1778Schristos     buffered-up input is lost).  Note that calling 'yyrestart()' with
196830da1778Schristos     'yyin' as an argument thus throws away the current input buffer and
196930da1778Schristos     continues scanning the same input file.
19703c3a7b76Schristos
197130da1778Schristos'FILE *yyout'
197230da1778Schristos     is the file to which 'ECHO' actions are done.  It can be reassigned
19733c3a7b76Schristos     by the user.
19743c3a7b76Schristos
197530da1778Schristos'YY_CURRENT_BUFFER'
197630da1778Schristos     returns a 'YY_BUFFER_STATE' handle to the current buffer.
19773c3a7b76Schristos
197830da1778Schristos'YY_START'
19793c3a7b76Schristos     returns an integer value corresponding to the current start
198030da1778Schristos     condition.  You can subsequently use this value with 'BEGIN' to
19813c3a7b76Schristos     return to that start condition.
19823c3a7b76Schristos
19833c3a7b76Schristos
19843c3a7b76SchristosFile: flex.info,  Node: Yacc,  Next: Scanner Options,  Prev: User Values,  Up: Top
19853c3a7b76Schristos
19863c3a7b76Schristos15 Interfacing with Yacc
19873c3a7b76Schristos************************
19883c3a7b76Schristos
198930da1778SchristosOne of the main uses of 'flex' is as a companion to the 'yacc'
199030da1778Schristosparser-generator.  'yacc' parsers expect to call a routine named
199130da1778Schristos'yylex()' to find the next input token.  The routine is supposed to
19923c3a7b76Schristosreturn the type of the next token as well as putting any associated
199330da1778Schristosvalue in the global 'yylval'.  To use 'flex' with 'yacc', one specifies
199430da1778Schristosthe '-d' option to 'yacc' to instruct it to generate the file 'y.tab.h'
199530da1778Schristoscontaining definitions of all the '%tokens' appearing in the 'yacc'
199630da1778Schristosinput.  This file is then included in the 'flex' scanner.  For example,
199730da1778Schristosif one of the tokens is 'TOK_NUMBER', part of the scanner might look
19983c3a7b76Schristoslike:
19993c3a7b76Schristos
20003c3a7b76Schristos         %{
20013c3a7b76Schristos         #include "y.tab.h"
20023c3a7b76Schristos         %}
20033c3a7b76Schristos
20043c3a7b76Schristos         %%
20053c3a7b76Schristos
20063c3a7b76Schristos         [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
20073c3a7b76Schristos
20083c3a7b76Schristos
20093c3a7b76SchristosFile: flex.info,  Node: Scanner Options,  Next: Performance,  Prev: Yacc,  Up: Top
20103c3a7b76Schristos
20113c3a7b76Schristos16 Scanner Options
20123c3a7b76Schristos******************
20133c3a7b76Schristos
201430da1778SchristosThe various 'flex' options are categorized by function in the following
20153c3a7b76Schristosmenu.  If you want to lookup a particular option by name, *Note Index of
20163c3a7b76SchristosScanner Options::.
20173c3a7b76Schristos
20183c3a7b76Schristos* Menu:
20193c3a7b76Schristos
20203c3a7b76Schristos* Options for Specifying Filenames::
20213c3a7b76Schristos* Options Affecting Scanner Behavior::
20223c3a7b76Schristos* Code-Level And API Options::
20233c3a7b76Schristos* Options for Scanner Speed and Size::
20243c3a7b76Schristos* Debugging Options::
20253c3a7b76Schristos* Miscellaneous Options::
20263c3a7b76Schristos
20273c3a7b76Schristos   Even though there are many scanner options, a typical scanner might
20283c3a7b76Schristosonly specify the following options:
20293c3a7b76Schristos
20303c3a7b76Schristos     %option   8bit reentrant bison-bridge
20313c3a7b76Schristos     %option   warn nodefault
20323c3a7b76Schristos     %option   yylineno
20333c3a7b76Schristos     %option   outfile="scanner.c" header-file="scanner.h"
20343c3a7b76Schristos
20353c3a7b76Schristos   The first line specifies the general type of scanner we want.  The
20363c3a7b76Schristossecond line specifies that we are being careful.  The third line asks
20373c3a7b76Schristosflex to track line numbers.  The last line tells flex what to name the
20383c3a7b76Schristosfiles.  (The options can be specified in any order.  We just divided
20393c3a7b76Schristosthem.)
20403c3a7b76Schristos
204130da1778Schristos   'flex' also provides a mechanism for controlling options within the
20423c3a7b76Schristosscanner specification itself, rather than from the flex command-line.
204330da1778SchristosThis is done by including '%option' directives in the first section of
20443c3a7b76Schristosthe scanner specification.  You can specify multiple options with a
204530da1778Schristossingle '%option' directive, and multiple directives in the first section
204630da1778Schristosof your flex input file.
20473c3a7b76Schristos
20483c3a7b76Schristos   Most options are given simply as names, optionally preceded by the
204930da1778Schristosword 'no' (with no intervening whitespace) to negate their meaning.  The
205030da1778Schristosnames are the same as their long-option equivalents (but without the
205130da1778Schristosleading '--' ).
20523c3a7b76Schristos
205330da1778Schristos   'flex' scans your rule actions to determine whether you use the
205430da1778Schristos'REJECT' or 'yymore()' features.  The 'REJECT' and 'yymore' options are
20553c3a7b76Schristosavailable to override its decision as to whether you use the options,
205630da1778Schristoseither by setting them (e.g., '%option reject)' to indicate the feature
20573c3a7b76Schristosis indeed used, or unsetting them to indicate it actually is not used
205830da1778Schristos(e.g., '%option noyymore)'.
20593c3a7b76Schristos
20603c3a7b76Schristos   A number of options are available for lint purists who want to
20613c3a7b76Schristossuppress the appearance of unneeded routines in the generated scanner.
206230da1778SchristosEach of the following, if unset (e.g., '%option nounput'), results in
20633c3a7b76Schristosthe corresponding routine not appearing in the generated scanner:
20643c3a7b76Schristos
20653c3a7b76Schristos         input, unput
20663c3a7b76Schristos         yy_push_state, yy_pop_state, yy_top_state
20673c3a7b76Schristos         yy_scan_buffer, yy_scan_bytes, yy_scan_string
20683c3a7b76Schristos
20693c3a7b76Schristos         yyget_extra, yyset_extra, yyget_leng, yyget_text,
20703c3a7b76Schristos         yyget_lineno, yyset_lineno, yyget_in, yyset_in,
20713c3a7b76Schristos         yyget_out, yyset_out, yyget_lval, yyset_lval,
20723c3a7b76Schristos         yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
20733c3a7b76Schristos
207430da1778Schristos   (though 'yy_push_state()' and friends won't appear anyway unless you
207530da1778Schristosuse '%option stack)'.
20763c3a7b76Schristos
20773c3a7b76Schristos
20783c3a7b76SchristosFile: flex.info,  Node: Options for Specifying Filenames,  Next: Options Affecting Scanner Behavior,  Prev: Scanner Options,  Up: Scanner Options
20793c3a7b76Schristos
20803c3a7b76Schristos16.1 Options for Specifying Filenames
20813c3a7b76Schristos=====================================
20823c3a7b76Schristos
208330da1778Schristos'--header-file=FILE, '%option header-file="FILE"''
208430da1778Schristos     instructs flex to write a C header to 'FILE'.  This file contains
20853c3a7b76Schristos     function prototypes, extern variables, and types used by the
20863c3a7b76Schristos     scanner.  Only the external API is exported by the header file.
20873c3a7b76Schristos     Many macros that are usable from within scanner actions are not
20883c3a7b76Schristos     exported to the header file.  This is due to namespace problems and
20893c3a7b76Schristos     the goal of a clean external API.
20903c3a7b76Schristos
209130da1778Schristos     While in the header, the macro 'yyIN_HEADER' is defined, where 'yy'
20923c3a7b76Schristos     is substituted with the appropriate prefix.
20933c3a7b76Schristos
209430da1778Schristos     The '--header-file' option is not compatible with the '--c++'
20953c3a7b76Schristos     option, since the C++ scanner provides its own header in
209630da1778Schristos     'yyFlexLexer.h'.
20973c3a7b76Schristos
209830da1778Schristos'-oFILE, --outfile=FILE, '%option outfile="FILE"''
209930da1778Schristos     directs flex to write the scanner to the file 'FILE' instead of
210030da1778Schristos     'lex.yy.c'.  If you combine '--outfile' with the '--stdout' option,
210130da1778Schristos     then the scanner is written to 'stdout' but its '#line' directives
210230da1778Schristos     (see the '-l' option above) refer to the file 'FILE'.
21033c3a7b76Schristos
210430da1778Schristos'-t, --stdout, '%option stdout''
210530da1778Schristos     instructs 'flex' to write the scanner it generates to standard
210630da1778Schristos     output instead of 'lex.yy.c'.
21073c3a7b76Schristos
210830da1778Schristos'-SFILE, --skel=FILE'
210930da1778Schristos     overrides the default skeleton file from which 'flex' constructs
21103c3a7b76Schristos     its scanners.  You'll never need this option unless you are doing
211130da1778Schristos     'flex' maintenance or development.
21123c3a7b76Schristos
211330da1778Schristos'--tables-file=FILE'
21143c3a7b76Schristos     Write serialized scanner dfa tables to FILE. The generated scanner
21153c3a7b76Schristos     will not contain the tables, and requires them to be loaded at
21163c3a7b76Schristos     runtime.  *Note serialization::.
21173c3a7b76Schristos
211830da1778Schristos'--tables-verify'
21193c3a7b76Schristos     This option is for flex development.  We document it here in case
21203c3a7b76Schristos     you stumble upon it by accident or in case you suspect some
21213c3a7b76Schristos     inconsistency in the serialized tables.  Flex will serialize the
21223c3a7b76Schristos     scanner dfa tables but will also generate the in-code tables as it
21233c3a7b76Schristos     normally does.  At runtime, the scanner will verify that the
21243c3a7b76Schristos     serialized tables match the in-code tables, instead of loading
21253c3a7b76Schristos     them.
21263c3a7b76Schristos
21273c3a7b76Schristos
21283c3a7b76SchristosFile: flex.info,  Node: Options Affecting Scanner Behavior,  Next: Code-Level And API Options,  Prev: Options for Specifying Filenames,  Up: Scanner Options
21293c3a7b76Schristos
21303c3a7b76Schristos16.2 Options Affecting Scanner Behavior
21313c3a7b76Schristos=======================================
21323c3a7b76Schristos
213330da1778Schristos'-i, --case-insensitive, '%option case-insensitive''
213430da1778Schristos     instructs 'flex' to generate a "case-insensitive" scanner.  The
213530da1778Schristos     case of letters given in the 'flex' input patterns will be ignored,
21363c3a7b76Schristos     and tokens in the input will be matched regardless of case.  The
213730da1778Schristos     matched text given in 'yytext' will have the preserved case (i.e.,
2138dded093eSchristos     it will not be folded).  For tricky behavior, see *note case and
21393c3a7b76Schristos     character ranges::.
21403c3a7b76Schristos
214130da1778Schristos'-l, --lex-compat, '%option lex-compat''
214230da1778Schristos     turns on maximum compatibility with the original AT&T 'lex'
21433c3a7b76Schristos     implementation.  Note that this does not mean _full_ compatibility.
21443c3a7b76Schristos     Use of this option costs a considerable amount of performance, and
214530da1778Schristos     it cannot be used with the '--c++', '--full', '--fast', '-Cf', or
214630da1778Schristos     '-CF' options.  For details on the compatibilities it provides, see
2147dded093eSchristos     *note Lex and Posix::.  This option also results in the name
214830da1778Schristos     'YY_FLEX_LEX_COMPAT' being '#define''d in the generated scanner.
21493c3a7b76Schristos
215030da1778Schristos'-B, --batch, '%option batch''
215130da1778Schristos     instructs 'flex' to generate a "batch" scanner, the opposite of
215230da1778Schristos     _interactive_ scanners generated by '--interactive' (see below).
215330da1778Schristos     In general, you use '-B' when you are _certain_ that your scanner
21543c3a7b76Schristos     will never be used interactively, and you want to squeeze a
21553c3a7b76Schristos     _little_ more performance out of it.  If your goal is instead to
215630da1778Schristos     squeeze out a _lot_ more performance, you should be using the '-Cf'
215730da1778Schristos     or '-CF' options, which turn on '--batch' automatically anyway.
21583c3a7b76Schristos
215930da1778Schristos'-I, --interactive, '%option interactive''
216030da1778Schristos     instructs 'flex' to generate an interactive scanner.  An
21613c3a7b76Schristos     interactive scanner is one that only looks ahead to decide what
21623c3a7b76Schristos     token has been matched if it absolutely must.  It turns out that
21633c3a7b76Schristos     always looking one extra character ahead, even if the scanner has
21643c3a7b76Schristos     already seen enough text to disambiguate the current token, is a
21653c3a7b76Schristos     bit faster than only looking ahead when necessary.  But scanners
21663c3a7b76Schristos     that always look ahead give dreadful interactive performance; for
21673c3a7b76Schristos     example, when a user types a newline, it is not recognized as a
21683c3a7b76Schristos     newline token until they enter _another_ token, which often means
21693c3a7b76Schristos     typing in another whole line.
21703c3a7b76Schristos
217130da1778Schristos     'flex' scanners default to 'interactive' unless you use the '-Cf'
217230da1778Schristos     or '-CF' table-compression options (*note Performance::).  That's
21733c3a7b76Schristos     because if you're looking for high-performance you should be using
217430da1778Schristos     one of these options, so if you didn't, 'flex' assumes you'd rather
217530da1778Schristos     trade off a bit of run-time performance for intuitive interactive
217630da1778Schristos     behavior.  Note also that you _cannot_ use '--interactive' in
217730da1778Schristos     conjunction with '-Cf' or '-CF'.  Thus, this option is not really
217830da1778Schristos     needed; it is on by default for all those cases in which it is
217930da1778Schristos     allowed.
21803c3a7b76Schristos
218130da1778Schristos     You can force a scanner to _not_ be interactive by using '--batch'
21823c3a7b76Schristos
218330da1778Schristos'-7, --7bit, '%option 7bit''
218430da1778Schristos     instructs 'flex' to generate a 7-bit scanner, i.e., one which can
21853c3a7b76Schristos     only recognize 7-bit characters in its input.  The advantage of
218630da1778Schristos     using '--7bit' is that the scanner's tables can be up to half the
218730da1778Schristos     size of those generated using the '--8bit'.  The disadvantage is
21883c3a7b76Schristos     that such scanners often hang or crash if their input contains an
21893c3a7b76Schristos     8-bit character.
21903c3a7b76Schristos
21913c3a7b76Schristos     Note, however, that unless you generate your scanner using the
219230da1778Schristos     '-Cf' or '-CF' table compression options, use of '--7bit' will save
219330da1778Schristos     only a small amount of table space, and make your scanner
219430da1778Schristos     considerably less portable.  'Flex''s default behavior is to
219530da1778Schristos     generate an 8-bit scanner unless you use the '-Cf' or '-CF', in
219630da1778Schristos     which case 'flex' defaults to generating 7-bit scanners unless your
219730da1778Schristos     site was always configured to generate 8-bit scanners (as will
21983c3a7b76Schristos     often be the case with non-USA sites).  You can tell whether flex
21993c3a7b76Schristos     generated a 7-bit or an 8-bit scanner by inspecting the flag
220030da1778Schristos     summary in the '--verbose' output as described above.
22013c3a7b76Schristos
220230da1778Schristos     Note that if you use '-Cfe' or '-CFe' 'flex' still defaults to
22033c3a7b76Schristos     generating an 8-bit scanner, since usually with these compression
22043c3a7b76Schristos     options full 8-bit tables are not much more expensive than 7-bit
22053c3a7b76Schristos     tables.
22063c3a7b76Schristos
220730da1778Schristos'-8, --8bit, '%option 8bit''
220830da1778Schristos     instructs 'flex' to generate an 8-bit scanner, i.e., one which can
22093c3a7b76Schristos     recognize 8-bit characters.  This flag is only needed for scanners
221030da1778Schristos     generated using '-Cf' or '-CF', as otherwise flex defaults to
22113c3a7b76Schristos     generating an 8-bit scanner anyway.
22123c3a7b76Schristos
221330da1778Schristos     See the discussion of '--7bit' above for 'flex''s default behavior
22143c3a7b76Schristos     and the tradeoffs between 7-bit and 8-bit scanners.
22153c3a7b76Schristos
221630da1778Schristos'--default, '%option default''
22173c3a7b76Schristos     generate the default rule.
22183c3a7b76Schristos
221930da1778Schristos'--always-interactive, '%option always-interactive''
22203c3a7b76Schristos     instructs flex to generate a scanner which always considers its
22213c3a7b76Schristos     input _interactive_.  Normally, on each new input file the scanner
222230da1778Schristos     calls 'isatty()' in an attempt to determine whether the scanner's
22233c3a7b76Schristos     input source is interactive and thus should be read a character at
22243c3a7b76Schristos     a time.  When this option is used, however, then no such call is
22253c3a7b76Schristos     made.
22263c3a7b76Schristos
222730da1778Schristos'--never-interactive, '--never-interactive''
22283c3a7b76Schristos     instructs flex to generate a scanner which never considers its
222930da1778Schristos     input interactive.  This is the opposite of 'always-interactive'.
22303c3a7b76Schristos
223130da1778Schristos'-X, --posix, '%option posix''
22323c3a7b76Schristos     turns on maximum compatibility with the POSIX 1003.2-1992
223330da1778Schristos     definition of 'lex'.  Since 'flex' was originally designed to
223430da1778Schristos     implement the POSIX definition of 'lex' this generally involves
22353c3a7b76Schristos     very few changes in behavior.  At the current writing the known
223630da1778Schristos     differences between 'flex' and the POSIX standard are:
22373c3a7b76Schristos
223830da1778Schristos        * In POSIX and AT&T 'lex', the repeat operator, '{}', has lower
223930da1778Schristos          precedence than concatenation (thus 'ab{3}' yields 'ababab').
22403c3a7b76Schristos          Most POSIX utilities use an Extended Regular Expression (ERE)
22413c3a7b76Schristos          precedence that has the precedence of the repeat operator
224230da1778Schristos          higher than concatenation (which causes 'ab{3}' to yield
224330da1778Schristos          'abbb').  By default, 'flex' places the precedence of the
22443c3a7b76Schristos          repeat operator higher than concatenation which matches the
22453c3a7b76Schristos          ERE processing of other POSIX utilities.  When either
224630da1778Schristos          '--posix' or '-l' are specified, 'flex' will use the
224730da1778Schristos          traditional AT&T and POSIX-compliant precedence for the repeat
224830da1778Schristos          operator where concatenation has higher precedence than the
224930da1778Schristos          repeat operator.
22503c3a7b76Schristos
225130da1778Schristos'--stack, '%option stack''
22523c3a7b76Schristos     enables the use of start condition stacks (*note Start
22533c3a7b76Schristos     Conditions::).
22543c3a7b76Schristos
225530da1778Schristos'--stdinit, '%option stdinit''
225630da1778Schristos     if set (i.e., %option stdinit) initializes 'yyin' and 'yyout' to
225730da1778Schristos     'stdin' and 'stdout', instead of the default of 'NULL'.  Some
225830da1778Schristos     existing 'lex' programs depend on this behavior, even though it is
225930da1778Schristos     not compliant with ANSI C, which does not require 'stdin' and
226030da1778Schristos     'stdout' to be compile-time constant.  In a reentrant scanner,
226130da1778Schristos     however, this is not a problem since initialization is performed in
226230da1778Schristos     'yylex_init' at runtime.
22633c3a7b76Schristos
226430da1778Schristos'--yylineno, '%option yylineno''
226530da1778Schristos     directs 'flex' to generate a scanner that maintains the number of
22663c3a7b76Schristos     the current line read from its input in the global variable
226730da1778Schristos     'yylineno'.  This option is implied by '%option lex-compat'.  In a
226830da1778Schristos     reentrant C scanner, the macro 'yylineno' is accessible regardless
226930da1778Schristos     of the value of '%option yylineno', however, its value is not
227030da1778Schristos     modified by 'flex' unless '%option yylineno' is enabled.
22713c3a7b76Schristos
227230da1778Schristos'--yywrap, '%option yywrap''
227330da1778Schristos     if unset (i.e., '--noyywrap)', makes the scanner not call
227430da1778Schristos     'yywrap()' upon an end-of-file, but simply assume that there are no
227530da1778Schristos     more files to scan (until the user points 'yyin' at a new file and
227630da1778Schristos     calls 'yylex()' again).
22773c3a7b76Schristos
22783c3a7b76Schristos
22793c3a7b76SchristosFile: flex.info,  Node: Code-Level And API Options,  Next: Options for Scanner Speed and Size,  Prev: Options Affecting Scanner Behavior,  Up: Scanner Options
22803c3a7b76Schristos
22813c3a7b76Schristos16.3 Code-Level And API Options
22823c3a7b76Schristos===============================
22833c3a7b76Schristos
228430da1778Schristos'--ansi-definitions, '%option ansi-definitions''
228556bd8546Schristos     Deprecated, ignored
22863c3a7b76Schristos
228730da1778Schristos'--ansi-prototypes, '%option ansi-prototypes''
228856bd8546Schristos     Deprecated, ignored
22893c3a7b76Schristos
229030da1778Schristos'--bison-bridge, '%option bison-bridge''
22913c3a7b76Schristos     instructs flex to generate a C scanner that is meant to be called
229230da1778Schristos     by a 'GNU bison' parser.  The scanner has minor API changes for
229330da1778Schristos     'bison' compatibility.  In particular, the declaration of 'yylex'
229430da1778Schristos     is modified to take an additional parameter, 'yylval'.  *Note Bison
229530da1778Schristos     Bridge::.
22963c3a7b76Schristos
229730da1778Schristos'--bison-locations, '%option bison-locations''
229830da1778Schristos     instruct flex that 'GNU bison' '%locations' are being used.  This
229930da1778Schristos     means 'yylex' will be passed an additional parameter, 'yylloc'.
230030da1778Schristos     This option implies '%option bison-bridge'.  *Note Bison Bridge::.
23013c3a7b76Schristos
230230da1778Schristos'-L, --noline, '%option noline''
230330da1778Schristos     instructs 'flex' not to generate '#line' directives.  Without this
230430da1778Schristos     option, 'flex' peppers the generated scanner with '#line'
23053c3a7b76Schristos     directives so error messages in the actions will be correctly
230630da1778Schristos     located with respect to either the original 'flex' input file (if
230730da1778Schristos     the errors are due to code in the input file), or 'lex.yy.c' (if
230830da1778Schristos     the errors are 'flex''s fault - you should report these sorts of
2309dded093eSchristos     errors to the email address given in *note Reporting Bugs::).
23103c3a7b76Schristos
231130da1778Schristos'-R, --reentrant, '%option reentrant''
23123c3a7b76Schristos     instructs flex to generate a reentrant C scanner.  The generated
23133c3a7b76Schristos     scanner may safely be used in a multi-threaded environment.  The
23143c3a7b76Schristos     API for a reentrant scanner is different than for a non-reentrant
23153c3a7b76Schristos     scanner *note Reentrant::).  Because of the API difference between
231630da1778Schristos     reentrant and non-reentrant 'flex' scanners, non-reentrant flex
23173c3a7b76Schristos     code must be modified before it is suitable for use with this
231830da1778Schristos     option.  This option is not compatible with the '--c++' option.
23193c3a7b76Schristos
232030da1778Schristos     The option '--reentrant' does not affect the performance of the
23213c3a7b76Schristos     scanner.
23223c3a7b76Schristos
232330da1778Schristos'-+, --c++, '%option c++''
23243c3a7b76Schristos     specifies that you want flex to generate a C++ scanner class.
23253c3a7b76Schristos     *Note Cxx::, for details.
23263c3a7b76Schristos
232730da1778Schristos'--array, '%option array''
23283c3a7b76Schristos     specifies that you want yytext to be an array instead of a char*
23293c3a7b76Schristos
233030da1778Schristos'--pointer, '%option pointer''
233130da1778Schristos     specify that 'yytext' should be a 'char *', not an array.  This
233230da1778Schristos     default is 'char *'.
23333c3a7b76Schristos
233430da1778Schristos'-PPREFIX, --prefix=PREFIX, '%option prefix="PREFIX"''
233530da1778Schristos     changes the default 'yy' prefix used by 'flex' for all
23363c3a7b76Schristos     globally-visible variable and function names to instead be
233730da1778Schristos     'PREFIX'.  For example, '--prefix=foo' changes the name of 'yytext'
233830da1778Schristos     to 'footext'.  It also changes the name of the default output file
233930da1778Schristos     from 'lex.yy.c' to 'lex.foo.c'.  Here is a partial list of the
234030da1778Schristos     names affected:
23413c3a7b76Schristos
23423c3a7b76Schristos              yy_create_buffer
23433c3a7b76Schristos              yy_delete_buffer
23443c3a7b76Schristos              yy_flex_debug
23453c3a7b76Schristos              yy_init_buffer
23463c3a7b76Schristos              yy_flush_buffer
23473c3a7b76Schristos              yy_load_buffer_state
23483c3a7b76Schristos              yy_switch_to_buffer
23493c3a7b76Schristos              yyin
23503c3a7b76Schristos              yyleng
23513c3a7b76Schristos              yylex
23523c3a7b76Schristos              yylineno
23533c3a7b76Schristos              yyout
23543c3a7b76Schristos              yyrestart
23553c3a7b76Schristos              yytext
23563c3a7b76Schristos              yywrap
23573c3a7b76Schristos              yyalloc
23583c3a7b76Schristos              yyrealloc
23593c3a7b76Schristos              yyfree
23603c3a7b76Schristos
236130da1778Schristos     (If you are using a C++ scanner, then only 'yywrap' and
236230da1778Schristos     'yyFlexLexer' are affected.)  Within your scanner itself, you can
23633c3a7b76Schristos     still refer to the global variables and functions using either
23643c3a7b76Schristos     version of their name; but externally, they have the modified name.
23653c3a7b76Schristos
236630da1778Schristos     This option lets you easily link together multiple 'flex' programs
23673c3a7b76Schristos     into the same executable.  Note, though, that using this option
236830da1778Schristos     also renames 'yywrap()', so you now _must_ either provide your own
23693c3a7b76Schristos     (appropriately-named) version of the routine for your scanner, or
237030da1778Schristos     use '%option noyywrap', as linking with '-lfl' no longer provides
23713c3a7b76Schristos     one for you by default.
23723c3a7b76Schristos
237330da1778Schristos'--main, '%option main''
237430da1778Schristos     directs flex to provide a default 'main()' program for the scanner,
237530da1778Schristos     which simply calls 'yylex()'.  This option implies 'noyywrap' (see
237630da1778Schristos     below).
23773c3a7b76Schristos
237830da1778Schristos'--nounistd, '%option nounistd''
237930da1778Schristos     suppresses inclusion of the non-ANSI header file 'unistd.h'.  This
238030da1778Schristos     option is meant to target environments in which 'unistd.h' does not
238130da1778Schristos     exist.  Be aware that certain options may cause flex to generate
238230da1778Schristos     code that relies on functions normally found in 'unistd.h', (e.g.
238330da1778Schristos     'isatty()', 'read()'.)  If you wish to use these functions, you
238430da1778Schristos     will have to inform your compiler where to find them.  *Note
238530da1778Schristos     option-always-interactive::.  *Note option-read::.
23863c3a7b76Schristos
238730da1778Schristos'--yyclass=NAME, '%option yyclass="NAME"''
238830da1778Schristos     only applies when generating a C++ scanner (the '--c++' option).
238930da1778Schristos     It informs 'flex' that you have derived 'NAME' as a subclass of
239030da1778Schristos     'yyFlexLexer', so 'flex' will place your actions in the member
239130da1778Schristos     function 'foo::yylex()' instead of 'yyFlexLexer::yylex()'.  It also
239230da1778Schristos     generates a 'yyFlexLexer::yylex()' member function that emits a
239330da1778Schristos     run-time error (by invoking 'yyFlexLexer::LexerError())' if called.
239430da1778Schristos     *Note Cxx::.
23953c3a7b76Schristos
23963c3a7b76Schristos
23973c3a7b76SchristosFile: flex.info,  Node: Options for Scanner Speed and Size,  Next: Debugging Options,  Prev: Code-Level And API Options,  Up: Scanner Options
23983c3a7b76Schristos
23993c3a7b76Schristos16.4 Options for Scanner Speed and Size
24003c3a7b76Schristos=======================================
24013c3a7b76Schristos
240230da1778Schristos'-C[aefFmr]'
24033c3a7b76Schristos     controls the degree of table compression and, more generally,
24043c3a7b76Schristos     trade-offs between small scanners and fast scanners.
24053c3a7b76Schristos
240630da1778Schristos     '-C'
240730da1778Schristos          A lone '-C' specifies that the scanner tables should be
24083c3a7b76Schristos          compressed but neither equivalence classes nor
24093c3a7b76Schristos          meta-equivalence classes should be used.
24103c3a7b76Schristos
241130da1778Schristos     '-Ca, --align, '%option align''
24123c3a7b76Schristos          ("align") instructs flex to trade off larger tables in the
24133c3a7b76Schristos          generated scanner for faster performance because the elements
24143c3a7b76Schristos          of the tables are better aligned for memory access and
24153c3a7b76Schristos          computation.  On some RISC architectures, fetching and
24163c3a7b76Schristos          manipulating longwords is more efficient than with
24173c3a7b76Schristos          smaller-sized units such as shortwords.  This option can
24183c3a7b76Schristos          quadruple the size of the tables used by your scanner.
24193c3a7b76Schristos
242030da1778Schristos     '-Ce, --ecs, '%option ecs''
242130da1778Schristos          directs 'flex' to construct "equivalence classes", i.e., sets
24223c3a7b76Schristos          of characters which have identical lexical properties (for
242330da1778Schristos          example, if the only appearance of digits in the 'flex' input
24243c3a7b76Schristos          is in the character class "[0-9]" then the digits '0', '1',
24253c3a7b76Schristos          ..., '9' will all be put in the same equivalence class).
24263c3a7b76Schristos          Equivalence classes usually give dramatic reductions in the
24273c3a7b76Schristos          final table/object file sizes (typically a factor of 2-5) and
24283c3a7b76Schristos          are pretty cheap performance-wise (one array look-up per
24293c3a7b76Schristos          character scanned).
24303c3a7b76Schristos
243130da1778Schristos     '-Cf'
24323c3a7b76Schristos          specifies that the "full" scanner tables should be generated -
243330da1778Schristos          'flex' should not compress the tables by taking advantages of
24343c3a7b76Schristos          similar transition functions for different states.
24353c3a7b76Schristos
243630da1778Schristos     '-CF'
24373c3a7b76Schristos          specifies that the alternate fast scanner representation
243830da1778Schristos          (described above under the '--fast' flag) should be used.
243930da1778Schristos          This option cannot be used with '--c++'.
24403c3a7b76Schristos
244130da1778Schristos     '-Cm, --meta-ecs, '%option meta-ecs''
244230da1778Schristos          directs 'flex' to construct "meta-equivalence classes", which
24433c3a7b76Schristos          are sets of equivalence classes (or characters, if equivalence
24443c3a7b76Schristos          classes are not being used) that are commonly used together.
24453c3a7b76Schristos          Meta-equivalence classes are often a big win when using
244630da1778Schristos          compressed tables, but they have a moderate performance impact
244730da1778Schristos          (one or two 'if' tests and one array look-up per character
244830da1778Schristos          scanned).
24493c3a7b76Schristos
245030da1778Schristos     '-Cr, --read, '%option read''
24513c3a7b76Schristos          causes the generated scanner to _bypass_ use of the standard
245230da1778Schristos          I/O library ('stdio') for input.  Instead of calling 'fread()'
245330da1778Schristos          or 'getc()', the scanner will use the 'read()' system call,
245430da1778Schristos          resulting in a performance gain which varies from system to
245530da1778Schristos          system, but in general is probably negligible unless you are
245630da1778Schristos          also using '-Cf' or '-CF'.  Using '-Cr' can cause strange
245730da1778Schristos          behavior if, for example, you read from 'yyin' using 'stdio'
245830da1778Schristos          prior to calling the scanner (because the scanner will miss
245930da1778Schristos          whatever text your previous reads left in the 'stdio' input
246030da1778Schristos          buffer).  '-Cr' has no effect if you define 'YY_INPUT()'
246130da1778Schristos          (*note Generated Scanner::).
24623c3a7b76Schristos
246330da1778Schristos     The options '-Cf' or '-CF' and '-Cm' do not make sense together -
24643c3a7b76Schristos     there is no opportunity for meta-equivalence classes if the table
24653c3a7b76Schristos     is not being compressed.  Otherwise the options may be freely
24663c3a7b76Schristos     mixed, and are cumulative.
24673c3a7b76Schristos
246830da1778Schristos     The default setting is '-Cem', which specifies that 'flex' should
24693c3a7b76Schristos     generate equivalence classes and meta-equivalence classes.  This
24703c3a7b76Schristos     setting provides the highest degree of table compression.  You can
24713c3a7b76Schristos     trade off faster-executing scanners at the cost of larger tables
24723c3a7b76Schristos     with the following generally being true:
24733c3a7b76Schristos
24743c3a7b76Schristos              slowest & smallest
24753c3a7b76Schristos                    -Cem
24763c3a7b76Schristos                    -Cm
24773c3a7b76Schristos                    -Ce
24783c3a7b76Schristos                    -C
24793c3a7b76Schristos                    -C{f,F}e
24803c3a7b76Schristos                    -C{f,F}
24813c3a7b76Schristos                    -C{f,F}a
24823c3a7b76Schristos              fastest & largest
24833c3a7b76Schristos
24843c3a7b76Schristos     Note that scanners with the smallest tables are usually generated
24853c3a7b76Schristos     and compiled the quickest, so during development you will usually
24863c3a7b76Schristos     want to use the default, maximal compression.
24873c3a7b76Schristos
248830da1778Schristos     '-Cfe' is often a good compromise between speed and size for
24893c3a7b76Schristos     production scanners.
24903c3a7b76Schristos
249130da1778Schristos'-f, --full, '%option full''
249230da1778Schristos     specifies "fast scanner".  No table compression is done and 'stdio'
249330da1778Schristos     is bypassed.  The result is large but fast.  This option is
249430da1778Schristos     equivalent to '--Cfr'
24953c3a7b76Schristos
249630da1778Schristos'-F, --fast, '%option fast''
24973c3a7b76Schristos     specifies that the _fast_ scanner table representation should be
249830da1778Schristos     used (and 'stdio' bypassed).  This representation is about as fast
249930da1778Schristos     as the full table representation '--full', and for some sets of
25003c3a7b76Schristos     patterns will be considerably smaller (and for others, larger).  In
25013c3a7b76Schristos     general, if the pattern set contains both _keywords_ and a
25023c3a7b76Schristos     catch-all, _identifier_ rule, such as in the set:
25033c3a7b76Schristos
25043c3a7b76Schristos              "case"    return TOK_CASE;
25053c3a7b76Schristos              "switch"  return TOK_SWITCH;
25063c3a7b76Schristos              ...
25073c3a7b76Schristos              "default" return TOK_DEFAULT;
25083c3a7b76Schristos              [a-z]+    return TOK_ID;
25093c3a7b76Schristos
25103c3a7b76Schristos     then you're better off using the full table representation.  If
251130da1778Schristos     only the _identifier_ rule is present and you then use a hash table
251230da1778Schristos     or some such to detect the keywords, you're better off using
251330da1778Schristos     '--fast'.
25143c3a7b76Schristos
251530da1778Schristos     This option is equivalent to '-CFr'.  It cannot be used with
251630da1778Schristos     '--c++'.
25173c3a7b76Schristos
25183c3a7b76Schristos
25193c3a7b76SchristosFile: flex.info,  Node: Debugging Options,  Next: Miscellaneous Options,  Prev: Options for Scanner Speed and Size,  Up: Scanner Options
25203c3a7b76Schristos
25213c3a7b76Schristos16.5 Debugging Options
25223c3a7b76Schristos======================
25233c3a7b76Schristos
252430da1778Schristos'-b, --backup, '%option backup''
252530da1778Schristos     Generate backing-up information to 'lex.backup'.  This is a list of
25263c3a7b76Schristos     scanner states which require backing up and the input characters on
25273c3a7b76Schristos     which they do so.  By adding rules one can remove backing-up
252830da1778Schristos     states.  If _all_ backing-up states are eliminated and '-Cf' or
252930da1778Schristos     '-CF' is used, the generated scanner will run faster (see the
253030da1778Schristos     '--perf-report' flag).  Only users who wish to squeeze every last
25313c3a7b76Schristos     cycle out of their scanners need worry about this option.  (*note
25323c3a7b76Schristos     Performance::).
25333c3a7b76Schristos
253430da1778Schristos'-d, --debug, '%option debug''
25353c3a7b76Schristos     makes the generated scanner run in "debug" mode.  Whenever a
253630da1778Schristos     pattern is recognized and the global variable 'yy_flex_debug' is
253730da1778Schristos     non-zero (which is the default), the scanner will write to 'stderr'
253830da1778Schristos     a line of the form:
25393c3a7b76Schristos
25403c3a7b76Schristos              -accepting rule at line 53 ("the matched text")
25413c3a7b76Schristos
25423c3a7b76Schristos     The line number refers to the location of the rule in the file
25433c3a7b76Schristos     defining the scanner (i.e., the file that was fed to flex).
25443c3a7b76Schristos     Messages are also generated when the scanner backs up, accepts the
25453c3a7b76Schristos     default rule, reaches the end of its input buffer (or encounters a
25463c3a7b76Schristos     NUL; at this point, the two look the same as far as the scanner's
25473c3a7b76Schristos     concerned), or reaches an end-of-file.
25483c3a7b76Schristos
254930da1778Schristos'-p, --perf-report, '%option perf-report''
255030da1778Schristos     generates a performance report to 'stderr'.  The report consists of
255130da1778Schristos     comments regarding features of the 'flex' input file which will
25523c3a7b76Schristos     cause a serious loss of performance in the resulting scanner.  If
25533c3a7b76Schristos     you give the flag twice, you will also get comments regarding
25543c3a7b76Schristos     features that lead to minor performance losses.
25553c3a7b76Schristos
255630da1778Schristos     Note that the use of 'REJECT', and variable trailing context (*note
255730da1778Schristos     Limitations::) entails a substantial performance penalty; use of
255830da1778Schristos     'yymore()', the '^' operator, and the '--interactive' flag entail
255930da1778Schristos     minor performance penalties.
25603c3a7b76Schristos
256130da1778Schristos'-s, --nodefault, '%option nodefault''
25623c3a7b76Schristos     causes the _default rule_ (that unmatched scanner input is echoed
256330da1778Schristos     to 'stdout)' to be suppressed.  If the scanner encounters input
25643c3a7b76Schristos     that does not match any of its rules, it aborts with an error.
25653c3a7b76Schristos     This option is useful for finding holes in a scanner's rule set.
25663c3a7b76Schristos
256730da1778Schristos'-T, --trace, '%option trace''
256830da1778Schristos     makes 'flex' run in "trace" mode.  It will generate a lot of
256930da1778Schristos     messages to 'stderr' concerning the form of the input and the
25703c3a7b76Schristos     resultant non-deterministic and deterministic finite automata.
257130da1778Schristos     This option is mostly for use in maintaining 'flex'.
25723c3a7b76Schristos
257330da1778Schristos'-w, --nowarn, '%option nowarn''
25743c3a7b76Schristos     suppresses warning messages.
25753c3a7b76Schristos
257630da1778Schristos'-v, --verbose, '%option verbose''
257730da1778Schristos     specifies that 'flex' should write to 'stderr' a summary of
25783c3a7b76Schristos     statistics regarding the scanner it generates.  Most of the
257930da1778Schristos     statistics are meaningless to the casual 'flex' user, but the first
258030da1778Schristos     line identifies the version of 'flex' (same as reported by
258130da1778Schristos     '--version'), and the next line the flags used when generating the
25823c3a7b76Schristos     scanner, including those that are on by default.
25833c3a7b76Schristos
258430da1778Schristos'--warn, '%option warn''
25853c3a7b76Schristos     warn about certain things.  In particular, if the default rule can
25863c3a7b76Schristos     be matched but no default rule has been given, the flex will warn
25873c3a7b76Schristos     you.  We recommend using this option always.
25883c3a7b76Schristos
25893c3a7b76Schristos
25903c3a7b76SchristosFile: flex.info,  Node: Miscellaneous Options,  Prev: Debugging Options,  Up: Scanner Options
25913c3a7b76Schristos
25923c3a7b76Schristos16.6 Miscellaneous Options
25933c3a7b76Schristos==========================
25943c3a7b76Schristos
259530da1778Schristos'-c'
25963c3a7b76Schristos     A do-nothing option included for POSIX compliance.
25973c3a7b76Schristos
259830da1778Schristos'-h, -?, --help'
259930da1778Schristos     generates a "help" summary of 'flex''s options to 'stdout' and then
260030da1778Schristos     exits.
26013c3a7b76Schristos
260230da1778Schristos'-n'
26033c3a7b76Schristos     Another do-nothing option included for POSIX compliance.
26043c3a7b76Schristos
260530da1778Schristos'-V, --version'
260630da1778Schristos     prints the version number to 'stdout' and exits.
26073c3a7b76Schristos
26083c3a7b76Schristos
26093c3a7b76SchristosFile: flex.info,  Node: Performance,  Next: Cxx,  Prev: Scanner Options,  Up: Top
26103c3a7b76Schristos
26113c3a7b76Schristos17 Performance Considerations
26123c3a7b76Schristos*****************************
26133c3a7b76Schristos
261430da1778SchristosThe main design goal of 'flex' is that it generate high-performance
26153c3a7b76Schristosscanners.  It has been optimized for dealing well with large sets of
26163c3a7b76Schristosrules.  Aside from the effects on scanner speed of the table compression
261730da1778Schristos'-C' options outlined above, there are a number of options/actions which
261830da1778Schristosdegrade performance.  These are, from most expensive to least:
26193c3a7b76Schristos
26203c3a7b76Schristos         REJECT
26213c3a7b76Schristos         arbitrary trailing context
26223c3a7b76Schristos
26233c3a7b76Schristos         pattern sets that require backing up
26243c3a7b76Schristos         %option yylineno
26253c3a7b76Schristos         %array
26263c3a7b76Schristos
26273c3a7b76Schristos         %option interactive
26283c3a7b76Schristos         %option always-interactive
26293c3a7b76Schristos
2630dded093eSchristos         ^ beginning-of-line operator
26313c3a7b76Schristos         yymore()
26323c3a7b76Schristos
26333c3a7b76Schristos   with the first two all being quite expensive and the last two being
263430da1778Schristosquite cheap.  Note also that 'unput()' is implemented as a routine call
263530da1778Schristosthat potentially does quite a bit of work, while 'yyless()' is a
26363c3a7b76Schristosquite-cheap macro.  So if you are just putting back some excess text you
263730da1778Schristosscanned, use 'yyless()'.
26383c3a7b76Schristos
263930da1778Schristos   'REJECT' should be avoided at all costs when performance is
26403c3a7b76Schristosimportant.  It is a particularly expensive option.
26413c3a7b76Schristos
264230da1778Schristos   There is one case when '%option yylineno' can be expensive.  That is
26433c3a7b76Schristoswhen your patterns match long tokens that could _possibly_ contain a
26443c3a7b76Schristosnewline character.  There is no performance penalty for rules that can
26453c3a7b76Schristosnot possibly match newlines, since flex does not need to check them for
264630da1778Schristosnewlines.  In general, you should avoid rules such as '[^f]+', which
26473c3a7b76Schristosmatch very long tokens, including newlines, and may possibly match your
264830da1778Schristosentire file!  A better approach is to separate '[^f]+' into two rules:
26493c3a7b76Schristos
26503c3a7b76Schristos     %option yylineno
26513c3a7b76Schristos     %%
26523c3a7b76Schristos         [^f\n]+
26533c3a7b76Schristos         \n+
26543c3a7b76Schristos
26553c3a7b76Schristos   The above scanner does not incur a performance penalty.
26563c3a7b76Schristos
26573c3a7b76Schristos   Getting rid of backing up is messy and often may be an enormous
26583c3a7b76Schristosamount of work for a complicated scanner.  In principal, one begins by
265930da1778Schristosusing the '-b' flag to generate a 'lex.backup' file.  For example, on
26603c3a7b76Schristosthe input:
26613c3a7b76Schristos
26623c3a7b76Schristos         %%
26633c3a7b76Schristos         foo        return TOK_KEYWORD;
26643c3a7b76Schristos         foobar     return TOK_KEYWORD;
26653c3a7b76Schristos
26663c3a7b76Schristos   the file looks like:
26673c3a7b76Schristos
26683c3a7b76Schristos         State #6 is non-accepting -
26693c3a7b76Schristos          associated rule line numbers:
26703c3a7b76Schristos                2       3
26713c3a7b76Schristos          out-transitions: [ o ]
26723c3a7b76Schristos          jam-transitions: EOF [ \001-n  p-\177 ]
26733c3a7b76Schristos
26743c3a7b76Schristos         State #8 is non-accepting -
26753c3a7b76Schristos          associated rule line numbers:
26763c3a7b76Schristos                3
26773c3a7b76Schristos          out-transitions: [ a ]
26783c3a7b76Schristos          jam-transitions: EOF [ \001-`  b-\177 ]
26793c3a7b76Schristos
26803c3a7b76Schristos         State #9 is non-accepting -
26813c3a7b76Schristos          associated rule line numbers:
26823c3a7b76Schristos                3
26833c3a7b76Schristos          out-transitions: [ r ]
26843c3a7b76Schristos          jam-transitions: EOF [ \001-q  s-\177 ]
26853c3a7b76Schristos
26863c3a7b76Schristos         Compressed tables always back up.
26873c3a7b76Schristos
26883c3a7b76Schristos   The first few lines tell us that there's a scanner state in which it
268930da1778Schristoscan make a transition on an 'o' but not on any other character, and that
269030da1778Schristosin that state the currently scanned text does not match any rule.  The
269130da1778Schristosstate occurs when trying to match the rules found at lines 2 and 3 in
269230da1778Schristosthe input file.  If the scanner is in that state and then reads
26933c3a7b76Schristossomething other than an 'o', it will have to back up to find a rule
26943c3a7b76Schristoswhich is matched.  With a bit of headscratching one can see that this
269530da1778Schristosmust be the state it's in when it has seen 'fo'.  When this has
269630da1778Schristoshappened, if anything other than another 'o' is seen, the scanner will
269730da1778Schristoshave to back up to simply match the 'f' (by the default rule).
26983c3a7b76Schristos
26993c3a7b76Schristos   The comment regarding State #8 indicates there's a problem when
270030da1778Schristos'foob' has been scanned.  Indeed, on any character other than an 'a',
27013c3a7b76Schristosthe scanner will have to back up to accept "foo".  Similarly, the
270230da1778Schristoscomment for State #9 concerns when 'fooba' has been scanned and an 'r'
27033c3a7b76Schristosdoes not follow.
27043c3a7b76Schristos
27053c3a7b76Schristos   The final comment reminds us that there's no point going to all the
270630da1778Schristostrouble of removing backing up from the rules unless we're using '-Cf'
270730da1778Schristosor '-CF', since there's no performance gain doing so with compressed
27083c3a7b76Schristosscanners.
27093c3a7b76Schristos
27103c3a7b76Schristos   The way to remove the backing up is to add "error" rules:
27113c3a7b76Schristos
27123c3a7b76Schristos         %%
27133c3a7b76Schristos         foo         return TOK_KEYWORD;
27143c3a7b76Schristos         foobar      return TOK_KEYWORD;
27153c3a7b76Schristos
27163c3a7b76Schristos         fooba       |
27173c3a7b76Schristos         foob        |
27183c3a7b76Schristos         fo          {
27193c3a7b76Schristos                     /* false alarm, not really a keyword */
27203c3a7b76Schristos                     return TOK_ID;
27213c3a7b76Schristos                     }
27223c3a7b76Schristos
27233c3a7b76Schristos   Eliminating backing up among a list of keywords can also be done
27243c3a7b76Schristosusing a "catch-all" rule:
27253c3a7b76Schristos
27263c3a7b76Schristos         %%
27273c3a7b76Schristos         foo         return TOK_KEYWORD;
27283c3a7b76Schristos         foobar      return TOK_KEYWORD;
27293c3a7b76Schristos
27303c3a7b76Schristos         [a-z]+      return TOK_ID;
27313c3a7b76Schristos
27323c3a7b76Schristos   This is usually the best solution when appropriate.
27333c3a7b76Schristos
27343c3a7b76Schristos   Backing up messages tend to cascade.  With a complicated set of rules
27353c3a7b76Schristosit's not uncommon to get hundreds of messages.  If one can decipher
27363c3a7b76Schristosthem, though, it often only takes a dozen or so rules to eliminate the
27373c3a7b76Schristosbacking up (though it's easy to make a mistake and have an error rule
273830da1778Schristosaccidentally match a valid token.  A possible future 'flex' feature will
273930da1778Schristosbe to automatically add rules to eliminate backing up).
27403c3a7b76Schristos
27413c3a7b76Schristos   It's important to keep in mind that you gain the benefits of
274230da1778Schristoseliminating backing up only if you eliminate _every_ instance of backing
274330da1778Schristosup.  Leaving just one means you gain nothing.
27443c3a7b76Schristos
27453c3a7b76Schristos   _Variable_ trailing context (where both the leading and trailing
27463c3a7b76Schristosparts do not have a fixed length) entails almost the same performance
274730da1778Schristosloss as 'REJECT' (i.e., substantial).  So when possible a rule like:
27483c3a7b76Schristos
27493c3a7b76Schristos         %%
27503c3a7b76Schristos         mouse|rat/(cat|dog)   run();
27513c3a7b76Schristos
27523c3a7b76Schristos   is better written:
27533c3a7b76Schristos
27543c3a7b76Schristos         %%
27553c3a7b76Schristos         mouse/cat|dog         run();
27563c3a7b76Schristos         rat/cat|dog           run();
27573c3a7b76Schristos
27583c3a7b76Schristos   or as
27593c3a7b76Schristos
27603c3a7b76Schristos         %%
27613c3a7b76Schristos         mouse|rat/cat         run();
27623c3a7b76Schristos         mouse|rat/dog         run();
27633c3a7b76Schristos
276430da1778Schristos   Note that here the special '|' action does _not_ provide any savings,
276530da1778Schristosand can even make things worse (*note Limitations::).
27663c3a7b76Schristos
27673c3a7b76Schristos   Another area where the user can increase a scanner's performance (and
27683c3a7b76Schristosone that's easier to implement) arises from the fact that the longer the
27693c3a7b76Schristostokens matched, the faster the scanner will run.  This is because with
27703c3a7b76Schristoslong tokens the processing of most input characters takes place in the
27713c3a7b76Schristos(short) inner scanning loop, and does not often have to go through the
277230da1778Schristosadditional work of setting up the scanning environment (e.g., 'yytext')
27733c3a7b76Schristosfor the action.  Recall the scanner for C comments:
27743c3a7b76Schristos
27753c3a7b76Schristos         %x comment
27763c3a7b76Schristos         %%
27773c3a7b76Schristos                 int line_num = 1;
27783c3a7b76Schristos
27793c3a7b76Schristos         "/*"         BEGIN(comment);
27803c3a7b76Schristos
27813c3a7b76Schristos         <comment>[^*\n]*
27823c3a7b76Schristos         <comment>"*"+[^*/\n]*
27833c3a7b76Schristos         <comment>\n             ++line_num;
27843c3a7b76Schristos         <comment>"*"+"/"        BEGIN(INITIAL);
27853c3a7b76Schristos
27863c3a7b76Schristos   This could be sped up by writing it as:
27873c3a7b76Schristos
27883c3a7b76Schristos         %x comment
27893c3a7b76Schristos         %%
27903c3a7b76Schristos                 int line_num = 1;
27913c3a7b76Schristos
27923c3a7b76Schristos         "/*"         BEGIN(comment);
27933c3a7b76Schristos
27943c3a7b76Schristos         <comment>[^*\n]*
27953c3a7b76Schristos         <comment>[^*\n]*\n      ++line_num;
27963c3a7b76Schristos         <comment>"*"+[^*/\n]*
27973c3a7b76Schristos         <comment>"*"+[^*/\n]*\n ++line_num;
27983c3a7b76Schristos         <comment>"*"+"/"        BEGIN(INITIAL);
27993c3a7b76Schristos
28003c3a7b76Schristos   Now instead of each newline requiring the processing of another
28013c3a7b76Schristosaction, recognizing the newlines is distributed over the other rules to
28023c3a7b76Schristoskeep the matched text as long as possible.  Note that _adding_ rules
28033c3a7b76Schristosdoes _not_ slow down the scanner!  The speed of the scanner is
28043c3a7b76Schristosindependent of the number of rules or (modulo the considerations given
28053c3a7b76Schristosat the beginning of this section) how complicated the rules are with
280630da1778Schristosregard to operators such as '*' and '|'.
28073c3a7b76Schristos
28083c3a7b76Schristos   A final example in speeding up a scanner: suppose you want to scan
28093c3a7b76Schristosthrough a file containing identifiers and keywords, one per line and
28103c3a7b76Schristoswith no other extraneous characters, and recognize all the keywords.  A
28113c3a7b76Schristosnatural first approach is:
28123c3a7b76Schristos
28133c3a7b76Schristos         %%
28143c3a7b76Schristos         asm      |
28153c3a7b76Schristos         auto     |
28163c3a7b76Schristos         break    |
28173c3a7b76Schristos         ... etc ...
28183c3a7b76Schristos         volatile |
28193c3a7b76Schristos         while    /* it's a keyword */
28203c3a7b76Schristos
28213c3a7b76Schristos         .|\n     /* it's not a keyword */
28223c3a7b76Schristos
28233c3a7b76Schristos   To eliminate the back-tracking, introduce a catch-all rule:
28243c3a7b76Schristos
28253c3a7b76Schristos         %%
28263c3a7b76Schristos         asm      |
28273c3a7b76Schristos         auto     |
28283c3a7b76Schristos         break    |
28293c3a7b76Schristos         ... etc ...
28303c3a7b76Schristos         volatile |
28313c3a7b76Schristos         while    /* it's a keyword */
28323c3a7b76Schristos
28333c3a7b76Schristos         [a-z]+   |
28343c3a7b76Schristos         .|\n     /* it's not a keyword */
28353c3a7b76Schristos
28363c3a7b76Schristos   Now, if it's guaranteed that there's exactly one word per line, then
28373c3a7b76Schristoswe can reduce the total number of matches by a half by merging in the
28383c3a7b76Schristosrecognition of newlines with that of the other tokens:
28393c3a7b76Schristos
28403c3a7b76Schristos         %%
28413c3a7b76Schristos         asm\n    |
28423c3a7b76Schristos         auto\n   |
28433c3a7b76Schristos         break\n  |
28443c3a7b76Schristos         ... etc ...
28453c3a7b76Schristos         volatile\n |
28463c3a7b76Schristos         while\n  /* it's a keyword */
28473c3a7b76Schristos
28483c3a7b76Schristos         [a-z]+\n |
28493c3a7b76Schristos         .|\n     /* it's not a keyword */
28503c3a7b76Schristos
28513c3a7b76Schristos   One has to be careful here, as we have now reintroduced backing up
28523c3a7b76Schristosinto the scanner.  In particular, while _we_ know that there will never
28533c3a7b76Schristosbe any characters in the input stream other than letters or newlines,
285430da1778Schristos'flex' can't figure this out, and it will plan for possibly needing to
285530da1778Schristosback up when it has scanned a token like 'auto' and then the next
28563c3a7b76Schristoscharacter is something other than a newline or a letter.  Previously it
285730da1778Schristoswould then just match the 'auto' rule and be done, but now it has no
285830da1778Schristos'auto' rule, only a 'auto\n' rule.  To eliminate the possibility of
28593c3a7b76Schristosbacking up, we could either duplicate all rules but without final
28603c3a7b76Schristosnewlines, or, since we never expect to encounter such an input and
286130da1778Schristostherefore don't how it's classified, we can introduce one more catch-all
286230da1778Schristosrule, this one which doesn't include a newline:
28633c3a7b76Schristos
28643c3a7b76Schristos         %%
28653c3a7b76Schristos         asm\n    |
28663c3a7b76Schristos         auto\n   |
28673c3a7b76Schristos         break\n  |
28683c3a7b76Schristos         ... etc ...
28693c3a7b76Schristos         volatile\n |
28703c3a7b76Schristos         while\n  /* it's a keyword */
28713c3a7b76Schristos
28723c3a7b76Schristos         [a-z]+\n |
28733c3a7b76Schristos         [a-z]+   |
28743c3a7b76Schristos         .|\n     /* it's not a keyword */
28753c3a7b76Schristos
287630da1778Schristos   Compiled with '-Cf', this is about as fast as one can get a 'flex'
28773c3a7b76Schristosscanner to go for this particular problem.
28783c3a7b76Schristos
287930da1778Schristos   A final note: 'flex' is slow when matching 'NUL's, particularly when
288030da1778Schristosa token contains multiple 'NUL's.  It's best to write rules which match
28813c3a7b76Schristos_short_ amounts of text if it's anticipated that the text will often
288230da1778Schristosinclude 'NUL's.
28833c3a7b76Schristos
2884dded093eSchristos   Another final note regarding performance: as mentioned in *note
288530da1778SchristosMatching::, dynamically resizing 'yytext' to accommodate huge tokens is
28863c3a7b76Schristosa slow process because it presently requires that the (huge) token be
28873c3a7b76Schristosrescanned from the beginning.  Thus if performance is vital, you should
28883c3a7b76Schristosattempt to match "large" quantities of text but not "huge" quantities,
28893c3a7b76Schristoswhere the cutoff between the two is at about 8K characters per token.
28903c3a7b76Schristos
28913c3a7b76Schristos
28923c3a7b76SchristosFile: flex.info,  Node: Cxx,  Next: Reentrant,  Prev: Performance,  Up: Top
28933c3a7b76Schristos
28943c3a7b76Schristos18 Generating C++ Scanners
28953c3a7b76Schristos**************************
28963c3a7b76Schristos
28973c3a7b76Schristos*IMPORTANT*: the present form of the scanning class is _experimental_
28983c3a7b76Schristosand may change considerably between major releases.
28993c3a7b76Schristos
290030da1778Schristos   'flex' provides two different ways to generate scanners for use with
290130da1778SchristosC++.  The first way is to simply compile a scanner generated by 'flex'
29023c3a7b76Schristosusing a C++ compiler instead of a C compiler.  You should not encounter
29033c3a7b76Schristosany compilation errors (*note Reporting Bugs::).  You can then use C++
29043c3a7b76Schristoscode in your rule actions instead of C code.  Note that the default
290530da1778Schristosinput source for your scanner remains 'yyin', and default echoing is
290630da1778Schristosstill done to 'yyout'.  Both of these remain 'FILE *' variables and not
29073c3a7b76SchristosC++ _streams_.
29083c3a7b76Schristos
290930da1778Schristos   You can also use 'flex' to generate a C++ scanner class, using the
291030da1778Schristos'-+' option (or, equivalently, '%option c++)', which is automatically
291130da1778Schristosspecified if the name of the 'flex' executable ends in a '+', such as
291230da1778Schristos'flex++'.  When using this option, 'flex' defaults to generating the
291330da1778Schristosscanner to the file 'lex.yy.cc' instead of 'lex.yy.c'.  The generated
291430da1778Schristosscanner includes the header file 'FlexLexer.h', which defines the
29153c3a7b76Schristosinterface to two C++ classes.
29163c3a7b76Schristos
291730da1778Schristos   The first class in 'FlexLexer.h', 'FlexLexer', provides an abstract
291830da1778Schristosbase class defining the general scanner class interface.  It provides
291930da1778Schristosthe following member functions:
29203c3a7b76Schristos
292130da1778Schristos'const char* YYText()'
292230da1778Schristos     returns the text of the most recently matched token, the equivalent
292330da1778Schristos     of 'yytext'.
29243c3a7b76Schristos
292530da1778Schristos'int YYLeng()'
29263c3a7b76Schristos     returns the length of the most recently matched token, the
292730da1778Schristos     equivalent of 'yyleng'.
29283c3a7b76Schristos
292930da1778Schristos'int lineno() const'
293030da1778Schristos     returns the current input line number (see '%option yylineno)', or
293130da1778Schristos     '1' if '%option yylineno' was not used.
29323c3a7b76Schristos
293330da1778Schristos'void set_debug( int flag )'
29343c3a7b76Schristos     sets the debugging flag for the scanner, equivalent to assigning to
293530da1778Schristos     'yy_flex_debug' (*note Scanner Options::).  Note that you must
293630da1778Schristos     build the scanner using '%option debug' to include debugging
29373c3a7b76Schristos     information in it.
29383c3a7b76Schristos
293930da1778Schristos'int debug() const'
29403c3a7b76Schristos     returns the current setting of the debugging flag.
29413c3a7b76Schristos
29423c3a7b76Schristos   Also provided are member functions equivalent to
294330da1778Schristos'yy_switch_to_buffer()', 'yy_create_buffer()' (though the first argument
294430da1778Schristosis an 'istream&' object reference and not a 'FILE*)',
294530da1778Schristos'yy_flush_buffer()', 'yy_delete_buffer()', and 'yyrestart()' (again, the
294630da1778Schristosfirst argument is a 'istream&' object reference).
29473c3a7b76Schristos
294830da1778Schristos   The second class defined in 'FlexLexer.h' is 'yyFlexLexer', which is
294930da1778Schristosderived from 'FlexLexer'.  It defines the following additional member
29503c3a7b76Schristosfunctions:
29513c3a7b76Schristos
295230da1778Schristos'yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
295330da1778Schristos'yyFlexLexer( istream& arg_yyin, ostream& arg_yyout )'
295430da1778Schristos     constructs a 'yyFlexLexer' object using the given streams for input
295530da1778Schristos     and output.  If not specified, the streams default to 'cin' and
295630da1778Schristos     'cout', respectively.  'yyFlexLexer' does not take ownership of its
295730da1778Schristos     stream arguments.  It's up to the user to ensure the streams
295830da1778Schristos     pointed to remain alive at least as long as the 'yyFlexLexer'
295930da1778Schristos     instance.
29603c3a7b76Schristos
296130da1778Schristos'virtual int yylex()'
296230da1778Schristos     performs the same role is 'yylex()' does for ordinary 'flex'
29633c3a7b76Schristos     scanners: it scans the input stream, consuming tokens, until a
296430da1778Schristos     rule's action returns a value.  If you derive a subclass 'S' from
296530da1778Schristos     'yyFlexLexer' and want to access the member functions and variables
296630da1778Schristos     of 'S' inside 'yylex()', then you need to use '%option yyclass="S"'
296730da1778Schristos     to inform 'flex' that you will be using that subclass instead of
296830da1778Schristos     'yyFlexLexer'.  In this case, rather than generating
296930da1778Schristos     'yyFlexLexer::yylex()', 'flex' generates 'S::yylex()' (and also
297030da1778Schristos     generates a dummy 'yyFlexLexer::yylex()' that calls
297130da1778Schristos     'yyFlexLexer::LexerError()' if called).
29723c3a7b76Schristos
297330da1778Schristos'virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
297430da1778Schristos'virtual void switch_streams(istream& new_in, ostream& new_out)'
297530da1778Schristos     reassigns 'yyin' to 'new_in' (if non-null) and 'yyout' to 'new_out'
297630da1778Schristos     (if non-null), deleting the previous input buffer if 'yyin' is
297730da1778Schristos     reassigned.
29783c3a7b76Schristos
297930da1778Schristos'int yylex( istream* new_in, ostream* new_out = 0 )'
298030da1778Schristos'int yylex( istream& new_in, ostream& new_out )'
298130da1778Schristos     first switches the input streams via 'switch_streams( new_in,
298230da1778Schristos     new_out )' and then returns the value of 'yylex()'.
29833c3a7b76Schristos
298430da1778Schristos   In addition, 'yyFlexLexer' defines the following protected virtual
29853c3a7b76Schristosfunctions which you can redefine in derived classes to tailor the
29863c3a7b76Schristosscanner:
29873c3a7b76Schristos
298830da1778Schristos'virtual int LexerInput( char* buf, int max_size )'
298930da1778Schristos     reads up to 'max_size' characters into 'buf' and returns the number
299030da1778Schristos     of characters read.  To indicate end-of-input, return 0 characters.
299130da1778Schristos     Note that 'interactive' scanners (see the '-B' and '-I' flags in
299230da1778Schristos     *note Scanner Options::) define the macro 'YY_INTERACTIVE'.  If you
299330da1778Schristos     redefine 'LexerInput()' and need to take different actions
299430da1778Schristos     depending on whether or not the scanner might be scanning an
299530da1778Schristos     interactive input source, you can test for the presence of this
299630da1778Schristos     name via '#ifdef' statements.
29973c3a7b76Schristos
299830da1778Schristos'virtual void LexerOutput( const char* buf, int size )'
299930da1778Schristos     writes out 'size' characters from the buffer 'buf', which, while
300030da1778Schristos     'NUL'-terminated, may also contain internal 'NUL's if the scanner's
300130da1778Schristos     rules can match text with 'NUL's in them.
30023c3a7b76Schristos
300330da1778Schristos'virtual void LexerError( const char* msg )'
30043c3a7b76Schristos     reports a fatal error message.  The default version of this
300530da1778Schristos     function writes the message to the stream 'cerr' and exits.
30063c3a7b76Schristos
300730da1778Schristos   Note that a 'yyFlexLexer' object contains its _entire_ scanning
30083c3a7b76Schristosstate.  Thus you can use such objects to create reentrant scanners, but
3009dded093eSchristossee also *note Reentrant::.  You can instantiate multiple instances of
301030da1778Schristosthe same 'yyFlexLexer' class, and you can also combine multiple C++
301130da1778Schristosscanner classes together in the same program using the '-P' option
30123c3a7b76Schristosdiscussed above.
30133c3a7b76Schristos
301430da1778Schristos   Finally, note that the '%array' feature is not available to C++
301530da1778Schristosscanner classes; you must use '%pointer' (the default).
30163c3a7b76Schristos
30173c3a7b76Schristos   Here is an example of a simple C++ scanner:
30183c3a7b76Schristos
30193c3a7b76Schristos          // An example of using the flex C++ scanner class.
30203c3a7b76Schristos
30213c3a7b76Schristos         %{
3022dded093eSchristos         #include <iostream>
3023dded093eSchristos         using namespace std;
30243c3a7b76Schristos         int mylineno = 0;
30253c3a7b76Schristos         %}
30263c3a7b76Schristos
302730da1778Schristos         %option noyywrap c++
3028dded093eSchristos
30293c3a7b76Schristos         string  \"[^\n"]+\"
30303c3a7b76Schristos
30313c3a7b76Schristos         ws      [ \t]+
30323c3a7b76Schristos
30333c3a7b76Schristos         alpha   [A-Za-z]
30343c3a7b76Schristos         dig     [0-9]
30353c3a7b76Schristos         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
30363c3a7b76Schristos         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
30373c3a7b76Schristos         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
30383c3a7b76Schristos         number  {num1}|{num2}
30393c3a7b76Schristos
30403c3a7b76Schristos         %%
30413c3a7b76Schristos
30423c3a7b76Schristos         {ws}    /* skip blanks and tabs */
30433c3a7b76Schristos
30443c3a7b76Schristos         "/*"    {
30453c3a7b76Schristos                 int c;
30463c3a7b76Schristos
30473c3a7b76Schristos                 while((c = yyinput()) != 0)
30483c3a7b76Schristos                     {
30493c3a7b76Schristos                     if(c == '\n')
30503c3a7b76Schristos                         ++mylineno;
30513c3a7b76Schristos
3052dded093eSchristos                     else if(c == '*')
30533c3a7b76Schristos                         {
30543c3a7b76Schristos                         if((c = yyinput()) == '/')
30553c3a7b76Schristos                             break;
30563c3a7b76Schristos                         else
30573c3a7b76Schristos                             unput(c);
30583c3a7b76Schristos                         }
30593c3a7b76Schristos                     }
30603c3a7b76Schristos                 }
30613c3a7b76Schristos
3062dded093eSchristos         {number}  cout << "number " << YYText() << '\n';
30633c3a7b76Schristos
30643c3a7b76Schristos         \n        mylineno++;
30653c3a7b76Schristos
3066dded093eSchristos         {name}    cout << "name " << YYText() << '\n';
30673c3a7b76Schristos
3068dded093eSchristos         {string}  cout << "string " << YYText() << '\n';
30693c3a7b76Schristos
30703c3a7b76Schristos         %%
30713c3a7b76Schristos
307230da1778Schristos     	// This include is required if main() is an another source file.
307330da1778Schristos     	//#include <FlexLexer.h>
307430da1778Schristos
30753c3a7b76Schristos         int main( int /* argc */, char** /* argv */ )
30763c3a7b76Schristos         {
3077dded093eSchristos             FlexLexer* lexer = new yyFlexLexer;
30783c3a7b76Schristos             while(lexer->yylex() != 0)
30793c3a7b76Schristos                 ;
30803c3a7b76Schristos             return 0;
30813c3a7b76Schristos         }
30823c3a7b76Schristos
30833c3a7b76Schristos   If you want to create multiple (different) lexer classes, you use the
308430da1778Schristos'-P' flag (or the 'prefix=' option) to rename each 'yyFlexLexer' to some
308530da1778Schristosother 'xxFlexLexer'.  You then can include '<FlexLexer.h>' in your other
308630da1778Schristossources once per lexer class, first renaming 'yyFlexLexer' as follows:
30873c3a7b76Schristos
30883c3a7b76Schristos         #undef yyFlexLexer
30893c3a7b76Schristos         #define yyFlexLexer xxFlexLexer
30903c3a7b76Schristos         #include <FlexLexer.h>
30913c3a7b76Schristos
30923c3a7b76Schristos         #undef yyFlexLexer
30933c3a7b76Schristos         #define yyFlexLexer zzFlexLexer
30943c3a7b76Schristos         #include <FlexLexer.h>
30953c3a7b76Schristos
309630da1778Schristos   if, for example, you used '%option prefix="xx"' for one of your
309730da1778Schristosscanners and '%option prefix="zz"' for the other.
30983c3a7b76Schristos
30993c3a7b76Schristos
31003c3a7b76SchristosFile: flex.info,  Node: Reentrant,  Next: Lex and Posix,  Prev: Cxx,  Up: Top
31013c3a7b76Schristos
31023c3a7b76Schristos19 Reentrant C Scanners
31033c3a7b76Schristos***********************
31043c3a7b76Schristos
310530da1778Schristos'flex' has the ability to generate a reentrant C scanner.  This is
310630da1778Schristosaccomplished by specifying '%option reentrant' ('-R') The generated
31073c3a7b76Schristosscanner is both portable, and safe to use in one or more separate
31083c3a7b76Schristosthreads of control.  The most common use for reentrant scanners is from
310930da1778Schristoswithin multi-threaded applications.  Any thread may create and execute a
311030da1778Schristosreentrant 'flex' scanner without the need for synchronization with other
311130da1778Schristosthreads.
31123c3a7b76Schristos
31133c3a7b76Schristos* Menu:
31143c3a7b76Schristos
31153c3a7b76Schristos* Reentrant Uses::
31163c3a7b76Schristos* Reentrant Overview::
31173c3a7b76Schristos* Reentrant Example::
31183c3a7b76Schristos* Reentrant Detail::
31193c3a7b76Schristos* Reentrant Functions::
31203c3a7b76Schristos
31213c3a7b76Schristos
31223c3a7b76SchristosFile: flex.info,  Node: Reentrant Uses,  Next: Reentrant Overview,  Prev: Reentrant,  Up: Reentrant
31233c3a7b76Schristos
31243c3a7b76Schristos19.1 Uses for Reentrant Scanners
31253c3a7b76Schristos================================
31263c3a7b76Schristos
31273c3a7b76SchristosHowever, there are other uses for a reentrant scanner.  For example, you
312830da1778Schristoscould scan two or more files simultaneously to implement a 'diff' at the
312930da1778Schristostoken level (i.e., instead of at the character level):
31303c3a7b76Schristos
31313c3a7b76Schristos         /* Example of maintaining more than one active scanner. */
31323c3a7b76Schristos
31333c3a7b76Schristos         do {
31343c3a7b76Schristos             int tok1, tok2;
31353c3a7b76Schristos
31363c3a7b76Schristos             tok1 = yylex( scanner_1 );
31373c3a7b76Schristos             tok2 = yylex( scanner_2 );
31383c3a7b76Schristos
31393c3a7b76Schristos             if( tok1 != tok2 )
31403c3a7b76Schristos                 printf("Files are different.");
31413c3a7b76Schristos
31423c3a7b76Schristos        } while ( tok1 && tok2 );
31433c3a7b76Schristos
31443c3a7b76Schristos   Another use for a reentrant scanner is recursion.  (Note that a
31453c3a7b76Schristosrecursive scanner can also be created using a non-reentrant scanner and
31463c3a7b76Schristosbuffer states.  *Note Multiple Input Buffers::.)
31473c3a7b76Schristos
314830da1778Schristos   The following crude scanner supports the 'eval' command by invoking
31493c3a7b76Schristosanother instance of itself.
31503c3a7b76Schristos
31513c3a7b76Schristos         /* Example of recursive invocation. */
31523c3a7b76Schristos
31533c3a7b76Schristos         %option reentrant
31543c3a7b76Schristos
31553c3a7b76Schristos         %%
31563c3a7b76Schristos         "eval(".+")"  {
31573c3a7b76Schristos                           yyscan_t scanner;
31583c3a7b76Schristos                           YY_BUFFER_STATE buf;
31593c3a7b76Schristos
31603c3a7b76Schristos                           yylex_init( &scanner );
31613c3a7b76Schristos                           yytext[yyleng-1] = ' ';
31623c3a7b76Schristos
31633c3a7b76Schristos                           buf = yy_scan_string( yytext + 5, scanner );
31643c3a7b76Schristos                           yylex( scanner );
31653c3a7b76Schristos
31663c3a7b76Schristos                           yy_delete_buffer(buf,scanner);
31673c3a7b76Schristos                           yylex_destroy( scanner );
31683c3a7b76Schristos                      }
31693c3a7b76Schristos         ...
31703c3a7b76Schristos         %%
31713c3a7b76Schristos
31723c3a7b76Schristos
31733c3a7b76SchristosFile: flex.info,  Node: Reentrant Overview,  Next: Reentrant Example,  Prev: Reentrant Uses,  Up: Reentrant
31743c3a7b76Schristos
31753c3a7b76Schristos19.2 An Overview of the Reentrant API
31763c3a7b76Schristos=====================================
31773c3a7b76Schristos
31783c3a7b76SchristosThe API for reentrant scanners is different than for non-reentrant
31793c3a7b76Schristosscanners.  Here is a quick overview of the API:
31803c3a7b76Schristos
318130da1778Schristos     '%option reentrant' must be specified.
31823c3a7b76Schristos
318330da1778Schristos   * All functions take one additional argument: 'yyscanner'
31843c3a7b76Schristos
31853c3a7b76Schristos   * All global variables are replaced by their macro equivalents.  (We
31863c3a7b76Schristos     tell you this because it may be important to you during debugging.)
31873c3a7b76Schristos
318830da1778Schristos   * 'yylex_init' and 'yylex_destroy' must be called before and after
318930da1778Schristos     'yylex', respectively.
31903c3a7b76Schristos
31913c3a7b76Schristos   * Accessor methods (get/set functions) provide access to common
319230da1778Schristos     'flex' variables.
31933c3a7b76Schristos
319430da1778Schristos   * User-specific data can be stored in 'yyextra'.
31953c3a7b76Schristos
31963c3a7b76Schristos
31973c3a7b76SchristosFile: flex.info,  Node: Reentrant Example,  Next: Reentrant Detail,  Prev: Reentrant Overview,  Up: Reentrant
31983c3a7b76Schristos
31993c3a7b76Schristos19.3 Reentrant Example
32003c3a7b76Schristos======================
32013c3a7b76Schristos
32023c3a7b76SchristosFirst, an example of a reentrant scanner:
32033c3a7b76Schristos         /* This scanner prints "//" comments. */
32043c3a7b76Schristos
32053c3a7b76Schristos         %option reentrant stack noyywrap
32063c3a7b76Schristos         %x COMMENT
32073c3a7b76Schristos
32083c3a7b76Schristos         %%
32093c3a7b76Schristos
32103c3a7b76Schristos         "//"                 yy_push_state( COMMENT, yyscanner);
32113c3a7b76Schristos         .|\n
32123c3a7b76Schristos
32133c3a7b76Schristos         <COMMENT>\n          yy_pop_state( yyscanner );
32143c3a7b76Schristos         <COMMENT>[^\n]+      fprintf( yyout, "%s\n", yytext);
32153c3a7b76Schristos
32163c3a7b76Schristos         %%
32173c3a7b76Schristos
32183c3a7b76Schristos         int main ( int argc, char * argv[] )
32193c3a7b76Schristos         {
32203c3a7b76Schristos             yyscan_t scanner;
32213c3a7b76Schristos
32223c3a7b76Schristos             yylex_init ( &scanner );
32233c3a7b76Schristos             yylex ( scanner );
32243c3a7b76Schristos             yylex_destroy ( scanner );
32253c3a7b76Schristos         return 0;
32263c3a7b76Schristos        }
32273c3a7b76Schristos
32283c3a7b76Schristos
32293c3a7b76SchristosFile: flex.info,  Node: Reentrant Detail,  Next: Reentrant Functions,  Prev: Reentrant Example,  Up: Reentrant
32303c3a7b76Schristos
32313c3a7b76Schristos19.4 The Reentrant API in Detail
32323c3a7b76Schristos================================
32333c3a7b76Schristos
32343c3a7b76SchristosHere are the things you need to do or know to use the reentrant C API of
323530da1778Schristos'flex'.
32363c3a7b76Schristos
32373c3a7b76Schristos* Menu:
32383c3a7b76Schristos
32393c3a7b76Schristos* Specify Reentrant::
32403c3a7b76Schristos* Extra Reentrant Argument::
32413c3a7b76Schristos* Global Replacement::
32423c3a7b76Schristos* Init and Destroy Functions::
32433c3a7b76Schristos* Accessor Methods::
32443c3a7b76Schristos* Extra Data::
32453c3a7b76Schristos* About yyscan_t::
32463c3a7b76Schristos
32473c3a7b76Schristos
32483c3a7b76SchristosFile: flex.info,  Node: Specify Reentrant,  Next: Extra Reentrant Argument,  Prev: Reentrant Detail,  Up: Reentrant Detail
32493c3a7b76Schristos
32503c3a7b76Schristos19.4.1 Declaring a Scanner As Reentrant
32513c3a7b76Schristos---------------------------------------
32523c3a7b76Schristos
32533c3a7b76Schristos%option reentrant (-reentrant) must be specified.
32543c3a7b76Schristos
325530da1778Schristos   Notice that '%option reentrant' is specified in the above example
325630da1778Schristos(*note Reentrant Example::.  Had this option not been specified, 'flex'
32573c3a7b76Schristoswould have happily generated a non-reentrant scanner without
325830da1778Schristoscomplaining.  You may explicitly specify '%option noreentrant', if you
32593c3a7b76Schristosdo _not_ want a reentrant scanner, although it is not necessary.  The
32603c3a7b76Schristosdefault is to generate a non-reentrant scanner.
32613c3a7b76Schristos
32623c3a7b76Schristos
32633c3a7b76SchristosFile: flex.info,  Node: Extra Reentrant Argument,  Next: Global Replacement,  Prev: Specify Reentrant,  Up: Reentrant Detail
32643c3a7b76Schristos
32653c3a7b76Schristos19.4.2 The Extra Argument
32663c3a7b76Schristos-------------------------
32673c3a7b76Schristos
326830da1778SchristosAll functions take one additional argument: 'yyscanner'.
32693c3a7b76Schristos
327030da1778Schristos   Notice that the calls to 'yy_push_state' and 'yy_pop_state' both have
327130da1778Schristosan argument, 'yyscanner' , that is not present in a non-reentrant
327230da1778Schristosscanner.  Here are the declarations of 'yy_push_state' and
327330da1778Schristos'yy_pop_state' in the reentrant scanner:
32743c3a7b76Schristos
32753c3a7b76Schristos         static void yy_push_state  ( int new_state , yyscan_t yyscanner ) ;
32763c3a7b76Schristos         static void yy_pop_state  ( yyscan_t yyscanner  ) ;
32773c3a7b76Schristos
327830da1778Schristos   Notice that the argument 'yyscanner' appears in the declaration of
327930da1778Schristosboth functions.  In fact, all 'flex' functions in a reentrant scanner
32803c3a7b76Schristoshave this additional argument.  It is always the last argument in the
328130da1778Schristosargument list, it is always of type 'yyscan_t' (which is typedef'd to
328230da1778Schristos'void *') and it is always named 'yyscanner'.  As you may have guessed,
328330da1778Schristos'yyscanner' is a pointer to an opaque data structure encapsulating the
32843c3a7b76Schristoscurrent state of the scanner.  For a list of function declarations, see
3285dded093eSchristos*note Reentrant Functions::.  Note that preprocessor macros, such as
328630da1778Schristos'BEGIN', 'ECHO', and 'REJECT', do not take this additional argument.
32873c3a7b76Schristos
32883c3a7b76Schristos
32893c3a7b76SchristosFile: flex.info,  Node: Global Replacement,  Next: Init and Destroy Functions,  Prev: Extra Reentrant Argument,  Up: Reentrant Detail
32903c3a7b76Schristos
32913c3a7b76Schristos19.4.3 Global Variables Replaced By Macros
32923c3a7b76Schristos------------------------------------------
32933c3a7b76Schristos
32943c3a7b76SchristosAll global variables in traditional flex have been replaced by macro
32953c3a7b76Schristosequivalents.
32963c3a7b76Schristos
329730da1778Schristos   Note that in the above example, 'yyout' and 'yytext' are not plain
329830da1778Schristosvariables.  These are macros that will expand to their equivalent
329930da1778Schristoslvalue.  All of the familiar 'flex' globals have been replaced by their
330030da1778Schristosmacro equivalents.  In particular, 'yytext', 'yyleng', 'yylineno',
330130da1778Schristos'yyin', 'yyout', 'yyextra', 'yylval', and 'yylloc' are macros.  You may
330230da1778Schristossafely use these macros in actions as if they were plain variables.  We
330330da1778Schristosonly tell you this so you don't expect to link to these variables
33043c3a7b76Schristosexternally.  Currently, each macro expands to a member of an internal
33053c3a7b76Schristosstruct, e.g.,
33063c3a7b76Schristos
33073c3a7b76Schristos     #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)
33083c3a7b76Schristos
330930da1778Schristos   One important thing to remember about 'yytext' and friends is that
331030da1778Schristos'yytext' is not a global variable in a reentrant scanner, you can not
33113c3a7b76Schristosaccess it directly from outside an action or from other functions.  You
331230da1778Schristosmust use an accessor method, e.g., 'yyget_text', to accomplish this.
33133c3a7b76Schristos(See below).
33143c3a7b76Schristos
33153c3a7b76Schristos
33163c3a7b76SchristosFile: flex.info,  Node: Init and Destroy Functions,  Next: Accessor Methods,  Prev: Global Replacement,  Up: Reentrant Detail
33173c3a7b76Schristos
33183c3a7b76Schristos19.4.4 Init and Destroy Functions
33193c3a7b76Schristos---------------------------------
33203c3a7b76Schristos
332130da1778Schristos'yylex_init' and 'yylex_destroy' must be called before and after
332230da1778Schristos'yylex', respectively.
33233c3a7b76Schristos
33243c3a7b76Schristos         int yylex_init ( yyscan_t * ptr_yy_globals ) ;
33253c3a7b76Schristos         int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ;
33263c3a7b76Schristos         int yylex ( yyscan_t yyscanner ) ;
33273c3a7b76Schristos         int yylex_destroy ( yyscan_t yyscanner ) ;
33283c3a7b76Schristos
332930da1778Schristos   The function 'yylex_init' must be called before calling any other
333030da1778Schristosfunction.  The argument to 'yylex_init' is the address of an
333130da1778Schristosuninitialized pointer to be filled in by 'yylex_init', overwriting any
333230da1778Schristosprevious contents.  The function 'yylex_init_extra' may be used instead,
333330da1778Schristostaking as its first argument a variable of type 'YY_EXTRA_TYPE'.  See
33343c3a7b76Schristosthe section on yyextra, below, for more details.
33353c3a7b76Schristos
333630da1778Schristos   The value stored in 'ptr_yy_globals' should thereafter be passed to
333730da1778Schristos'yylex' and 'yylex_destroy'.  Flex does not save the argument passed to
333830da1778Schristos'yylex_init', so it is safe to pass the address of a local pointer to
333930da1778Schristos'yylex_init' so long as it remains in scope for the duration of all
334030da1778Schristoscalls to the scanner, up to and including the call to 'yylex_destroy'.
33413c3a7b76Schristos
334230da1778Schristos   The function 'yylex' should be familiar to you by now.  The reentrant
33433c3a7b76Schristosversion takes one argument, which is the value returned (via an
334430da1778Schristosargument) by 'yylex_init'.  Otherwise, it behaves the same as the
334530da1778Schristosnon-reentrant version of 'yylex'.
33463c3a7b76Schristos
334730da1778Schristos   Both 'yylex_init' and 'yylex_init_extra' returns 0 (zero) on success,
33483c3a7b76Schristosor non-zero on failure, in which case errno is set to one of the
33493c3a7b76Schristosfollowing values:
33503c3a7b76Schristos
33513c3a7b76Schristos   * ENOMEM Memory allocation error.  *Note memory-management::.
33523c3a7b76Schristos   * EINVAL Invalid argument.
33533c3a7b76Schristos
335430da1778Schristos   The function 'yylex_destroy' should be called to free resources used
335530da1778Schristosby the scanner.  After 'yylex_destroy' is called, the contents of
335630da1778Schristos'yyscanner' should not be used.  Of course, there is no need to destroy
335730da1778Schristosa scanner if you plan to reuse it.  A 'flex' scanner (both reentrant and
335830da1778Schristosnon-reentrant) may be restarted by calling 'yyrestart'.
33593c3a7b76Schristos
33603c3a7b76Schristos   Below is an example of a program that creates a scanner, uses it,
33613c3a7b76Schristosthen destroys it when done:
33623c3a7b76Schristos
33633c3a7b76Schristos         int main ()
33643c3a7b76Schristos         {
33653c3a7b76Schristos             yyscan_t scanner;
33663c3a7b76Schristos             int tok;
33673c3a7b76Schristos
33683c3a7b76Schristos             yylex_init(&scanner);
33693c3a7b76Schristos
3370dded093eSchristos             while ((tok=yylex(scanner)) > 0)
33713c3a7b76Schristos                 printf("tok=%d  yytext=%s\n", tok, yyget_text(scanner));
33723c3a7b76Schristos
33733c3a7b76Schristos             yylex_destroy(scanner);
33743c3a7b76Schristos             return 0;
33753c3a7b76Schristos         }
33763c3a7b76Schristos
33773c3a7b76Schristos
33783c3a7b76SchristosFile: flex.info,  Node: Accessor Methods,  Next: Extra Data,  Prev: Init and Destroy Functions,  Up: Reentrant Detail
33793c3a7b76Schristos
33803c3a7b76Schristos19.4.5 Accessing Variables with Reentrant Scanners
33813c3a7b76Schristos--------------------------------------------------
33823c3a7b76Schristos
338330da1778SchristosAccessor methods (get/set functions) provide access to common 'flex'
33843c3a7b76Schristosvariables.
33853c3a7b76Schristos
33863c3a7b76Schristos   Many scanners that you build will be part of a larger project.
338730da1778SchristosPortions of your project will need access to 'flex' values, such as
338830da1778Schristos'yytext'.  In a non-reentrant scanner, these values are global, so there
338930da1778Schristosis no problem accessing them.  However, in a reentrant scanner, there
339030da1778Schristosare no global 'flex' values.  You can not access them directly.
339130da1778SchristosInstead, you must access 'flex' values using accessor methods (get/set
339230da1778Schristosfunctions).  Each accessor method is named 'yyget_NAME' or 'yyset_NAME',
339330da1778Schristoswhere 'NAME' is the name of the 'flex' variable you want.  For example:
33943c3a7b76Schristos
33953c3a7b76Schristos         /* Set the last character of yytext to NULL. */
33963c3a7b76Schristos         void chop ( yyscan_t scanner )
33973c3a7b76Schristos         {
33983c3a7b76Schristos             int len = yyget_leng( scanner );
33993c3a7b76Schristos             yyget_text( scanner )[len - 1] = '\0';
34003c3a7b76Schristos         }
34013c3a7b76Schristos
34023c3a7b76Schristos   The above code may be called from within an action like this:
34033c3a7b76Schristos
34043c3a7b76Schristos         %%
34053c3a7b76Schristos         .+\n    { chop( yyscanner );}
34063c3a7b76Schristos
340730da1778Schristos   You may find that '%option header-file' is particularly useful for
34083c3a7b76Schristosgenerating prototypes of all the accessor functions.  *Note
34093c3a7b76Schristosoption-header::.
34103c3a7b76Schristos
34113c3a7b76Schristos
34123c3a7b76SchristosFile: flex.info,  Node: Extra Data,  Next: About yyscan_t,  Prev: Accessor Methods,  Up: Reentrant Detail
34133c3a7b76Schristos
34143c3a7b76Schristos19.4.6 Extra Data
34153c3a7b76Schristos-----------------
34163c3a7b76Schristos
341730da1778SchristosUser-specific data can be stored in 'yyextra'.
34183c3a7b76Schristos
34193c3a7b76Schristos   In a reentrant scanner, it is unwise to use global variables to
34203c3a7b76Schristoscommunicate with or maintain state between different pieces of your
34213c3a7b76Schristosprogram.  However, you may need access to external data or invoke
34223c3a7b76Schristosexternal functions from within the scanner actions.  Likewise, you may
34233c3a7b76Schristosneed to pass information to your scanner (e.g., open file descriptors,
34243c3a7b76Schristosor database connections).  In a non-reentrant scanner, the only way to
342530da1778Schristosdo this would be through the use of global variables.  'Flex' allows you
342630da1778Schristosto store arbitrary, "extra" data in a scanner.  This data is accessible
342730da1778Schristosthrough the accessor methods 'yyget_extra' and 'yyset_extra' from
342830da1778Schristosoutside the scanner, and through the shortcut macro 'yyextra' from
34293c3a7b76Schristoswithin the scanner itself.  They are defined as follows:
34303c3a7b76Schristos
34313c3a7b76Schristos         #define YY_EXTRA_TYPE  void*
34323c3a7b76Schristos         YY_EXTRA_TYPE  yyget_extra ( yyscan_t scanner );
34333c3a7b76Schristos         void           yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);
34343c3a7b76Schristos
343530da1778Schristos   In addition, an extra form of 'yylex_init' is provided,
343630da1778Schristos'yylex_init_extra'.  This function is provided so that the yyextra value
343730da1778Schristoscan be accessed from within the very first yyalloc, used to allocate the
343830da1778Schristosscanner itself.
34393c3a7b76Schristos
344030da1778Schristos   By default, 'YY_EXTRA_TYPE' is defined as type 'void *'.  You may
344130da1778Schristosredefine this type using '%option extra-type="your_type"' in the
34423c3a7b76Schristosscanner:
34433c3a7b76Schristos
34443c3a7b76Schristos         /* An example of overriding YY_EXTRA_TYPE. */
34453c3a7b76Schristos         %{
34463c3a7b76Schristos         #include <sys/stat.h>
34473c3a7b76Schristos         #include <unistd.h>
34483c3a7b76Schristos         %}
34493c3a7b76Schristos         %option reentrant
34503c3a7b76Schristos         %option extra-type="struct stat *"
34513c3a7b76Schristos         %%
34523c3a7b76Schristos
34533c3a7b76Schristos         __filesize__     printf( "%ld", yyextra->st_size  );
34543c3a7b76Schristos         __lastmod__      printf( "%ld", yyextra->st_mtime );
34553c3a7b76Schristos         %%
34563c3a7b76Schristos         void scan_file( char* filename )
34573c3a7b76Schristos         {
34583c3a7b76Schristos             yyscan_t scanner;
34593c3a7b76Schristos             struct stat buf;
34603c3a7b76Schristos             FILE *in;
34613c3a7b76Schristos
34623c3a7b76Schristos             in = fopen( filename, "r" );
34633c3a7b76Schristos             stat( filename, &buf );
34643c3a7b76Schristos
34653c3a7b76Schristos             yylex_init_extra( buf, &scanner );
34663c3a7b76Schristos             yyset_in( in, scanner );
34673c3a7b76Schristos             yylex( scanner );
34683c3a7b76Schristos             yylex_destroy( scanner );
34693c3a7b76Schristos
34703c3a7b76Schristos             fclose( in );
34713c3a7b76Schristos        }
34723c3a7b76Schristos
34733c3a7b76Schristos
34743c3a7b76SchristosFile: flex.info,  Node: About yyscan_t,  Prev: Extra Data,  Up: Reentrant Detail
34753c3a7b76Schristos
34763c3a7b76Schristos19.4.7 About yyscan_t
34773c3a7b76Schristos---------------------
34783c3a7b76Schristos
347930da1778Schristos'yyscan_t' is defined as:
34803c3a7b76Schristos
34813c3a7b76Schristos          typedef void* yyscan_t;
34823c3a7b76Schristos
348330da1778Schristos   It is initialized by 'yylex_init()' to point to an internal
34843c3a7b76Schristosstructure.  You should never access this value directly.  In particular,
348530da1778Schristosyou should never attempt to free it (use 'yylex_destroy()' instead.)
34863c3a7b76Schristos
34873c3a7b76Schristos
34883c3a7b76SchristosFile: flex.info,  Node: Reentrant Functions,  Prev: Reentrant Detail,  Up: Reentrant
34893c3a7b76Schristos
34903c3a7b76Schristos19.5 Functions and Macros Available in Reentrant C Scanners
34913c3a7b76Schristos===========================================================
34923c3a7b76Schristos
34933c3a7b76SchristosThe following Functions are available in a reentrant scanner:
34943c3a7b76Schristos
34953c3a7b76Schristos         char *yyget_text ( yyscan_t scanner );
34963c3a7b76Schristos         int yyget_leng ( yyscan_t scanner );
34973c3a7b76Schristos         FILE *yyget_in ( yyscan_t scanner );
34983c3a7b76Schristos         FILE *yyget_out ( yyscan_t scanner );
34993c3a7b76Schristos         int yyget_lineno ( yyscan_t scanner );
35003c3a7b76Schristos         YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
35013c3a7b76Schristos         int  yyget_debug ( yyscan_t scanner );
35023c3a7b76Schristos
35033c3a7b76Schristos         void yyset_debug ( int flag, yyscan_t scanner );
35043c3a7b76Schristos         void yyset_in  ( FILE * in_str , yyscan_t scanner );
35053c3a7b76Schristos         void yyset_out  ( FILE * out_str , yyscan_t scanner );
35063c3a7b76Schristos         void yyset_lineno ( int line_number , yyscan_t scanner );
35073c3a7b76Schristos         void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );
35083c3a7b76Schristos
35093c3a7b76Schristos   There are no "set" functions for yytext and yyleng.  This is
35103c3a7b76Schristosintentional.
35113c3a7b76Schristos
35123c3a7b76Schristos   The following Macro shortcuts are available in actions in a reentrant
35133c3a7b76Schristosscanner:
35143c3a7b76Schristos
35153c3a7b76Schristos         yytext
35163c3a7b76Schristos         yyleng
35173c3a7b76Schristos         yyin
35183c3a7b76Schristos         yyout
35193c3a7b76Schristos         yylineno
35203c3a7b76Schristos         yyextra
35213c3a7b76Schristos         yy_flex_debug
35223c3a7b76Schristos
35233c3a7b76Schristos   In a reentrant C scanner, support for yylineno is always present
35243c3a7b76Schristos(i.e., you may access yylineno), but the value is never modified by
352530da1778Schristos'flex' unless '%option yylineno' is enabled.  This is to allow the user
352630da1778Schristosto maintain the line count independently of 'flex'.
35273c3a7b76Schristos
352830da1778Schristos   The following functions and macros are made available when '%option
352930da1778Schristosbison-bridge' ('--bison-bridge') is specified:
35303c3a7b76Schristos
35313c3a7b76Schristos         YYSTYPE * yyget_lval ( yyscan_t scanner );
35323c3a7b76Schristos         void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
35333c3a7b76Schristos         yylval
35343c3a7b76Schristos
353530da1778Schristos   The following functions and macros are made available when '%option
353630da1778Schristosbison-locations' ('--bison-locations') is specified:
35373c3a7b76Schristos
35383c3a7b76Schristos         YYLTYPE *yyget_lloc ( yyscan_t scanner );
35393c3a7b76Schristos         void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
35403c3a7b76Schristos         yylloc
35413c3a7b76Schristos
354230da1778Schristos   Support for yylval assumes that 'YYSTYPE' is a valid type.  Support
354330da1778Schristosfor yylloc assumes that 'YYSLYPE' is a valid type.  Typically, these
354430da1778Schristostypes are generated by 'bison', and are included in section 1 of the
354530da1778Schristos'flex' input.
35463c3a7b76Schristos
35473c3a7b76Schristos
35483c3a7b76SchristosFile: flex.info,  Node: Lex and Posix,  Next: Memory Management,  Prev: Reentrant,  Up: Top
35493c3a7b76Schristos
35503c3a7b76Schristos20 Incompatibilities with Lex and Posix
35513c3a7b76Schristos***************************************
35523c3a7b76Schristos
355330da1778Schristos'flex' is a rewrite of the AT&T Unix _lex_ tool (the two implementations
355430da1778Schristosdo not share any code, though), with some extensions and
35553c3a7b76Schristosincompatibilities, both of which are of concern to those who wish to
355630da1778Schristoswrite scanners acceptable to both implementations.  'flex' is fully
355730da1778Schristoscompliant with the POSIX 'lex' specification, except that when using
355830da1778Schristos'%pointer' (the default), a call to 'unput()' destroys the contents of
355930da1778Schristos'yytext', which is counter to the POSIX specification.  In this section
356030da1778Schristoswe discuss all of the known areas of incompatibility between 'flex',
356130da1778SchristosAT&T 'lex', and the POSIX specification.  'flex''s '-l' option turns on
356230da1778Schristosmaximum compatibility with the original AT&T 'lex' implementation, at
35633c3a7b76Schristosthe cost of a major loss in the generated scanner's performance.  We
356430da1778Schristosnote below which incompatibilities can be overcome using the '-l'
356530da1778Schristosoption.  'flex' is fully compatible with 'lex' with the following
35663c3a7b76Schristosexceptions:
35673c3a7b76Schristos
356830da1778Schristos   * The undocumented 'lex' scanner internal variable 'yylineno' is not
356930da1778Schristos     supported unless '-l' or '%option yylineno' is used.
35703c3a7b76Schristos
357130da1778Schristos   * 'yylineno' should be maintained on a per-buffer basis, rather than
35723c3a7b76Schristos     a per-scanner (single global variable) basis.
35733c3a7b76Schristos
357430da1778Schristos   * 'yylineno' is not part of the POSIX specification.
35753c3a7b76Schristos
357630da1778Schristos   * The 'input()' routine is not redefinable, though it may be called
35773c3a7b76Schristos     to read characters following whatever has been matched by a rule.
357830da1778Schristos     If 'input()' encounters an end-of-file the normal 'yywrap()'
357930da1778Schristos     processing is done.  A "real" end-of-file is returned by 'input()'
358030da1778Schristos     as 'EOF'.
35813c3a7b76Schristos
358230da1778Schristos   * Input is instead controlled by defining the 'YY_INPUT()' macro.
35833c3a7b76Schristos
358430da1778Schristos   * The 'flex' restriction that 'input()' cannot be redefined is in
35853c3a7b76Schristos     accordance with the POSIX specification, which simply does not
35863c3a7b76Schristos     specify any way of controlling the scanner's input other than by
358730da1778Schristos     making an initial assignment to 'yyin'.
35883c3a7b76Schristos
358930da1778Schristos   * The 'unput()' routine is not redefinable.  This restriction is in
35903c3a7b76Schristos     accordance with POSIX.
35913c3a7b76Schristos
359230da1778Schristos   * 'flex' scanners are not as reentrant as 'lex' scanners.  In
35933c3a7b76Schristos     particular, if you have an interactive scanner and an interrupt
35943c3a7b76Schristos     handler which long-jumps out of the scanner, and the scanner is
35953c3a7b76Schristos     subsequently called again, you may get the following message:
35963c3a7b76Schristos
3597dded093eSchristos              fatal flex scanner internal error--end of buffer missed
35983c3a7b76Schristos
35993c3a7b76Schristos     To reenter the scanner, first use:
36003c3a7b76Schristos
36013c3a7b76Schristos              yyrestart( yyin );
36023c3a7b76Schristos
36033c3a7b76Schristos     Note that this call will throw away any buffered input; usually
36043c3a7b76Schristos     this isn't a problem with an interactive scanner.  *Note
360530da1778Schristos     Reentrant::, for 'flex''s reentrant API.
36063c3a7b76Schristos
360730da1778Schristos   * Also note that 'flex' C++ scanner classes _are_ reentrant, so if
360830da1778Schristos     using C++ is an option for you, you should use them instead.  *Note
360930da1778Schristos     Cxx::, and *note Reentrant:: for details.
36103c3a7b76Schristos
361130da1778Schristos   * 'output()' is not supported.  Output from the ECHO macro is done to
361230da1778Schristos     the file-pointer 'yyout' (default 'stdout)'.
36133c3a7b76Schristos
361430da1778Schristos   * 'output()' is not part of the POSIX specification.
36153c3a7b76Schristos
361630da1778Schristos   * 'lex' does not support exclusive start conditions (%x), though they
36173c3a7b76Schristos     are in the POSIX specification.
36183c3a7b76Schristos
361930da1778Schristos   * When definitions are expanded, 'flex' encloses them in parentheses.
362030da1778Schristos     With 'lex', the following:
36213c3a7b76Schristos
36223c3a7b76Schristos              NAME    [A-Z][A-Z0-9]*
36233c3a7b76Schristos              %%
36243c3a7b76Schristos              foo{NAME}?      printf( "Found it\n" );
36253c3a7b76Schristos              %%
36263c3a7b76Schristos
362730da1778Schristos     will not match the string 'foo' because when the macro is expanded
362830da1778Schristos     the rule is equivalent to 'foo[A-Z][A-Z0-9]*?' and the precedence
362930da1778Schristos     is such that the '?' is associated with '[A-Z0-9]*'.  With 'flex',
363030da1778Schristos     the rule will be expanded to 'foo([A-Z][A-Z0-9]*)?' and so the
363130da1778Schristos     string 'foo' will match.
36323c3a7b76Schristos
363330da1778Schristos   * Note that if the definition begins with '^' or ends with '$' then
36343c3a7b76Schristos     it is _not_ expanded with parentheses, to allow these operators to
36353c3a7b76Schristos     appear in definitions without losing their special meanings.  But
363630da1778Schristos     the '<s>', '/', and '<<EOF>>' operators cannot be used in a 'flex'
36373c3a7b76Schristos     definition.
36383c3a7b76Schristos
363930da1778Schristos   * Using '-l' results in the 'lex' behavior of no parentheses around
36403c3a7b76Schristos     the definition.
36413c3a7b76Schristos
36423c3a7b76Schristos   * The POSIX specification is that the definition be enclosed in
36433c3a7b76Schristos     parentheses.
36443c3a7b76Schristos
364530da1778Schristos   * Some implementations of 'lex' allow a rule's action to begin on a
36463c3a7b76Schristos     separate line, if the rule's pattern has trailing whitespace:
36473c3a7b76Schristos
36483c3a7b76Schristos              %%
36493c3a7b76Schristos              foo|bar<space here>
36503c3a7b76Schristos                { foobar_action();}
36513c3a7b76Schristos
365230da1778Schristos     'flex' does not support this feature.
36533c3a7b76Schristos
365430da1778Schristos   * The 'lex' '%r' (generate a Ratfor scanner) option is not supported.
365530da1778Schristos     It is not part of the POSIX specification.
36563c3a7b76Schristos
365730da1778Schristos   * After a call to 'unput()', _yytext_ is undefined until the next
365830da1778Schristos     token is matched, unless the scanner was built using '%array'.
365930da1778Schristos     This is not the case with 'lex' or the POSIX specification.  The
366030da1778Schristos     '-l' option does away with this incompatibility.
36613c3a7b76Schristos
366230da1778Schristos   * The precedence of the '{,}' (numeric range) operator is different.
366330da1778Schristos     The AT&T and POSIX specifications of 'lex' interpret 'abc{1,3}' as
366430da1778Schristos     match one, two, or three occurrences of 'abc'", whereas 'flex'
366530da1778Schristos     interprets it as "match 'ab' followed by one, two, or three
366630da1778Schristos     occurrences of 'c'".  The '-l' and '--posix' options do away with
36673c3a7b76Schristos     this incompatibility.
36683c3a7b76Schristos
366930da1778Schristos   * The precedence of the '^' operator is different.  'lex' interprets
367030da1778Schristos     '^foo|bar' as "match either 'foo' at the beginning of a line, or
367130da1778Schristos     'bar' anywhere", whereas 'flex' interprets it as "match either
367230da1778Schristos     'foo' or 'bar' if they come at the beginning of a line".  The
36733c3a7b76Schristos     latter is in agreement with the POSIX specification.
36743c3a7b76Schristos
367530da1778Schristos   * The special table-size declarations such as '%a' supported by 'lex'
367630da1778Schristos     are not required by 'flex' scanners..  'flex' ignores them.
367730da1778Schristos   * The name 'FLEX_SCANNER' is '#define''d so scanners may be written
367830da1778Schristos     for use with either 'flex' or 'lex'.  Scanners also include
367930da1778Schristos     'YY_FLEX_MAJOR_VERSION', 'YY_FLEX_MINOR_VERSION' and
368030da1778Schristos     'YY_FLEX_SUBMINOR_VERSION' indicating which version of 'flex'
36813c3a7b76Schristos     generated the scanner.  For example, for the 2.5.22 release, these
36823c3a7b76Schristos     defines would be 2, 5 and 22 respectively.  If the version of
368330da1778Schristos     'flex' being used is a beta version, then the symbol 'FLEX_BETA' is
368430da1778Schristos     defined.
36853c3a7b76Schristos
368630da1778Schristos   * The symbols '[[' and ']]' in the code sections of the input may
36873c3a7b76Schristos     conflict with the m4 delimiters.  *Note M4 Dependency::.
36883c3a7b76Schristos
368930da1778Schristos   The following 'flex' features are not included in 'lex' or the POSIX
36903c3a7b76Schristosspecification:
36913c3a7b76Schristos
36923c3a7b76Schristos   * C++ scanners
36933c3a7b76Schristos   * %option
36943c3a7b76Schristos   * start condition scopes
36953c3a7b76Schristos   * start condition stacks
36963c3a7b76Schristos   * interactive/non-interactive scanners
36973c3a7b76Schristos   * yy_scan_string() and friends
36983c3a7b76Schristos   * yyterminate()
36993c3a7b76Schristos   * yy_set_interactive()
37003c3a7b76Schristos   * yy_set_bol()
37013c3a7b76Schristos   * YY_AT_BOL() <<EOF>>
37023c3a7b76Schristos   * <*>
37033c3a7b76Schristos   * YY_DECL
37043c3a7b76Schristos   * YY_START
37053c3a7b76Schristos   * YY_USER_ACTION
37063c3a7b76Schristos   * YY_USER_INIT
37073c3a7b76Schristos   * #line directives
37083c3a7b76Schristos   * %{}'s around actions
37093c3a7b76Schristos   * reentrant C API
37103c3a7b76Schristos   * multiple actions on a line
371130da1778Schristos   * almost all of the 'flex' command-line options
37123c3a7b76Schristos
371330da1778Schristos   The feature "multiple actions on a line" refers to the fact that with
371430da1778Schristos'flex' you can put multiple actions on the same line, separated with
371530da1778Schristossemi-colons, while with 'lex', the following:
37163c3a7b76Schristos
37173c3a7b76Schristos         foo    handle_foo(); ++num_foos_seen;
37183c3a7b76Schristos
37193c3a7b76Schristos   is (rather surprisingly) truncated to
37203c3a7b76Schristos
37213c3a7b76Schristos         foo    handle_foo();
37223c3a7b76Schristos
372330da1778Schristos   'flex' does not truncate the action.  Actions that are not enclosed
37243c3a7b76Schristosin braces are simply terminated at the end of the line.
37253c3a7b76Schristos
37263c3a7b76Schristos
37273c3a7b76SchristosFile: flex.info,  Node: Memory Management,  Next: Serialized Tables,  Prev: Lex and Posix,  Up: Top
37283c3a7b76Schristos
37293c3a7b76Schristos21 Memory Management
37303c3a7b76Schristos********************
37313c3a7b76Schristos
37323c3a7b76SchristosThis chapter describes how flex handles dynamic memory, and how you can
37333c3a7b76Schristosoverride the default behavior.
37343c3a7b76Schristos
37353c3a7b76Schristos* Menu:
37363c3a7b76Schristos
37373c3a7b76Schristos* The Default Memory Management::
37383c3a7b76Schristos* Overriding The Default Memory Management::
37393c3a7b76Schristos* A Note About yytext And Memory::
37403c3a7b76Schristos
37413c3a7b76Schristos
37423c3a7b76SchristosFile: flex.info,  Node: The Default Memory Management,  Next: Overriding The Default Memory Management,  Prev: Memory Management,  Up: Memory Management
37433c3a7b76Schristos
37443c3a7b76Schristos21.1 The Default Memory Management
37453c3a7b76Schristos==================================
37463c3a7b76Schristos
374730da1778SchristosFlex allocates dynamic memory during initialization, and once in a while
374830da1778Schristosfrom within a call to yylex().  Initialization takes place during the
374930da1778Schristosfirst call to yylex().  Thereafter, flex may reallocate more memory if
375030da1778Schristosit needs to enlarge a buffer.  As of version 2.5.9 Flex will clean up
375130da1778Schristosall memory when you call 'yylex_destroy' *Note faq-memory-leak::.
37523c3a7b76Schristos
37533c3a7b76Schristos   Flex allocates dynamic memory for four purposes, listed below (1)
37543c3a7b76Schristos
37553c3a7b76Schristos16kB for the input buffer.
37563c3a7b76Schristos     Flex allocates memory for the character buffer used to perform
37573c3a7b76Schristos     pattern matching.  Flex must read ahead from the input stream and
375830da1778Schristos     store it in a large character buffer.  This buffer is typically the
375930da1778Schristos     largest chunk of dynamic memory flex consumes.  This buffer will
376030da1778Schristos     grow if necessary, doubling the size each time.  Flex frees this
376130da1778Schristos     memory when you call yylex_destroy().  The default size of this
376230da1778Schristos     buffer (16384 bytes) is almost always too large.  The ideal size
376330da1778Schristos     for this buffer is the length of the longest token expected, in
376430da1778Schristos     bytes, plus a little more.  Flex will allocate a few extra bytes
376530da1778Schristos     for housekeeping.  Currently, to override the size of the input
376630da1778Schristos     buffer you must '#define YY_BUF_SIZE' to whatever number of bytes
376730da1778Schristos     you want.  We don't plan to change this in the near future, but we
376830da1778Schristos     reserve the right to do so if we ever add a more robust memory
376930da1778Schristos     management API.
37703c3a7b76Schristos
37713c3a7b76Schristos64kb for the REJECT state. This will only be allocated if you use REJECT.
3772dded093eSchristos     The size is large enough to hold the same number of states as
37733c3a7b76Schristos     characters in the input buffer.  If you override the size of the
377430da1778Schristos     input buffer (via 'YY_BUF_SIZE'), then you automatically override
37753c3a7b76Schristos     the size of this buffer as well.
37763c3a7b76Schristos
37773c3a7b76Schristos100 bytes for the start condition stack.
37783c3a7b76Schristos     Flex allocates memory for the start condition stack.  This is the
37793c3a7b76Schristos     stack used for pushing start states, i.e., with yy_push_state().
37803c3a7b76Schristos     It will grow if necessary.  Since the states are simply integers,
37813c3a7b76Schristos     this stack doesn't consume much memory.  This stack is not present
378230da1778Schristos     if '%option stack' is not specified.  You will rarely need to tune
37833c3a7b76Schristos     this buffer.  The ideal size for this stack is the maximum depth
37843c3a7b76Schristos     expected.  The memory for this stack is automatically destroyed
37853c3a7b76Schristos     when you call yylex_destroy().  *Note option-stack::.
37863c3a7b76Schristos
37873c3a7b76Schristos40 bytes for each YY_BUFFER_STATE.
37883c3a7b76Schristos     Flex allocates memory for each YY_BUFFER_STATE. The buffer state
378930da1778Schristos     itself is about 40 bytes, plus an additional large character buffer
379030da1778Schristos     (described above.)  The initial buffer state is created during
379130da1778Schristos     initialization, and with each call to yy_create_buffer().  You
379230da1778Schristos     can't tune the size of this, but you can tune the character buffer
379330da1778Schristos     as described above.  Any buffer state that you explicitly create by
379430da1778Schristos     calling yy_create_buffer() is _NOT_ destroyed automatically.  You
379530da1778Schristos     must call yy_delete_buffer() to free the memory.  The exception to
379630da1778Schristos     this rule is that flex will delete the current buffer automatically
379730da1778Schristos     when you call yylex_destroy().  If you delete the current buffer,
379830da1778Schristos     be sure to set it to NULL. That way, flex will not try to delete
379930da1778Schristos     the buffer a second time (possibly crashing your program!)  At the
380030da1778Schristos     time of this writing, flex does not provide a growable stack for
380130da1778Schristos     the buffer states.  You have to manage that yourself.  *Note
380230da1778Schristos     Multiple Input Buffers::.
38033c3a7b76Schristos
38043c3a7b76Schristos84 bytes for the reentrant scanner guts
38053c3a7b76Schristos     Flex allocates about 84 bytes for the reentrant scanner structure
38063c3a7b76Schristos     when you call yylex_init().  It is destroyed when the user calls
38073c3a7b76Schristos     yylex_destroy().
38083c3a7b76Schristos
38093c3a7b76Schristos   ---------- Footnotes ----------
38103c3a7b76Schristos
38113c3a7b76Schristos   (1) The quantities given here are approximate, and may vary due to
381230da1778Schristoshost architecture, compiler configuration, or due to future enhancements
381330da1778Schristosto flex.
38143c3a7b76Schristos
38153c3a7b76Schristos
38163c3a7b76SchristosFile: flex.info,  Node: Overriding The Default Memory Management,  Next: A Note About yytext And Memory,  Prev: The Default Memory Management,  Up: Memory Management
38173c3a7b76Schristos
38183c3a7b76Schristos21.2 Overriding The Default Memory Management
38193c3a7b76Schristos=============================================
38203c3a7b76Schristos
382130da1778SchristosFlex calls the functions 'yyalloc', 'yyrealloc', and 'yyfree' when it
38223c3a7b76Schristosneeds to allocate or free memory.  By default, these functions are
382330da1778Schristoswrappers around the standard C functions, 'malloc', 'realloc', and
382430da1778Schristos'free', respectively.  You can override the default implementations by
38253c3a7b76Schristostelling flex that you will provide your own implementations.
38263c3a7b76Schristos
38273c3a7b76Schristos   To override the default implementations, you must do two things:
38283c3a7b76Schristos
38293c3a7b76Schristos  1. Suppress the default implementations by specifying one or more of
38303c3a7b76Schristos     the following options:
38313c3a7b76Schristos
383230da1778Schristos        * '%option noyyalloc'
383330da1778Schristos        * '%option noyyrealloc'
383430da1778Schristos        * '%option noyyfree'.
38353c3a7b76Schristos
38363c3a7b76Schristos  2. Provide your own implementation of the following functions: (1)
38373c3a7b76Schristos
38383c3a7b76Schristos          // For a non-reentrant scanner
38393c3a7b76Schristos          void * yyalloc (size_t bytes);
38403c3a7b76Schristos          void * yyrealloc (void * ptr, size_t bytes);
38413c3a7b76Schristos          void   yyfree (void * ptr);
38423c3a7b76Schristos
38433c3a7b76Schristos          // For a reentrant scanner
38443c3a7b76Schristos          void * yyalloc (size_t bytes, void * yyscanner);
38453c3a7b76Schristos          void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
38463c3a7b76Schristos          void   yyfree (void * ptr, void * yyscanner);
38473c3a7b76Schristos
384830da1778Schristos   In the following example, we will override all three memory routines.
384930da1778SchristosWe assume that there is a custom allocator with garbage collection.  In
385030da1778Schristosorder to make this example interesting, we will use a reentrant scanner,
385130da1778Schristospassing a pointer to the custom allocator through 'yyextra'.
38523c3a7b76Schristos
38533c3a7b76Schristos     %{
38543c3a7b76Schristos     #include "some_allocator.h"
38553c3a7b76Schristos     %}
38563c3a7b76Schristos
38573c3a7b76Schristos     /* Suppress the default implementations. */
38583c3a7b76Schristos     %option noyyalloc noyyrealloc noyyfree
38593c3a7b76Schristos     %option reentrant
38603c3a7b76Schristos
38613c3a7b76Schristos     /* Initialize the allocator. */
386230da1778Schristos     %{
38633c3a7b76Schristos     #define YY_EXTRA_TYPE  struct allocator*
38643c3a7b76Schristos     #define YY_USER_INIT  yyextra = allocator_create();
386530da1778Schristos     %}
38663c3a7b76Schristos
38673c3a7b76Schristos     %%
38683c3a7b76Schristos     .|\n   ;
38693c3a7b76Schristos     %%
38703c3a7b76Schristos
38713c3a7b76Schristos     /* Provide our own implementations. */
38723c3a7b76Schristos     void * yyalloc (size_t bytes, void* yyscanner) {
38733c3a7b76Schristos         return allocator_alloc (yyextra, bytes);
38743c3a7b76Schristos     }
38753c3a7b76Schristos
38763c3a7b76Schristos     void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
38773c3a7b76Schristos         return allocator_realloc (yyextra, bytes);
38783c3a7b76Schristos     }
38793c3a7b76Schristos
38803c3a7b76Schristos     void yyfree (void * ptr, void * yyscanner) {
38813c3a7b76Schristos         /* Do nothing -- we leave it to the garbage collector. */
38823c3a7b76Schristos     }
38833c3a7b76Schristos
388430da1778Schristos
38853c3a7b76Schristos   ---------- Footnotes ----------
38863c3a7b76Schristos
38873c3a7b76Schristos   (1) It is not necessary to override all (or any) of the memory
388830da1778Schristosmanagement routines.  You may, for example, override 'yyrealloc', but
388930da1778Schristosnot 'yyfree' or 'yyalloc'.
38903c3a7b76Schristos
38913c3a7b76Schristos
38923c3a7b76SchristosFile: flex.info,  Node: A Note About yytext And Memory,  Prev: Overriding The Default Memory Management,  Up: Memory Management
38933c3a7b76Schristos
38943c3a7b76Schristos21.3 A Note About yytext And Memory
38953c3a7b76Schristos===================================
38963c3a7b76Schristos
389730da1778SchristosWhen flex finds a match, 'yytext' points to the first character of the
38983c3a7b76Schristosmatch in the input buffer.  The string itself is part of the input
38993c3a7b76Schristosbuffer, and is _NOT_ allocated separately.  The value of yytext will be
39003c3a7b76Schristosoverwritten the next time yylex() is called.  In short, the value of
39013c3a7b76Schristosyytext is only valid from within the matched rule's action.
39023c3a7b76Schristos
39033c3a7b76Schristos   Often, you want the value of yytext to persist for later processing,
39043c3a7b76Schristosi.e., by a parser with non-zero lookahead.  In order to preserve yytext,
39053c3a7b76Schristosyou will have to copy it with strdup() or a similar function.  But this
39063c3a7b76Schristosintroduces some headache because your parser is now responsible for
39073c3a7b76Schristosfreeing the copy of yytext.  If you use a yacc or bison parser,
39083c3a7b76Schristos(commonly used with flex), you will discover that the error recovery
39093c3a7b76Schristosmechanisms can cause memory to be leaked.
39103c3a7b76Schristos
39113c3a7b76Schristos   To prevent memory leaks from strdup'd yytext, you will have to track
39123c3a7b76Schristosthe memory somehow.  Our experience has shown that a garbage collection
391330da1778Schristosmechanism or a pooled memory mechanism will save you a lot of grief when
391430da1778Schristoswriting parsers.
39153c3a7b76Schristos
39163c3a7b76Schristos
39173c3a7b76SchristosFile: flex.info,  Node: Serialized Tables,  Next: Diagnostics,  Prev: Memory Management,  Up: Top
39183c3a7b76Schristos
39193c3a7b76Schristos22 Serialized Tables
39203c3a7b76Schristos********************
39213c3a7b76Schristos
392230da1778SchristosA 'flex' scanner has the ability to save the DFA tables to a file, and
392330da1778Schristosload them at runtime when needed.  The motivation for this feature is to
392430da1778Schristosreduce the runtime memory footprint.  Traditionally, these tables have
392530da1778Schristosbeen compiled into the scanner as C arrays, and are sometimes quite
392630da1778Schristoslarge.  Since the tables are compiled into the scanner, the memory used
392730da1778Schristosby the tables can never be freed.  This is a waste of memory, especially
392830da1778Schristosif an application uses several scanners, but none of them at the same
392930da1778Schristostime.
39303c3a7b76Schristos
39313c3a7b76Schristos   The serialization feature allows the tables to be loaded at runtime,
39323c3a7b76Schristosbefore scanning begins.  The tables may be discarded when scanning is
39333c3a7b76Schristosfinished.
39343c3a7b76Schristos
39353c3a7b76Schristos* Menu:
39363c3a7b76Schristos
39373c3a7b76Schristos* Creating Serialized Tables::
39383c3a7b76Schristos* Loading and Unloading Serialized Tables::
39393c3a7b76Schristos* Tables File Format::
39403c3a7b76Schristos
39413c3a7b76Schristos
39423c3a7b76SchristosFile: flex.info,  Node: Creating Serialized Tables,  Next: Loading and Unloading Serialized Tables,  Prev: Serialized Tables,  Up: Serialized Tables
39433c3a7b76Schristos
39443c3a7b76Schristos22.1 Creating Serialized Tables
39453c3a7b76Schristos===============================
39463c3a7b76Schristos
39473c3a7b76SchristosYou may create a scanner with serialized tables by specifying:
39483c3a7b76Schristos
39493c3a7b76Schristos         %option tables-file=FILE
39503c3a7b76Schristos     or
39513c3a7b76Schristos         --tables-file=FILE
39523c3a7b76Schristos
39533c3a7b76Schristos   These options instruct flex to save the DFA tables to the file FILE.
39543c3a7b76SchristosThe tables will _not_ be embedded in the generated scanner.  The scanner
39553c3a7b76Schristoswill not function on its own.  The scanner will be dependent upon the
39563c3a7b76Schristosserialized tables.  You must load the tables from this file at runtime
39573c3a7b76Schristosbefore you can scan anything.
39583c3a7b76Schristos
395930da1778Schristos   If you do not specify a filename to '--tables-file', the tables will
396030da1778Schristosbe saved to 'lex.yy.tables', where 'yy' is the appropriate prefix.
39613c3a7b76Schristos
39623c3a7b76Schristos   If your project uses several different scanners, you can concatenate
39633c3a7b76Schristosthe serialized tables into one file, and flex will find the correct set
39643c3a7b76Schristosof tables, using the scanner prefix as part of the lookup key.  An
39653c3a7b76Schristosexample follows:
39663c3a7b76Schristos
39673c3a7b76Schristos     $ flex --tables-file --prefix=cpp cpp.l
39683c3a7b76Schristos     $ flex --tables-file --prefix=c   c.l
39693c3a7b76Schristos     $ cat lex.cpp.tables lex.c.tables  >  all.tables
39703c3a7b76Schristos
397130da1778Schristos   The above example created two scanners, 'cpp', and 'c'.  Since we did
397230da1778Schristosnot specify a filename, the tables were serialized to 'lex.c.tables' and
397330da1778Schristos'lex.cpp.tables', respectively.  Then, we concatenated the two files
397430da1778Schristostogether into 'all.tables', which we will distribute with our project.
39753c3a7b76SchristosAt runtime, we will open the file and tell flex to load the tables from
39763c3a7b76Schristosit.  Flex will find the correct tables automatically.  (See next
39773c3a7b76Schristossection).
39783c3a7b76Schristos
39793c3a7b76Schristos
39803c3a7b76SchristosFile: flex.info,  Node: Loading and Unloading Serialized Tables,  Next: Tables File Format,  Prev: Creating Serialized Tables,  Up: Serialized Tables
39813c3a7b76Schristos
39823c3a7b76Schristos22.2 Loading and Unloading Serialized Tables
39833c3a7b76Schristos============================================
39843c3a7b76Schristos
398530da1778SchristosIf you've built your scanner with '%option tables-file', then you must
39863c3a7b76Schristosload the scanner tables at runtime.  This can be accomplished with the
39873c3a7b76Schristosfollowing function:
39883c3a7b76Schristos
39893c3a7b76Schristos -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
39903c3a7b76Schristos     Locates scanner tables in the stream pointed to by FP and loads
399130da1778Schristos     them.  Memory for the tables is allocated via 'yyalloc'.  You must
399230da1778Schristos     call this function before the first call to 'yylex'.  The argument
39933c3a7b76Schristos     SCANNER only appears in the reentrant scanner.  This function
399430da1778Schristos     returns '0' (zero) on success, or non-zero on error.
39953c3a7b76Schristos
39963c3a7b76Schristos   The loaded tables are *not* automatically destroyed (unloaded) when
399730da1778Schristosyou call 'yylex_destroy'.  The reason is that you may create several
39983c3a7b76Schristosscanners of the same type (in a reentrant scanner), each of which needs
399930da1778Schristosaccess to these tables.  To avoid a nasty memory leak, you must call the
400030da1778Schristosfollowing function:
40013c3a7b76Schristos
40023c3a7b76Schristos -- Function: int yytables_destroy ([yyscan_t SCANNER])
40033c3a7b76Schristos     Unloads the scanner tables.  The tables must be loaded again before
40043c3a7b76Schristos     you can scan any more data.  The argument SCANNER only appears in
400530da1778Schristos     the reentrant scanner.  This function returns '0' (zero) on
40063c3a7b76Schristos     success, or non-zero on error.
40073c3a7b76Schristos
400830da1778Schristos   *The functions 'yytables_fload' and 'yytables_destroy' are not
40093c3a7b76Schristosthread-safe.*  You must ensure that these functions are called exactly
40103c3a7b76Schristosonce (for each scanner type) in a threaded program, before any thread
401130da1778Schristoscalls 'yylex'.  After the tables are loaded, they are never written to,
40123c3a7b76Schristosand no thread protection is required thereafter - until you destroy
40133c3a7b76Schristosthem.
40143c3a7b76Schristos
40153c3a7b76Schristos
40163c3a7b76SchristosFile: flex.info,  Node: Tables File Format,  Prev: Loading and Unloading Serialized Tables,  Up: Serialized Tables
40173c3a7b76Schristos
40183c3a7b76Schristos22.3 Tables File Format
40193c3a7b76Schristos=======================
40203c3a7b76Schristos
402130da1778SchristosThis section defines the file format of serialized 'flex' tables.
40223c3a7b76Schristos
40233c3a7b76Schristos   The tables format allows for one or more sets of tables to be
40243c3a7b76Schristosspecified, where each set corresponds to a given scanner.  Scanners are
40253c3a7b76Schristosindexed by name, as described below.  The file format is as follows:
40263c3a7b76Schristos
40273c3a7b76Schristos                      TABLE SET 1
40283c3a7b76Schristos                     +-------------------------------+
40293c3a7b76Schristos             Header  | uint32          th_magic;     |
40303c3a7b76Schristos                     | uint32          th_hsize;     |
40313c3a7b76Schristos                     | uint32          th_ssize;     |
40323c3a7b76Schristos                     | uint16          th_flags;     |
40333c3a7b76Schristos                     | char            th_version[]; |
40343c3a7b76Schristos                     | char            th_name[];    |
40353c3a7b76Schristos                     | uint8           th_pad64[];   |
40363c3a7b76Schristos                     +-------------------------------+
40373c3a7b76Schristos             Table 1 | uint16          td_id;        |
40383c3a7b76Schristos                     | uint16          td_flags;     |
40393c3a7b76Schristos                     | uint32          td_hilen;     |
4040dded093eSchristos                     | uint32          td_lolen;     |
40413c3a7b76Schristos                     | void            td_data[];    |
40423c3a7b76Schristos                     | uint8           td_pad64[];   |
40433c3a7b76Schristos                     +-------------------------------+
40443c3a7b76Schristos             Table 2 |                               |
40453c3a7b76Schristos                .    .                               .
40463c3a7b76Schristos                .    .                               .
40473c3a7b76Schristos                .    .                               .
40483c3a7b76Schristos                .    .                               .
40493c3a7b76Schristos             Table n |                               |
40503c3a7b76Schristos                     +-------------------------------+
40513c3a7b76Schristos                      TABLE SET 2
40523c3a7b76Schristos                           .
40533c3a7b76Schristos                           .
40543c3a7b76Schristos                           .
40553c3a7b76Schristos                      TABLE SET N
40563c3a7b76Schristos
40573c3a7b76Schristos   The above diagram shows that a complete set of tables consists of a
40583c3a7b76Schristosheader followed by multiple individual tables.  Furthermore, multiple
40593c3a7b76Schristoscomplete sets may be present in the same file, each set with its own
406030da1778Schristosheader and tables.  The sets are contiguous in the file.  The only way
406130da1778Schristosto know if another set follows is to check the next four bytes for the
40623c3a7b76Schristosmagic number (or check for EOF). The header and tables sections are
40633c3a7b76Schristospadded to 64-bit boundaries.  Below we describe each field in detail.
406430da1778SchristosThis format does not specify how the scanner will expand the given data,
406530da1778Schristosi.e., data may be serialized as int8, but expanded to an int32 array at
406630da1778Schristosruntime.  This is to reduce the size of the serialized data where
406730da1778Schristospossible.  Remember, _all integer values are in network byte order_.
40683c3a7b76Schristos
40693c3a7b76SchristosFields of a table header:
40703c3a7b76Schristos
407130da1778Schristos'th_magic'
40723c3a7b76Schristos     Magic number, always 0xF13C57B1.
40733c3a7b76Schristos
407430da1778Schristos'th_hsize'
407530da1778Schristos     Size of this entire header, in bytes, including all fields plus any
407630da1778Schristos     padding.
40773c3a7b76Schristos
407830da1778Schristos'th_ssize'
40793c3a7b76Schristos     Size of this entire set, in bytes, including the header, all
40803c3a7b76Schristos     tables, plus any padding.
40813c3a7b76Schristos
408230da1778Schristos'th_flags'
40833c3a7b76Schristos     Bit flags for this table set.  Currently unused.
40843c3a7b76Schristos
408530da1778Schristos'th_version[]'
408630da1778Schristos     Flex version in NULL-terminated string format.  e.g., '2.5.13a'.
40873c3a7b76Schristos     This is the version of flex that was used to create the serialized
40883c3a7b76Schristos     tables.
40893c3a7b76Schristos
409030da1778Schristos'th_name[]'
409130da1778Schristos     Contains the name of this table set.  The default is 'yytables',
409230da1778Schristos     and is prefixed accordingly, e.g., 'footables'.  Must be
40933c3a7b76Schristos     NULL-terminated.
40943c3a7b76Schristos
409530da1778Schristos'th_pad64[]'
40963c3a7b76Schristos     Zero or more NULL bytes, padding the entire header to the next
40973c3a7b76Schristos     64-bit boundary as calculated from the beginning of the header.
40983c3a7b76Schristos
40993c3a7b76SchristosFields of a table:
41003c3a7b76Schristos
410130da1778Schristos'td_id'
41023c3a7b76Schristos     Specifies the table identifier.  Possible values are:
410330da1778Schristos     'YYTD_ID_ACCEPT (0x01)'
410430da1778Schristos          'yy_accept'
410530da1778Schristos     'YYTD_ID_BASE (0x02)'
410630da1778Schristos          'yy_base'
410730da1778Schristos     'YYTD_ID_CHK (0x03)'
410830da1778Schristos          'yy_chk'
410930da1778Schristos     'YYTD_ID_DEF (0x04)'
411030da1778Schristos          'yy_def'
411130da1778Schristos     'YYTD_ID_EC (0x05)'
411230da1778Schristos          'yy_ec '
411330da1778Schristos     'YYTD_ID_META (0x06)'
411430da1778Schristos          'yy_meta'
411530da1778Schristos     'YYTD_ID_NUL_TRANS (0x07)'
411630da1778Schristos          'yy_NUL_trans'
411730da1778Schristos     'YYTD_ID_NXT (0x08)'
411830da1778Schristos          'yy_nxt'.  This array may be two dimensional.  See the
411930da1778Schristos          'td_hilen' field below.
412030da1778Schristos     'YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
412130da1778Schristos          'yy_rule_can_match_eol'
412230da1778Schristos     'YYTD_ID_START_STATE_LIST (0x0A)'
412330da1778Schristos          'yy_start_state_list'.  This array is handled specially
41243c3a7b76Schristos          because it is an array of pointers to structs.  See the
412530da1778Schristos          'td_flags' field below.
412630da1778Schristos     'YYTD_ID_TRANSITION (0x0B)'
412730da1778Schristos          'yy_transition'.  This array is handled specially because it
412830da1778Schristos          is an array of structs.  See the 'td_lolen' field below.
412930da1778Schristos     'YYTD_ID_ACCLIST (0x0C)'
413030da1778Schristos          'yy_acclist'
41313c3a7b76Schristos
413230da1778Schristos'td_flags'
413330da1778Schristos     Bit flags describing how to interpret the data in 'td_data'.  The
41343c3a7b76Schristos     data arrays are one-dimensional by default, but may be two
413530da1778Schristos     dimensional as specified in the 'td_hilen' field.
41363c3a7b76Schristos
413730da1778Schristos     'YYTD_DATA8 (0x01)'
41383c3a7b76Schristos          The data is serialized as an array of type int8.
413930da1778Schristos     'YYTD_DATA16 (0x02)'
41403c3a7b76Schristos          The data is serialized as an array of type int16.
414130da1778Schristos     'YYTD_DATA32 (0x04)'
41423c3a7b76Schristos          The data is serialized as an array of type int32.
414330da1778Schristos     'YYTD_PTRANS (0x08)'
41443c3a7b76Schristos          The data is a list of indexes of entries in the expanded
414530da1778Schristos          'yy_transition' array.  Each index should be expanded to a
414630da1778Schristos          pointer to the corresponding entry in the 'yy_transition'
414730da1778Schristos          array.  We count on the fact that the 'yy_transition' array
41483c3a7b76Schristos          has already been seen.
414930da1778Schristos     'YYTD_STRUCT (0x10)'
41503c3a7b76Schristos          The data is a list of yy_trans_info structs, each of which
41513c3a7b76Schristos          consists of two integers.  There is no padding between struct
41523c3a7b76Schristos          elements or between structs.  The type of each member is
415330da1778Schristos          determined by the 'YYTD_DATA*' bits.
41543c3a7b76Schristos
415530da1778Schristos'td_hilen'
415630da1778Schristos     If 'td_hilen' is non-zero, then the data is a two-dimensional
415730da1778Schristos     array.  Otherwise, the data is a one-dimensional array.  'td_hilen'
41583c3a7b76Schristos     contains the number of elements in the higher dimensional array,
415930da1778Schristos     and 'td_lolen' contains the number of elements in the lowest
41603c3a7b76Schristos     dimension.
41613c3a7b76Schristos
416230da1778Schristos     Conceptually, 'td_data' is either 'sometype td_data[td_lolen]', or
416330da1778Schristos     'sometype td_data[td_hilen][td_lolen]', where 'sometype' is
416430da1778Schristos     specified by the 'td_flags' field.  It is possible for both
416530da1778Schristos     'td_lolen' and 'td_hilen' to be zero, in which case 'td_data' is a
41663c3a7b76Schristos     zero length array, and no data is loaded, i.e., this table is
41673c3a7b76Schristos     simply skipped.  Flex does not currently generate tables of zero
41683c3a7b76Schristos     length.
41693c3a7b76Schristos
417030da1778Schristos'td_lolen'
4171dded093eSchristos     Specifies the number of elements in the lowest dimension array.  If
4172dded093eSchristos     this is a one-dimensional array, then it is simply the number of
4173dded093eSchristos     elements in this array.  The element size is determined by the
417430da1778Schristos     'td_flags' field.
4175dded093eSchristos
417630da1778Schristos'td_data[]'
41773c3a7b76Schristos     The table data.  This array may be a one- or two-dimensional array,
417830da1778Schristos     of type 'int8', 'int16', 'int32', 'struct yy_trans_info', or
417930da1778Schristos     'struct yy_trans_info*', depending upon the values in the
418030da1778Schristos     'td_flags', 'td_hilen', and 'td_lolen' fields.
41813c3a7b76Schristos
418230da1778Schristos'td_pad64[]'
41833c3a7b76Schristos     Zero or more NULL bytes, padding the entire table to the next
41843c3a7b76Schristos     64-bit boundary as calculated from the beginning of this table.
41853c3a7b76Schristos
41863c3a7b76Schristos
41873c3a7b76SchristosFile: flex.info,  Node: Diagnostics,  Next: Limitations,  Prev: Serialized Tables,  Up: Top
41883c3a7b76Schristos
41893c3a7b76Schristos23 Diagnostics
41903c3a7b76Schristos**************
41913c3a7b76Schristos
419230da1778SchristosThe following is a list of 'flex' diagnostic messages:
41933c3a7b76Schristos
419430da1778Schristos   * 'warning, rule cannot be matched' indicates that the given rule
41953c3a7b76Schristos     cannot be matched because it follows other rules that will always
419630da1778Schristos     match the same text as it.  For example, in the following 'foo'
41973c3a7b76Schristos     cannot be matched because it comes after an identifier "catch-all"
41983c3a7b76Schristos     rule:
41993c3a7b76Schristos
42003c3a7b76Schristos              [a-z]+    got_identifier();
42013c3a7b76Schristos              foo       got_foo();
42023c3a7b76Schristos
420330da1778Schristos     Using 'REJECT' in a scanner suppresses this warning.
42043c3a7b76Schristos
420530da1778Schristos   * 'warning, -s option given but default rule can be matched' means
42063c3a7b76Schristos     that it is possible (perhaps only in a particular start condition)
42073c3a7b76Schristos     that the default rule (match any single character) is the only one
420830da1778Schristos     that will match a particular input.  Since '-s' was given,
42093c3a7b76Schristos     presumably this is not intended.
42103c3a7b76Schristos
421130da1778Schristos   * 'reject_used_but_not_detected undefined' or
421230da1778Schristos     'yymore_used_but_not_detected undefined'.  These errors can occur
421330da1778Schristos     at compile time.  They indicate that the scanner uses 'REJECT' or
421430da1778Schristos     'yymore()' but that 'flex' failed to notice the fact, meaning that
421530da1778Schristos     'flex' scanned the first two sections looking for occurrences of
42163c3a7b76Schristos     these actions and failed to find any, but somehow you snuck some in
421730da1778Schristos     (via a #include file, for example).  Use '%option reject' or
421830da1778Schristos     '%option yymore' to indicate to 'flex' that you really do use these
421930da1778Schristos     features.
42203c3a7b76Schristos
422130da1778Schristos   * 'flex scanner jammed'.  a scanner compiled with '-s' has
42223c3a7b76Schristos     encountered an input string which wasn't matched by any of its
42233c3a7b76Schristos     rules.  This error can also occur due to internal problems.
42243c3a7b76Schristos
422530da1778Schristos   * 'token too large, exceeds YYLMAX'.  your scanner uses '%array' and
422630da1778Schristos     one of its rules matched a string longer than the 'YYLMAX' constant
422730da1778Schristos     (8K bytes by default).  You can increase the value by #define'ing
422830da1778Schristos     'YYLMAX' in the definitions section of your 'flex' input.
42293c3a7b76Schristos
423030da1778Schristos   * 'scanner requires -8 flag to use the character 'x''.  Your scanner
423130da1778Schristos     specification includes recognizing the 8-bit character ''x'' and
42323c3a7b76Schristos     you did not specify the -8 flag, and your scanner defaulted to
423330da1778Schristos     7-bit because you used the '-Cf' or '-CF' table compression
423430da1778Schristos     options.  See the discussion of the '-7' flag, *note Scanner
42353c3a7b76Schristos     Options::, for details.
42363c3a7b76Schristos
423730da1778Schristos   * 'flex scanner push-back overflow'.  you used 'unput()' to push back
42383c3a7b76Schristos     so much text that the scanner's buffer could not hold both the
423930da1778Schristos     pushed-back text and the current token in 'yytext'.  Ideally the
42403c3a7b76Schristos     scanner should dynamically resize the buffer in this case, but at
42413c3a7b76Schristos     present it does not.
42423c3a7b76Schristos
424330da1778Schristos   * 'input buffer overflow, can't enlarge buffer because scanner uses
42443c3a7b76Schristos     REJECT'.  the scanner was working on matching an extremely large
42453c3a7b76Schristos     token and needed to expand the input buffer.  This doesn't work
424630da1778Schristos     with scanners that use 'REJECT'.
42473c3a7b76Schristos
424830da1778Schristos   * 'fatal flex scanner internal error--end of buffer missed'.  This
424930da1778Schristos     can occur in a scanner which is reentered after a long-jump has
425030da1778Schristos     jumped out (or over) the scanner's activation frame.  Before
425130da1778Schristos     reentering the scanner, use:
42523c3a7b76Schristos              yyrestart( yyin );
42533c3a7b76Schristos     or, as noted above, switch to using the C++ scanner class.
42543c3a7b76Schristos
425530da1778Schristos   * 'too many start conditions in <> construct!' you listed more start
42563c3a7b76Schristos     conditions in a <> construct than exist (so you must have listed at
42573c3a7b76Schristos     least one of them twice).
42583c3a7b76Schristos
42593c3a7b76Schristos
42603c3a7b76SchristosFile: flex.info,  Node: Limitations,  Next: Bibliography,  Prev: Diagnostics,  Up: Top
42613c3a7b76Schristos
42623c3a7b76Schristos24 Limitations
42633c3a7b76Schristos**************
42643c3a7b76Schristos
42653c3a7b76SchristosSome trailing context patterns cannot be properly matched and generate
426630da1778Schristoswarning messages ('dangerous trailing context').  These are patterns
42673c3a7b76Schristoswhere the ending of the first part of the rule matches the beginning of
426830da1778Schristosthe second part, such as 'zx*/xy*', where the 'x*' matches the 'x' at
42693c3a7b76Schristosthe beginning of the trailing context.  (Note that the POSIX draft
42703c3a7b76Schristosstates that the text matched by such patterns is undefined.)  For some
42713c3a7b76Schristostrailing context rules, parts which are actually fixed-length are not
42723c3a7b76Schristosrecognized as such, leading to the abovementioned performance loss.  In
427330da1778Schristosparticular, parts using '|' or '{n}' (such as 'foo{3}') are always
427430da1778Schristosconsidered variable-length.  Combining trailing context with the special
427530da1778Schristos'|' action can result in _fixed_ trailing context being turned into the
427630da1778Schristosmore expensive _variable_ trailing context.  For example, in the
427730da1778Schristosfollowing:
42783c3a7b76Schristos
42793c3a7b76Schristos         %%
42803c3a7b76Schristos         abc      |
42813c3a7b76Schristos         xyz/def
42823c3a7b76Schristos
428330da1778Schristos   Use of 'unput()' invalidates yytext and yyleng, unless the '%array'
428430da1778Schristosdirective or the '-l' option has been used.  Pattern-matching of 'NUL's
42853c3a7b76Schristosis substantially slower than matching other characters.  Dynamic
42863c3a7b76Schristosresizing of the input buffer is slow, as it entails rescanning all the
42873c3a7b76Schristostext matched so far by the current (generally huge) token.  Due to both
42883c3a7b76Schristosbuffering of input and read-ahead, you cannot intermix calls to
428930da1778Schristos'<stdio.h>' routines, such as, getchar(), with 'flex' rules and expect
429030da1778Schristosit to work.  Call 'input()' instead.  The total table entries listed by
429130da1778Schristosthe '-v' flag excludes the number of table entries needed to determine
42923c3a7b76Schristoswhat rule has been matched.  The number of entries is equal to the
429330da1778Schristosnumber of DFA states if the scanner does not use 'REJECT', and somewhat
429430da1778Schristosgreater than the number of states if it does.  'REJECT' cannot be used
429530da1778Schristoswith the '-f' or '-F' options.
42963c3a7b76Schristos
429730da1778Schristos   The 'flex' internal algorithms need documentation.
42983c3a7b76Schristos
42993c3a7b76Schristos
43003c3a7b76SchristosFile: flex.info,  Node: Bibliography,  Next: FAQ,  Prev: Limitations,  Up: Top
43013c3a7b76Schristos
43023c3a7b76Schristos25 Additional Reading
43033c3a7b76Schristos*********************
43043c3a7b76Schristos
43053c3a7b76SchristosYou may wish to read more about the following programs:
43063c3a7b76Schristos   * lex
43073c3a7b76Schristos   * yacc
43083c3a7b76Schristos   * sed
43093c3a7b76Schristos   * awk
43103c3a7b76Schristos
43113c3a7b76Schristos   The following books may contain material of interest:
43123c3a7b76Schristos
43133c3a7b76Schristos   John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
43143c3a7b76SchristosAssociates.  Be sure to get the 2nd edition.
43153c3a7b76Schristos
43163c3a7b76Schristos   M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
43173c3a7b76Schristos
43183c3a7b76Schristos   Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
43193c3a7b76SchristosTechniques and Tools_, Addison-Wesley (1986).  Describes the
432030da1778Schristospattern-matching techniques used by 'flex' (deterministic finite
43213c3a7b76Schristosautomata).
43223c3a7b76Schristos
43233c3a7b76Schristos
43243c3a7b76SchristosFile: flex.info,  Node: FAQ,  Next: Appendices,  Prev: Bibliography,  Up: Top
43253c3a7b76Schristos
43263c3a7b76SchristosFAQ
43273c3a7b76Schristos***
43283c3a7b76Schristos
432930da1778SchristosFrom time to time, the 'flex' maintainer receives certain questions.
43303c3a7b76SchristosRather than repeat answers to well-understood problems, we publish them
43313c3a7b76Schristoshere.
43323c3a7b76Schristos
43333c3a7b76Schristos* Menu:
43343c3a7b76Schristos
43353c3a7b76Schristos* When was flex born?::
43363c3a7b76Schristos* How do I expand backslash-escape sequences in C-style quoted strings?::
43373c3a7b76Schristos* Why do flex scanners call fileno if it is not ANSI compatible?::
43383c3a7b76Schristos* Does flex support recursive pattern definitions?::
43393c3a7b76Schristos* How do I skip huge chunks of input (tens of megabytes) while using flex?::
43403c3a7b76Schristos* Flex is not matching my patterns in the same order that I defined them.::
43413c3a7b76Schristos* My actions are executing out of order or sometimes not at all.::
43423c3a7b76Schristos* How can I have multiple input sources feed into the same scanner at the same time?::
43433c3a7b76Schristos* Can I build nested parsers that work with the same input file?::
43443c3a7b76Schristos* How can I match text only at the end of a file?::
43453c3a7b76Schristos* How can I make REJECT cascade across start condition boundaries?::
43463c3a7b76Schristos* Why cant I use fast or full tables with interactive mode?::
43473c3a7b76Schristos* How much faster is -F or -f than -C?::
43483c3a7b76Schristos* If I have a simple grammar cant I just parse it with flex?::
43493c3a7b76Schristos* Why doesn't yyrestart() set the start state back to INITIAL?::
43503c3a7b76Schristos* How can I match C-style comments?::
43513c3a7b76Schristos* The period isn't working the way I expected.::
43523c3a7b76Schristos* Can I get the flex manual in another format?::
43533c3a7b76Schristos* Does there exist a "faster" NDFA->DFA algorithm?::
43543c3a7b76Schristos* How does flex compile the DFA so quickly?::
43553c3a7b76Schristos* How can I use more than 8192 rules?::
43563c3a7b76Schristos* How do I abandon a file in the middle of a scan and switch to a new file?::
43573c3a7b76Schristos* How do I execute code only during initialization (only before the first scan)?::
43583c3a7b76Schristos* How do I execute code at termination?::
43593c3a7b76Schristos* Where else can I find help?::
43603c3a7b76Schristos* Can I include comments in the "rules" section of the file?::
43613c3a7b76Schristos* I get an error about undefined yywrap().::
43623c3a7b76Schristos* How can I change the matching pattern at run time?::
43633c3a7b76Schristos* How can I expand macros in the input?::
43643c3a7b76Schristos* How can I build a two-pass scanner?::
43653c3a7b76Schristos* How do I match any string not matched in the preceding rules?::
43663c3a7b76Schristos* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
43673c3a7b76Schristos* Is there a way to make flex treat NULL like a regular character?::
43683c3a7b76Schristos* Whenever flex can not match the input it says "flex scanner jammed".::
43693c3a7b76Schristos* Why doesn't flex have non-greedy operators like perl does?::
43703c3a7b76Schristos* Memory leak - 16386 bytes allocated by malloc.::
43713c3a7b76Schristos* How do I track the byte offset for lseek()?::
43723c3a7b76Schristos* How do I use my own I/O classes in a C++ scanner?::
43733c3a7b76Schristos* How do I skip as many chars as possible?::
43743c3a7b76Schristos* deleteme00::
43753c3a7b76Schristos* Are certain equivalent patterns faster than others?::
43763c3a7b76Schristos* Is backing up a big deal?::
43773c3a7b76Schristos* Can I fake multi-byte character support?::
43783c3a7b76Schristos* deleteme01::
43793c3a7b76Schristos* Can you discuss some flex internals?::
43803c3a7b76Schristos* unput() messes up yy_at_bol::
43813c3a7b76Schristos* The | operator is not doing what I want::
43823c3a7b76Schristos* Why can't flex understand this variable trailing context pattern?::
43833c3a7b76Schristos* The ^ operator isn't working::
43843c3a7b76Schristos* Trailing context is getting confused with trailing optional patterns::
43853c3a7b76Schristos* Is flex GNU or not?::
43863c3a7b76Schristos* ERASEME53::
43873c3a7b76Schristos* I need to scan if-then-else blocks and while loops::
43883c3a7b76Schristos* ERASEME55::
43893c3a7b76Schristos* ERASEME56::
43903c3a7b76Schristos* ERASEME57::
43913c3a7b76Schristos* Is there a repository for flex scanners?::
43923c3a7b76Schristos* How can I conditionally compile or preprocess my flex input file?::
43933c3a7b76Schristos* Where can I find grammars for lex and yacc?::
43943c3a7b76Schristos* I get an end-of-buffer message for each character scanned.::
43953c3a7b76Schristos* unnamed-faq-62::
43963c3a7b76Schristos* unnamed-faq-63::
43973c3a7b76Schristos* unnamed-faq-64::
43983c3a7b76Schristos* unnamed-faq-65::
43993c3a7b76Schristos* unnamed-faq-66::
44003c3a7b76Schristos* unnamed-faq-67::
44013c3a7b76Schristos* unnamed-faq-68::
44023c3a7b76Schristos* unnamed-faq-69::
44033c3a7b76Schristos* unnamed-faq-70::
44043c3a7b76Schristos* unnamed-faq-71::
44053c3a7b76Schristos* unnamed-faq-72::
44063c3a7b76Schristos* unnamed-faq-73::
44073c3a7b76Schristos* unnamed-faq-74::
44083c3a7b76Schristos* unnamed-faq-75::
44093c3a7b76Schristos* unnamed-faq-76::
44103c3a7b76Schristos* unnamed-faq-77::
44113c3a7b76Schristos* unnamed-faq-78::
44123c3a7b76Schristos* unnamed-faq-79::
44133c3a7b76Schristos* unnamed-faq-80::
44143c3a7b76Schristos* unnamed-faq-81::
44153c3a7b76Schristos* unnamed-faq-82::
44163c3a7b76Schristos* unnamed-faq-83::
44173c3a7b76Schristos* unnamed-faq-84::
44183c3a7b76Schristos* unnamed-faq-85::
44193c3a7b76Schristos* unnamed-faq-86::
44203c3a7b76Schristos* unnamed-faq-87::
44213c3a7b76Schristos* unnamed-faq-88::
44223c3a7b76Schristos* unnamed-faq-90::
44233c3a7b76Schristos* unnamed-faq-91::
44243c3a7b76Schristos* unnamed-faq-92::
44253c3a7b76Schristos* unnamed-faq-93::
44263c3a7b76Schristos* unnamed-faq-94::
44273c3a7b76Schristos* unnamed-faq-95::
44283c3a7b76Schristos* unnamed-faq-96::
44293c3a7b76Schristos* unnamed-faq-97::
44303c3a7b76Schristos* unnamed-faq-98::
44313c3a7b76Schristos* unnamed-faq-99::
44323c3a7b76Schristos* unnamed-faq-100::
44333c3a7b76Schristos* unnamed-faq-101::
44343c3a7b76Schristos* What is the difference between YYLEX_PARAM and YY_DECL?::
44353c3a7b76Schristos* Why do I get "conflicting types for yylex" error?::
44363c3a7b76Schristos* How do I access the values set in a Flex action from within a Bison action?::
44373c3a7b76Schristos
44383c3a7b76Schristos
44393c3a7b76SchristosFile: flex.info,  Node: When was flex born?,  Next: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
44403c3a7b76Schristos
44413c3a7b76SchristosWhen was flex born?
44423c3a7b76Schristos===================
44433c3a7b76Schristos
444430da1778SchristosVern Paxson took over the 'Software Tools' lex project from Jef
444530da1778SchristosPoskanzer in 1982.  At that point it was written in Ratfor.  Around 1987
444630da1778Schristosor so, Paxson translated it into C, and a legend was born :-).
44473c3a7b76Schristos
44483c3a7b76Schristos
44493c3a7b76SchristosFile: flex.info,  Node: How do I expand backslash-escape sequences in C-style quoted strings?,  Next: Why do flex scanners call fileno if it is not ANSI compatible?,  Prev: When was flex born?,  Up: FAQ
44503c3a7b76Schristos
44513c3a7b76SchristosHow do I expand backslash-escape sequences in C-style quoted strings?
44523c3a7b76Schristos=====================================================================
44533c3a7b76Schristos
44543c3a7b76SchristosA key point when scanning quoted strings is that you cannot (easily)
44553c3a7b76Schristoswrite a single rule that will precisely match the string if you allow
445630da1778Schristosthings like embedded escape sequences and newlines.  If you try to match
445730da1778Schristosstrings with a single rule then you'll wind up having to rescan the
445830da1778Schristosstring anyway to find any escape sequences.
44593c3a7b76Schristos
44603c3a7b76Schristos   Instead you can use exclusive start conditions and a set of rules,
446130da1778Schristosone for matching non-escaped text, one for matching a single escape, one
446230da1778Schristosfor matching an embedded newline, and one for recognizing the end of the
446330da1778Schristosstring.  Each of these rules is then faced with the question of where to
446430da1778Schristosput its intermediary results.  The best solution is for the rules to
446530da1778Schristosappend their local value of 'yytext' to the end of a "string literal"
446630da1778Schristosbuffer.  A rule like the escape-matcher will append to the buffer the
446730da1778Schristosmeaning of the escape sequence rather than the literal text in 'yytext'.
446830da1778SchristosIn this way, 'yytext' does not need to be modified at all.
44693c3a7b76Schristos
44703c3a7b76Schristos
44713c3a7b76SchristosFile: flex.info,  Node: Why do flex scanners call fileno if it is not ANSI compatible?,  Next: Does flex support recursive pattern definitions?,  Prev: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
44723c3a7b76Schristos
44733c3a7b76SchristosWhy do flex scanners call fileno if it is not ANSI compatible?
44743c3a7b76Schristos==============================================================
44753c3a7b76Schristos
447630da1778SchristosFlex scanners call 'fileno()' in order to get the file descriptor
447730da1778Schristoscorresponding to 'yyin'.  The file descriptor may be passed to
447830da1778Schristos'isatty()' or 'read()', depending upon which '%options' you specified.
447930da1778SchristosIf your system does not have 'fileno()' support, to get rid of the
448030da1778Schristos'read()' call, do not specify '%option read'.  To get rid of the
448130da1778Schristos'isatty()' call, you must specify one of '%option always-interactive' or
448230da1778Schristos'%option never-interactive'.
44833c3a7b76Schristos
44843c3a7b76Schristos
44853c3a7b76SchristosFile: flex.info,  Node: Does flex support recursive pattern definitions?,  Next: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Prev: Why do flex scanners call fileno if it is not ANSI compatible?,  Up: FAQ
44863c3a7b76Schristos
44873c3a7b76SchristosDoes flex support recursive pattern definitions?
44883c3a7b76Schristos================================================
44893c3a7b76Schristos
44903c3a7b76Schristose.g.,
44913c3a7b76Schristos
44923c3a7b76Schristos     %%
44933c3a7b76Schristos     block   "{"({block}|{statement})*"}"
44943c3a7b76Schristos
44953c3a7b76Schristos   No.  You cannot have recursive definitions.  The pattern-matching
44963c3a7b76Schristospower of regular expressions in general (and therefore flex scanners,
44973c3a7b76Schristostoo) is limited.  In particular, regular expressions cannot "balance"
44983c3a7b76Schristosparentheses to an arbitrary degree.  For example, it's impossible to
44993c3a7b76Schristoswrite a regular expression that matches all strings containing the same
45003c3a7b76Schristosnumber of '{'s as '}'s.  For more powerful pattern matching, you need a
450130da1778Schristosparser, such as 'GNU bison'.
45023c3a7b76Schristos
45033c3a7b76Schristos
45043c3a7b76SchristosFile: flex.info,  Node: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Next: Flex is not matching my patterns in the same order that I defined them.,  Prev: Does flex support recursive pattern definitions?,  Up: FAQ
45053c3a7b76Schristos
45063c3a7b76SchristosHow do I skip huge chunks of input (tens of megabytes) while using flex?
45073c3a7b76Schristos========================================================================
45083c3a7b76Schristos
450930da1778SchristosUse 'fseek()' (or 'lseek()') to position yyin, then call 'yyrestart()'.
45103c3a7b76Schristos
45113c3a7b76Schristos
45123c3a7b76SchristosFile: flex.info,  Node: Flex is not matching my patterns in the same order that I defined them.,  Next: My actions are executing out of order or sometimes not at all.,  Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Up: FAQ
45133c3a7b76Schristos
45143c3a7b76SchristosFlex is not matching my patterns in the same order that I defined them.
45153c3a7b76Schristos=======================================================================
45163c3a7b76Schristos
451730da1778Schristos'flex' picks the rule that matches the most text (i.e., the longest
451830da1778Schristospossible input string).  This is because 'flex' uses an entirely
45193c3a7b76Schristosdifferent matching technique ("deterministic finite automata") that
45203c3a7b76Schristosactually does all of the matching simultaneously, in parallel.  (Seems
45213c3a7b76Schristosimpossible, but it's actually a fairly simple technique once you
45223c3a7b76Schristosunderstand the principles.)
45233c3a7b76Schristos
45243c3a7b76Schristos   A side-effect of this parallel matching is that when the input
452530da1778Schristosmatches more than one rule, 'flex' scanners pick the rule that matched
45263c3a7b76Schristosthe _most_ text.  This is explained further in the manual, in the
45273c3a7b76Schristossection *Note Matching::.
45283c3a7b76Schristos
452930da1778Schristos   If you want 'flex' to choose a shorter match, then you can work
45303c3a7b76Schristosaround this behavior by expanding your short rule to match more text,
45313c3a7b76Schristosthen put back the extra:
45323c3a7b76Schristos
45333c3a7b76Schristos     data_.*        yyless( 5 ); BEGIN BLOCKIDSTATE;
45343c3a7b76Schristos
45353c3a7b76Schristos   Another fix would be to make the second rule active only during the
453630da1778Schristos'<BLOCKIDSTATE>' start condition, and make that start condition
453730da1778Schristosexclusive by declaring it with '%x' instead of '%s'.
45383c3a7b76Schristos
45393c3a7b76Schristos   A final fix is to change the input language so that the ambiguity for
454030da1778Schristos'data_' is removed, by adding characters to it that don't match the
454130da1778Schristosidentifier rule, or by removing characters (such as '_') from the
454230da1778Schristosidentifier rule so it no longer matches 'data_'.  (Of course, you might
45433c3a7b76Schristosalso not have the option of changing the input language.)
45443c3a7b76Schristos
45453c3a7b76Schristos
45463c3a7b76SchristosFile: flex.info,  Node: My actions are executing out of order or sometimes not at all.,  Next: How can I have multiple input sources feed into the same scanner at the same time?,  Prev: Flex is not matching my patterns in the same order that I defined them.,  Up: FAQ
45473c3a7b76Schristos
45483c3a7b76SchristosMy actions are executing out of order or sometimes not at all.
45493c3a7b76Schristos==============================================================
45503c3a7b76Schristos
455130da1778SchristosMost likely, you have (in error) placed the opening '{' of the action
45523c3a7b76Schristosblock on a different line than the rule, e.g.,
45533c3a7b76Schristos
45543c3a7b76Schristos     ^(foo|bar)
45553c3a7b76Schristos     {  <<<--- WRONG!
45563c3a7b76Schristos
45573c3a7b76Schristos     }
45583c3a7b76Schristos
455930da1778Schristos   'flex' requires that the opening '{' of an action associated with a
456030da1778Schristosrule begin on the same line as does the rule.  You need instead to write
456130da1778Schristosyour rules as follows:
45623c3a7b76Schristos
45633c3a7b76Schristos     ^(foo|bar)   {  // CORRECT!
45643c3a7b76Schristos
45653c3a7b76Schristos     }
45663c3a7b76Schristos
45673c3a7b76Schristos
45683c3a7b76SchristosFile: flex.info,  Node: How can I have multiple input sources feed into the same scanner at the same time?,  Next: Can I build nested parsers that work with the same input file?,  Prev: My actions are executing out of order or sometimes not at all.,  Up: FAQ
45693c3a7b76Schristos
45703c3a7b76SchristosHow can I have multiple input sources feed into the same scanner at the same time?
45713c3a7b76Schristos==================================================================================
45723c3a7b76Schristos
45733c3a7b76SchristosIf ...
457430da1778Schristos   * your scanner is free of backtracking (verified using 'flex''s '-b'
45753c3a7b76Schristos     flag),
457630da1778Schristos   * AND you run your scanner interactively ('-I' option; default unless
457730da1778Schristos     using special table compression options),
457830da1778Schristos   * AND you feed it one character at a time by redefining 'YY_INPUT' to
457930da1778Schristos     do so,
45803c3a7b76Schristos
45813c3a7b76Schristos   then every time it matches a token, it will have exhausted its input
45823c3a7b76Schristosbuffer (because the scanner is free of backtracking).  This means you
458330da1778Schristoscan safely use 'select()' at the point and only call 'yylex()' for
458430da1778Schristosanother token if 'select()' indicates there's data available.
45853c3a7b76Schristos
458630da1778Schristos   That is, move the 'select()' out from the input function to a point
458730da1778Schristoswhere it determines whether 'yylex()' gets called for the next token.
45883c3a7b76Schristos
45893c3a7b76Schristos   With this approach, you will still have problems if your input can
459030da1778Schristosarrive piecemeal; 'select()' could inform you that the beginning of a
459130da1778Schristostoken is available, you call 'yylex()' to get it, but it winds up
45923c3a7b76Schristosblocking waiting for the later characters in the token.
45933c3a7b76Schristos
45943c3a7b76Schristos   Here's another way: Move your input multiplexing inside of
459530da1778Schristos'YY_INPUT'.  That is, whenever 'YY_INPUT' is called, it 'select()''s to
459630da1778Schristossee where input is available.  If input is available for the scanner, it
459730da1778Schristosreads and returns the next byte.  If input is available from another
45983c3a7b76Schristossource, it calls whatever function is responsible for reading from that
45993c3a7b76Schristossource.  (If no input is available, it blocks until some input is
46003c3a7b76Schristosavailable.)  I've used this technique in an interpreter I wrote that
460130da1778Schristosboth reads keyboard input using a 'flex' scanner and IPC traffic from
46023c3a7b76Schristossockets, and it works fine.
46033c3a7b76Schristos
46043c3a7b76Schristos
46053c3a7b76SchristosFile: flex.info,  Node: Can I build nested parsers that work with the same input file?,  Next: How can I match text only at the end of a file?,  Prev: How can I have multiple input sources feed into the same scanner at the same time?,  Up: FAQ
46063c3a7b76Schristos
46073c3a7b76SchristosCan I build nested parsers that work with the same input file?
46083c3a7b76Schristos==============================================================
46093c3a7b76Schristos
46103c3a7b76SchristosThis is not going to work without some additional effort.  The reason is
461130da1778Schristosthat 'flex' block-buffers the input it reads from 'yyin'.  This means
461230da1778Schristosthat the "outermost" 'yylex()', when called, will automatically slurp up
461330da1778Schristosthe first 8K of input available on yyin, and subsequent calls to other
461430da1778Schristos'yylex()''s won't see that input.  You might be tempted to work around
461530da1778Schristosthis problem by redefining 'YY_INPUT' to only return a small amount of
461630da1778Schristostext, but it turns out that that approach is quite difficult.  Instead,
461730da1778Schristosthe best solution is to combine all of your scanners into one large
461830da1778Schristosscanner, using a different exclusive start condition for each.
46193c3a7b76Schristos
46203c3a7b76Schristos
46213c3a7b76SchristosFile: flex.info,  Node: How can I match text only at the end of a file?,  Next: How can I make REJECT cascade across start condition boundaries?,  Prev: Can I build nested parsers that work with the same input file?,  Up: FAQ
46223c3a7b76Schristos
46233c3a7b76SchristosHow can I match text only at the end of a file?
46243c3a7b76Schristos===============================================
46253c3a7b76Schristos
46263c3a7b76SchristosThere is no way to write a rule which is "match this text, but only if
46273c3a7b76Schristosit comes at the end of the file".  You can fake it, though, if you
46283c3a7b76Schristoshappen to have a character lying around that you don't allow in your
462930da1778Schristosinput.  Then you redefine 'YY_INPUT' to call your own routine which, if
463030da1778Schristosit sees an 'EOF', returns the magic character first (and remembers to
463130da1778Schristosreturn a real 'EOF' next time it's called).  Then you could write:
46323c3a7b76Schristos
46333c3a7b76Schristos     <COMMENT>(.|\n)*{EOF_CHAR}    /* saw comment at EOF */
46343c3a7b76Schristos
46353c3a7b76Schristos
46363c3a7b76SchristosFile: flex.info,  Node: How can I make REJECT cascade across start condition boundaries?,  Next: Why cant I use fast or full tables with interactive mode?,  Prev: How can I match text only at the end of a file?,  Up: FAQ
46373c3a7b76Schristos
46383c3a7b76SchristosHow can I make REJECT cascade across start condition boundaries?
46393c3a7b76Schristos================================================================
46403c3a7b76Schristos
464130da1778SchristosYou can do this as follows.  Suppose you have a start condition 'A', and
464230da1778Schristosafter exhausting all of the possible matches in '<A>', you want to try
464330da1778Schristosmatches in '<INITIAL>'.  Then you could use the following:
46443c3a7b76Schristos
46453c3a7b76Schristos     %x A
46463c3a7b76Schristos     %%
46473c3a7b76Schristos     <A>rule_that_is_long    ...; REJECT;
46483c3a7b76Schristos     <A>rule                 ...; REJECT; /* shorter rule */
46493c3a7b76Schristos     <A>etc.
46503c3a7b76Schristos     ...
46513c3a7b76Schristos     <A>.|\n  {
46523c3a7b76Schristos     /* Shortest and last rule in <A>, so
46533c3a7b76Schristos     * cascaded REJECTs will eventually
46543c3a7b76Schristos     * wind up matching this rule.  We want
46553c3a7b76Schristos     * to now switch to the initial state
46563c3a7b76Schristos     * and try matching from there instead.
46573c3a7b76Schristos     */
46583c3a7b76Schristos     yyless(0);    /* put back matched text */
46593c3a7b76Schristos     BEGIN(INITIAL);
46603c3a7b76Schristos     }
46613c3a7b76Schristos
46623c3a7b76Schristos
46633c3a7b76SchristosFile: flex.info,  Node: Why cant I use fast or full tables with interactive mode?,  Next: How much faster is -F or -f than -C?,  Prev: How can I make REJECT cascade across start condition boundaries?,  Up: FAQ
46643c3a7b76Schristos
46653c3a7b76SchristosWhy can't I use fast or full tables with interactive mode?
46663c3a7b76Schristos==========================================================
46673c3a7b76Schristos
46683c3a7b76SchristosOne of the assumptions flex makes is that interactive applications are
46693c3a7b76Schristosinherently slow (they're waiting on a human after all).  It has to do
46703c3a7b76Schristoswith how the scanner detects that it must be finished scanning a token.
46713c3a7b76SchristosFor interactive scanners, after scanning each character the current
46723c3a7b76Schristosstate is looked up in a table (essentially) to see whether there's a
46733c3a7b76Schristoschance of another input character possibly extending the length of the
46743c3a7b76Schristosmatch.  If not, the scanner halts.  For non-interactive scanners, the
46753c3a7b76Schristosend-of-token test is much simpler, basically a compare with 0, so no
46763c3a7b76Schristosmemory bus cycles.  Since the test occurs in the innermost scanning
46773c3a7b76Schristosloop, one would like to make it go as fast as possible.
46783c3a7b76Schristos
467930da1778Schristos   Still, it seems reasonable to allow the user to choose to trade off a
468030da1778Schristosbit of performance in this area to gain the corresponding flexibility.
468130da1778SchristosThere might be another reason, though, why fast scanners don't support
468230da1778Schristosthe interactive option.
46833c3a7b76Schristos
46843c3a7b76Schristos
46853c3a7b76SchristosFile: flex.info,  Node: How much faster is -F or -f than -C?,  Next: If I have a simple grammar cant I just parse it with flex?,  Prev: Why cant I use fast or full tables with interactive mode?,  Up: FAQ
46863c3a7b76Schristos
46873c3a7b76SchristosHow much faster is -F or -f than -C?
46883c3a7b76Schristos====================================
46893c3a7b76Schristos
46903c3a7b76SchristosMuch faster (factor of 2-3).
46913c3a7b76Schristos
46923c3a7b76Schristos
46933c3a7b76SchristosFile: flex.info,  Node: If I have a simple grammar cant I just parse it with flex?,  Next: Why doesn't yyrestart() set the start state back to INITIAL?,  Prev: How much faster is -F or -f than -C?,  Up: FAQ
46943c3a7b76Schristos
46953c3a7b76SchristosIf I have a simple grammar can't I just parse it with flex?
46963c3a7b76Schristos===========================================================
46973c3a7b76Schristos
46983c3a7b76SchristosIs your grammar recursive?  That's almost always a sign that you're
46993c3a7b76Schristosbetter off using a parser/scanner rather than just trying to use a
47003c3a7b76Schristosscanner alone.
47013c3a7b76Schristos
47023c3a7b76Schristos
47033c3a7b76SchristosFile: flex.info,  Node: Why doesn't yyrestart() set the start state back to INITIAL?,  Next: How can I match C-style comments?,  Prev: If I have a simple grammar cant I just parse it with flex?,  Up: FAQ
47043c3a7b76Schristos
47053c3a7b76SchristosWhy doesn't yyrestart() set the start state back to INITIAL?
47063c3a7b76Schristos============================================================
47073c3a7b76Schristos
47083c3a7b76SchristosThere are two reasons.  The first is that there might be programs that
470930da1778Schristosrely on the start state not changing across file changes.  The second is
471030da1778Schristosthat beginning with 'flex' version 2.4, use of 'yyrestart()' is no
47113c3a7b76Schristoslonger required, so fixing the problem there doesn't solve the more
47123c3a7b76Schristosgeneral problem.
47133c3a7b76Schristos
47143c3a7b76Schristos
47153c3a7b76SchristosFile: flex.info,  Node: How can I match C-style comments?,  Next: The period isn't working the way I expected.,  Prev: Why doesn't yyrestart() set the start state back to INITIAL?,  Up: FAQ
47163c3a7b76Schristos
47173c3a7b76SchristosHow can I match C-style comments?
47183c3a7b76Schristos=================================
47193c3a7b76Schristos
47203c3a7b76SchristosYou might be tempted to try something like this:
47213c3a7b76Schristos
47223c3a7b76Schristos     "/*".*"*/"       // WRONG!
47233c3a7b76Schristos
47243c3a7b76Schristos   or, worse, this:
47253c3a7b76Schristos
47263c3a7b76Schristos     "/*"(.|\n)"*/"   // WRONG!
47273c3a7b76Schristos
47283c3a7b76Schristos   The above rules will eat too much input, and blow up on things like:
47293c3a7b76Schristos
47303c3a7b76Schristos     /* a comment */ do_my_thing( "oops */" );
47313c3a7b76Schristos
47323c3a7b76Schristos   Here is one way which allows you to track line information:
47333c3a7b76Schristos
47343c3a7b76Schristos     <INITIAL>{
47353c3a7b76Schristos     "/*"              BEGIN(IN_COMMENT);
47363c3a7b76Schristos     }
47373c3a7b76Schristos     <IN_COMMENT>{
47383c3a7b76Schristos     "*/"      BEGIN(INITIAL);
47393c3a7b76Schristos     [^*\n]+   // eat comment in chunks
47403c3a7b76Schristos     "*"       // eat the lone star
47413c3a7b76Schristos     \n        yylineno++;
47423c3a7b76Schristos     }
47433c3a7b76Schristos
47443c3a7b76Schristos
47453c3a7b76SchristosFile: flex.info,  Node: The period isn't working the way I expected.,  Next: Can I get the flex manual in another format?,  Prev: How can I match C-style comments?,  Up: FAQ
47463c3a7b76Schristos
47473c3a7b76SchristosThe '.' isn't working the way I expected.
47483c3a7b76Schristos=========================================
47493c3a7b76Schristos
475030da1778SchristosHere are some tips for using '.':
47513c3a7b76Schristos
47523c3a7b76Schristos   * A common mistake is to place the grouping parenthesis AFTER an
475330da1778Schristos     operator, when you really meant to place the parenthesis BEFORE the
475430da1778Schristos     operator, e.g., you probably want this '(foo|bar)+' and NOT this
475530da1778Schristos     '(foo|bar+)'.
47563c3a7b76Schristos
475730da1778Schristos     The first pattern matches the words 'foo' or 'bar' any number of
475830da1778Schristos     times, e.g., it matches the text 'barfoofoobarfoo'.  The second
475930da1778Schristos     pattern matches a single instance of 'foo' or a single instance of
476030da1778Schristos     'bar' followed by one or more 'r's, e.g., it matches the text
476130da1778Schristos     'barrrr' .
476230da1778Schristos   * A '.' inside '[]''s just means a literal'.' (period), and NOT "any
47633c3a7b76Schristos     character except newline".
476430da1778Schristos   * Remember that '.' matches any character EXCEPT '\n' (and 'EOF').
476530da1778Schristos     If you really want to match ANY character, including newlines, then
476630da1778Schristos     use '(.|\n)' Beware that the regex '(.|\n)+' will match your entire
476730da1778Schristos     input!
476830da1778Schristos   * Finally, if you want to match a literal '.' (a period), then use
476930da1778Schristos     '[.]' or '"."'
47703c3a7b76Schristos
47713c3a7b76Schristos
47723c3a7b76SchristosFile: flex.info,  Node: Can I get the flex manual in another format?,  Next: Does there exist a "faster" NDFA->DFA algorithm?,  Prev: The period isn't working the way I expected.,  Up: FAQ
47733c3a7b76Schristos
47743c3a7b76SchristosCan I get the flex manual in another format?
47753c3a7b76Schristos============================================
47763c3a7b76Schristos
477730da1778SchristosThe 'flex' source distribution includes a texinfo manual.  You are free
477830da1778Schristosto convert that texinfo into whatever format you desire.  The 'texinfo'
47793c3a7b76Schristospackage includes tools for conversion to a number of formats.
47803c3a7b76Schristos
47813c3a7b76Schristos
47823c3a7b76SchristosFile: flex.info,  Node: Does there exist a "faster" NDFA->DFA algorithm?,  Next: How does flex compile the DFA so quickly?,  Prev: Can I get the flex manual in another format?,  Up: FAQ
47833c3a7b76Schristos
47843c3a7b76SchristosDoes there exist a "faster" NDFA->DFA algorithm?
47853c3a7b76Schristos================================================
47863c3a7b76Schristos
47873c3a7b76SchristosThere's no way around the potential exponential running time - it can
47883c3a7b76Schristostake you exponential time just to enumerate all of the DFA states.  In
47893c3a7b76Schristospractice, though, the running time is closer to linear, or sometimes
47903c3a7b76Schristosquadratic.
47913c3a7b76Schristos
47923c3a7b76Schristos
47933c3a7b76SchristosFile: flex.info,  Node: How does flex compile the DFA so quickly?,  Next: How can I use more than 8192 rules?,  Prev: Does there exist a "faster" NDFA->DFA algorithm?,  Up: FAQ
47943c3a7b76Schristos
47953c3a7b76SchristosHow does flex compile the DFA so quickly?
47963c3a7b76Schristos=========================================
47973c3a7b76Schristos
479830da1778SchristosThere are two big speed wins that 'flex' uses:
47993c3a7b76Schristos
48003c3a7b76Schristos  1. It analyzes the input rules to construct equivalence classes for
48013c3a7b76Schristos     those characters that always make the same transitions.  It then
48023c3a7b76Schristos     rewrites the NFA using equivalence classes for transitions instead
48033c3a7b76Schristos     of characters.  This cuts down the NFA->DFA computation time
48043c3a7b76Schristos     dramatically, to the point where, for uncompressed DFA tables, the
48053c3a7b76Schristos     DFA generation is often I/O bound in writing out the tables.
48063c3a7b76Schristos  2. It maintains hash values for previously computed DFA states, so
48073c3a7b76Schristos     testing whether a newly constructed DFA state is equivalent to a
48083c3a7b76Schristos     previously constructed state can be done very quickly, by first
48093c3a7b76Schristos     comparing hash values.
48103c3a7b76Schristos
48113c3a7b76Schristos
48123c3a7b76SchristosFile: flex.info,  Node: How can I use more than 8192 rules?,  Next: How do I abandon a file in the middle of a scan and switch to a new file?,  Prev: How does flex compile the DFA so quickly?,  Up: FAQ
48133c3a7b76Schristos
48143c3a7b76SchristosHow can I use more than 8192 rules?
48153c3a7b76Schristos===================================
48163c3a7b76Schristos
481730da1778Schristos'Flex' is compiled with an upper limit of 8192 rules per scanner.  If
48183c3a7b76Schristosyou need more than 8192 rules in your scanner, you'll have to recompile
481930da1778Schristos'flex' with the following changes in 'flexdef.h':
48203c3a7b76Schristos
48213c3a7b76Schristos     <    #define YY_TRAILING_MASK 0x2000
48223c3a7b76Schristos     <    #define YY_TRAILING_HEAD_MASK 0x4000
48233c3a7b76Schristos     --
48243c3a7b76Schristos     >    #define YY_TRAILING_MASK 0x20000000
48253c3a7b76Schristos     >    #define YY_TRAILING_HEAD_MASK 0x40000000
48263c3a7b76Schristos
48273c3a7b76Schristos   This should work okay as long as your C compiler uses 32 bit
48283c3a7b76Schristosintegers.  But you might want to think about whether using such a huge
48293c3a7b76Schristosnumber of rules is the best way to solve your problem.
48303c3a7b76Schristos
48313c3a7b76Schristos   The following may also be relevant:
48323c3a7b76Schristos
48333c3a7b76Schristos   With luck, you should be able to increase the definitions in
48343c3a7b76Schristosflexdef.h for:
48353c3a7b76Schristos
48363c3a7b76Schristos     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
48373c3a7b76Schristos     #define MAXIMUM_MNS 31999
48383c3a7b76Schristos     #define BAD_SUBSCRIPT -32767
48393c3a7b76Schristos
48403c3a7b76Schristos   recompile everything, and it'll all work.  Flex only has these
48413c3a7b76Schristos16-bit-like values built into it because a long time ago it was
48423c3a7b76Schristosdeveloped on a machine with 16-bit ints.  I've given this advice to
48433c3a7b76Schristosothers in the past but haven't heard back from them whether it worked
48443c3a7b76Schristosokay or not...
48453c3a7b76Schristos
48463c3a7b76Schristos
48473c3a7b76SchristosFile: flex.info,  Node: How do I abandon a file in the middle of a scan and switch to a new file?,  Next: How do I execute code only during initialization (only before the first scan)?,  Prev: How can I use more than 8192 rules?,  Up: FAQ
48483c3a7b76Schristos
48493c3a7b76SchristosHow do I abandon a file in the middle of a scan and switch to a new file?
48503c3a7b76Schristos=========================================================================
48513c3a7b76Schristos
485230da1778SchristosJust call 'yyrestart(newfile)'.  Be sure to reset the start state if you
485330da1778Schristoswant a "fresh start, since 'yyrestart' does NOT reset the start state
485430da1778Schristosback to 'INITIAL'.
48553c3a7b76Schristos
48563c3a7b76Schristos
48573c3a7b76SchristosFile: flex.info,  Node: How do I execute code only during initialization (only before the first scan)?,  Next: How do I execute code at termination?,  Prev: How do I abandon a file in the middle of a scan and switch to a new file?,  Up: FAQ
48583c3a7b76Schristos
48593c3a7b76SchristosHow do I execute code only during initialization (only before the first scan)?
48603c3a7b76Schristos==============================================================================
48613c3a7b76Schristos
486230da1778SchristosYou can specify an initial action by defining the macro 'YY_USER_INIT'
486330da1778Schristos(though note that 'yyout' may not be available at the time this macro is
486430da1778Schristosexecuted).  Or you can add to the beginning of your rules section:
48653c3a7b76Schristos
48663c3a7b76Schristos     %%
48673c3a7b76Schristos         /* Must be indented! */
48683c3a7b76Schristos         static int did_init = 0;
48693c3a7b76Schristos
48703c3a7b76Schristos         if ( ! did_init ){
48713c3a7b76Schristos     do_my_init();
48723c3a7b76Schristos             did_init = 1;
48733c3a7b76Schristos         }
48743c3a7b76Schristos
48753c3a7b76Schristos
48763c3a7b76SchristosFile: flex.info,  Node: How do I execute code at termination?,  Next: Where else can I find help?,  Prev: How do I execute code only during initialization (only before the first scan)?,  Up: FAQ
48773c3a7b76Schristos
48783c3a7b76SchristosHow do I execute code at termination?
48793c3a7b76Schristos=====================================
48803c3a7b76Schristos
488130da1778SchristosYou can specify an action for the '<<EOF>>' rule.
48823c3a7b76Schristos
48833c3a7b76Schristos
48843c3a7b76SchristosFile: flex.info,  Node: Where else can I find help?,  Next: Can I include comments in the "rules" section of the file?,  Prev: How do I execute code at termination?,  Up: FAQ
48853c3a7b76Schristos
48863c3a7b76SchristosWhere else can I find help?
48873c3a7b76Schristos===========================
48883c3a7b76Schristos
48893c3a7b76SchristosYou can find the flex homepage on the web at
489030da1778Schristos<http://flex.sourceforge.net/>.  See that page for details about flex
48913c3a7b76Schristosmailing lists as well.
48923c3a7b76Schristos
48933c3a7b76Schristos
48943c3a7b76SchristosFile: flex.info,  Node: Can I include comments in the "rules" section of the file?,  Next: I get an error about undefined yywrap().,  Prev: Where else can I find help?,  Up: FAQ
48953c3a7b76Schristos
48963c3a7b76SchristosCan I include comments in the "rules" section of the file?
48973c3a7b76Schristos==========================================================
48983c3a7b76Schristos
48993c3a7b76SchristosYes, just about anywhere you want to.  See the manual for the specific
49003c3a7b76Schristossyntax.
49013c3a7b76Schristos
49023c3a7b76Schristos
49033c3a7b76SchristosFile: flex.info,  Node: I get an error about undefined yywrap().,  Next: How can I change the matching pattern at run time?,  Prev: Can I include comments in the "rules" section of the file?,  Up: FAQ
49043c3a7b76Schristos
49053c3a7b76SchristosI get an error about undefined yywrap().
49063c3a7b76Schristos========================================
49073c3a7b76Schristos
490830da1778SchristosYou must supply a 'yywrap()' function of your own, or link to 'libfl.a'
49093c3a7b76Schristos(which provides one), or use
49103c3a7b76Schristos
49113c3a7b76Schristos     %option noyywrap
49123c3a7b76Schristos
491330da1778Schristos   in your source to say you don't want a 'yywrap()' function.
49143c3a7b76Schristos
49153c3a7b76Schristos
49163c3a7b76SchristosFile: flex.info,  Node: How can I change the matching pattern at run time?,  Next: How can I expand macros in the input?,  Prev: I get an error about undefined yywrap().,  Up: FAQ
49173c3a7b76Schristos
49183c3a7b76SchristosHow can I change the matching pattern at run time?
49193c3a7b76Schristos==================================================
49203c3a7b76Schristos
49213c3a7b76SchristosYou can't, it's compiled into a static table when flex builds the
49223c3a7b76Schristosscanner.
49233c3a7b76Schristos
49243c3a7b76Schristos
49253c3a7b76SchristosFile: flex.info,  Node: How can I expand macros in the input?,  Next: How can I build a two-pass scanner?,  Prev: How can I change the matching pattern at run time?,  Up: FAQ
49263c3a7b76Schristos
49273c3a7b76SchristosHow can I expand macros in the input?
49283c3a7b76Schristos=====================================
49293c3a7b76Schristos
493030da1778SchristosThe best way to approach this problem is at a higher level, e.g., in the
493130da1778Schristosparser.
49323c3a7b76Schristos
49333c3a7b76Schristos   However, you can do this using multiple input buffers.
49343c3a7b76Schristos
49353c3a7b76Schristos     %%
49363c3a7b76Schristos     macro/[a-z]+	{
49373c3a7b76Schristos     /* Saw the macro "macro" followed by extra stuff. */
49383c3a7b76Schristos     main_buffer = YY_CURRENT_BUFFER;
49393c3a7b76Schristos     expansion_buffer = yy_scan_string(expand(yytext));
49403c3a7b76Schristos     yy_switch_to_buffer(expansion_buffer);
49413c3a7b76Schristos     }
49423c3a7b76Schristos
49433c3a7b76Schristos     <<EOF>>	{
49443c3a7b76Schristos     if ( expansion_buffer )
49453c3a7b76Schristos     {
49463c3a7b76Schristos     // We were doing an expansion, return to where
49473c3a7b76Schristos     // we were.
49483c3a7b76Schristos     yy_switch_to_buffer(main_buffer);
49493c3a7b76Schristos     yy_delete_buffer(expansion_buffer);
49503c3a7b76Schristos     expansion_buffer = 0;
49513c3a7b76Schristos     }
49523c3a7b76Schristos     else
49533c3a7b76Schristos     yyterminate();
49543c3a7b76Schristos     }
49553c3a7b76Schristos
49563c3a7b76Schristos   You probably will want a stack of expansion buffers to allow nested
49573c3a7b76Schristosmacros.  From the above though hopefully the idea is clear.
49583c3a7b76Schristos
49593c3a7b76Schristos
49603c3a7b76SchristosFile: flex.info,  Node: How can I build a two-pass scanner?,  Next: How do I match any string not matched in the preceding rules?,  Prev: How can I expand macros in the input?,  Up: FAQ
49613c3a7b76Schristos
49623c3a7b76SchristosHow can I build a two-pass scanner?
49633c3a7b76Schristos===================================
49643c3a7b76Schristos
49653c3a7b76SchristosOne way to do it is to filter the first pass to a temporary file, then
49663c3a7b76Schristosprocess the temporary file on the second pass.  You will probably see a
49673c3a7b76Schristosperformance hit, due to all the disk I/O.
49683c3a7b76Schristos
49693c3a7b76Schristos   When you need to look ahead far forward like this, it almost always
49703c3a7b76Schristosmeans that the right solution is to build a parse tree of the entire
497130da1778Schristosinput, then walk it after the parse in order to generate the output.  In
497230da1778Schristosa sense, this is a two-pass approach, once through the text and once
49733c3a7b76Schristosthrough the parse tree, but the performance hit for the latter is
49743c3a7b76Schristosusually an order of magnitude smaller, since everything is already
49753c3a7b76Schristosclassified, in binary format, and residing in memory.
49763c3a7b76Schristos
49773c3a7b76Schristos
49783c3a7b76SchristosFile: flex.info,  Node: How do I match any string not matched in the preceding rules?,  Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Prev: How can I build a two-pass scanner?,  Up: FAQ
49793c3a7b76Schristos
49803c3a7b76SchristosHow do I match any string not matched in the preceding rules?
49813c3a7b76Schristos=============================================================
49823c3a7b76Schristos
498330da1778SchristosOne way to assign precedence, is to place the more specific rules first.
498430da1778SchristosIf two rules would match the same input (same sequence of characters)
498530da1778Schristosthen the first rule listed in the 'flex' input wins, e.g.,
49863c3a7b76Schristos
49873c3a7b76Schristos     %%
49883c3a7b76Schristos     foo[a-zA-Z_]+    return FOO_ID;
49893c3a7b76Schristos     bar[a-zA-Z_]+    return BAR_ID;
49903c3a7b76Schristos     [a-zA-Z_]+       return GENERIC_ID;
49913c3a7b76Schristos
499230da1778Schristos   Note that the rule '[a-zA-Z_]+' must come *after* the others.  It
49933c3a7b76Schristoswill match the same amount of text as the more specific rules, and in
499430da1778Schristosthat case the 'flex' scanner will pick the first rule listed in your
49953c3a7b76Schristosscanner as the one to match.
49963c3a7b76Schristos
49973c3a7b76Schristos
49983c3a7b76SchristosFile: flex.info,  Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Next: Is there a way to make flex treat NULL like a regular character?,  Prev: How do I match any string not matched in the preceding rules?,  Up: FAQ
49993c3a7b76Schristos
50003c3a7b76SchristosI am trying to port code from AT&T lex that uses yysptr and yysbuf.
50013c3a7b76Schristos===================================================================
50023c3a7b76Schristos
50033c3a7b76SchristosThose are internal variables pointing into the AT&T scanner's input
50043c3a7b76Schristosbuffer.  I imagine they're being manipulated in user versions of the
500530da1778Schristos'input()' and 'unput()' functions.  If so, what you need to do is
50063c3a7b76Schristosanalyze those functions to figure out what they're doing, and then
500730da1778Schristosreplace 'input()' with an appropriate definition of 'YY_INPUT'.  You
500830da1778Schristosshouldn't need to (and must not) replace 'flex''s 'unput()' function.
50093c3a7b76Schristos
50103c3a7b76Schristos
50113c3a7b76SchristosFile: flex.info,  Node: Is there a way to make flex treat NULL like a regular character?,  Next: Whenever flex can not match the input it says "flex scanner jammed".,  Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Up: FAQ
50123c3a7b76Schristos
50133c3a7b76SchristosIs there a way to make flex treat NULL like a regular character?
50143c3a7b76Schristos================================================================
50153c3a7b76Schristos
501630da1778SchristosYes, '\0' and '\x00' should both do the trick.  Perhaps you have an
5017*463ae347Schristosancient version of 'flex'.  The latest release is version 2.6.4.
50183c3a7b76Schristos
50193c3a7b76Schristos
50203c3a7b76SchristosFile: flex.info,  Node: Whenever flex can not match the input it says "flex scanner jammed".,  Next: Why doesn't flex have non-greedy operators like perl does?,  Prev: Is there a way to make flex treat NULL like a regular character?,  Up: FAQ
50213c3a7b76Schristos
50223c3a7b76SchristosWhenever flex can not match the input it says "flex scanner jammed".
50233c3a7b76Schristos====================================================================
50243c3a7b76Schristos
50253c3a7b76SchristosYou need to add a rule that matches the otherwise-unmatched text, e.g.,
50263c3a7b76Schristos
50273c3a7b76Schristos     %option yylineno
50283c3a7b76Schristos     %%
50293c3a7b76Schristos     [[a bunch of rules here]]
50303c3a7b76Schristos
50313c3a7b76Schristos     .	printf("bad input character '%s' at line %d\n", yytext, yylineno);
50323c3a7b76Schristos
503330da1778Schristos   See '%option default' for more information.
50343c3a7b76Schristos
50353c3a7b76Schristos
50363c3a7b76SchristosFile: flex.info,  Node: Why doesn't flex have non-greedy operators like perl does?,  Next: Memory leak - 16386 bytes allocated by malloc.,  Prev: Whenever flex can not match the input it says "flex scanner jammed".,  Up: FAQ
50373c3a7b76Schristos
50383c3a7b76SchristosWhy doesn't flex have non-greedy operators like perl does?
50393c3a7b76Schristos==========================================================
50403c3a7b76Schristos
50413c3a7b76SchristosA DFA can do a non-greedy match by stopping the first time it enters an
50423c3a7b76Schristosaccepting state, instead of consuming input until it determines that no
50433c3a7b76Schristosfurther matching is possible (a "jam" state).  This is actually easier
50443c3a7b76Schristosto implement than longest leftmost match (which flex does).
50453c3a7b76Schristos
50463c3a7b76Schristos   But it's also much less useful than longest leftmost match.  In
50473c3a7b76Schristosgeneral, when you find yourself wishing for non-greedy matching, that's
50483c3a7b76Schristosusually a sign that you're trying to make the scanner do some parsing.
50493c3a7b76SchristosThat's generally the wrong approach, since it lacks the power to do a
50503c3a7b76Schristosdecent job.  Better is to either introduce a separate parser, or to
50513c3a7b76Schristossplit the scanner into multiple scanners using (exclusive) start
50523c3a7b76Schristosconditions.
50533c3a7b76Schristos
505430da1778Schristos   You might have a separate start state once you've seen the 'BEGIN'.
505530da1778SchristosIn that state, you might then have a regex that will match 'END' (to
505630da1778Schristoskick you out of the state), and perhaps '(.|\n)' to get a single
50573c3a7b76Schristoscharacter within the chunk ...
50583c3a7b76Schristos
50593c3a7b76Schristos   This approach also has much better error-reporting properties.
50603c3a7b76Schristos
50613c3a7b76Schristos
50623c3a7b76SchristosFile: flex.info,  Node: Memory leak - 16386 bytes allocated by malloc.,  Next: How do I track the byte offset for lseek()?,  Prev: Why doesn't flex have non-greedy operators like perl does?,  Up: FAQ
50633c3a7b76Schristos
50643c3a7b76SchristosMemory leak - 16386 bytes allocated by malloc.
50653c3a7b76Schristos==============================================
50663c3a7b76Schristos
506730da1778SchristosUPDATED 2002-07-10: As of 'flex' version 2.5.9, this leak means that you
506830da1778Schristosdid not call 'yylex_destroy()'.  If you are using an earlier version of
506930da1778Schristos'flex', then read on.
50703c3a7b76Schristos
50713c3a7b76Schristos   The leak is about 16426 bytes.  That is, (8192 * 2 + 2) for the
507230da1778Schristosread-buffer, and about 40 for 'struct yy_buffer_state' (depending upon
50733c3a7b76Schristosalignment).  The leak is in the non-reentrant C scanner only (NOT in the
507430da1778Schristosreentrant scanner, NOT in the C++ scanner).  Since 'flex' doesn't know
50753c3a7b76Schristoswhen you are done, the buffer is never freed.
50763c3a7b76Schristos
507730da1778Schristos   However, the leak won't multiply since the buffer is reused no matter
507830da1778Schristoshow many times you call 'yylex()'.
50793c3a7b76Schristos
50803c3a7b76Schristos   If you want to reclaim the memory when you are completely done
50813c3a7b76Schristosscanning, then you might try this:
50823c3a7b76Schristos
50833c3a7b76Schristos     /* For non-reentrant C scanner only. */
50843c3a7b76Schristos     yy_delete_buffer(YY_CURRENT_BUFFER);
50853c3a7b76Schristos     yy_init = 1;
50863c3a7b76Schristos
508730da1778Schristos   Note: 'yy_init' is an "internal variable", and hasn't been tested in
50883c3a7b76Schristosthis situation.  It is possible that some other globals may need
50893c3a7b76Schristosresetting as well.
50903c3a7b76Schristos
50913c3a7b76Schristos
50923c3a7b76SchristosFile: flex.info,  Node: How do I track the byte offset for lseek()?,  Next: How do I use my own I/O classes in a C++ scanner?,  Prev: Memory leak - 16386 bytes allocated by malloc.,  Up: FAQ
50933c3a7b76Schristos
50943c3a7b76SchristosHow do I track the byte offset for lseek()?
50953c3a7b76Schristos===========================================
50963c3a7b76Schristos
50973c3a7b76Schristos     >   We thought that it would be possible to have this number through the
50983c3a7b76Schristos     >   evaluation of the following expression:
50993c3a7b76Schristos     >
51003c3a7b76Schristos     >   seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
51013c3a7b76Schristos
51023c3a7b76Schristos   While this is the right idea, it has two problems.  The first is that
510330da1778Schristosit's possible that 'flex' will request less than 'YY_READ_BUF_SIZE'
510430da1778Schristosduring an invocation of 'YY_INPUT' (or that your input source will
510530da1778Schristosreturn less even though 'YY_READ_BUF_SIZE' bytes were requested).  The
510630da1778Schristossecond problem is that when refilling its internal buffer, 'flex' keeps
51073c3a7b76Schristossome characters from the previous buffer (because usually it's in the
510830da1778Schristosmiddle of a match, and needs those characters to construct 'yytext' for
510930da1778Schristosthe match once it's done).  Because of this, 'yy_c_buf_p -
51103c3a7b76SchristosYY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
51113c3a7b76Schristosalready read from the current buffer.
51123c3a7b76Schristos
51133c3a7b76Schristos   An alternative solution is to count the number of characters you've
51143c3a7b76Schristosmatched since starting to scan.  This can be done by using
511530da1778Schristos'YY_USER_ACTION'.  For example,
51163c3a7b76Schristos
51173c3a7b76Schristos     #define YY_USER_ACTION num_chars += yyleng;
51183c3a7b76Schristos
51193c3a7b76Schristos   (You need to be careful to update your bookkeeping if you use
512030da1778Schristos'yymore('), 'yyless()', 'unput()', or 'input()'.)
51213c3a7b76Schristos
51223c3a7b76Schristos
51233c3a7b76SchristosFile: flex.info,  Node: How do I use my own I/O classes in a C++ scanner?,  Next: How do I skip as many chars as possible?,  Prev: How do I track the byte offset for lseek()?,  Up: FAQ
51243c3a7b76Schristos
51253c3a7b76SchristosHow do I use my own I/O classes in a C++ scanner?
51263c3a7b76Schristos=================================================
51273c3a7b76Schristos
512830da1778SchristosWhen the flex C++ scanning class rewrite finally happens, then this sort
512930da1778Schristosof thing should become much easier.
51303c3a7b76Schristos
51313c3a7b76Schristos   You can do this by passing the various functions (such as
513230da1778Schristos'LexerInput()' and 'LexerOutput()') NULL 'iostream*''s, and then dealing
513330da1778Schristoswith your own I/O classes surreptitiously (i.e., stashing them in
513430da1778Schristosspecial member variables).  This works because the only assumption about
513530da1778Schristosthe lexer regarding what's done with the iostream's is that they're
513630da1778Schristosultimately passed to 'LexerInput()' and 'LexerOutput', which then do
513730da1778Schristoswhatever is necessary with them.
51383c3a7b76Schristos
51393c3a7b76Schristos
51403c3a7b76SchristosFile: flex.info,  Node: How do I skip as many chars as possible?,  Next: deleteme00,  Prev: How do I use my own I/O classes in a C++ scanner?,  Up: FAQ
51413c3a7b76Schristos
51423c3a7b76SchristosHow do I skip as many chars as possible?
51433c3a7b76Schristos========================================
51443c3a7b76Schristos
51453c3a7b76SchristosHow do I skip as many chars as possible - without interfering with the
51463c3a7b76Schristosother patterns?
51473c3a7b76Schristos
51483c3a7b76Schristos   In the example below, we want to skip over characters until we see
51493c3a7b76Schristosthe phrase "endskip".  The following will _NOT_ work correctly (do you
51503c3a7b76Schristossee why not?)
51513c3a7b76Schristos
51523c3a7b76Schristos     /* INCORRECT SCANNER */
51533c3a7b76Schristos     %x SKIP
51543c3a7b76Schristos     %%
51553c3a7b76Schristos     <INITIAL>startskip   BEGIN(SKIP);
51563c3a7b76Schristos     ...
51573c3a7b76Schristos     <SKIP>"endskip"       BEGIN(INITIAL);
51583c3a7b76Schristos     <SKIP>.*             ;
51593c3a7b76Schristos
51603c3a7b76Schristos   The problem is that the pattern .* will eat up the word "endskip."
51613c3a7b76SchristosThe simplest (but slow) fix is:
51623c3a7b76Schristos
51633c3a7b76Schristos     <SKIP>"endskip"      BEGIN(INITIAL);
51643c3a7b76Schristos     <SKIP>.              ;
51653c3a7b76Schristos
516630da1778Schristos   The fix involves making the second rule match more, without making it
516730da1778Schristosmatch "endskip" plus something else.  So for example:
51683c3a7b76Schristos
51693c3a7b76Schristos     <SKIP>"endskip"     BEGIN(INITIAL);
51703c3a7b76Schristos     <SKIP>[^e]+         ;
51713c3a7b76Schristos     <SKIP>.		        ;/* so you eat up e's, too */
51723c3a7b76Schristos
51733c3a7b76Schristos
51743c3a7b76SchristosFile: flex.info,  Node: deleteme00,  Next: Are certain equivalent patterns faster than others?,  Prev: How do I skip as many chars as possible?,  Up: FAQ
51753c3a7b76Schristos
51763c3a7b76Schristosdeleteme00
51773c3a7b76Schristos==========
51783c3a7b76Schristos
51793c3a7b76Schristos     QUESTION:
51803c3a7b76Schristos     When was flex born?
51813c3a7b76Schristos
51823c3a7b76Schristos     Vern Paxson took over
51833c3a7b76Schristos     the Software Tools lex project from Jef Poskanzer in 1982.  At that point it
51843c3a7b76Schristos     was written in Ratfor.  Around 1987 or so, Paxson translated it into C, and
51853c3a7b76Schristos     a legend was born :-).
51863c3a7b76Schristos
51873c3a7b76Schristos
51883c3a7b76SchristosFile: flex.info,  Node: Are certain equivalent patterns faster than others?,  Next: Is backing up a big deal?,  Prev: deleteme00,  Up: FAQ
51893c3a7b76Schristos
51903c3a7b76SchristosAre certain equivalent patterns faster than others?
51913c3a7b76Schristos===================================================
51923c3a7b76Schristos
51933c3a7b76Schristos     To: Adoram Rogel <adoram@orna.hybridge.com>
51943c3a7b76Schristos     Subject: Re: Flex 2.5.2 performance questions
51953c3a7b76Schristos     In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
51963c3a7b76Schristos     Date: Wed, 18 Sep 96 10:51:02 PDT
51973c3a7b76Schristos     From: Vern Paxson <vern>
51983c3a7b76Schristos
51993c3a7b76Schristos     [Note, the most recent flex release is 2.5.4, which you can get from
52003c3a7b76Schristos     ftp.ee.lbl.gov.  It has bug fixes over 2.5.2 and 2.5.3.]
52013c3a7b76Schristos
52023c3a7b76Schristos     > 1. Using the pattern
52033c3a7b76Schristos     >    ([Ff](oot)?)?[Nn](ote)?(\.)?
52043c3a7b76Schristos     >    instead of
52053c3a7b76Schristos     >    (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
52063c3a7b76Schristos     >    (in a very complicated flex program) caused the program to slow from
52073c3a7b76Schristos     >    300K+/min to 100K/min (no other changes were done).
52083c3a7b76Schristos
52093c3a7b76Schristos     These two are not equivalent.  For example, the first can match "footnote."
52103c3a7b76Schristos     but the second can only match "footnote".  This is almost certainly the
52113c3a7b76Schristos     cause in the discrepancy - the slower scanner run is matching more tokens,
52123c3a7b76Schristos     and/or having to do more backing up.
52133c3a7b76Schristos
52143c3a7b76Schristos     > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
52153c3a7b76Schristos
52163c3a7b76Schristos     From a performance point of view, they're equivalent (modulo presumably
52173c3a7b76Schristos     minor effects such as memory cache hit rates; and the presence of trailing
52183c3a7b76Schristos     context, see below).  From a space point of view, the first is slightly
52193c3a7b76Schristos     preferable.
52203c3a7b76Schristos
52213c3a7b76Schristos     > 3. I have a pattern that look like this:
52223c3a7b76Schristos     >    pats {p1}|{p2}|{p3}|...|{p50}     (50 patterns ORd)
52233c3a7b76Schristos     >
52243c3a7b76Schristos     >    running yet another complicated program that includes the following rule:
52253c3a7b76Schristos     >    <snext>{and}/{no4}{bb}{pats}
52263c3a7b76Schristos     >
52273c3a7b76Schristos     >    gets me to "too complicated - over 32,000 states"...
52283c3a7b76Schristos
52293c3a7b76Schristos     I can't tell from this example whether the trailing context is variable-length
52303c3a7b76Schristos     or fixed-length (it could be the latter if {and} is fixed-length).  If it's
52313c3a7b76Schristos     variable length, which flex -p will tell you, then this reflects a basic
52323c3a7b76Schristos     performance problem, and if you can eliminate it by restructuring your
52333c3a7b76Schristos     scanner, you will see significant improvement.
52343c3a7b76Schristos
52353c3a7b76Schristos     >    so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
52363c3a7b76Schristos     >    10 patterns and changed the rule to be 5 rules.
52373c3a7b76Schristos     >    This did compile, but what is the rule of thumb here ?
52383c3a7b76Schristos
52393c3a7b76Schristos     The rule is to avoid trailing context other than fixed-length, in which for
52403c3a7b76Schristos     a/b, either the 'a' pattern or the 'b' pattern have a fixed length.  Use
52413c3a7b76Schristos     of the '|' operator automatically makes the pattern variable length, so in
52423c3a7b76Schristos     this case '[Ff]oot' is preferred to '(F|f)oot'.
52433c3a7b76Schristos
52443c3a7b76Schristos     > 4. I changed a rule that looked like this:
52453c3a7b76Schristos     >    <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
52463c3a7b76Schristos     >
52473c3a7b76Schristos     >    to the next 2 rules:
52483c3a7b76Schristos     >    <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
52493c3a7b76Schristos     >    <snext8>{and}{bb}/{ROMAN}         { BEGIN...
52503c3a7b76Schristos     >
52513c3a7b76Schristos     >    Again, I understand the using [^...] will cause a great performance loss
52523c3a7b76Schristos
52533c3a7b76Schristos     Actually, it doesn't cause any sort of performance loss.  It's a surprising
52543c3a7b76Schristos     fact about regular expressions that they always match in linear time
52553c3a7b76Schristos     regardless of how complex they are.
52563c3a7b76Schristos
52573c3a7b76Schristos     >    but are there any specific rules about it ?
52583c3a7b76Schristos
52593c3a7b76Schristos     See the "Performance Considerations" section of the man page, and also
52603c3a7b76Schristos     the example in MISC/fastwc/.
52613c3a7b76Schristos
52623c3a7b76Schristos     		Vern
52633c3a7b76Schristos
52643c3a7b76Schristos
52653c3a7b76SchristosFile: flex.info,  Node: Is backing up a big deal?,  Next: Can I fake multi-byte character support?,  Prev: Are certain equivalent patterns faster than others?,  Up: FAQ
52663c3a7b76Schristos
52673c3a7b76SchristosIs backing up a big deal?
52683c3a7b76Schristos=========================
52693c3a7b76Schristos
52703c3a7b76Schristos     To: Adoram Rogel <adoram@hybridge.com>
52713c3a7b76Schristos     Subject: Re: Flex 2.5.2 performance questions
52723c3a7b76Schristos     In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
52733c3a7b76Schristos     Date: Thu, 19 Sep 96 09:58:00 PDT
52743c3a7b76Schristos     From: Vern Paxson <vern>
52753c3a7b76Schristos
52763c3a7b76Schristos     > a lot about the backing up problem.
52773c3a7b76Schristos     > I believe that there lies my biggest problem, and I'll try to improve
52783c3a7b76Schristos     > it.
52793c3a7b76Schristos
52803c3a7b76Schristos     Since you have variable trailing context, this is a bigger performance
52813c3a7b76Schristos     problem.  Fixing it is usually easier than fixing backing up, which in a
52823c3a7b76Schristos     complicated scanner (yours seems to fit the bill) can be extremely
52833c3a7b76Schristos     difficult to do correctly.
52843c3a7b76Schristos
52853c3a7b76Schristos     You also don't mention what flags you are using for your scanner.
52863c3a7b76Schristos     -f makes a large speed difference, and -Cfe buys you nearly as much
52873c3a7b76Schristos     speed but the resulting scanner is considerably smaller.
52883c3a7b76Schristos
52893c3a7b76Schristos     > I have an | operator in {and} and in {pats} so both of them are variable
52903c3a7b76Schristos     > length.
52913c3a7b76Schristos
52923c3a7b76Schristos     -p should have reported this.
52933c3a7b76Schristos
52943c3a7b76Schristos     > Is changing one of them to fixed-length is enough ?
52953c3a7b76Schristos
52963c3a7b76Schristos     Yes.
52973c3a7b76Schristos
52983c3a7b76Schristos     > Is it possible to change the 32,000 states limit ?
52993c3a7b76Schristos
53003c3a7b76Schristos     Yes.  I've appended instructions on how.  Before you make this change,
53013c3a7b76Schristos     though, you should think about whether there are ways to fundamentally
53023c3a7b76Schristos     simplify your scanner - those are certainly preferable!
53033c3a7b76Schristos
53043c3a7b76Schristos     		Vern
53053c3a7b76Schristos
53063c3a7b76Schristos     To increase the 32K limit (on a machine with 32 bit integers), you increase
53073c3a7b76Schristos     the magnitude of the following in flexdef.h:
53083c3a7b76Schristos
53093c3a7b76Schristos     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
53103c3a7b76Schristos     #define MAXIMUM_MNS 31999
53113c3a7b76Schristos     #define BAD_SUBSCRIPT -32767
53123c3a7b76Schristos     #define MAX_SHORT 32700
53133c3a7b76Schristos
53143c3a7b76Schristos     Adding a 0 or two after each should do the trick.
53153c3a7b76Schristos
53163c3a7b76Schristos
53173c3a7b76SchristosFile: flex.info,  Node: Can I fake multi-byte character support?,  Next: deleteme01,  Prev: Is backing up a big deal?,  Up: FAQ
53183c3a7b76Schristos
53193c3a7b76SchristosCan I fake multi-byte character support?
53203c3a7b76Schristos========================================
53213c3a7b76Schristos
53223c3a7b76Schristos     To: Heeman_Lee@hp.com
53233c3a7b76Schristos     Subject: Re: flex - multi-byte support?
53243c3a7b76Schristos     In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
53253c3a7b76Schristos     Date: Fri, 04 Oct 1996 11:42:18 PDT
53263c3a7b76Schristos     From: Vern Paxson <vern>
53273c3a7b76Schristos
53283c3a7b76Schristos     >      I assume as long as my *.l file defines the
53293c3a7b76Schristos     >      range of expected character code values (in octal format), flex will
53303c3a7b76Schristos     >      scan the file and read multi-byte characters correctly. But I have no
53313c3a7b76Schristos     >      confidence in this assumption.
53323c3a7b76Schristos
53333c3a7b76Schristos     Your lack of confidence is justified - this won't work.
53343c3a7b76Schristos
53353c3a7b76Schristos     Flex has in it a widespread assumption that the input is processed
53363c3a7b76Schristos     one byte at a time.  Fixing this is on the to-do list, but is involved,
53373c3a7b76Schristos     so it won't happen any time soon.  In the interim, the best I can suggest
53383c3a7b76Schristos     (unless you want to try fixing it yourself) is to write your rules in
53393c3a7b76Schristos     terms of pairs of bytes, using definitions in the first section:
53403c3a7b76Schristos
53413c3a7b76Schristos     	X	\xfe\xc2
53423c3a7b76Schristos     	...
53433c3a7b76Schristos     	%%
53443c3a7b76Schristos     	foo{X}bar	found_foo_fe_c2_bar();
53453c3a7b76Schristos
53463c3a7b76Schristos     etc.  Definitely a pain - sorry about that.
53473c3a7b76Schristos
53483c3a7b76Schristos     By the way, the email address you used for me is ancient, indicating you
53493c3a7b76Schristos     have a very old version of flex.  You can get the most recent, 2.5.4, from
53503c3a7b76Schristos     ftp.ee.lbl.gov.
53513c3a7b76Schristos
53523c3a7b76Schristos     		Vern
53533c3a7b76Schristos
53543c3a7b76Schristos
53553c3a7b76SchristosFile: flex.info,  Node: deleteme01,  Next: Can you discuss some flex internals?,  Prev: Can I fake multi-byte character support?,  Up: FAQ
53563c3a7b76Schristos
53573c3a7b76Schristosdeleteme01
53583c3a7b76Schristos==========
53593c3a7b76Schristos
53603c3a7b76Schristos     To: moleary@primus.com
53613c3a7b76Schristos     Subject: Re: Flex / Unicode compatibility question
53623c3a7b76Schristos     In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
53633c3a7b76Schristos     Date: Tue, 22 Oct 1996 11:06:13 PDT
53643c3a7b76Schristos     From: Vern Paxson <vern>
53653c3a7b76Schristos
53663c3a7b76Schristos     Unfortunately flex at the moment has a widespread assumption within it
53673c3a7b76Schristos     that characters are processed 8 bits at a time.  I don't see any easy
53683c3a7b76Schristos     fix for this (other than writing your rules in terms of double characters -
53693c3a7b76Schristos     a pain).  I also don't know of a wider lex, though you might try surfing
53703c3a7b76Schristos     the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
53713c3a7b76Schristos     toolkit (try searching say Alta Vista for "Purdue Compiler Construction
53723c3a7b76Schristos     Toolkit").
53733c3a7b76Schristos
53743c3a7b76Schristos     Fixing flex to handle wider characters is on the long-term to-do list.
53753c3a7b76Schristos     But since flex is a strictly spare-time project these days, this probably
53763c3a7b76Schristos     won't happen for quite a while, unless someone else does it first.
53773c3a7b76Schristos
53783c3a7b76Schristos     		Vern
53793c3a7b76Schristos
53803c3a7b76Schristos
53813c3a7b76SchristosFile: flex.info,  Node: Can you discuss some flex internals?,  Next: unput() messes up yy_at_bol,  Prev: deleteme01,  Up: FAQ
53823c3a7b76Schristos
53833c3a7b76SchristosCan you discuss some flex internals?
53843c3a7b76Schristos====================================
53853c3a7b76Schristos
53863c3a7b76Schristos     To: Johan Linde <jl@theophys.kth.se>
53873c3a7b76Schristos     Subject: Re: translation of flex
53883c3a7b76Schristos     In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
53893c3a7b76Schristos     Date: Mon, 11 Nov 1996 10:33:50 PST
53903c3a7b76Schristos     From: Vern Paxson <vern>
53913c3a7b76Schristos
53923c3a7b76Schristos     > I'm working for the Swedish team translating GNU program, and I'm currently
53933c3a7b76Schristos     > working with flex. I have a few questions about some of the messages which
53943c3a7b76Schristos     > I hope you can answer.
53953c3a7b76Schristos
53963c3a7b76Schristos     All of the things you're wondering about, by the way, concerning flex
53973c3a7b76Schristos     internals - probably the only person who understands what they mean in
53983c3a7b76Schristos     English is me!  So I wouldn't worry too much about getting them right.
53993c3a7b76Schristos     That said ...
54003c3a7b76Schristos
54013c3a7b76Schristos     > #: main.c:545
54023c3a7b76Schristos     > msgid "  %d protos created\n"
54033c3a7b76Schristos     >
54043c3a7b76Schristos     > Does proto mean prototype?
54053c3a7b76Schristos
54063c3a7b76Schristos     Yes - prototypes of state compression tables.
54073c3a7b76Schristos
54083c3a7b76Schristos     > #: main.c:539
54093c3a7b76Schristos     > msgid "  %d/%d (peak %d) template nxt-chk entries created\n"
54103c3a7b76Schristos     >
54113c3a7b76Schristos     > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
54123c3a7b76Schristos     > However, 'template next-check entries' doesn't make much sense to me. To be
54133c3a7b76Schristos     > able to find a good translation I need to know a little bit more about it.
54143c3a7b76Schristos
54153c3a7b76Schristos     There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
54163c3a7b76Schristos     scanner tables.  It involves creating two pairs of tables.  The first has
54173c3a7b76Schristos     "base" and "default" entries, the second has "next" and "check" entries.
54183c3a7b76Schristos     The "base" entry is indexed by the current state and yields an index into
54193c3a7b76Schristos     the next/check table.  The "default" entry gives what to do if the state
54203c3a7b76Schristos     transition isn't found in next/check.  The "next" entry gives the next
54213c3a7b76Schristos     state to enter, but only if the "check" entry verifies that this entry is
54223c3a7b76Schristos     correct for the current state.  Flex creates templates of series of
54233c3a7b76Schristos     next/check entries and then encodes differences from these templates as a
54243c3a7b76Schristos     way to compress the tables.
54253c3a7b76Schristos
54263c3a7b76Schristos     > #: main.c:533
54273c3a7b76Schristos     > msgid "  %d/%d base-def entries created\n"
54283c3a7b76Schristos     >
54293c3a7b76Schristos     > The same problem here for 'base-def'.
54303c3a7b76Schristos
54313c3a7b76Schristos     See above.
54323c3a7b76Schristos
54333c3a7b76Schristos     		Vern
54343c3a7b76Schristos
54353c3a7b76Schristos
54363c3a7b76SchristosFile: flex.info,  Node: unput() messes up yy_at_bol,  Next: The | operator is not doing what I want,  Prev: Can you discuss some flex internals?,  Up: FAQ
54373c3a7b76Schristos
54383c3a7b76Schristosunput() messes up yy_at_bol
54393c3a7b76Schristos===========================
54403c3a7b76Schristos
54413c3a7b76Schristos     To: Xinying Li <xli@npac.syr.edu>
54423c3a7b76Schristos     Subject: Re: FLEX ?
54433c3a7b76Schristos     In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
54443c3a7b76Schristos     Date: Wed, 13 Nov 1996 19:51:54 PST
54453c3a7b76Schristos     From: Vern Paxson <vern>
54463c3a7b76Schristos
54473c3a7b76Schristos     > "unput()" them to input flow, question occurs. If I do this after I scan
54483c3a7b76Schristos     > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
54493c3a7b76Schristos     > means the carriage flag has gone.
54503c3a7b76Schristos
54513c3a7b76Schristos     You can control this by calling yy_set_bol().  It's described in the manual.
54523c3a7b76Schristos
54533c3a7b76Schristos     >      And if in pre-reading it goes to the end of file, is anything done
54543c3a7b76Schristos     > to control the end of curren buffer and end of file?
54553c3a7b76Schristos
54563c3a7b76Schristos     No, there's no way to put back an end-of-file.
54573c3a7b76Schristos
54583c3a7b76Schristos     >      By the way I am using flex 2.5.2 and using the "-l".
54593c3a7b76Schristos
54603c3a7b76Schristos     The latest release is 2.5.4, by the way.  It fixes some bugs in 2.5.2 and
54613c3a7b76Schristos     2.5.3.  You can get it from ftp.ee.lbl.gov.
54623c3a7b76Schristos
54633c3a7b76Schristos     		Vern
54643c3a7b76Schristos
54653c3a7b76Schristos
54663c3a7b76SchristosFile: flex.info,  Node: The | operator is not doing what I want,  Next: Why can't flex understand this variable trailing context pattern?,  Prev: unput() messes up yy_at_bol,  Up: FAQ
54673c3a7b76Schristos
54683c3a7b76SchristosThe | operator is not doing what I want
54693c3a7b76Schristos=======================================
54703c3a7b76Schristos
54713c3a7b76Schristos     To: Alain.ISSARD@st.com
54723c3a7b76Schristos     Subject: Re: Start condition with FLEX
54733c3a7b76Schristos     In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
54743c3a7b76Schristos     Date: Mon, 18 Nov 1996 10:41:34 PST
54753c3a7b76Schristos     From: Vern Paxson <vern>
54763c3a7b76Schristos
54773c3a7b76Schristos     > I am not able to use the start condition scope and to use the | (OR) with
54783c3a7b76Schristos     > rules having start conditions.
54793c3a7b76Schristos
54803c3a7b76Schristos     The problem is that if you use '|' as a regular expression operator, for
54813c3a7b76Schristos     example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
54823c3a7b76Schristos     any blanks around it.  If you instead want the special '|' *action* (which
54833c3a7b76Schristos     from your scanner appears to be the case), which is a way of giving two
54843c3a7b76Schristos     different rules the same action:
54853c3a7b76Schristos
54863c3a7b76Schristos     	foo	|
54873c3a7b76Schristos     	bar	matched_foo_or_bar();
54883c3a7b76Schristos
54893c3a7b76Schristos     then '|' *must* be separated from the first rule by whitespace and *must*
54903c3a7b76Schristos     be followed by a new line.  You *cannot* write it as:
54913c3a7b76Schristos
54923c3a7b76Schristos     	foo | bar	matched_foo_or_bar();
54933c3a7b76Schristos
54943c3a7b76Schristos     even though you might think you could because yacc supports this syntax.
54953c3a7b76Schristos     The reason for this unfortunately incompatibility is historical, but it's
54963c3a7b76Schristos     unlikely to be changed.
54973c3a7b76Schristos
54983c3a7b76Schristos     Your problems with start condition scope are simply due to syntax errors
54993c3a7b76Schristos     from your use of '|' later confusing flex.
55003c3a7b76Schristos
55013c3a7b76Schristos     Let me know if you still have problems.
55023c3a7b76Schristos
55033c3a7b76Schristos     		Vern
55043c3a7b76Schristos
55053c3a7b76Schristos
55063c3a7b76SchristosFile: flex.info,  Node: Why can't flex understand this variable trailing context pattern?,  Next: The ^ operator isn't working,  Prev: The | operator is not doing what I want,  Up: FAQ
55073c3a7b76Schristos
55083c3a7b76SchristosWhy can't flex understand this variable trailing context pattern?
55093c3a7b76Schristos=================================================================
55103c3a7b76Schristos
55113c3a7b76Schristos     To: Gregory Margo <gmargo@newton.vip.best.com>
55123c3a7b76Schristos     Subject: Re: flex-2.5.3 bug report
55133c3a7b76Schristos     In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
55143c3a7b76Schristos     Date: Sat, 23 Nov 1996 17:07:32 PST
55153c3a7b76Schristos     From: Vern Paxson <vern>
55163c3a7b76Schristos
55173c3a7b76Schristos     > Enclosed is a lex file that "real" lex will process, but I cannot get
55183c3a7b76Schristos     > flex to process it.  Could you try it and maybe point me in the right direction?
55193c3a7b76Schristos
55203c3a7b76Schristos     Your problem is that some of the definitions in the scanner use the '/'
55213c3a7b76Schristos     trailing context operator, and have it enclosed in ()'s.  Flex does not
55223c3a7b76Schristos     allow this operator to be enclosed in ()'s because doing so allows undefined
55233c3a7b76Schristos     regular expressions such as "(a/b)+".  So the solution is to remove the
55243c3a7b76Schristos     parentheses.  Note that you must also be building the scanner with the -l
55253c3a7b76Schristos     option for AT&T lex compatibility.  Without this option, flex automatically
55263c3a7b76Schristos     encloses the definitions in parentheses.
55273c3a7b76Schristos
55283c3a7b76Schristos     		Vern
55293c3a7b76Schristos
55303c3a7b76Schristos
55313c3a7b76SchristosFile: flex.info,  Node: The ^ operator isn't working,  Next: Trailing context is getting confused with trailing optional patterns,  Prev: Why can't flex understand this variable trailing context pattern?,  Up: FAQ
55323c3a7b76Schristos
55333c3a7b76SchristosThe ^ operator isn't working
55343c3a7b76Schristos============================
55353c3a7b76Schristos
55363c3a7b76Schristos     To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
55373c3a7b76Schristos     Subject: Re: Flex Bug ?
55383c3a7b76Schristos     In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
55393c3a7b76Schristos     Date: Tue, 26 Nov 1996 11:15:05 PST
55403c3a7b76Schristos     From: Vern Paxson <vern>
55413c3a7b76Schristos
55423c3a7b76Schristos     > In my lexer code, i have the line :
55433c3a7b76Schristos     > ^\*.*          { }
55443c3a7b76Schristos     >
55453c3a7b76Schristos     > Thus all lines starting with an astrix (*) are comment lines.
55463c3a7b76Schristos     > This does not work !
55473c3a7b76Schristos
55483c3a7b76Schristos     I can't get this problem to reproduce - it works fine for me.  Note
55493c3a7b76Schristos     though that if what you have is slightly different:
55503c3a7b76Schristos
55513c3a7b76Schristos     	COMMENT	^\*.*
55523c3a7b76Schristos     	%%
55533c3a7b76Schristos     	{COMMENT}	{ }
55543c3a7b76Schristos
55553c3a7b76Schristos     then it won't work, because flex pushes back macro definitions enclosed
55563c3a7b76Schristos     in ()'s, so the rule becomes
55573c3a7b76Schristos
55583c3a7b76Schristos     	(^\*.*)		{ }
55593c3a7b76Schristos
55603c3a7b76Schristos     and now that the '^' operator is not at the immediate beginning of the
55613c3a7b76Schristos     line, it's interpreted as just a regular character.  You can avoid this
55623c3a7b76Schristos     behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
55633c3a7b76Schristos
55643c3a7b76Schristos     		Vern
55653c3a7b76Schristos
55663c3a7b76Schristos
55673c3a7b76SchristosFile: flex.info,  Node: Trailing context is getting confused with trailing optional patterns,  Next: Is flex GNU or not?,  Prev: The ^ operator isn't working,  Up: FAQ
55683c3a7b76Schristos
55693c3a7b76SchristosTrailing context is getting confused with trailing optional patterns
55703c3a7b76Schristos====================================================================
55713c3a7b76Schristos
55723c3a7b76Schristos     To: Adoram Rogel <adoram@hybridge.com>
55733c3a7b76Schristos     Subject: Re: Flex 2.5.4 BOF ???
55743c3a7b76Schristos     In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
55753c3a7b76Schristos     Date: Wed, 27 Nov 1996 10:56:25 PST
55763c3a7b76Schristos     From: Vern Paxson <vern>
55773c3a7b76Schristos
55783c3a7b76Schristos     >     Organization(s)?/[a-z]
55793c3a7b76Schristos     >
55803c3a7b76Schristos     > This matched "Organizations" (looking in debug mode, the trailing s
55813c3a7b76Schristos     > was matched with trailing context instead of the optional (s) in the
55823c3a7b76Schristos     > end of the word.
55833c3a7b76Schristos
55843c3a7b76Schristos     That should only happen with lex.  Flex can properly match this pattern.
55853c3a7b76Schristos     (That might be what you're saying, I'm just not sure.)
55863c3a7b76Schristos
55873c3a7b76Schristos     > Is there a way to avoid this dangerous trailing context problem ?
55883c3a7b76Schristos
55893c3a7b76Schristos     Unfortunately, there's no easy way.  On the other hand, I don't see why
55903c3a7b76Schristos     it should be a problem.  Lex's matching is clearly wrong, and I'd hope
55913c3a7b76Schristos     that usually the intent remains the same as expressed with the pattern,
55923c3a7b76Schristos     so flex's matching will be correct.
55933c3a7b76Schristos
55943c3a7b76Schristos     		Vern
55953c3a7b76Schristos
55963c3a7b76Schristos
55973c3a7b76SchristosFile: flex.info,  Node: Is flex GNU or not?,  Next: ERASEME53,  Prev: Trailing context is getting confused with trailing optional patterns,  Up: FAQ
55983c3a7b76Schristos
55993c3a7b76SchristosIs flex GNU or not?
56003c3a7b76Schristos===================
56013c3a7b76Schristos
56023c3a7b76Schristos     To: Cameron MacKinnon <mackin@interlog.com>
56033c3a7b76Schristos     Subject: Re: Flex documentation bug
56043c3a7b76Schristos     In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
56053c3a7b76Schristos     Date: Sun, 01 Dec 1996 22:29:39 PST
56063c3a7b76Schristos     From: Vern Paxson <vern>
56073c3a7b76Schristos
56083c3a7b76Schristos     > I'm not sure how or where to submit bug reports (documentation or
56093c3a7b76Schristos     > otherwise) for the GNU project stuff ...
56103c3a7b76Schristos
56113c3a7b76Schristos     Well, strictly speaking flex isn't part of the GNU project.  They just
56123c3a7b76Schristos     distribute it because no one's written a decent GPL'd lex replacement.
56133c3a7b76Schristos     So you should send bugs directly to me.  Those sent to the GNU folks
56143c3a7b76Schristos     sometimes find there way to me, but some may drop between the cracks.
56153c3a7b76Schristos
56163c3a7b76Schristos     > In GNU Info, under the section 'Start Conditions', and also in the man
56173c3a7b76Schristos     > page (mine's dated April '95) is a nice little snippet showing how to
56183c3a7b76Schristos     > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
56193c3a7b76Schristos     > size. Unfortunately, no overflow checking is ever done ...
56203c3a7b76Schristos
56213c3a7b76Schristos     This is already mentioned in the manual:
56223c3a7b76Schristos
56233c3a7b76Schristos     Finally, here's an example of how to  match  C-style  quoted
56243c3a7b76Schristos     strings using exclusive start conditions, including expanded
56253c3a7b76Schristos     escape sequences (but not including checking  for  a  string
56263c3a7b76Schristos     that's too long):
56273c3a7b76Schristos
56283c3a7b76Schristos     The reason for not doing the overflow checking is that it will needlessly
56293c3a7b76Schristos     clutter up an example whose main purpose is just to demonstrate how to
56303c3a7b76Schristos     use flex.
56313c3a7b76Schristos
56323c3a7b76Schristos     The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
56333c3a7b76Schristos
56343c3a7b76Schristos     		Vern
56353c3a7b76Schristos
56363c3a7b76Schristos
56373c3a7b76SchristosFile: flex.info,  Node: ERASEME53,  Next: I need to scan if-then-else blocks and while loops,  Prev: Is flex GNU or not?,  Up: FAQ
56383c3a7b76Schristos
56393c3a7b76SchristosERASEME53
56403c3a7b76Schristos=========
56413c3a7b76Schristos
56423c3a7b76Schristos     To: tsv@cs.UManitoba.CA
56433c3a7b76Schristos     Subject: Re: Flex (reg)..
56443c3a7b76Schristos     In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
56453c3a7b76Schristos     Date: Thu, 06 Mar 1997 15:54:19 PST
56463c3a7b76Schristos     From: Vern Paxson <vern>
56473c3a7b76Schristos
56483c3a7b76Schristos     > [:alpha:] ([:alnum:] | \\_)*
56493c3a7b76Schristos
56503c3a7b76Schristos     If your rule really has embedded blanks as shown above, then it won't
56513c3a7b76Schristos     work, as the first blank delimits the rule from the action.  (It wouldn't
56523c3a7b76Schristos     even compile ...)  You need instead:
56533c3a7b76Schristos
56543c3a7b76Schristos     [:alpha:]([:alnum:]|\\_)*
56553c3a7b76Schristos
56563c3a7b76Schristos     and that should work fine - there's no restriction on what can go inside
56573c3a7b76Schristos     of ()'s except for the trailing context operator, '/'.
56583c3a7b76Schristos
56593c3a7b76Schristos     		Vern
56603c3a7b76Schristos
56613c3a7b76Schristos
56623c3a7b76SchristosFile: flex.info,  Node: I need to scan if-then-else blocks and while loops,  Next: ERASEME55,  Prev: ERASEME53,  Up: FAQ
56633c3a7b76Schristos
56643c3a7b76SchristosI need to scan if-then-else blocks and while loops
56653c3a7b76Schristos==================================================
56663c3a7b76Schristos
56673c3a7b76Schristos     To: "Mike Stolnicki" <mstolnic@ford.com>
56683c3a7b76Schristos     Subject: Re: FLEX help
56693c3a7b76Schristos     In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
56703c3a7b76Schristos     Date: Fri, 30 May 1997 10:46:35 PDT
56713c3a7b76Schristos     From: Vern Paxson <vern>
56723c3a7b76Schristos
56733c3a7b76Schristos     > We'd like to add "if-then-else", "while", and "for" statements to our
56743c3a7b76Schristos     > language ...
56753c3a7b76Schristos     > We've investigated many possible solutions.  The one solution that seems
56763c3a7b76Schristos     > the most reasonable involves knowing the position of a TOKEN in yyin.
56773c3a7b76Schristos
56783c3a7b76Schristos     I strongly advise you to instead build a parse tree (abstract syntax tree)
56793c3a7b76Schristos     and loop over that instead.  You'll find this has major benefits in keeping
56803c3a7b76Schristos     your interpreter simple and extensible.
56813c3a7b76Schristos
56823c3a7b76Schristos     That said, the functionality you mention for get_position and set_position
56833c3a7b76Schristos     have been on the to-do list for a while.  As flex is a purely spare-time
56843c3a7b76Schristos     project for me, no guarantees when this will be added (in particular, it
56853c3a7b76Schristos     for sure won't be for many months to come).
56863c3a7b76Schristos
56873c3a7b76Schristos     		Vern
56883c3a7b76Schristos
56893c3a7b76Schristos
56903c3a7b76SchristosFile: flex.info,  Node: ERASEME55,  Next: ERASEME56,  Prev: I need to scan if-then-else blocks and while loops,  Up: FAQ
56913c3a7b76Schristos
56923c3a7b76SchristosERASEME55
56933c3a7b76Schristos=========
56943c3a7b76Schristos
56953c3a7b76Schristos     To: Colin Paul Adams <colin@colina.demon.co.uk>
56963c3a7b76Schristos     Subject: Re: Flex C++ classes and Bison
56973c3a7b76Schristos     In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
56983c3a7b76Schristos     Date: Fri, 15 Aug 1997 10:48:19 PDT
56993c3a7b76Schristos     From: Vern Paxson <vern>
57003c3a7b76Schristos
57013c3a7b76Schristos     > #define YY_DECL   int yylex (YYSTYPE *lvalp, struct parser_control
57023c3a7b76Schristos     > *parm)
57033c3a7b76Schristos     >
57043c3a7b76Schristos     > I have been trying  to get this to work as a C++ scanner, but it does
57053c3a7b76Schristos     > not appear to be possible (warning that it matches no declarations in
57063c3a7b76Schristos     > yyFlexLexer, or something like that).
57073c3a7b76Schristos     >
57083c3a7b76Schristos     > Is this supposed to be possible, or is it being worked on (I DID
57093c3a7b76Schristos     > notice the comment that scanner classes are still experimental, so I'm
57103c3a7b76Schristos     > not too hopeful)?
57113c3a7b76Schristos
57123c3a7b76Schristos     What you need to do is derive a subclass from yyFlexLexer that provides
57133c3a7b76Schristos     the above yylex() method, squirrels away lvalp and parm into member
57143c3a7b76Schristos     variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
57153c3a7b76Schristos
57163c3a7b76Schristos     		Vern
57173c3a7b76Schristos
57183c3a7b76Schristos
57193c3a7b76SchristosFile: flex.info,  Node: ERASEME56,  Next: ERASEME57,  Prev: ERASEME55,  Up: FAQ
57203c3a7b76Schristos
57213c3a7b76SchristosERASEME56
57223c3a7b76Schristos=========
57233c3a7b76Schristos
57243c3a7b76Schristos     To: Mikael.Latvala@lmf.ericsson.se
57253c3a7b76Schristos     Subject: Re: Possible mistake in Flex v2.5 document
57263c3a7b76Schristos     In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
57273c3a7b76Schristos     Date: Fri, 05 Sep 1997 10:01:54 PDT
57283c3a7b76Schristos     From: Vern Paxson <vern>
57293c3a7b76Schristos
57303c3a7b76Schristos     > In that example you show how to count comment lines when using
57313c3a7b76Schristos     > C style /* ... */ comments. My question is, shouldn't you take into
57323c3a7b76Schristos     > account a scenario where end of a comment marker occurs inside
57333c3a7b76Schristos     > character or string literals?
57343c3a7b76Schristos
57353c3a7b76Schristos     The scanner certainly needs to also scan character and string literals.
57363c3a7b76Schristos     However it does that (there's an example in the man page for strings), the
57373c3a7b76Schristos     lexer will recognize the beginning of the literal before it runs across the
57383c3a7b76Schristos     embedded "/*".  Consequently, it will finish scanning the literal before it
57393c3a7b76Schristos     even considers the possibility of matching "/*".
57403c3a7b76Schristos
57413c3a7b76Schristos     Example:
57423c3a7b76Schristos
57433c3a7b76Schristos     	'([^']*|{ESCAPE_SEQUENCE})'
57443c3a7b76Schristos
57453c3a7b76Schristos     will match all the text between the ''s (inclusive).  So the lexer
57463c3a7b76Schristos     considers this as a token beginning at the first ', and doesn't even
57473c3a7b76Schristos     attempt to match other tokens inside it.
57483c3a7b76Schristos
57493c3a7b76Schristos     I thinnk this subtlety is not worth putting in the manual, as I suspect
57503c3a7b76Schristos     it would confuse more people than it would enlighten.
57513c3a7b76Schristos
57523c3a7b76Schristos     		Vern
57533c3a7b76Schristos
57543c3a7b76Schristos
57553c3a7b76SchristosFile: flex.info,  Node: ERASEME57,  Next: Is there a repository for flex scanners?,  Prev: ERASEME56,  Up: FAQ
57563c3a7b76Schristos
57573c3a7b76SchristosERASEME57
57583c3a7b76Schristos=========
57593c3a7b76Schristos
57603c3a7b76Schristos     To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
57613c3a7b76Schristos     Subject: Re: flex limitations
57623c3a7b76Schristos     In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
57633c3a7b76Schristos     Date: Mon, 08 Sep 1997 11:38:08 PDT
57643c3a7b76Schristos     From: Vern Paxson <vern>
57653c3a7b76Schristos
57663c3a7b76Schristos     > %%
57673c3a7b76Schristos     > [a-zA-Z]+       /* skip a line */
57683c3a7b76Schristos     >                 {  printf("got %s\n", yytext); }
57693c3a7b76Schristos     > %%
57703c3a7b76Schristos
57713c3a7b76Schristos     What version of flex are you using?  If I feed this to 2.5.4, it complains:
57723c3a7b76Schristos
57733c3a7b76Schristos     	"bug.l", line 5: EOF encountered inside an action
57743c3a7b76Schristos     	"bug.l", line 5: unrecognized rule
57753c3a7b76Schristos     	"bug.l", line 5: fatal parse error
57763c3a7b76Schristos
57773c3a7b76Schristos     Not the world's greatest error message, but it manages to flag the problem.
57783c3a7b76Schristos
57793c3a7b76Schristos     (With the introduction of start condition scopes, flex can't accommodate
57803c3a7b76Schristos     an action on a separate line, since it's ambiguous with an indented rule.)
57813c3a7b76Schristos
57823c3a7b76Schristos     You can get 2.5.4 from ftp.ee.lbl.gov.
57833c3a7b76Schristos
57843c3a7b76Schristos     		Vern
57853c3a7b76Schristos
57863c3a7b76Schristos
57873c3a7b76SchristosFile: flex.info,  Node: Is there a repository for flex scanners?,  Next: How can I conditionally compile or preprocess my flex input file?,  Prev: ERASEME57,  Up: FAQ
57883c3a7b76Schristos
57893c3a7b76SchristosIs there a repository for flex scanners?
57903c3a7b76Schristos========================================
57913c3a7b76Schristos
57923c3a7b76SchristosNot that we know of.  You might try asking on comp.compilers.
57933c3a7b76Schristos
57943c3a7b76Schristos
57953c3a7b76SchristosFile: flex.info,  Node: How can I conditionally compile or preprocess my flex input file?,  Next: Where can I find grammars for lex and yacc?,  Prev: Is there a repository for flex scanners?,  Up: FAQ
57963c3a7b76Schristos
57973c3a7b76SchristosHow can I conditionally compile or preprocess my flex input file?
57983c3a7b76Schristos=================================================================
57993c3a7b76Schristos
58003c3a7b76SchristosFlex doesn't have a preprocessor like C does.  You might try using m4,
58013c3a7b76Schristosor the C preprocessor plus a sed script to clean up the result.
58023c3a7b76Schristos
58033c3a7b76Schristos
58043c3a7b76SchristosFile: flex.info,  Node: Where can I find grammars for lex and yacc?,  Next: I get an end-of-buffer message for each character scanned.,  Prev: How can I conditionally compile or preprocess my flex input file?,  Up: FAQ
58053c3a7b76Schristos
58063c3a7b76SchristosWhere can I find grammars for lex and yacc?
58073c3a7b76Schristos===========================================
58083c3a7b76Schristos
58093c3a7b76SchristosIn the sources for flex and bison.
58103c3a7b76Schristos
58113c3a7b76Schristos
58123c3a7b76SchristosFile: flex.info,  Node: I get an end-of-buffer message for each character scanned.,  Next: unnamed-faq-62,  Prev: Where can I find grammars for lex and yacc?,  Up: FAQ
58133c3a7b76Schristos
58143c3a7b76SchristosI get an end-of-buffer message for each character scanned.
58153c3a7b76Schristos==========================================================
58163c3a7b76Schristos
58173c3a7b76SchristosThis will happen if your LexerInput() function returns only one
58183c3a7b76Schristoscharacter at a time, which can happen either if you're scanner is
581930da1778Schristos"interactive", or if the streams library on your platform always returns
582030da1778Schristos1 for yyin->gcount().
58213c3a7b76Schristos
58223c3a7b76Schristos   Solution: override LexerInput() with a version that returns whole
58233c3a7b76Schristosbuffers.
58243c3a7b76Schristos
58253c3a7b76Schristos
58263c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-62,  Next: unnamed-faq-63,  Prev: I get an end-of-buffer message for each character scanned.,  Up: FAQ
58273c3a7b76Schristos
58283c3a7b76Schristosunnamed-faq-62
58293c3a7b76Schristos==============
58303c3a7b76Schristos
58313c3a7b76Schristos     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
58323c3a7b76Schristos     Subject: Re: Flex maximums
58333c3a7b76Schristos     In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
58343c3a7b76Schristos     Date: Mon, 17 Nov 1997 17:16:15 PST
58353c3a7b76Schristos     From: Vern Paxson <vern>
58363c3a7b76Schristos
58373c3a7b76Schristos     > I took a quick look into the flex-sources and altered some #defines in
58383c3a7b76Schristos     > flexdefs.h:
58393c3a7b76Schristos     >
58403c3a7b76Schristos     > 	#define INITIAL_MNS 64000
58413c3a7b76Schristos     > 	#define MNS_INCREMENT 1024000
58423c3a7b76Schristos     > 	#define MAXIMUM_MNS 64000
58433c3a7b76Schristos
58443c3a7b76Schristos     The things to fix are to add a couple of zeroes to:
58453c3a7b76Schristos
58463c3a7b76Schristos     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
58473c3a7b76Schristos     #define MAXIMUM_MNS 31999
58483c3a7b76Schristos     #define BAD_SUBSCRIPT -32767
58493c3a7b76Schristos     #define MAX_SHORT 32700
58503c3a7b76Schristos
58513c3a7b76Schristos     and, if you get complaints about too many rules, make the following change too:
58523c3a7b76Schristos
58533c3a7b76Schristos     	#define YY_TRAILING_MASK 0x200000
58543c3a7b76Schristos     	#define YY_TRAILING_HEAD_MASK 0x400000
58553c3a7b76Schristos
58563c3a7b76Schristos     - Vern
58573c3a7b76Schristos
58583c3a7b76Schristos
58593c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-63,  Next: unnamed-faq-64,  Prev: unnamed-faq-62,  Up: FAQ
58603c3a7b76Schristos
58613c3a7b76Schristosunnamed-faq-63
58623c3a7b76Schristos==============
58633c3a7b76Schristos
58643c3a7b76Schristos     To: jimmey@lexis-nexis.com (Jimmey Todd)
58653c3a7b76Schristos     Subject: Re: FLEX question regarding istream vs ifstream
58663c3a7b76Schristos     In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
58673c3a7b76Schristos     Date: Mon, 15 Dec 1997 13:21:35 PST
58683c3a7b76Schristos     From: Vern Paxson <vern>
58693c3a7b76Schristos
58703c3a7b76Schristos     >         stdin_handle = YY_CURRENT_BUFFER;
58713c3a7b76Schristos     >         ifstream fin( "aFile" );
58723c3a7b76Schristos     >         yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
58733c3a7b76Schristos     >
58743c3a7b76Schristos     > What I'm wanting to do, is pass the contents of a file thru one set
58753c3a7b76Schristos     > of rules and then pass stdin thru another set... It works great if, I
58763c3a7b76Schristos     > don't use the C++ classes. But since everything else that I'm doing is
58773c3a7b76Schristos     > in C++, I thought I'd be consistent.
58783c3a7b76Schristos     >
58793c3a7b76Schristos     > The problem is that 'yy_create_buffer' is expecting an istream* as it's
58803c3a7b76Schristos     > first argument (as stated in the man page). However, fin is a ifstream
58813c3a7b76Schristos     > object. Any ideas on what I might be doing wrong? Any help would be
58823c3a7b76Schristos     > appreciated. Thanks!!
58833c3a7b76Schristos
58843c3a7b76Schristos     You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
58853c3a7b76Schristos     Then its type will be compatible with the expected istream*, because ifstream
58863c3a7b76Schristos     is derived from istream.
58873c3a7b76Schristos
58883c3a7b76Schristos     		Vern
58893c3a7b76Schristos
58903c3a7b76Schristos
58913c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-64,  Next: unnamed-faq-65,  Prev: unnamed-faq-63,  Up: FAQ
58923c3a7b76Schristos
58933c3a7b76Schristosunnamed-faq-64
58943c3a7b76Schristos==============
58953c3a7b76Schristos
58963c3a7b76Schristos     To: Enda Fadian <fadiane@piercom.ie>
58973c3a7b76Schristos     Subject: Re: Question related to Flex man page?
58983c3a7b76Schristos     In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
58993c3a7b76Schristos     Date: Tue, 16 Dec 1997 14:17:09 PST
59003c3a7b76Schristos     From: Vern Paxson <vern>
59013c3a7b76Schristos
59023c3a7b76Schristos     > Can you explain to me what is ment by a long-jump in relation to flex?
59033c3a7b76Schristos
59043c3a7b76Schristos     Using the longjmp() function while inside yylex() or a routine called by it.
59053c3a7b76Schristos
59063c3a7b76Schristos     > what is the flex activation frame.
59073c3a7b76Schristos
59083c3a7b76Schristos     Just yylex()'s stack frame.
59093c3a7b76Schristos
59103c3a7b76Schristos     > As far as I can see yyrestart will bring me back to the sart of the input
59113c3a7b76Schristos     > file and using flex++ isnot really an option!
59123c3a7b76Schristos
59133c3a7b76Schristos     No, yyrestart() doesn't imply a rewind, even though its name might sound
59143c3a7b76Schristos     like it does.  It tells the scanner to flush its internal buffers and
59153c3a7b76Schristos     start reading from the given file at its present location.
59163c3a7b76Schristos
59173c3a7b76Schristos     		Vern
59183c3a7b76Schristos
59193c3a7b76Schristos
59203c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-65,  Next: unnamed-faq-66,  Prev: unnamed-faq-64,  Up: FAQ
59213c3a7b76Schristos
59223c3a7b76Schristosunnamed-faq-65
59233c3a7b76Schristos==============
59243c3a7b76Schristos
59253c3a7b76Schristos     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
59263c3a7b76Schristos     Subject: Re: Need urgent Help
59273c3a7b76Schristos     In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
59283c3a7b76Schristos     Date: Sun, 21 Dec 1997 21:30:46 PST
59293c3a7b76Schristos     From: Vern Paxson <vern>
59303c3a7b76Schristos
59313c3a7b76Schristos     > /usr/lib/yaccpar: In function `int yyparse()':
59323c3a7b76Schristos     > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
59333c3a7b76Schristos     >
59343c3a7b76Schristos     > ld: Undefined symbol
59353c3a7b76Schristos     >    _yylex
59363c3a7b76Schristos     >    _yyparse
59373c3a7b76Schristos     >    _yyin
59383c3a7b76Schristos
59393c3a7b76Schristos     This is a known problem with Solaris C++ (and/or Solaris yacc).  I believe
59403c3a7b76Schristos     the fix is to explicitly insert some 'extern "C"' statements for the
59413c3a7b76Schristos     corresponding routines/symbols.
59423c3a7b76Schristos
59433c3a7b76Schristos     		Vern
59443c3a7b76Schristos
59453c3a7b76Schristos
59463c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-66,  Next: unnamed-faq-67,  Prev: unnamed-faq-65,  Up: FAQ
59473c3a7b76Schristos
59483c3a7b76Schristosunnamed-faq-66
59493c3a7b76Schristos==============
59503c3a7b76Schristos
59513c3a7b76Schristos     To: mc0307@mclink.it
59523c3a7b76Schristos     Cc: gnu@prep.ai.mit.edu
59533c3a7b76Schristos     Subject: Re: [mc0307@mclink.it: Help request]
59543c3a7b76Schristos     In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
59553c3a7b76Schristos     Date: Sun, 21 Dec 1997 22:33:37 PST
59563c3a7b76Schristos     From: Vern Paxson <vern>
59573c3a7b76Schristos
59583c3a7b76Schristos     > This is my definition for float and integer types:
59593c3a7b76Schristos     > . . .
59603c3a7b76Schristos     > NZD          [1-9]
59613c3a7b76Schristos     > ...
59623c3a7b76Schristos     > I've tested my program on other lex version (on UNIX Sun Solaris an HP
59633c3a7b76Schristos     > UNIX) and it work well, so I think that my definitions are correct.
59643c3a7b76Schristos     > There are any differences between Lex and Flex?
59653c3a7b76Schristos
59663c3a7b76Schristos     There are indeed differences, as discussed in the man page.  The one
59673c3a7b76Schristos     you are probably running into is that when flex expands a name definition,
59683c3a7b76Schristos     it puts parentheses around the expansion, while lex does not.  There's
59693c3a7b76Schristos     an example in the man page of how this can lead to different matching.
59703c3a7b76Schristos     Flex's behavior complies with the POSIX standard (or at least with the
59713c3a7b76Schristos     last POSIX draft I saw).
59723c3a7b76Schristos
59733c3a7b76Schristos     		Vern
59743c3a7b76Schristos
59753c3a7b76Schristos
59763c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-67,  Next: unnamed-faq-68,  Prev: unnamed-faq-66,  Up: FAQ
59773c3a7b76Schristos
59783c3a7b76Schristosunnamed-faq-67
59793c3a7b76Schristos==============
59803c3a7b76Schristos
59813c3a7b76Schristos     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
59823c3a7b76Schristos     Subject: Re: Thanks
59833c3a7b76Schristos     In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
59843c3a7b76Schristos     Date: Mon, 22 Dec 1997 14:35:05 PST
59853c3a7b76Schristos     From: Vern Paxson <vern>
59863c3a7b76Schristos
59873c3a7b76Schristos     > Thank you very much for your help. I compile and link well with C++ while
59883c3a7b76Schristos     > declaring 'yylex ...' extern, But a little problem remains. I get a
59893c3a7b76Schristos     > segmentation default when executing ( I linked with lfl library) while it
59903c3a7b76Schristos     > works well when using LEX instead of flex. Do you have some ideas about the
59913c3a7b76Schristos     > reason for this ?
59923c3a7b76Schristos
59933c3a7b76Schristos     The one possible reason for this that comes to mind is if you've defined
59943c3a7b76Schristos     yytext as "extern char yytext[]" (which is what lex uses) instead of
59953c3a7b76Schristos     "extern char *yytext" (which is what flex uses).  If it's not that, then
59963c3a7b76Schristos     I'm afraid I don't know what the problem might be.
59973c3a7b76Schristos
59983c3a7b76Schristos     		Vern
59993c3a7b76Schristos
60003c3a7b76Schristos
60013c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-68,  Next: unnamed-faq-69,  Prev: unnamed-faq-67,  Up: FAQ
60023c3a7b76Schristos
60033c3a7b76Schristosunnamed-faq-68
60043c3a7b76Schristos==============
60053c3a7b76Schristos
60063c3a7b76Schristos     To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
60073c3a7b76Schristos     Subject: Re: flex 2.5: c++ scanners & start conditions
60083c3a7b76Schristos     In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
60093c3a7b76Schristos     Date: Tue, 06 Jan 1998 19:19:30 PST
60103c3a7b76Schristos     From: Vern Paxson <vern>
60113c3a7b76Schristos
60123c3a7b76Schristos     > The problem is that when I do this (using %option c++) start
60133c3a7b76Schristos     > conditions seem to not apply.
60143c3a7b76Schristos
60153c3a7b76Schristos     The BEGIN macro modifies the yy_start variable.  For C scanners, this
60163c3a7b76Schristos     is a static with scope visible through the whole file.  For C++ scanners,
60173c3a7b76Schristos     it's a member variable, so it only has visible scope within a member
60183c3a7b76Schristos     function.  Your lexbegin() routine is not a member function when you
60193c3a7b76Schristos     build a C++ scanner, so it's not modifying the correct yy_start.  The
60203c3a7b76Schristos     diagnostic that indicates this is that you found you needed to add
60213c3a7b76Schristos     a declaration of yy_start in order to get your scanner to compile when
60223c3a7b76Schristos     using C++; instead, the correct fix is to make lexbegin() a member
60233c3a7b76Schristos     function (by deriving from yyFlexLexer).
60243c3a7b76Schristos
60253c3a7b76Schristos     		Vern
60263c3a7b76Schristos
60273c3a7b76Schristos
60283c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-69,  Next: unnamed-faq-70,  Prev: unnamed-faq-68,  Up: FAQ
60293c3a7b76Schristos
60303c3a7b76Schristosunnamed-faq-69
60313c3a7b76Schristos==============
60323c3a7b76Schristos
60333c3a7b76Schristos     To: "Boris Zinin" <boris@ippe.rssi.ru>
60343c3a7b76Schristos     Subject: Re: current position in flex buffer
60353c3a7b76Schristos     In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
60363c3a7b76Schristos     Date: Mon, 12 Jan 1998 12:03:15 PST
60373c3a7b76Schristos     From: Vern Paxson <vern>
60383c3a7b76Schristos
60393c3a7b76Schristos     > The problem is how to determine the current position in flex active
60403c3a7b76Schristos     > buffer when a rule is matched....
60413c3a7b76Schristos
60423c3a7b76Schristos     You will need to keep track of this explicitly, such as by redefining
60433c3a7b76Schristos     YY_USER_ACTION to count the number of characters matched.
60443c3a7b76Schristos
60453c3a7b76Schristos     The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
60463c3a7b76Schristos
60473c3a7b76Schristos     		Vern
60483c3a7b76Schristos
60493c3a7b76Schristos
60503c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-70,  Next: unnamed-faq-71,  Prev: unnamed-faq-69,  Up: FAQ
60513c3a7b76Schristos
60523c3a7b76Schristosunnamed-faq-70
60533c3a7b76Schristos==============
60543c3a7b76Schristos
60553c3a7b76Schristos     To: Bik.Dhaliwal@bis.org
60563c3a7b76Schristos     Subject: Re: Flex question
60573c3a7b76Schristos     In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
60583c3a7b76Schristos     Date: Tue, 27 Jan 1998 22:41:52 PST
60593c3a7b76Schristos     From: Vern Paxson <vern>
60603c3a7b76Schristos
60613c3a7b76Schristos     > That requirement involves knowing
60623c3a7b76Schristos     > the character position at which a particular token was matched
60633c3a7b76Schristos     > in the lexer.
60643c3a7b76Schristos
60653c3a7b76Schristos     The way you have to do this is by explicitly keeping track of where
60663c3a7b76Schristos     you are in the file, by counting the number of characters scanned
60673c3a7b76Schristos     for each token (available in yyleng).  It may prove convenient to
60683c3a7b76Schristos     do this by redefining YY_USER_ACTION, as described in the manual.
60693c3a7b76Schristos
60703c3a7b76Schristos     		Vern
60713c3a7b76Schristos
60723c3a7b76Schristos
60733c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-71,  Next: unnamed-faq-72,  Prev: unnamed-faq-70,  Up: FAQ
60743c3a7b76Schristos
60753c3a7b76Schristosunnamed-faq-71
60763c3a7b76Schristos==============
60773c3a7b76Schristos
60783c3a7b76Schristos     To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
60793c3a7b76Schristos     Subject: Re: flex: how to control start condition from parser?
60803c3a7b76Schristos     In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
60813c3a7b76Schristos     Date: Tue, 27 Jan 1998 22:45:37 PST
60823c3a7b76Schristos     From: Vern Paxson <vern>
60833c3a7b76Schristos
60843c3a7b76Schristos     > It seems useful for the parser to be able to tell the lexer about such
60853c3a7b76Schristos     > context dependencies, because then they don't have to be limited to
60863c3a7b76Schristos     > local or sequential context.
60873c3a7b76Schristos
60883c3a7b76Schristos     One way to do this is to have the parser call a stub routine that's
60893c3a7b76Schristos     included in the scanner's .l file, and consequently that has access ot
60903c3a7b76Schristos     BEGIN.  The only ugliness is that the parser can't pass in the state
60913c3a7b76Schristos     it wants, because those aren't visible - but if you don't have many
60923c3a7b76Schristos     such states, then using a different set of names doesn't seem like
60933c3a7b76Schristos     to much of a burden.
60943c3a7b76Schristos
60953c3a7b76Schristos     While generating a .h file like you suggests is certainly cleaner,
60963c3a7b76Schristos     flex development has come to a virtual stand-still :-(, so a workaround
60973c3a7b76Schristos     like the above is much more pragmatic than waiting for a new feature.
60983c3a7b76Schristos
60993c3a7b76Schristos     		Vern
61003c3a7b76Schristos
61013c3a7b76Schristos
61023c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-72,  Next: unnamed-faq-73,  Prev: unnamed-faq-71,  Up: FAQ
61033c3a7b76Schristos
61043c3a7b76Schristosunnamed-faq-72
61053c3a7b76Schristos==============
61063c3a7b76Schristos
61073c3a7b76Schristos     To: Barbara Denny <denny@3com.com>
61083c3a7b76Schristos     Subject: Re: freebsd flex bug?
61093c3a7b76Schristos     In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
61103c3a7b76Schristos     Date: Fri, 30 Jan 1998 12:42:32 PST
61113c3a7b76Schristos     From: Vern Paxson <vern>
61123c3a7b76Schristos
61133c3a7b76Schristos     > lex.yy.c:1996: parse error before `='
61143c3a7b76Schristos
61153c3a7b76Schristos     This is the key, identifying this error.  (It may help to pinpoint
61163c3a7b76Schristos     it by using flex -L, so it doesn't generate #line directives in its
61173c3a7b76Schristos     output.)  I will bet you heavy money that you have a start condition
61183c3a7b76Schristos     name that is also a variable name, or something like that; flex spits
61193c3a7b76Schristos     out #define's for each start condition name, mapping them to a number,
61203c3a7b76Schristos     so you can wind up with:
61213c3a7b76Schristos
61223c3a7b76Schristos     	%x foo
61233c3a7b76Schristos     	%%
61243c3a7b76Schristos     		...
61253c3a7b76Schristos     	%%
61263c3a7b76Schristos     	void bar()
61273c3a7b76Schristos     		{
61283c3a7b76Schristos     		int foo = 3;
61293c3a7b76Schristos     		}
61303c3a7b76Schristos
61313c3a7b76Schristos     and the penultimate will turn into "int 1 = 3" after C preprocessing,
61323c3a7b76Schristos     since flex will put "#define foo 1" in the generated scanner.
61333c3a7b76Schristos
61343c3a7b76Schristos     		Vern
61353c3a7b76Schristos
61363c3a7b76Schristos
61373c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-73,  Next: unnamed-faq-74,  Prev: unnamed-faq-72,  Up: FAQ
61383c3a7b76Schristos
61393c3a7b76Schristosunnamed-faq-73
61403c3a7b76Schristos==============
61413c3a7b76Schristos
61423c3a7b76Schristos     To: Maurice Petrie <mpetrie@infoscigroup.com>
61433c3a7b76Schristos     Subject: Re: Lost flex .l file
61443c3a7b76Schristos     In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
61453c3a7b76Schristos     Date: Mon, 02 Feb 1998 11:15:12 PST
61463c3a7b76Schristos     From: Vern Paxson <vern>
61473c3a7b76Schristos
61483c3a7b76Schristos     > I am curious as to
61493c3a7b76Schristos     > whether there is a simple way to backtrack from the generated source to
61503c3a7b76Schristos     > reproduce the lost list of tokens we are searching on.
61513c3a7b76Schristos
61523c3a7b76Schristos     In theory, it's straight-forward to go from the DFA representation
61533c3a7b76Schristos     back to a regular-expression representation - the two are isomorphic.
61543c3a7b76Schristos     In practice, a huge headache, because you have to unpack all the tables
61553c3a7b76Schristos     back into a single DFA representation, and then write a program to munch
61563c3a7b76Schristos     on that and translate it into an RE.
61573c3a7b76Schristos
61583c3a7b76Schristos     Sorry for the less-than-happy news ...
61593c3a7b76Schristos
61603c3a7b76Schristos     		Vern
61613c3a7b76Schristos
61623c3a7b76Schristos
61633c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-74,  Next: unnamed-faq-75,  Prev: unnamed-faq-73,  Up: FAQ
61643c3a7b76Schristos
61653c3a7b76Schristosunnamed-faq-74
61663c3a7b76Schristos==============
61673c3a7b76Schristos
61683c3a7b76Schristos     To: jimmey@lexis-nexis.com (Jimmey Todd)
61693c3a7b76Schristos     Subject: Re: Flex performance question
61703c3a7b76Schristos     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
61713c3a7b76Schristos     Date: Thu, 19 Feb 1998 08:48:51 PST
61723c3a7b76Schristos     From: Vern Paxson <vern>
61733c3a7b76Schristos
61743c3a7b76Schristos     > What I have found, is that the smaller the data chunk, the faster the
61753c3a7b76Schristos     > program executes. This is the opposite of what I expected. Should this be
61763c3a7b76Schristos     > happening this way?
61773c3a7b76Schristos
61783c3a7b76Schristos     This is exactly what will happen if your input file has embedded NULs.
61793c3a7b76Schristos     From the man page:
61803c3a7b76Schristos
61813c3a7b76Schristos     A final note: flex is slow when matching NUL's, particularly
61823c3a7b76Schristos     when  a  token  contains multiple NUL's.  It's best to write
61833c3a7b76Schristos     rules which match short amounts of text if it's  anticipated
61843c3a7b76Schristos     that the text will often include NUL's.
61853c3a7b76Schristos
61863c3a7b76Schristos     So that's the first thing to look for.
61873c3a7b76Schristos
61883c3a7b76Schristos     		Vern
61893c3a7b76Schristos
61903c3a7b76Schristos
61913c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-75,  Next: unnamed-faq-76,  Prev: unnamed-faq-74,  Up: FAQ
61923c3a7b76Schristos
61933c3a7b76Schristosunnamed-faq-75
61943c3a7b76Schristos==============
61953c3a7b76Schristos
61963c3a7b76Schristos     To: jimmey@lexis-nexis.com (Jimmey Todd)
61973c3a7b76Schristos     Subject: Re: Flex performance question
61983c3a7b76Schristos     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
61993c3a7b76Schristos     Date: Thu, 19 Feb 1998 15:42:25 PST
62003c3a7b76Schristos     From: Vern Paxson <vern>
62013c3a7b76Schristos
62023c3a7b76Schristos     So there are several problems.
62033c3a7b76Schristos
62043c3a7b76Schristos     First, to go fast, you want to match as much text as possible, which
62053c3a7b76Schristos     your scanners don't in the case that what they're scanning is *not*
62063c3a7b76Schristos     a <RN> tag.  So you want a rule like:
62073c3a7b76Schristos
62083c3a7b76Schristos     	[^<]+
62093c3a7b76Schristos
62103c3a7b76Schristos     Second, C++ scanners are particularly slow if they're interactive,
62113c3a7b76Schristos     which they are by default.  Using -B speeds it up by a factor of 3-4
62123c3a7b76Schristos     on my workstation.
62133c3a7b76Schristos
62143c3a7b76Schristos     Third, C++ scanners that use the istream interface are slow, because
62153c3a7b76Schristos     of how poorly implemented istream's are.  I built two versions of
62163c3a7b76Schristos     the following scanner:
62173c3a7b76Schristos
62183c3a7b76Schristos     	%%
62193c3a7b76Schristos     	.*\n
62203c3a7b76Schristos     	.*
62213c3a7b76Schristos     	%%
62223c3a7b76Schristos
62233c3a7b76Schristos     and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
62243c3a7b76Schristos     The C++ istream version, using -B, takes 3.8 seconds.
62253c3a7b76Schristos
62263c3a7b76Schristos     		Vern
62273c3a7b76Schristos
62283c3a7b76Schristos
62293c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-76,  Next: unnamed-faq-77,  Prev: unnamed-faq-75,  Up: FAQ
62303c3a7b76Schristos
62313c3a7b76Schristosunnamed-faq-76
62323c3a7b76Schristos==============
62333c3a7b76Schristos
62343c3a7b76Schristos     To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com>
62353c3a7b76Schristos     Subject: Re: FLEX 2.5 & THE YEAR 2000
62363c3a7b76Schristos     In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT.
62373c3a7b76Schristos     Date: Wed, 03 Jun 1998 10:22:26 PDT
62383c3a7b76Schristos     From: Vern Paxson <vern>
62393c3a7b76Schristos
62403c3a7b76Schristos     > I am researching the Y2K problem with General Electric R&D
62413c3a7b76Schristos     > and need to know if there are any known issues concerning
62423c3a7b76Schristos     > the above mentioned software and Y2K regardless of version.
62433c3a7b76Schristos
62443c3a7b76Schristos     There shouldn't be, all it ever does with the date is ask the system
62453c3a7b76Schristos     for it and then print it out.
62463c3a7b76Schristos
62473c3a7b76Schristos     		Vern
62483c3a7b76Schristos
62493c3a7b76Schristos
62503c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-77,  Next: unnamed-faq-78,  Prev: unnamed-faq-76,  Up: FAQ
62513c3a7b76Schristos
62523c3a7b76Schristosunnamed-faq-77
62533c3a7b76Schristos==============
62543c3a7b76Schristos
62553c3a7b76Schristos     To: "Hans Dermot Doran" <htd@ibhdoran.com>
62563c3a7b76Schristos     Subject: Re: flex problem
62573c3a7b76Schristos     In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT.
62583c3a7b76Schristos     Date: Tue, 21 Jul 1998 14:23:34 PDT
62593c3a7b76Schristos     From: Vern Paxson <vern>
62603c3a7b76Schristos
62613c3a7b76Schristos     > To overcome this, I gets() the stdin into a string and lex the string. The
62623c3a7b76Schristos     > string is lexed OK except that the end of string isn't lexed properly
62633c3a7b76Schristos     > (yy_scan_string()), that is the lexer dosn't recognise the end of string.
62643c3a7b76Schristos
62653c3a7b76Schristos     Flex doesn't contain mechanisms for recognizing buffer endpoints.  But if
62663c3a7b76Schristos     you use fgets instead (which you should anyway, to protect against buffer
62673c3a7b76Schristos     overflows), then the final \n will be preserved in the string, and you can
62683c3a7b76Schristos     scan that in order to find the end of the string.
62693c3a7b76Schristos
62703c3a7b76Schristos     		Vern
62713c3a7b76Schristos
62723c3a7b76Schristos
62733c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-78,  Next: unnamed-faq-79,  Prev: unnamed-faq-77,  Up: FAQ
62743c3a7b76Schristos
62753c3a7b76Schristosunnamed-faq-78
62763c3a7b76Schristos==============
62773c3a7b76Schristos
62783c3a7b76Schristos     To: soumen@almaden.ibm.com
62793c3a7b76Schristos     Subject: Re: Flex++ 2.5.3 instance member vs. static member
62803c3a7b76Schristos     In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT.
62813c3a7b76Schristos     Date: Tue, 28 Jul 1998 01:10:34 PDT
62823c3a7b76Schristos     From: Vern Paxson <vern>
62833c3a7b76Schristos
62843c3a7b76Schristos     > %{
62853c3a7b76Schristos     > int mylineno = 0;
62863c3a7b76Schristos     > %}
62873c3a7b76Schristos     > ws      [ \t]+
62883c3a7b76Schristos     > alpha   [A-Za-z]
62893c3a7b76Schristos     > dig     [0-9]
62903c3a7b76Schristos     > %%
62913c3a7b76Schristos     >
62923c3a7b76Schristos     > Now you'd expect mylineno to be a member of each instance of class
62933c3a7b76Schristos     > yyFlexLexer, but is this the case?  A look at the lex.yy.cc file seems to
62943c3a7b76Schristos     > indicate otherwise; unless I am missing something the declaration of
62953c3a7b76Schristos     > mylineno seems to be outside any class scope.
62963c3a7b76Schristos     >
62973c3a7b76Schristos     > How will this work if I want to run a multi-threaded application with each
62983c3a7b76Schristos     > thread creating a FlexLexer instance?
62993c3a7b76Schristos
63003c3a7b76Schristos     Derive your own subclass and make mylineno a member variable of it.
63013c3a7b76Schristos
63023c3a7b76Schristos     		Vern
63033c3a7b76Schristos
63043c3a7b76Schristos
63053c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-79,  Next: unnamed-faq-80,  Prev: unnamed-faq-78,  Up: FAQ
63063c3a7b76Schristos
63073c3a7b76Schristosunnamed-faq-79
63083c3a7b76Schristos==============
63093c3a7b76Schristos
63103c3a7b76Schristos     To: Adoram Rogel <adoram@hybridge.com>
63113c3a7b76Schristos     Subject: Re: More than 32K states change hangs
63123c3a7b76Schristos     In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT.
63133c3a7b76Schristos     Date: Tue, 04 Aug 1998 22:28:45 PDT
63143c3a7b76Schristos     From: Vern Paxson <vern>
63153c3a7b76Schristos
63163c3a7b76Schristos     > Vern Paxson,
63173c3a7b76Schristos     >
63183c3a7b76Schristos     > I followed your advice, posted on Usenet bu you, and emailed to me
63193c3a7b76Schristos     > personally by you, on how to overcome the 32K states limit. I'm running
63203c3a7b76Schristos     > on Linux machines.
63213c3a7b76Schristos     > I took the full source of version 2.5.4 and did the following changes in
63223c3a7b76Schristos     > flexdef.h:
63233c3a7b76Schristos     > #define JAMSTATE -327660
63243c3a7b76Schristos     > #define MAXIMUM_MNS 319990
63253c3a7b76Schristos     > #define BAD_SUBSCRIPT -327670
63263c3a7b76Schristos     > #define MAX_SHORT 327000
63273c3a7b76Schristos     >
63283c3a7b76Schristos     > and compiled.
63293c3a7b76Schristos     > All looked fine, including check and bigcheck, so I installed.
63303c3a7b76Schristos
63313c3a7b76Schristos     Hmmm, you shouldn't increase MAX_SHORT, though looking through my email
63323c3a7b76Schristos     archives I see that I did indeed recommend doing so.  Try setting it back
63333c3a7b76Schristos     to 32700; that should suffice that you no longer need -Ca.  If it still
63343c3a7b76Schristos     hangs, then the interesting question is - where?
63353c3a7b76Schristos
63363c3a7b76Schristos     > Compiling the same hanged program with a out-of-the-box (RedHat 4.2
63373c3a7b76Schristos     > distribution of Linux)
63383c3a7b76Schristos     > flex 2.5.4 binary works.
63393c3a7b76Schristos
63403c3a7b76Schristos     Since Linux comes with source code, you should diff it against what
63413c3a7b76Schristos     you have to see what problems they missed.
63423c3a7b76Schristos
63433c3a7b76Schristos     > Should I always compile with the -Ca option now ? even short and simple
63443c3a7b76Schristos     > filters ?
63453c3a7b76Schristos
63463c3a7b76Schristos     No, definitely not.  It's meant to be for those situations where you
63473c3a7b76Schristos     absolutely must squeeze every last cycle out of your scanner.
63483c3a7b76Schristos
63493c3a7b76Schristos     		Vern
63503c3a7b76Schristos
63513c3a7b76Schristos
63523c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-80,  Next: unnamed-faq-81,  Prev: unnamed-faq-79,  Up: FAQ
63533c3a7b76Schristos
63543c3a7b76Schristosunnamed-faq-80
63553c3a7b76Schristos==============
63563c3a7b76Schristos
63573c3a7b76Schristos     To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com>
63583c3a7b76Schristos     Subject: Re: flex output for static code portion
63593c3a7b76Schristos     In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT.
63603c3a7b76Schristos     Date: Mon, 17 Aug 1998 23:57:42 PDT
63613c3a7b76Schristos     From: Vern Paxson <vern>
63623c3a7b76Schristos
63633c3a7b76Schristos     > I would like to use flex under the hood to generate a binary file
63643c3a7b76Schristos     > containing the data structures that control the parse.
63653c3a7b76Schristos
63663c3a7b76Schristos     This has been on the wish-list for a long time.  In principle it's
63673c3a7b76Schristos     straight-forward - you redirect mkdata() et al's I/O to another file,
63683c3a7b76Schristos     and modify the skeleton to have a start-up function that slurps these
63693c3a7b76Schristos     into dynamic arrays.  The concerns are (1) the scanner generation code
63703c3a7b76Schristos     is hairy and full of corner cases, so it's easy to get surprised when
63713c3a7b76Schristos     going down this path :-( ; and (2) being careful about buffering so
63723c3a7b76Schristos     that when the tables change you make sure the scanner starts in the
63733c3a7b76Schristos     correct state and reading at the right point in the input file.
63743c3a7b76Schristos
63753c3a7b76Schristos     > I was wondering if you know of anyone who has used flex in this way.
63763c3a7b76Schristos
63773c3a7b76Schristos     I don't - but it seems like a reasonable project to undertake (unlike
63783c3a7b76Schristos     numerous other flex tweaks :-).
63793c3a7b76Schristos
63803c3a7b76Schristos     		Vern
63813c3a7b76Schristos
63823c3a7b76Schristos
63833c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-81,  Next: unnamed-faq-82,  Prev: unnamed-faq-80,  Up: FAQ
63843c3a7b76Schristos
63853c3a7b76Schristosunnamed-faq-81
63863c3a7b76Schristos==============
63873c3a7b76Schristos
63883c3a7b76Schristos     Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11])
63893c3a7b76Schristos     	by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838
63903c3a7b76Schristos     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT)
63913c3a7b76Schristos     Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2])
63923c3a7b76Schristos     	by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694
63933c3a7b76Schristos     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200
63943c3a7b76Schristos     Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200
63953c3a7b76Schristos     From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de>
63963c3a7b76Schristos     Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de>
63973c3a7b76Schristos     Subject: "flex scanner push-back overflow"
63983c3a7b76Schristos     To: vern@ee.lbl.gov
63993c3a7b76Schristos     Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST)
64003c3a7b76Schristos     Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
64013c3a7b76Schristos     X-NoJunk: Do NOT send commercial mail, spam or ads to this address!
64023c3a7b76Schristos     X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/
64033c3a7b76Schristos     X-Mailer: ELM [version 2.4ME+ PL28 (25)]
64043c3a7b76Schristos     MIME-Version: 1.0
64053c3a7b76Schristos     Content-Type: text/plain; charset=US-ASCII
64063c3a7b76Schristos     Content-Transfer-Encoding: 7bit
64073c3a7b76Schristos
64083c3a7b76Schristos     Hi Vern,
64093c3a7b76Schristos
64103c3a7b76Schristos     Yesterday, I encountered a strange problem: I use the macro processor m4
64113c3a7b76Schristos     to include some lengthy lists into a .l file. Following is a flex macro
64123c3a7b76Schristos     definition that causes some serious pain in my neck:
64133c3a7b76Schristos
64143c3a7b76Schristos     AUTHOR           ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...])
64153c3a7b76Schristos
64163c3a7b76Schristos     The complete list contains about 10kB. When I try to "flex" this file
64173c3a7b76Schristos     (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
64183c3a7b76Schristos     some of the predefined values in flexdefs.h) I get the error:
64193c3a7b76Schristos
64203c3a7b76Schristos     myflex/flex -8  sentag.tmp.l
64213c3a7b76Schristos     flex scanner push-back overflow
64223c3a7b76Schristos
64233c3a7b76Schristos     When I remove the slashes in the macro definition everything works fine.
64243c3a7b76Schristos     As I understand it, the double quotes escape the slash-character so it
64253c3a7b76Schristos     really means "/" and not "trailing context". Furthermore, I tried to
64263c3a7b76Schristos     escape the slashes with backslashes, but with no use, the same error message
64273c3a7b76Schristos     appeared when flexing the code.
64283c3a7b76Schristos
64293c3a7b76Schristos     Do you have an idea what's going on here?
64303c3a7b76Schristos
64313c3a7b76Schristos     Greetings from Germany,
64323c3a7b76Schristos     	Georg
64333c3a7b76Schristos     --
64343c3a7b76Schristos     Georg Rehm                                     georg@cl-ki.uni-osnabrueck.de
64353c3a7b76Schristos     Institute for Semantic Information Processing, University of Osnabrueck, FRG
64363c3a7b76Schristos
64373c3a7b76Schristos
64383c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-82,  Next: unnamed-faq-83,  Prev: unnamed-faq-81,  Up: FAQ
64393c3a7b76Schristos
64403c3a7b76Schristosunnamed-faq-82
64413c3a7b76Schristos==============
64423c3a7b76Schristos
64433c3a7b76Schristos     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
64443c3a7b76Schristos     Subject: Re: "flex scanner push-back overflow"
64453c3a7b76Schristos     In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT.
64463c3a7b76Schristos     Date: Thu, 20 Aug 1998 07:05:35 PDT
64473c3a7b76Schristos     From: Vern Paxson <vern>
64483c3a7b76Schristos
64493c3a7b76Schristos     > myflex/flex -8  sentag.tmp.l
64503c3a7b76Schristos     > flex scanner push-back overflow
64513c3a7b76Schristos
64523c3a7b76Schristos     Flex itself uses a flex scanner.  That scanner is running out of buffer
64533c3a7b76Schristos     space when it tries to unput() the humongous macro you've defined.  When
64543c3a7b76Schristos     you remove the '/'s, you make it small enough so that it fits in the buffer;
64553c3a7b76Schristos     removing spaces would do the same thing.
64563c3a7b76Schristos
64573c3a7b76Schristos     The fix is to either rethink how come you're using such a big macro and
64583c3a7b76Schristos     perhaps there's another/better way to do it; or to rebuild flex's own
64593c3a7b76Schristos     scan.c with a larger value for
64603c3a7b76Schristos
64613c3a7b76Schristos     	#define YY_BUF_SIZE 16384
64623c3a7b76Schristos
64633c3a7b76Schristos     - Vern
64643c3a7b76Schristos
64653c3a7b76Schristos
64663c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-83,  Next: unnamed-faq-84,  Prev: unnamed-faq-82,  Up: FAQ
64673c3a7b76Schristos
64683c3a7b76Schristosunnamed-faq-83
64693c3a7b76Schristos==============
64703c3a7b76Schristos
64713c3a7b76Schristos     To: Jan Kort <jan@research.techforce.nl>
64723c3a7b76Schristos     Subject: Re: Flex
64733c3a7b76Schristos     In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200.
64743c3a7b76Schristos     Date: Sat, 05 Sep 1998 00:59:49 PDT
64753c3a7b76Schristos     From: Vern Paxson <vern>
64763c3a7b76Schristos
64773c3a7b76Schristos     > %%
64783c3a7b76Schristos     >
64793c3a7b76Schristos     > "TEST1\n"       { fprintf(stderr, "TEST1\n"); yyless(5); }
64803c3a7b76Schristos     > ^\n             { fprintf(stderr, "empty line\n"); }
64813c3a7b76Schristos     > .               { }
64823c3a7b76Schristos     > \n              { fprintf(stderr, "new line\n"); }
64833c3a7b76Schristos     >
64843c3a7b76Schristos     > %%
64853c3a7b76Schristos     > -- input ---------------------------------------
64863c3a7b76Schristos     > TEST1
64873c3a7b76Schristos     > -- output --------------------------------------
64883c3a7b76Schristos     > TEST1
64893c3a7b76Schristos     > empty line
64903c3a7b76Schristos     > ------------------------------------------------
64913c3a7b76Schristos
64923c3a7b76Schristos     IMHO, it's not clear whether or not this is in fact a bug.  It depends
64933c3a7b76Schristos     on whether you view yyless() as backing up in the input stream, or as
64943c3a7b76Schristos     pushing new characters onto the beginning of the input stream.  Flex
64953c3a7b76Schristos     interprets it as the latter (for implementation convenience, I'll admit),
64963c3a7b76Schristos     and so considers the newline as in fact matching at the beginning of a
64973c3a7b76Schristos     line, as after all the last token scanned an entire line and so the
64983c3a7b76Schristos     scanner is now at the beginning of a new line.
64993c3a7b76Schristos
65003c3a7b76Schristos     I agree that this is counter-intuitive for yyless(), given its
65013c3a7b76Schristos     functional description (it's less so for unput(), depending on whether
65023c3a7b76Schristos     you're unput()'ing new text or scanned text).  But I don't plan to
65033c3a7b76Schristos     change it any time soon, as it's a pain to do so.  Consequently,
65043c3a7b76Schristos     you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak
65053c3a7b76Schristos     your scanner into the behavior you desire.
65063c3a7b76Schristos
65073c3a7b76Schristos     Sorry for the less-than-completely-satisfactory answer.
65083c3a7b76Schristos
65093c3a7b76Schristos     		Vern
65103c3a7b76Schristos
65113c3a7b76Schristos
65123c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-84,  Next: unnamed-faq-85,  Prev: unnamed-faq-83,  Up: FAQ
65133c3a7b76Schristos
65143c3a7b76Schristosunnamed-faq-84
65153c3a7b76Schristos==============
65163c3a7b76Schristos
65173c3a7b76Schristos     To: Patrick Krusenotto <krusenot@mac-info-link.de>
65183c3a7b76Schristos     Subject: Re: Problems with restarting flex-2.5.2-generated scanner
65193c3a7b76Schristos     In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT.
65203c3a7b76Schristos     Date: Thu, 24 Sep 1998 23:28:43 PDT
65213c3a7b76Schristos     From: Vern Paxson <vern>
65223c3a7b76Schristos
65233c3a7b76Schristos     > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately
65243c3a7b76Schristos     > trying to make my scanner restart with a new file after my parser stops
65253c3a7b76Schristos     > with a parse error. When my compiler restarts, the parser always
65263c3a7b76Schristos     > receives the token after the token (in the old file!) that caused the
65273c3a7b76Schristos     > parser error.
65283c3a7b76Schristos
65293c3a7b76Schristos     I suspect the problem is that your parser has read ahead in order
65303c3a7b76Schristos     to attempt to resolve an ambiguity, and when it's restarted it picks
65313c3a7b76Schristos     up with that token rather than reading a fresh one.  If you're using
65323c3a7b76Schristos     yacc, then the special "error" production can sometimes be used to
65333c3a7b76Schristos     consume tokens in an attempt to get the parser into a consistent state.
65343c3a7b76Schristos
65353c3a7b76Schristos     		Vern
65363c3a7b76Schristos
65373c3a7b76Schristos
65383c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-85,  Next: unnamed-faq-86,  Prev: unnamed-faq-84,  Up: FAQ
65393c3a7b76Schristos
65403c3a7b76Schristosunnamed-faq-85
65413c3a7b76Schristos==============
65423c3a7b76Schristos
65433c3a7b76Schristos     To: Henric Jungheim <junghelh@pe-nelson.com>
65443c3a7b76Schristos     Subject: Re: flex 2.5.4a
65453c3a7b76Schristos     In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST.
65463c3a7b76Schristos     Date: Tue, 27 Oct 1998 16:50:14 PST
65473c3a7b76Schristos     From: Vern Paxson <vern>
65483c3a7b76Schristos
65493c3a7b76Schristos     > This brings up a feature request:  How about a command line
65503c3a7b76Schristos     > option to specify the filename when reading from stdin?  That way one
65513c3a7b76Schristos     > doesn't need to create a temporary file in order to get the "#line"
65523c3a7b76Schristos     > directives to make sense.
65533c3a7b76Schristos
65543c3a7b76Schristos     Use -o combined with -t (per the man page description of -o).
65553c3a7b76Schristos
65563c3a7b76Schristos     > P.S., Is there any simple way to use non-blocking IO to parse multiple
65573c3a7b76Schristos     > streams?
65583c3a7b76Schristos
65593c3a7b76Schristos     Simple, no.
65603c3a7b76Schristos
65613c3a7b76Schristos     One approach might be to return a magic character on EWOULDBLOCK and
65623c3a7b76Schristos     have a rule
65633c3a7b76Schristos
65643c3a7b76Schristos     	.*<magic-character>	// put back .*, eat magic character
65653c3a7b76Schristos
65663c3a7b76Schristos     This is off the top of my head, not sure it'll work.
65673c3a7b76Schristos
65683c3a7b76Schristos     		Vern
65693c3a7b76Schristos
65703c3a7b76Schristos
65713c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-86,  Next: unnamed-faq-87,  Prev: unnamed-faq-85,  Up: FAQ
65723c3a7b76Schristos
65733c3a7b76Schristosunnamed-faq-86
65743c3a7b76Schristos==============
65753c3a7b76Schristos
65763c3a7b76Schristos     To: "Repko, Billy D" <billy.d.repko@intel.com>
65773c3a7b76Schristos     Subject: Re: Compiling scanners
65783c3a7b76Schristos     In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST.
65793c3a7b76Schristos     Date: Thu, 14 Jan 1999 00:25:30 PST
65803c3a7b76Schristos     From: Vern Paxson <vern>
65813c3a7b76Schristos
65823c3a7b76Schristos     > It appears that maybe it cannot find the lfl library.
65833c3a7b76Schristos
65843c3a7b76Schristos     The Makefile in the distribution builds it, so you should have it.
65853c3a7b76Schristos     It's exceedingly trivial, just a main() that calls yylex() and
65863c3a7b76Schristos     a yyrap() that always returns 1.
65873c3a7b76Schristos
65883c3a7b76Schristos     > %%
65893c3a7b76Schristos     >       \n      ++num_lines; ++num_chars;
65903c3a7b76Schristos     >       .       ++num_chars;
65913c3a7b76Schristos
65923c3a7b76Schristos     You can't indent your rules like this - that's where the errors are coming
65933c3a7b76Schristos     from.  Flex copies indented text to the output file, it's how you do things
65943c3a7b76Schristos     like
65953c3a7b76Schristos
65963c3a7b76Schristos     	int num_lines_seen = 0;
65973c3a7b76Schristos
65983c3a7b76Schristos     to declare local variables.
65993c3a7b76Schristos
66003c3a7b76Schristos     		Vern
66013c3a7b76Schristos
66023c3a7b76Schristos
66033c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-87,  Next: unnamed-faq-88,  Prev: unnamed-faq-86,  Up: FAQ
66043c3a7b76Schristos
66053c3a7b76Schristosunnamed-faq-87
66063c3a7b76Schristos==============
66073c3a7b76Schristos
66083c3a7b76Schristos     To: Erick Branderhorst <Erick.Branderhorst@asml.nl>
66093c3a7b76Schristos     Subject: Re: flex input buffer
66103c3a7b76Schristos     In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST.
66113c3a7b76Schristos     Date: Tue, 09 Feb 1999 21:03:37 PST
66123c3a7b76Schristos     From: Vern Paxson <vern>
66133c3a7b76Schristos
66143c3a7b76Schristos     > In the flex.skl file the size of the default input buffers is set.  Can you
66153c3a7b76Schristos     > explain why this size is set and why it is such a high number.
66163c3a7b76Schristos
66173c3a7b76Schristos     It's large to optimize performance when scanning large files.  You can
66183c3a7b76Schristos     safely make it a lot lower if needed.
66193c3a7b76Schristos
66203c3a7b76Schristos     		Vern
66213c3a7b76Schristos
66223c3a7b76Schristos
66233c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-88,  Next: unnamed-faq-90,  Prev: unnamed-faq-87,  Up: FAQ
66243c3a7b76Schristos
66253c3a7b76Schristosunnamed-faq-88
66263c3a7b76Schristos==============
66273c3a7b76Schristos
66283c3a7b76Schristos     To: "Guido Minnen" <guidomi@cogs.susx.ac.uk>
66293c3a7b76Schristos     Subject: Re: Flex error message
66303c3a7b76Schristos     In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST.
66313c3a7b76Schristos     Date: Thu, 25 Feb 1999 00:11:31 PST
66323c3a7b76Schristos     From: Vern Paxson <vern>
66333c3a7b76Schristos
66343c3a7b76Schristos     > I'm extending a larger scanner written in Flex and I keep running into
66353c3a7b76Schristos     > problems. More specifically, I get the error message:
66363c3a7b76Schristos     > "flex: input rules are too complicated (>= 32000 NFA states)"
66373c3a7b76Schristos
66383c3a7b76Schristos     Increase the definitions in flexdef.h for:
66393c3a7b76Schristos
66403c3a7b76Schristos     #define JAMSTATE -32766 /* marks a reference to the state that always j
66413c3a7b76Schristos     ams */
66423c3a7b76Schristos     #define MAXIMUM_MNS 31999
66433c3a7b76Schristos     #define BAD_SUBSCRIPT -32767
66443c3a7b76Schristos
66453c3a7b76Schristos     recompile everything, and it should all work.
66463c3a7b76Schristos
66473c3a7b76Schristos     		Vern
66483c3a7b76Schristos
66493c3a7b76Schristos
66503c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-90,  Next: unnamed-faq-91,  Prev: unnamed-faq-88,  Up: FAQ
66513c3a7b76Schristos
66523c3a7b76Schristosunnamed-faq-90
66533c3a7b76Schristos==============
66543c3a7b76Schristos
66553c3a7b76Schristos     To: "Dmitriy Goldobin" <gold@ems.chel.su>
66563c3a7b76Schristos     Subject: Re: FLEX trouble
66573c3a7b76Schristos     In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT.
66583c3a7b76Schristos     Date: Tue, 01 Jun 1999 00:15:07 PDT
66593c3a7b76Schristos     From: Vern Paxson <vern>
66603c3a7b76Schristos
66613c3a7b76Schristos     >   I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20
66623c3a7b76Schristos     > but rule "/*"(.|\n)*"*/" don't work ?
66633c3a7b76Schristos
66643c3a7b76Schristos     The second of these will have to scan the entire input stream (because
66653c3a7b76Schristos     "(.|\n)*" matches an arbitrary amount of any text) in order to see if
66663c3a7b76Schristos     it ends with "*/", terminating the comment.  That potentially will overflow
66673c3a7b76Schristos     the input buffer.
66683c3a7b76Schristos
66693c3a7b76Schristos     >   More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error
66703c3a7b76Schristos     > 'unrecognized rule'.
66713c3a7b76Schristos
66723c3a7b76Schristos     You can't use the '/' operator inside parentheses.  It's not clear
66733c3a7b76Schristos     what "(a/b)*" actually means.
66743c3a7b76Schristos
66753c3a7b76Schristos     >   I now use workaround with state <comment>, but single-rule is
66763c3a7b76Schristos     > better, i think.
66773c3a7b76Schristos
66783c3a7b76Schristos     Single-rule is nice but will always have the problem of either setting
66793c3a7b76Schristos     restrictions on comments (like not allowing multi-line comments) and/or
66803c3a7b76Schristos     running the risk of consuming the entire input stream, as noted above.
66813c3a7b76Schristos
66823c3a7b76Schristos     		Vern
66833c3a7b76Schristos
66843c3a7b76Schristos
66853c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-91,  Next: unnamed-faq-92,  Prev: unnamed-faq-90,  Up: FAQ
66863c3a7b76Schristos
66873c3a7b76Schristosunnamed-faq-91
66883c3a7b76Schristos==============
66893c3a7b76Schristos
66903c3a7b76Schristos     Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18])
66913c3a7b76Schristos     	by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100
66923c3a7b76Schristos     	for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT)
66933c3a7b76Schristos     Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999
66943c3a7b76Schristos     To: vern@ee.lbl.gov
66953c3a7b76Schristos     Date: Tue, 15 Jun 1999 08:55:43 -0700
66963c3a7b76Schristos     From: "Aki Niimura" <neko@my-deja.com>
66973c3a7b76Schristos     Message-ID: <KNONDOHDOBGAEAAA@my-deja.com>
66983c3a7b76Schristos     Mime-Version: 1.0
66993c3a7b76Schristos     Cc:
67003c3a7b76Schristos     X-Sent-Mail: on
67013c3a7b76Schristos     Reply-To:
67023c3a7b76Schristos     X-Mailer: MailCity Service
67033c3a7b76Schristos     Subject: A question on flex C++ scanner
67043c3a7b76Schristos     X-Sender-Ip: 12.72.207.61
67053c3a7b76Schristos     Organization: My Deja Email  (http://www.my-deja.com:80)
67063c3a7b76Schristos     Content-Type: text/plain; charset=us-ascii
67073c3a7b76Schristos     Content-Transfer-Encoding: 7bit
67083c3a7b76Schristos
67093c3a7b76Schristos     Dear Dr. Paxon,
67103c3a7b76Schristos
67113c3a7b76Schristos     I have been using flex for years.
67123c3a7b76Schristos     It works very well on many projects.
67133c3a7b76Schristos     Most case, I used it to generate a scanner on C language.
67143c3a7b76Schristos     However, one project I needed to generate  a scanner
67153c3a7b76Schristos     on C++ lanuage. Thanks to your enhancement, flex did
67163c3a7b76Schristos     the job.
67173c3a7b76Schristos
67183c3a7b76Schristos     Currently, I'm working on enhancing my previous project.
67193c3a7b76Schristos     I need to deal with multiple input streams (recursive
67203c3a7b76Schristos     inclusion) in this scanner (C++).
67213c3a7b76Schristos     I did similar thing for another scanner (C) as you
67223c3a7b76Schristos     explained in your documentation.
67233c3a7b76Schristos
67243c3a7b76Schristos     The generated scanner (C++) has necessary methods:
67253c3a7b76Schristos     - switch_to_buffer(struct yy_buffer_state *b)
67263c3a7b76Schristos     - yy_create_buffer(istream *is, int sz)
67273c3a7b76Schristos     - yy_delete_buffer(struct yy_buffer_state *b)
67283c3a7b76Schristos
67293c3a7b76Schristos     However, I couldn't figure out how to access current
67303c3a7b76Schristos     buffer (yy_current_buffer).
67313c3a7b76Schristos
67323c3a7b76Schristos     yy_current_buffer is a protected member of yyFlexLexer.
67333c3a7b76Schristos     I can't access it directly.
67343c3a7b76Schristos     Then, I thought yy_create_buffer() with is = 0 might
67353c3a7b76Schristos     return current stream buffer. But it seems not as far
67363c3a7b76Schristos     as I checked the source. (flex 2.5.4)
67373c3a7b76Schristos
67383c3a7b76Schristos     I went through the Web in addition to Flex documentation.
67393c3a7b76Schristos     However, it hasn't been successful, so far.
67403c3a7b76Schristos
67413c3a7b76Schristos     It is not my intention to bother you, but, can you
67423c3a7b76Schristos     comment about how to obtain the current stream buffer?
67433c3a7b76Schristos
67443c3a7b76Schristos     Your response would be highly appreciated.
67453c3a7b76Schristos
67463c3a7b76Schristos     Best regards,
67473c3a7b76Schristos     Aki Niimura
67483c3a7b76Schristos
67493c3a7b76Schristos     --== Sent via Deja.com http://www.deja.com/ ==--
67503c3a7b76Schristos     Share what you know. Learn what you don't.
67513c3a7b76Schristos
67523c3a7b76Schristos
67533c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-92,  Next: unnamed-faq-93,  Prev: unnamed-faq-91,  Up: FAQ
67543c3a7b76Schristos
67553c3a7b76Schristosunnamed-faq-92
67563c3a7b76Schristos==============
67573c3a7b76Schristos
67583c3a7b76Schristos     To: neko@my-deja.com
67593c3a7b76Schristos     Subject: Re: A question on flex C++ scanner
67603c3a7b76Schristos     In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT.
67613c3a7b76Schristos     Date: Tue, 15 Jun 1999 09:04:24 PDT
67623c3a7b76Schristos     From: Vern Paxson <vern>
67633c3a7b76Schristos
67643c3a7b76Schristos     > However, I couldn't figure out how to access current
67653c3a7b76Schristos     > buffer (yy_current_buffer).
67663c3a7b76Schristos
67673c3a7b76Schristos     Derive your own subclass from yyFlexLexer.
67683c3a7b76Schristos
67693c3a7b76Schristos     		Vern
67703c3a7b76Schristos
67713c3a7b76Schristos
67723c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-93,  Next: unnamed-faq-94,  Prev: unnamed-faq-92,  Up: FAQ
67733c3a7b76Schristos
67743c3a7b76Schristosunnamed-faq-93
67753c3a7b76Schristos==============
67763c3a7b76Schristos
67773c3a7b76Schristos     To: "Stones, Darren" <Darren.Stones@nectech.co.uk>
67783c3a7b76Schristos     Subject: Re: You're the man to see?
67793c3a7b76Schristos     In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT.
67803c3a7b76Schristos     Date: Wed, 23 Jun 1999 09:01:40 PDT
67813c3a7b76Schristos     From: Vern Paxson <vern>
67823c3a7b76Schristos
67833c3a7b76Schristos     > I hope you can help me.  I am using Flex and Bison to produce an interpreted
67843c3a7b76Schristos     > language.  However all goes well until I try to implement an IF statement or
67853c3a7b76Schristos     > a WHILE.  I cannot get this to work as the parser parses all the conditions
67863c3a7b76Schristos     > eg. the TRUE and FALSE conditons to check for a rule match.  So I cannot
67873c3a7b76Schristos     > make a decision!!
67883c3a7b76Schristos
67893c3a7b76Schristos     You need to use the parser to build a parse tree (= abstract syntax trwee),
67903c3a7b76Schristos     and when that's all done you recursively evaluate the tree, binding variables
67913c3a7b76Schristos     to values at that time.
67923c3a7b76Schristos
67933c3a7b76Schristos     		Vern
67943c3a7b76Schristos
67953c3a7b76Schristos
67963c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-94,  Next: unnamed-faq-95,  Prev: unnamed-faq-93,  Up: FAQ
67973c3a7b76Schristos
67983c3a7b76Schristosunnamed-faq-94
67993c3a7b76Schristos==============
68003c3a7b76Schristos
68013c3a7b76Schristos     To: Petr Danecek <petr@ics.cas.cz>
68023c3a7b76Schristos     Subject: Re: flex - question
68033c3a7b76Schristos     In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT.
68043c3a7b76Schristos     Date: Fri, 02 Jul 1999 16:52:13 PDT
68053c3a7b76Schristos     From: Vern Paxson <vern>
68063c3a7b76Schristos
68073c3a7b76Schristos     > file, it takes an enormous amount of time. It is funny, because the
68083c3a7b76Schristos     > source code has only 12 rules!!! I think it looks like an exponencial
68093c3a7b76Schristos     > growth.
68103c3a7b76Schristos
68113c3a7b76Schristos     Right, that's the problem - some patterns (those with a lot of
68123c3a7b76Schristos     ambiguity, where yours has because at any given time the scanner can
68133c3a7b76Schristos     be in the middle of all sorts of combinations of the different
68143c3a7b76Schristos     rules) blow up exponentially.
68153c3a7b76Schristos
68163c3a7b76Schristos     For your rules, there is an easy fix.  Change the ".*" that comes fater
68173c3a7b76Schristos     the directory name to "[^ ]*".  With that in place, the rules are no
68183c3a7b76Schristos     longer nearly so ambiguous, because then once one of the directories
68193c3a7b76Schristos     has been matched, no other can be matched (since they all require a
68203c3a7b76Schristos     leading blank).
68213c3a7b76Schristos
68223c3a7b76Schristos     If that's not an acceptable solution, then you can enter a start state
68233c3a7b76Schristos     to pick up the .*\n after each directory is matched.
68243c3a7b76Schristos
68253c3a7b76Schristos     Also note that for speed, you'll want to add a ".*" rule at the end,
68263c3a7b76Schristos     otherwise rules that don't match any of the patterns will be matched
68273c3a7b76Schristos     very slowly, a character at a time.
68283c3a7b76Schristos
68293c3a7b76Schristos     		Vern
68303c3a7b76Schristos
68313c3a7b76Schristos
68323c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-95,  Next: unnamed-faq-96,  Prev: unnamed-faq-94,  Up: FAQ
68333c3a7b76Schristos
68343c3a7b76Schristosunnamed-faq-95
68353c3a7b76Schristos==============
68363c3a7b76Schristos
68373c3a7b76Schristos     To: Tielman Koekemoer <tielman@spi.co.za>
68383c3a7b76Schristos     Subject: Re: Please help.
68393c3a7b76Schristos     In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT.
68403c3a7b76Schristos     Date: Thu, 08 Jul 1999 08:20:39 PDT
68413c3a7b76Schristos     From: Vern Paxson <vern>
68423c3a7b76Schristos
68433c3a7b76Schristos     > I was hoping you could help me with my problem.
68443c3a7b76Schristos     >
68453c3a7b76Schristos     > I tried compiling (gnu)flex on a Solaris 2.4 machine
68463c3a7b76Schristos     > but when I ran make (after configure) I got an error.
68473c3a7b76Schristos     >
68483c3a7b76Schristos     > --------------------------------------------------------------
68493c3a7b76Schristos     > gcc -c -I. -I. -g -O parse.c
68503c3a7b76Schristos     > ./flex -t -p  ./scan.l >scan.c
68513c3a7b76Schristos     > sh: ./flex: not found
68523c3a7b76Schristos     > *** Error code 1
68533c3a7b76Schristos     > make: Fatal error: Command failed for target `scan.c'
68543c3a7b76Schristos     > -------------------------------------------------------------
68553c3a7b76Schristos     >
68563c3a7b76Schristos     > What's strange to me is that I'm only
68573c3a7b76Schristos     > trying to install flex now. I then edited the Makefile to
68583c3a7b76Schristos     > and changed where it says "FLEX = flex" to "FLEX = lex"
68593c3a7b76Schristos     > ( lex: the native Solaris one ) but then it complains about
68603c3a7b76Schristos     > the "-p" option. Is there any way I can compile flex without
68613c3a7b76Schristos     > using flex or lex?
68623c3a7b76Schristos     >
68633c3a7b76Schristos     > Thanks so much for your time.
68643c3a7b76Schristos
68653c3a7b76Schristos     You managed to step on the bootstrap sequence, which first copies
68663c3a7b76Schristos     initscan.c to scan.c in order to build flex.  Try fetching a fresh
68673c3a7b76Schristos     distribution from ftp.ee.lbl.gov.  (Or you can first try removing
68683c3a7b76Schristos     ".bootstrap" and doing a make again.)
68693c3a7b76Schristos
68703c3a7b76Schristos     		Vern
68713c3a7b76Schristos
68723c3a7b76Schristos
68733c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-96,  Next: unnamed-faq-97,  Prev: unnamed-faq-95,  Up: FAQ
68743c3a7b76Schristos
68753c3a7b76Schristosunnamed-faq-96
68763c3a7b76Schristos==============
68773c3a7b76Schristos
68783c3a7b76Schristos     To: Tielman Koekemoer <tielman@spi.co.za>
68793c3a7b76Schristos     Subject: Re: Please help.
68803c3a7b76Schristos     In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT.
68813c3a7b76Schristos     Date: Fri, 09 Jul 1999 00:27:20 PDT
68823c3a7b76Schristos     From: Vern Paxson <vern>
68833c3a7b76Schristos
68843c3a7b76Schristos     > First I removed .bootstrap (and ran make) - no luck. I downloaded the
68853c3a7b76Schristos     > software but I still have the same problem. Is there anything else I
68863c3a7b76Schristos     > could try.
68873c3a7b76Schristos
68883c3a7b76Schristos     Try:
68893c3a7b76Schristos
68903c3a7b76Schristos     	cp initscan.c scan.c
68913c3a7b76Schristos     	touch scan.c
68923c3a7b76Schristos     	make scan.o
68933c3a7b76Schristos
68943c3a7b76Schristos     If this last tries to first build scan.c from scan.l using ./flex, then
68953c3a7b76Schristos     your "make" is broken, in which case compile scan.c to scan.o by hand.
68963c3a7b76Schristos
68973c3a7b76Schristos     		Vern
68983c3a7b76Schristos
68993c3a7b76Schristos
69003c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-97,  Next: unnamed-faq-98,  Prev: unnamed-faq-96,  Up: FAQ
69013c3a7b76Schristos
69023c3a7b76Schristosunnamed-faq-97
69033c3a7b76Schristos==============
69043c3a7b76Schristos
69053c3a7b76Schristos     To: Sumanth Kamenani <skamenan@crl.nmsu.edu>
69063c3a7b76Schristos     Subject: Re: Error
69073c3a7b76Schristos     In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT.
69083c3a7b76Schristos     Date: Tue, 20 Jul 1999 00:18:26 PDT
69093c3a7b76Schristos     From: Vern Paxson <vern>
69103c3a7b76Schristos
69113c3a7b76Schristos     > I am getting a compilation error. The error is given as "unknown symbol- yylex".
69123c3a7b76Schristos
69133c3a7b76Schristos     The parser relies on calling yylex(), but you're instead using the C++ scanning
69143c3a7b76Schristos     class, so you need to supply a yylex() "glue" function that calls an instance
69153c3a7b76Schristos     scanner of the scanner (e.g., "scanner->yylex()").
69163c3a7b76Schristos
69173c3a7b76Schristos     		Vern
69183c3a7b76Schristos
69193c3a7b76Schristos
69203c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-98,  Next: unnamed-faq-99,  Prev: unnamed-faq-97,  Up: FAQ
69213c3a7b76Schristos
69223c3a7b76Schristosunnamed-faq-98
69233c3a7b76Schristos==============
69243c3a7b76Schristos
69253c3a7b76Schristos     To: daniel@synchrods.synchrods.COM (Daniel Senderowicz)
69263c3a7b76Schristos     Subject: Re: lex
69273c3a7b76Schristos     In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST.
69283c3a7b76Schristos     Date: Tue, 23 Nov 1999 15:54:30 PST
69293c3a7b76Schristos     From: Vern Paxson <vern>
69303c3a7b76Schristos
69313c3a7b76Schristos     Well, your problem is the
69323c3a7b76Schristos
69333c3a7b76Schristos     switch (yybgin-yysvec-1) {      /* witchcraft */
69343c3a7b76Schristos
69353c3a7b76Schristos     at the beginning of lex rules.  "witchcraft" == "non-portable".  It's
69363c3a7b76Schristos     assuming knowledge of the AT&T lex's internal variables.
69373c3a7b76Schristos
69383c3a7b76Schristos     For flex, you can probably do the equivalent using a switch on YYSTATE.
69393c3a7b76Schristos
69403c3a7b76Schristos     		Vern
69413c3a7b76Schristos
69423c3a7b76Schristos
69433c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-99,  Next: unnamed-faq-100,  Prev: unnamed-faq-98,  Up: FAQ
69443c3a7b76Schristos
69453c3a7b76Schristosunnamed-faq-99
69463c3a7b76Schristos==============
69473c3a7b76Schristos
69483c3a7b76Schristos     To: archow@hss.hns.com
69493c3a7b76Schristos     Subject: Re: Regarding distribution of flex and yacc based grammars
69503c3a7b76Schristos     In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530.
69513c3a7b76Schristos     Date: Wed, 22 Dec 1999 01:56:24 PST
69523c3a7b76Schristos     From: Vern Paxson <vern>
69533c3a7b76Schristos
69543c3a7b76Schristos     > When we provide the customer with an object code distribution, is it
69553c3a7b76Schristos     > necessary for us to provide source
69563c3a7b76Schristos     > for the generated C files from flex and bison since they are generated by
69573c3a7b76Schristos     > flex and bison ?
69583c3a7b76Schristos
69593c3a7b76Schristos     For flex, no.  I don't know what the current state of this is for bison.
69603c3a7b76Schristos
69613c3a7b76Schristos     > Also, is there any requrirement for us to neccessarily  provide source for
69623c3a7b76Schristos     > the grammar files which are fed into flex and bison ?
69633c3a7b76Schristos
69643c3a7b76Schristos     Again, for flex, no.
69653c3a7b76Schristos
69663c3a7b76Schristos     See the file "COPYING" in the flex distribution for the legalese.
69673c3a7b76Schristos
69683c3a7b76Schristos     		Vern
69693c3a7b76Schristos
69703c3a7b76Schristos
69713c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-100,  Next: unnamed-faq-101,  Prev: unnamed-faq-99,  Up: FAQ
69723c3a7b76Schristos
69733c3a7b76Schristosunnamed-faq-100
69743c3a7b76Schristos===============
69753c3a7b76Schristos
69763c3a7b76Schristos     To: Martin Gallwey <gallweym@hyperion.moe.ul.ie>
69773c3a7b76Schristos     Subject: Re: Flex, and self referencing rules
69783c3a7b76Schristos     In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST.
69793c3a7b76Schristos     Date: Sat, 19 Feb 2000 18:33:16 PST
69803c3a7b76Schristos     From: Vern Paxson <vern>
69813c3a7b76Schristos
69823c3a7b76Schristos     > However, I do not use unput anywhere. I do use self-referencing
69833c3a7b76Schristos     > rules like this:
69843c3a7b76Schristos     >
69853c3a7b76Schristos     > UnaryExpr               ({UnionExpr})|("-"{UnaryExpr})
69863c3a7b76Schristos
69873c3a7b76Schristos     You can't do this - flex is *not* a parser like yacc (which does indeed
69883c3a7b76Schristos     allow recursion), it is a scanner that's confined to regular expressions.
69893c3a7b76Schristos
69903c3a7b76Schristos     		Vern
69913c3a7b76Schristos
69923c3a7b76Schristos
69933c3a7b76SchristosFile: flex.info,  Node: unnamed-faq-101,  Next: What is the difference between YYLEX_PARAM and YY_DECL?,  Prev: unnamed-faq-100,  Up: FAQ
69943c3a7b76Schristos
69953c3a7b76Schristosunnamed-faq-101
69963c3a7b76Schristos===============
69973c3a7b76Schristos
69983c3a7b76Schristos     To: slg3@lehigh.edu (SAMUEL L. GULDEN)
69993c3a7b76Schristos     Subject: Re: Flex problem
70003c3a7b76Schristos     In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST.
70013c3a7b76Schristos     Date: Thu, 02 Mar 2000 23:00:46 PST
70023c3a7b76Schristos     From: Vern Paxson <vern>
70033c3a7b76Schristos
70043c3a7b76Schristos     If this is exactly your program:
70053c3a7b76Schristos
70063c3a7b76Schristos     > digit [0-9]
70073c3a7b76Schristos     > digits {digit}+
70083c3a7b76Schristos     > whitespace [ \t\n]+
70093c3a7b76Schristos     >
70103c3a7b76Schristos     > %%
70113c3a7b76Schristos     > "[" { printf("open_brac\n");}
70123c3a7b76Schristos     > "]" { printf("close_brac\n");}
70133c3a7b76Schristos     > "+" { printf("addop\n");}
70143c3a7b76Schristos     > "*" { printf("multop\n");}
70153c3a7b76Schristos     > {digits} { printf("NUMBER = %s\n", yytext);}
70163c3a7b76Schristos     > whitespace ;
70173c3a7b76Schristos
70183c3a7b76Schristos     then the problem is that the last rule needs to be "{whitespace}" !
70193c3a7b76Schristos
70203c3a7b76Schristos     		Vern
70213c3a7b76Schristos
70223c3a7b76Schristos
70233c3a7b76SchristosFile: flex.info,  Node: What is the difference between YYLEX_PARAM and YY_DECL?,  Next: Why do I get "conflicting types for yylex" error?,  Prev: unnamed-faq-101,  Up: FAQ
70243c3a7b76Schristos
70253c3a7b76SchristosWhat is the difference between YYLEX_PARAM and YY_DECL?
70263c3a7b76Schristos=======================================================
70273c3a7b76Schristos
70283c3a7b76SchristosYYLEX_PARAM is not a flex symbol.  It is for Bison.  It tells Bison to
70293c3a7b76Schristospass extra params when it calls yylex() from the parser.
70303c3a7b76Schristos
70313c3a7b76Schristos   YY_DECL is the Flex declaration of yylex.  The default is similar to
70323c3a7b76Schristosthis:
70333c3a7b76Schristos
70343c3a7b76Schristos     #define int yy_lex ()
70353c3a7b76Schristos
70363c3a7b76Schristos
70373c3a7b76SchristosFile: flex.info,  Node: Why do I get "conflicting types for yylex" error?,  Next: How do I access the values set in a Flex action from within a Bison action?,  Prev: What is the difference between YYLEX_PARAM and YY_DECL?,  Up: FAQ
70383c3a7b76Schristos
70393c3a7b76SchristosWhy do I get "conflicting types for yylex" error?
70403c3a7b76Schristos=================================================
70413c3a7b76Schristos
70423c3a7b76SchristosThis is a compiler error regarding a generated Bison parser, not a Flex
70433c3a7b76Schristosscanner.  It means you need a prototype of yylex() in the top of the
70443c3a7b76SchristosBison file.  Be sure the prototype matches YY_DECL.
70453c3a7b76Schristos
70463c3a7b76Schristos
70473c3a7b76SchristosFile: flex.info,  Node: How do I access the values set in a Flex action from within a Bison action?,  Prev: Why do I get "conflicting types for yylex" error?,  Up: FAQ
70483c3a7b76Schristos
70493c3a7b76SchristosHow do I access the values set in a Flex action from within a Bison action?
70503c3a7b76Schristos===========================================================================
70513c3a7b76Schristos
70523c3a7b76SchristosWith $1, $2, $3, etc.  These are called "Semantic Values" in the Bison
705330da1778Schristosmanual.  See *note (bison)Top::.
70543c3a7b76Schristos
70553c3a7b76Schristos
70563c3a7b76SchristosFile: flex.info,  Node: Appendices,  Next: Indices,  Prev: FAQ,  Up: Top
70573c3a7b76Schristos
70583c3a7b76SchristosAppendix A Appendices
70593c3a7b76Schristos*********************
70603c3a7b76Schristos
70613c3a7b76Schristos* Menu:
70623c3a7b76Schristos
70633c3a7b76Schristos* Makefiles and Flex::
70643c3a7b76Schristos* Bison Bridge::
70653c3a7b76Schristos* M4 Dependency::
70663c3a7b76Schristos* Common Patterns::
70673c3a7b76Schristos
70683c3a7b76Schristos
70693c3a7b76SchristosFile: flex.info,  Node: Makefiles and Flex,  Next: Bison Bridge,  Prev: Appendices,  Up: Appendices
70703c3a7b76Schristos
70713c3a7b76SchristosA.1 Makefiles and Flex
70723c3a7b76Schristos======================
70733c3a7b76Schristos
70743c3a7b76SchristosIn this appendix, we provide tips for writing Makefiles to build your
70753c3a7b76Schristosscanners.
70763c3a7b76Schristos
707730da1778Schristos   In a traditional build environment, we say that the '.c' files are
707830da1778Schristosthe sources, and the '.o' files are the intermediate files.  When using
707930da1778Schristos'flex', however, the '.l' files are the sources, and the generated '.c'
708030da1778Schristosfiles (along with the '.o' files) are the intermediate files.  This
70813c3a7b76Schristosrequires you to carefully plan your Makefile.
70823c3a7b76Schristos
708330da1778Schristos   Modern 'make' programs understand that 'foo.l' is intended to
708430da1778Schristosgenerate 'lex.yy.c' or 'foo.c', and will behave accordingly(1)(2).  The
708530da1778Schristosfollowing Makefile does not explicitly instruct 'make' how to build
708630da1778Schristos'foo.c' from 'foo.l'.  Instead, it relies on the implicit rules of the
708730da1778Schristos'make' program to build the intermediate file, 'scan.c':
70883c3a7b76Schristos
70893c3a7b76Schristos         # Basic Makefile -- relies on implicit rules
70903c3a7b76Schristos         # Creates "myprogram" from "scan.l" and "myprogram.c"
70913c3a7b76Schristos         #
70923c3a7b76Schristos         LEX=flex
70933c3a7b76Schristos         myprogram: scan.o myprogram.o
70943c3a7b76Schristos         scan.o: scan.l
70953c3a7b76Schristos
709630da1778Schristos
70973c3a7b76Schristos   For simple cases, the above may be sufficient.  For other cases, you
709830da1778Schristosmay have to explicitly instruct 'make' how to build your scanner.  The
70993c3a7b76Schristosfollowing is an example of a Makefile containing explicit rules:
71003c3a7b76Schristos
71013c3a7b76Schristos         # Basic Makefile -- provides explicit rules
71023c3a7b76Schristos         # Creates "myprogram" from "scan.l" and "myprogram.c"
71033c3a7b76Schristos         #
71043c3a7b76Schristos         LEX=flex
71053c3a7b76Schristos         myprogram: scan.o myprogram.o
71063c3a7b76Schristos                 $(CC) -o $@  $(LDFLAGS) $^
71073c3a7b76Schristos
71083c3a7b76Schristos         myprogram.o: myprogram.c
71093c3a7b76Schristos                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
71103c3a7b76Schristos
71113c3a7b76Schristos         scan.o: scan.c
71123c3a7b76Schristos                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
71133c3a7b76Schristos
71143c3a7b76Schristos         scan.c: scan.l
71153c3a7b76Schristos                 $(LEX) $(LFLAGS) -o $@ $^
71163c3a7b76Schristos
71173c3a7b76Schristos         clean:
71183c3a7b76Schristos                 $(RM) *.o scan.c
71193c3a7b76Schristos
712030da1778Schristos
712130da1778Schristos   Notice in the above example that 'scan.c' is in the 'clean' target.
712230da1778SchristosThis is because we consider the file 'scan.c' to be an intermediate
71233c3a7b76Schristosfile.
71243c3a7b76Schristos
712530da1778Schristos   Finally, we provide a realistic example of a 'flex' scanner used with
712630da1778Schristosa 'bison' parser(3).  There is a tricky problem we have to deal with.
712730da1778SchristosSince a 'flex' scanner will typically include a header file (e.g.,
712830da1778Schristos'y.tab.h') generated by the parser, we need to be sure that the header
712930da1778Schristosfile is generated BEFORE the scanner is compiled.  We handle this case
713030da1778Schristosin the following example:
71313c3a7b76Schristos
71323c3a7b76Schristos         # Makefile example -- scanner and parser.
71333c3a7b76Schristos         # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c"
71343c3a7b76Schristos         #
71353c3a7b76Schristos         LEX     = flex
71363c3a7b76Schristos         YACC    = bison -y
71373c3a7b76Schristos         YFLAGS  = -d
71383c3a7b76Schristos         objects = scan.o parse.o myprogram.o
71393c3a7b76Schristos
71403c3a7b76Schristos         myprogram: $(objects)
71413c3a7b76Schristos         scan.o: scan.l parse.c
71423c3a7b76Schristos         parse.o: parse.y
71433c3a7b76Schristos         myprogram.o: myprogram.c
71443c3a7b76Schristos
714530da1778Schristos
71463c3a7b76Schristos   In the above example, notice the line,
71473c3a7b76Schristos
71483c3a7b76Schristos         scan.o: scan.l parse.c
71493c3a7b76Schristos
715030da1778Schristos   , which lists the file 'parse.c' (the generated parser) as a
715130da1778Schristosdependency of 'scan.o'.  We want to ensure that the parser is created
71523c3a7b76Schristosbefore the scanner is compiled, and the above line seems to do the
71533c3a7b76Schristostrick.  Feel free to experiment with your specific implementation of
715430da1778Schristos'make'.
71553c3a7b76Schristos
715630da1778Schristos   For more details on writing Makefiles, see *note (make)Top::.
71573c3a7b76Schristos
71583c3a7b76Schristos   ---------- Footnotes ----------
71593c3a7b76Schristos
716030da1778Schristos   (1) GNU 'make' and GNU 'automake' are two such programs that provide
71613c3a7b76Schristosimplicit rules for flex-generated scanners.
71623c3a7b76Schristos
716330da1778Schristos   (2) GNU 'automake' may generate code to execute flex in
71643c3a7b76Schristoslex-compatible mode, or to stdout.  If this is not what you want, then
71653c3a7b76Schristosyou should provide an explicit rule in your Makefile.am
71663c3a7b76Schristos
71673c3a7b76Schristos   (3) This example also applies to yacc parsers.
71683c3a7b76Schristos
71693c3a7b76Schristos
71703c3a7b76SchristosFile: flex.info,  Node: Bison Bridge,  Next: M4 Dependency,  Prev: Makefiles and Flex,  Up: Appendices
71713c3a7b76Schristos
71723c3a7b76SchristosA.2 C Scanners with Bison Parsers
71733c3a7b76Schristos=================================
71743c3a7b76Schristos
717530da1778SchristosThis section describes the 'flex' features useful when integrating
717630da1778Schristos'flex' with 'GNU bison'(1).  Skip this section if you are not using
717730da1778Schristos'bison' with your scanner.  Here we discuss only the 'flex' half of the
717830da1778Schristos'flex' and 'bison' pair.  We do not discuss 'bison' in any detail.  For
717930da1778Schristosmore information about generating 'bison' parsers, see *note
718030da1778Schristos(bison)Top::.
71813c3a7b76Schristos
718230da1778Schristos   A compatible 'bison' scanner is generated by declaring '%option
718330da1778Schristosbison-bridge' or by supplying '--bison-bridge' when invoking 'flex' from
718430da1778Schristosthe command line.  This instructs 'flex' that the macro 'yylval' may be
718530da1778Schristosused.  The data type for 'yylval', 'YYSTYPE', is typically defined in a
718630da1778Schristosheader file, included in section 1 of the 'flex' input file.  For a list
718730da1778Schristosof functions and macros available, *Note bison-functions::.
71883c3a7b76Schristos
71893c3a7b76Schristos   The declaration of yylex becomes,
71903c3a7b76Schristos
71913c3a7b76Schristos           int yylex ( YYSTYPE * lvalp, yyscan_t scanner );
71923c3a7b76Schristos
719330da1778Schristos   If '%option bison-locations' is specified, then the declaration
71943c3a7b76Schristosbecomes,
71953c3a7b76Schristos
71963c3a7b76Schristos           int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner );
71973c3a7b76Schristos
719830da1778Schristos   Note that the macros 'yylval' and 'yylloc' evaluate to pointers.
719930da1778SchristosSupport for 'yylloc' is optional in 'bison', so it is optional in 'flex'
720030da1778Schristosas well.  The following is an example of a 'flex' scanner that is
720130da1778Schristoscompatible with 'bison'.
72023c3a7b76Schristos
72033c3a7b76Schristos         /* Scanner for "C" assignment statements... sort of. */
72043c3a7b76Schristos         %{
72053c3a7b76Schristos         #include "y.tab.h"  /* Generated by bison. */
72063c3a7b76Schristos         %}
72073c3a7b76Schristos
72083c3a7b76Schristos         %option bison-bridge bison-locations
72093c3a7b76Schristos         %
72103c3a7b76Schristos
72113c3a7b76Schristos         [[:digit:]]+  { yylval->num = atoi(yytext);   return NUMBER;}
72123c3a7b76Schristos         [[:alnum:]]+  { yylval->str = strdup(yytext); return STRING;}
72133c3a7b76Schristos         "="|";"       { return yytext[0];}
72143c3a7b76Schristos         .  {}
72153c3a7b76Schristos         %
72163c3a7b76Schristos
721730da1778Schristos   As you can see, there really is no magic here.  We just use 'yylval'
721830da1778Schristosas we would any other variable.  The data type of 'yylval' is generated
721930da1778Schristosby 'bison', and included in the file 'y.tab.h'.  Here is the
722030da1778Schristoscorresponding 'bison' parser:
72213c3a7b76Schristos
72223c3a7b76Schristos         /* Parser to convert "C" assignments to lisp. */
72233c3a7b76Schristos         %{
72243c3a7b76Schristos         /* Pass the argument to yyparse through to yylex. */
72253c3a7b76Schristos         #define YYPARSE_PARAM scanner
72263c3a7b76Schristos         #define YYLEX_PARAM   scanner
72273c3a7b76Schristos         %}
72283c3a7b76Schristos         %locations
72293c3a7b76Schristos         %pure_parser
72303c3a7b76Schristos         %union {
72313c3a7b76Schristos             int num;
72323c3a7b76Schristos             char* str;
72333c3a7b76Schristos         }
72343c3a7b76Schristos         %token <str> STRING
72353c3a7b76Schristos         %token <num> NUMBER
72363c3a7b76Schristos         %%
72373c3a7b76Schristos         assignment:
72383c3a7b76Schristos             STRING '=' NUMBER ';' {
72393c3a7b76Schristos                 printf( "(setf %s %d)", $1, $3 );
72403c3a7b76Schristos            }
72413c3a7b76Schristos         ;
72423c3a7b76Schristos
72433c3a7b76Schristos   ---------- Footnotes ----------
72443c3a7b76Schristos
72453c3a7b76Schristos   (1) The features described here are purely optional, and are by no
72463c3a7b76Schristosmeans the only way to use flex with bison.  We merely provide some glue
72473c3a7b76Schristosto ease development of your parser-scanner pair.
72483c3a7b76Schristos
72493c3a7b76Schristos
72503c3a7b76SchristosFile: flex.info,  Node: M4 Dependency,  Next: Common Patterns,  Prev: Bison Bridge,  Up: Appendices
72513c3a7b76Schristos
72523c3a7b76SchristosA.3 M4 Dependency
72533c3a7b76Schristos=================
72543c3a7b76Schristos
725530da1778SchristosThe macro processor 'm4'(1) must be installed wherever flex is
725630da1778Schristosinstalled.  'flex' invokes 'm4', found by searching the directories in
725730da1778Schristosthe 'PATH' environment variable.  Any code you place in section 1 or in
72583c3a7b76Schristosthe actions will be sent through m4.  Please follow these rules to
725930da1778Schristosprotect your code from unwanted 'm4' processing.
72603c3a7b76Schristos
726130da1778Schristos   * Do not use symbols that begin with, 'm4_', such as, 'm4_define', or
726230da1778Schristos     'm4_include', since those are reserved for 'm4' macro names.  If
72633c3a7b76Schristos     for some reason you need m4_ as a prefix, use a preprocessor
72643c3a7b76Schristos     #define to get your symbol past m4 unmangled.
72653c3a7b76Schristos
726630da1778Schristos   * Do not use the strings '[[' or ']]' anywhere in your code.  The
72673c3a7b76Schristos     former is not valid in C, except within comments and strings, but
726830da1778Schristos     the latter is valid in code such as 'x[y[z]]'.  The solution is
726930da1778Schristos     simple.  To get the literal string '"]]"', use '"]""]"'.  To get
727030da1778Schristos     the array notation 'x[y[z]]', use 'x[y[z] ]'.  Flex will attempt to
72713c3a7b76Schristos     detect these sequences in user code, and escape them.  However,
727230da1778Schristos     it's best to avoid this complexity where possible, by removing such
727330da1778Schristos     sequences from your code.
72743c3a7b76Schristos
727530da1778Schristos   'm4' is only required at the time you run 'flex'.  The generated
727630da1778Schristosscanner is ordinary C or C++, and does _not_ require 'm4'.
72773c3a7b76Schristos
72783c3a7b76Schristos   ---------- Footnotes ----------
72793c3a7b76Schristos
72803c3a7b76Schristos   (1) The use of m4 is subject to change in future revisions of flex.
72813c3a7b76SchristosIt is not part of the public API of flex.  Do not depend on it.
72823c3a7b76Schristos
72833c3a7b76Schristos
72843c3a7b76SchristosFile: flex.info,  Node: Common Patterns,  Prev: M4 Dependency,  Up: Appendices
72853c3a7b76Schristos
72863c3a7b76SchristosA.4 Common Patterns
72873c3a7b76Schristos===================
72883c3a7b76Schristos
72893c3a7b76SchristosThis appendix provides examples of common regular expressions you might
72903c3a7b76Schristosuse in your scanner.
72913c3a7b76Schristos
72923c3a7b76Schristos* Menu:
72933c3a7b76Schristos
72943c3a7b76Schristos* Numbers::
72953c3a7b76Schristos* Identifiers::
72963c3a7b76Schristos* Quoted Constructs::
72973c3a7b76Schristos* Addresses::
72983c3a7b76Schristos
72993c3a7b76Schristos
73003c3a7b76SchristosFile: flex.info,  Node: Numbers,  Next: Identifiers,  Up: Common Patterns
73013c3a7b76Schristos
73023c3a7b76SchristosA.4.1 Numbers
73033c3a7b76Schristos-------------
73043c3a7b76Schristos
73053c3a7b76SchristosC99 decimal constant
730630da1778Schristos     '([[:digit:]]{-}[0])[[:digit:]]*'
73073c3a7b76Schristos
73083c3a7b76SchristosC99 hexadecimal constant
730930da1778Schristos     '0[xX][[:xdigit:]]+'
73103c3a7b76Schristos
73113c3a7b76SchristosC99 octal constant
731230da1778Schristos     '0[01234567]*'
73133c3a7b76Schristos
73143c3a7b76SchristosC99 floating point constant
73153c3a7b76Schristos      {dseq}      ([[:digit:]]+)
73163c3a7b76Schristos      {dseq_opt}  ([[:digit:]]*)
73173c3a7b76Schristos      {frac}      (({dseq_opt}"."{dseq})|{dseq}".")
73183c3a7b76Schristos      {exp}       ([eE][+-]?{dseq})
73193c3a7b76Schristos      {exp_opt}   ({exp}?)
73203c3a7b76Schristos      {fsuff}     [flFL]
73213c3a7b76Schristos      {fsuff_opt} ({fsuff}?)
73223c3a7b76Schristos      {hpref}     (0[xX])
73233c3a7b76Schristos      {hdseq}     ([[:xdigit:]]+)
73243c3a7b76Schristos      {hdseq_opt} ([[:xdigit:]]*)
73253c3a7b76Schristos      {hfrac}     (({hdseq_opt}"."{hdseq})|({hdseq}"."))
73263c3a7b76Schristos      {bexp}      ([pP][+-]?{dseq})
73273c3a7b76Schristos      {dfc}       (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt}))
73283c3a7b76Schristos      {hfc}       (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt}))
73293c3a7b76Schristos
73303c3a7b76Schristos      {c99_floating_point_constant}  ({dfc}|{hfc})
73313c3a7b76Schristos
73323c3a7b76Schristos     See C99 section 6.4.4.2 for the gory details.
73333c3a7b76Schristos
73343c3a7b76Schristos
73353c3a7b76SchristosFile: flex.info,  Node: Identifiers,  Next: Quoted Constructs,  Prev: Numbers,  Up: Common Patterns
73363c3a7b76Schristos
73373c3a7b76SchristosA.4.2 Identifiers
73383c3a7b76Schristos-----------------
73393c3a7b76Schristos
73403c3a7b76SchristosC99 Identifier
73413c3a7b76Schristos     ucn        ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))
73423c3a7b76Schristos     nondigit    [_[:alpha:]]
73433c3a7b76Schristos     c99_id     ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})*
73443c3a7b76Schristos
73453c3a7b76Schristos     Technically, the above pattern does not encompass all possible C99
73463c3a7b76Schristos     identifiers, since C99 allows for "implementation-defined"
73473c3a7b76Schristos     characters.  In practice, C compilers follow the above pattern,
734830da1778Schristos     with the addition of the '$' character.
73493c3a7b76Schristos
73503c3a7b76SchristosUTF-8 Encoded Unicode Code Point
73513c3a7b76Schristos     [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
73523c3a7b76Schristos
73533c3a7b76Schristos
73543c3a7b76SchristosFile: flex.info,  Node: Quoted Constructs,  Next: Addresses,  Prev: Identifiers,  Up: Common Patterns
73553c3a7b76Schristos
73563c3a7b76SchristosA.4.3 Quoted Constructs
73573c3a7b76Schristos-----------------------
73583c3a7b76Schristos
73593c3a7b76SchristosC99 String Literal
736030da1778Schristos     'L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"'
73613c3a7b76Schristos
73623c3a7b76SchristosC99 Comment
736330da1778Schristos     '("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)'
73643c3a7b76Schristos
736530da1778Schristos     Note that in C99, a '//'-style comment may be split across lines,
736630da1778Schristos     and, contrary to popular belief, does not include the trailing '\n'
736730da1778Schristos     character.
73683c3a7b76Schristos
736930da1778Schristos     A better way to scan '/* */' comments is by line, rather than
73703c3a7b76Schristos     matching possibly huge comments all at once.  This will allow you
737130da1778Schristos     to scan comments of unlimited length, as long as line breaks appear
737230da1778Schristos     at sane intervals.  This is also more efficient when used with
737330da1778Schristos     automatic line number processing.  *Note option-yylineno::.
73743c3a7b76Schristos
73753c3a7b76Schristos     <INITIAL>{
73763c3a7b76Schristos         "/*"      BEGIN(COMMENT);
73773c3a7b76Schristos     }
73783c3a7b76Schristos     <COMMENT>{
73793c3a7b76Schristos         "*/"      BEGIN(0);
73803c3a7b76Schristos         [^*\n]+   ;
73813c3a7b76Schristos         "*"[^/]   ;
73823c3a7b76Schristos         \n        ;
73833c3a7b76Schristos     }
73843c3a7b76Schristos
73853c3a7b76Schristos
73863c3a7b76SchristosFile: flex.info,  Node: Addresses,  Prev: Quoted Constructs,  Up: Common Patterns
73873c3a7b76Schristos
73883c3a7b76SchristosA.4.4 Addresses
73893c3a7b76Schristos---------------
73903c3a7b76Schristos
73913c3a7b76SchristosIPv4 Address
7392dded093eSchristos     dec-octet     [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5]
7393dded093eSchristos     IPv4address   {dec-octet}\.{dec-octet}\.{dec-octet}\.{dec-octet}
73943c3a7b76Schristos
73953c3a7b76SchristosIPv6 Address
7396dded093eSchristos     h16           [0-9A-Fa-f]{1,4}
7397dded093eSchristos     ls32          {h16}:{h16}|{IPv4address}
7398dded093eSchristos     IPv6address   ({h16}:){6}{ls32}|
7399dded093eSchristos                   ::({h16}:){5}{ls32}|
7400dded093eSchristos                   ({h16})?::({h16}:){4}{ls32}|
7401dded093eSchristos                   (({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|
7402dded093eSchristos                   (({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|
7403dded093eSchristos                   (({h16}:){0,3}{h16})?::{h16}:{ls32}|
7404dded093eSchristos                   (({h16}:){0,4}{h16})?::{ls32}|
7405dded093eSchristos                   (({h16}:){0,5}{h16})?::{h16}|
7406dded093eSchristos                   (({h16}:){0,6}{h16})?::
74073c3a7b76Schristos
7408dded093eSchristos     See RFC 2373 (http://www.ietf.org/rfc/rfc2373.txt) for details.
740930da1778Schristos     Note that you have to fold the definition of 'IPv6address' into one
7410dded093eSchristos     line and that it also matches the "unspecified address" "::".
74113c3a7b76Schristos
74123c3a7b76SchristosURI
741330da1778Schristos     '(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?'
74143c3a7b76Schristos
74153c3a7b76Schristos     This pattern is nearly useless, since it allows just about any
74163c3a7b76Schristos     character to appear in a URI, including spaces and control
741730da1778Schristos     characters.  See RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) for
741830da1778Schristos     details.
74193c3a7b76Schristos
74203c3a7b76Schristos
74213c3a7b76SchristosFile: flex.info,  Node: Indices,  Prev: Appendices,  Up: Top
74223c3a7b76Schristos
74233c3a7b76SchristosIndices
74243c3a7b76Schristos*******
74253c3a7b76Schristos
74263c3a7b76Schristos* Menu:
74273c3a7b76Schristos
74283c3a7b76Schristos* Concept Index::
74293c3a7b76Schristos* Index of Functions and Macros::
74303c3a7b76Schristos* Index of Variables::
74313c3a7b76Schristos* Index of Data Types::
74323c3a7b76Schristos* Index of Hooks::
74333c3a7b76Schristos* Index of Scanner Options::
74343c3a7b76Schristos
743530da1778Schristos
743630da1778SchristosFile: flex.info,  Node: Concept Index,  Next: Index of Functions and Macros,  Prev: Indices,  Up: Indices
743730da1778Schristos
743830da1778SchristosConcept Index
743930da1778Schristos=============
744030da1778Schristos
744130da1778Schristos�[index�]
744230da1778Schristos* Menu:
744330da1778Schristos
744430da1778Schristos* $ as normal character in patterns:     Patterns.            (line 275)
744530da1778Schristos* %array, advantages of:                 Matching.            (line  43)
744630da1778Schristos* %array, use of:                        Matching.            (line  29)
744730da1778Schristos* %array, with C++:                      Matching.            (line  65)
744830da1778Schristos* %option noyywrapp:                     Generated Scanner.   (line  93)
744930da1778Schristos* %pointer, and unput():                 Actions.             (line 162)
745030da1778Schristos* %pointer, use of:                      Matching.            (line  29)
745130da1778Schristos* %top:                                  Definitions Section. (line  44)
745230da1778Schristos* %{ and %}, in Definitions Section:     Definitions Section. (line  40)
745330da1778Schristos* %{ and %}, in Rules Section:           Actions.             (line  26)
745430da1778Schristos* <<EOF>>, use of:                       EOF.                 (line  33)
745530da1778Schristos* [] in patterns:                        Patterns.            (line  15)
745630da1778Schristos* ^ as non-special character in patterns: Patterns.           (line 275)
745730da1778Schristos* |, in actions:                         Actions.             (line  33)
745830da1778Schristos* |, use of:                             Actions.             (line  83)
745930da1778Schristos* accessor functions, use of:            Accessor Methods.    (line  18)
746030da1778Schristos* actions:                               Actions.             (line   6)
746130da1778Schristos* actions, embedded C strings:           Actions.             (line  26)
746230da1778Schristos* actions, redefining YY_BREAK:          Misc Macros.         (line  49)
746330da1778Schristos* actions, use of { and }:               Actions.             (line  26)
746430da1778Schristos* aliases, how to define:                Definitions Section. (line  10)
746530da1778Schristos* arguments, command-line:               Scanner Options.     (line   6)
746630da1778Schristos* array, default size for yytext:        User Values.         (line  13)
746730da1778Schristos* backing up, eliminating:               Performance.         (line  54)
746830da1778Schristos* backing up, eliminating by adding error rules: Performance. (line 104)
746930da1778Schristos* backing up, eliminating with catch-all rule: Performance.   (line 118)
747030da1778Schristos* backing up, example of eliminating:    Performance.         (line  49)
747130da1778Schristos* BEGIN:                                 Actions.             (line  57)
747230da1778Schristos* BEGIN, explanation:                    Start Conditions.    (line  84)
747330da1778Schristos* beginning of line, in patterns:        Patterns.            (line 127)
747430da1778Schristos* bison, bridging with flex:             Bison Bridge.        (line   6)
747530da1778Schristos* bison, parser:                         Bison Bridge.        (line  53)
747630da1778Schristos* bison, scanner to be called from bison: Bison Bridge.       (line  34)
747730da1778Schristos* BOL, checking the BOL flag:            Misc Macros.         (line  46)
747830da1778Schristos* BOL, in patterns:                      Patterns.            (line 127)
747930da1778Schristos* BOL, setting it:                       Misc Macros.         (line  40)
748030da1778Schristos* braces in patterns:                    Patterns.            (line  42)
748130da1778Schristos* bugs, reporting:                       Reporting Bugs.      (line   6)
748230da1778Schristos* C code in flex input:                  Definitions Section. (line  40)
748330da1778Schristos* C++:                                   Cxx.                 (line   9)
748430da1778Schristos* C++ and %array:                        User Values.         (line  23)
748530da1778Schristos* C++ I/O, customizing:                  How do I use my own I/O classes in a C++ scanner?.
748630da1778Schristos                                                              (line   9)
748730da1778Schristos* C++ scanners, including multiple scanners: Cxx.             (line 197)
748830da1778Schristos* C++ scanners, use of:                  Cxx.                 (line 128)
748930da1778Schristos* c++, experimental form of scanner class: Cxx.               (line   6)
749030da1778Schristos* C++, multiple different scanners:      Cxx.                 (line 192)
749130da1778Schristos* C-strings, in actions:                 Actions.             (line  26)
749230da1778Schristos* case-insensitive, effect on character classes: Patterns.    (line 216)
749330da1778Schristos* character classes in patterns:         Patterns.            (line 186)
749430da1778Schristos* character classes in patterns, syntax of: Patterns.         (line  15)
749530da1778Schristos* character classes, equivalence of:     Patterns.            (line 205)
749630da1778Schristos* clearing an input buffer:              Multiple Input Buffers.
749730da1778Schristos                                                              (line  66)
749830da1778Schristos* command-line options:                  Scanner Options.     (line   6)
749930da1778Schristos* comments in flex input:                Definitions Section. (line  37)
750030da1778Schristos* comments in the input:                 Comments in the Input.
750130da1778Schristos                                                              (line  24)
750230da1778Schristos* comments, discarding:                  Actions.             (line 176)
750330da1778Schristos* comments, example of scanning C comments: Start Conditions. (line 140)
750430da1778Schristos* comments, in actions:                  Actions.             (line  26)
750530da1778Schristos* comments, in rules section:            Comments in the Input.
750630da1778Schristos                                                              (line  11)
750730da1778Schristos* comments, syntax of:                   Comments in the Input.
750830da1778Schristos                                                              (line   6)
750930da1778Schristos* comments, valid uses of:               Comments in the Input.
751030da1778Schristos                                                              (line  24)
751130da1778Schristos* compressing whitespace:                Actions.             (line  22)
751230da1778Schristos* concatenation, in patterns:            Patterns.            (line 111)
751330da1778Schristos* copyright of flex:                     Copyright.           (line   6)
751430da1778Schristos* counting characters and lines:         Simple Examples.     (line  23)
751530da1778Schristos* customizing I/O in C++ scanners:       How do I use my own I/O classes in a C++ scanner?.
751630da1778Schristos                                                              (line   9)
751730da1778Schristos* default rule:                          Simple Examples.     (line  15)
751830da1778Schristos* default rule <1>:                      Matching.            (line  20)
751930da1778Schristos* defining pattern aliases:              Definitions Section. (line  21)
752030da1778Schristos* Definitions, in flex input:            Definitions Section. (line   6)
752130da1778Schristos* deleting lines from input:             Actions.             (line  13)
752230da1778Schristos* discarding C comments:                 Actions.             (line 176)
752330da1778Schristos* distributing flex:                     Copyright.           (line   6)
752430da1778Schristos* ECHO:                                  Actions.             (line  54)
752530da1778Schristos* ECHO, and yyout:                       Generated Scanner.   (line 101)
752630da1778Schristos* embedding C code in flex input:        Definitions Section. (line  40)
752730da1778Schristos* end of file, in patterns:              Patterns.            (line 150)
752830da1778Schristos* end of line, in negated character classes: Patterns.        (line 237)
752930da1778Schristos* end of line, in patterns:              Patterns.            (line 131)
753030da1778Schristos* end-of-file, and yyrestart():          Generated Scanner.   (line  42)
753130da1778Schristos* EOF and yyrestart():                   Generated Scanner.   (line  42)
753230da1778Schristos* EOF in patterns, syntax of:            Patterns.            (line 150)
753330da1778Schristos* EOF, example using multiple input buffers: Multiple Input Buffers.
753430da1778Schristos                                                              (line  81)
753530da1778Schristos* EOF, explanation:                      EOF.                 (line   6)
753630da1778Schristos* EOF, pushing back:                     Actions.             (line 170)
753730da1778Schristos* EOL, in negated character classes:     Patterns.            (line 237)
753830da1778Schristos* EOL, in patterns:                      Patterns.            (line 131)
753930da1778Schristos* error messages, end of buffer missed:  Lex and Posix.       (line  50)
754030da1778Schristos* error reporting, diagnostic messages:  Diagnostics.         (line   6)
754130da1778Schristos* error reporting, in C++:               Cxx.                 (line 112)
754230da1778Schristos* error rules, to eliminate backing up:  Performance.         (line 102)
754330da1778Schristos* escape sequences in patterns, syntax of: Patterns.          (line  57)
754430da1778Schristos* exiting with yyterminate():            Actions.             (line 212)
754530da1778Schristos* experimental form of c++ scanner class: Cxx.                (line   6)
754630da1778Schristos* extended scope of start conditions:    Start Conditions.    (line 270)
754730da1778Schristos* file format:                           Format.              (line   6)
754830da1778Schristos* file format, serialized tables:        Tables File Format.  (line   6)
754930da1778Schristos* flushing an input buffer:              Multiple Input Buffers.
755030da1778Schristos                                                              (line  66)
755130da1778Schristos* flushing the internal buffer:          Actions.             (line 206)
755230da1778Schristos* format of flex input:                  Format.              (line   6)
755330da1778Schristos* format of input file:                  Format.              (line   9)
755430da1778Schristos* freeing tables:                        Loading and Unloading Serialized Tables.
755530da1778Schristos                                                              (line   6)
755630da1778Schristos* getting current start state with YY_START: Start Conditions.
755730da1778Schristos                                                              (line 189)
755830da1778Schristos* halting with yyterminate():            Actions.             (line 212)
755930da1778Schristos* handling include files with multiple input buffers: Multiple Input Buffers.
756030da1778Schristos                                                              (line  87)
756130da1778Schristos* handling include files with multiple input buffers <1>: Multiple Input Buffers.
756230da1778Schristos                                                              (line 122)
756330da1778Schristos* header files, with C++:                Cxx.                 (line 197)
756430da1778Schristos* include files, with C++:               Cxx.                 (line 197)
756530da1778Schristos* input file, Definitions section:       Definitions Section. (line   6)
756630da1778Schristos* input file, Rules Section:             Rules Section.       (line   6)
756730da1778Schristos* input file, user code Section:         User Code Section.   (line   6)
756830da1778Schristos* input():                               Actions.             (line 173)
756930da1778Schristos* input(), and C++:                      Actions.             (line 202)
757030da1778Schristos* input, format of:                      Format.              (line   6)
757130da1778Schristos* input, matching:                       Matching.            (line   6)
757230da1778Schristos* keywords, for performance:             Performance.         (line 200)
757330da1778Schristos* lex (traditional) and POSIX:           Lex and Posix.       (line   6)
757430da1778Schristos* LexerInput, overriding:                How do I use my own I/O classes in a C++ scanner?.
757530da1778Schristos                                                              (line   9)
757630da1778Schristos* LexerOutput, overriding:               How do I use my own I/O classes in a C++ scanner?.
757730da1778Schristos                                                              (line   9)
757830da1778Schristos* limitations of flex:                   Limitations.         (line   6)
757930da1778Schristos* literal text in patterns, syntax of:   Patterns.            (line  54)
758030da1778Schristos* loading tables at runtime:             Loading and Unloading Serialized Tables.
758130da1778Schristos                                                              (line   6)
758230da1778Schristos* m4:                                    M4 Dependency.       (line   6)
758330da1778Schristos* Makefile, example of implicit rules:   Makefiles and Flex.  (line  21)
758430da1778Schristos* Makefile, explicit example:            Makefiles and Flex.  (line  33)
758530da1778Schristos* Makefile, syntax:                      Makefiles and Flex.  (line   6)
758630da1778Schristos* matching C-style double-quoted strings: Start Conditions.   (line 203)
758730da1778Schristos* matching, and trailing context:        Matching.            (line   6)
758830da1778Schristos* matching, length of:                   Matching.            (line   6)
758930da1778Schristos* matching, multiple matches:            Matching.            (line   6)
759030da1778Schristos* member functions, C++:                 Cxx.                 (line   9)
759130da1778Schristos* memory management:                     Memory Management.   (line   6)
759230da1778Schristos* memory, allocating input buffers:      Multiple Input Buffers.
759330da1778Schristos                                                              (line  19)
759430da1778Schristos* memory, considerations for reentrant scanners: Init and Destroy Functions.
759530da1778Schristos                                                              (line   6)
759630da1778Schristos* memory, deleting input buffers:        Multiple Input Buffers.
759730da1778Schristos                                                              (line  46)
759830da1778Schristos* memory, for start condition stacks:    Start Conditions.    (line 301)
759930da1778Schristos* memory, serialized tables:             Serialized Tables.   (line   6)
760030da1778Schristos* memory, serialized tables <1>:         Loading and Unloading Serialized Tables.
760130da1778Schristos                                                              (line   6)
760230da1778Schristos* methods, c++:                          Cxx.                 (line   9)
760330da1778Schristos* minimal scanner:                       Matching.            (line  24)
760430da1778Schristos* multiple input streams:                Multiple Input Buffers.
760530da1778Schristos                                                              (line   6)
760630da1778Schristos* name definitions, not POSIX:           Lex and Posix.       (line  75)
760730da1778Schristos* negating ranges in patterns:           Patterns.            (line  23)
760830da1778Schristos* newline, matching in patterns:         Patterns.            (line 135)
760930da1778Schristos* non-POSIX features of flex:            Lex and Posix.       (line 142)
761030da1778Schristos* noyywrap, %option:                     Generated Scanner.   (line  93)
761130da1778Schristos* NULL character in patterns, syntax of: Patterns.            (line  62)
761230da1778Schristos* octal characters in patterns:          Patterns.            (line  65)
761330da1778Schristos* options, command-line:                 Scanner Options.     (line   6)
761430da1778Schristos* overriding LexerInput:                 How do I use my own I/O classes in a C++ scanner?.
761530da1778Schristos                                                              (line   9)
761630da1778Schristos* overriding LexerOutput:                How do I use my own I/O classes in a C++ scanner?.
761730da1778Schristos                                                              (line   9)
761830da1778Schristos* overriding the memory routines:        Overriding The Default Memory Management.
761930da1778Schristos                                                              (line  38)
762030da1778Schristos* Pascal-like language:                  Simple Examples.     (line  49)
762130da1778Schristos* pattern aliases, defining:             Definitions Section. (line  21)
762230da1778Schristos* pattern aliases, expansion of:         Patterns.            (line  51)
762330da1778Schristos* pattern aliases, how to define:        Definitions Section. (line  10)
762430da1778Schristos* pattern aliases, use of:               Definitions Section. (line  28)
762530da1778Schristos* patterns and actions on different lines: Lex and Posix.     (line 101)
762630da1778Schristos* patterns, character class equivalence: Patterns.            (line 205)
762730da1778Schristos* patterns, common:                      Common Patterns.     (line   6)
762830da1778Schristos* patterns, end of line:                 Patterns.            (line 300)
762930da1778Schristos* patterns, grouping and precedence:     Patterns.            (line 167)
763030da1778Schristos* patterns, in rules section:            Patterns.            (line   6)
763130da1778Schristos* patterns, invalid trailing context:    Patterns.            (line 285)
763230da1778Schristos* patterns, matching:                    Matching.            (line   6)
763330da1778Schristos* patterns, precedence of operators:     Patterns.            (line 161)
763430da1778Schristos* patterns, repetitions with grouping:   Patterns.            (line 184)
763530da1778Schristos* patterns, special characters treated as non-special: Patterns.
763630da1778Schristos                                                              (line 293)
763730da1778Schristos* patterns, syntax:                      Patterns.            (line   9)
763830da1778Schristos* patterns, syntax <1>:                  Patterns.            (line   9)
763930da1778Schristos* patterns, tuning for performance:      Performance.         (line  49)
764030da1778Schristos* patterns, valid character classes:     Patterns.            (line 192)
764130da1778Schristos* performance optimization, matching longer tokens: Performance.
764230da1778Schristos                                                              (line 167)
764330da1778Schristos* performance optimization, recognizing keywords: Performance.
764430da1778Schristos                                                              (line 205)
764530da1778Schristos* performance, backing up:               Performance.         (line  49)
764630da1778Schristos* performance, considerations:           Performance.         (line   6)
764730da1778Schristos* performance, using keywords:           Performance.         (line 200)
764830da1778Schristos* popping an input buffer:               Multiple Input Buffers.
764930da1778Schristos                                                              (line  60)
765030da1778Schristos* POSIX and lex:                         Lex and Posix.       (line   6)
765130da1778Schristos* POSIX comp;compliance:                 Lex and Posix.       (line 142)
765230da1778Schristos* POSIX, character classes in patterns, syntax of: Patterns.  (line  15)
765330da1778Schristos* preprocessor macros, for use in actions: Actions.           (line  50)
765430da1778Schristos* pushing an input buffer:               Multiple Input Buffers.
765530da1778Schristos                                                              (line  52)
765630da1778Schristos* pushing back characters with unput:    Actions.             (line 143)
765730da1778Schristos* pushing back characters with unput():  Actions.             (line 147)
765830da1778Schristos* pushing back characters with yyless:   Actions.             (line 131)
765930da1778Schristos* pushing back EOF:                      Actions.             (line 170)
766030da1778Schristos* ranges in patterns:                    Patterns.            (line  19)
766130da1778Schristos* ranges in patterns, negating:          Patterns.            (line  23)
766230da1778Schristos* recognizing C comments:                Start Conditions.    (line 143)
766330da1778Schristos* reentrant scanners, multiple interleaved scanners: Reentrant Uses.
766430da1778Schristos                                                              (line  10)
766530da1778Schristos* reentrant scanners, recursive invocation: Reentrant Uses.   (line  30)
766630da1778Schristos* reentrant, accessing flex variables:   Global Replacement.  (line   6)
766730da1778Schristos* reentrant, accessor functions:         Accessor Methods.    (line   6)
766830da1778Schristos* reentrant, API explanation:            Reentrant Overview.  (line   6)
766930da1778Schristos* reentrant, calling functions:          Extra Reentrant Argument.
767030da1778Schristos                                                              (line   6)
767130da1778Schristos* reentrant, example of:                 Reentrant Example.   (line   6)
767230da1778Schristos* reentrant, explanation:                Reentrant.           (line   6)
767330da1778Schristos* reentrant, extra data:                 Extra Data.          (line   6)
767430da1778Schristos* reentrant, initialization:             Init and Destroy Functions.
767530da1778Schristos                                                              (line   6)
767630da1778Schristos* regular expressions, in patterns:      Patterns.            (line   6)
767730da1778Schristos* REJECT:                                Actions.             (line  61)
767830da1778Schristos* REJECT, calling multiple times:        Actions.             (line  83)
767930da1778Schristos* REJECT, performance costs:             Performance.         (line  12)
768030da1778Schristos* reporting bugs:                        Reporting Bugs.      (line   6)
768130da1778Schristos* restarting the scanner:                Lex and Posix.       (line  54)
768230da1778Schristos* RETURN, within actions:                Generated Scanner.   (line  57)
768330da1778Schristos* rules, default:                        Simple Examples.     (line  15)
768430da1778Schristos* rules, in flex input:                  Rules Section.       (line   6)
768530da1778Schristos* scanner, definition of:                Introduction.        (line   6)
768630da1778Schristos* sections of flex input:                Format.              (line   6)
768730da1778Schristos* serialization:                         Serialized Tables.   (line   6)
768830da1778Schristos* serialization of tables:               Creating Serialized Tables.
768930da1778Schristos                                                              (line   6)
769030da1778Schristos* serialized tables, multiple scanners:  Creating Serialized Tables.
769130da1778Schristos                                                              (line  26)
769230da1778Schristos* stack, input buffer pop:               Multiple Input Buffers.
769330da1778Schristos                                                              (line  60)
769430da1778Schristos* stack, input buffer push:              Multiple Input Buffers.
769530da1778Schristos                                                              (line  52)
769630da1778Schristos* stacks, routines for manipulating:     Start Conditions.    (line 286)
769730da1778Schristos* start condition, applying to multiple patterns: Start Conditions.
769830da1778Schristos                                                              (line 258)
769930da1778Schristos* start conditions:                      Start Conditions.    (line   6)
770030da1778Schristos* start conditions, behavior of default rule: Start Conditions.
770130da1778Schristos                                                              (line  82)
770230da1778Schristos* start conditions, exclusive:           Start Conditions.    (line  53)
770330da1778Schristos* start conditions, for different interpretations of same input: Start Conditions.
770430da1778Schristos                                                              (line 112)
770530da1778Schristos* start conditions, in patterns:         Patterns.            (line 140)
770630da1778Schristos* start conditions, inclusive:           Start Conditions.    (line  44)
770730da1778Schristos* start conditions, inclusive v.s. exclusive: Start Conditions.
770830da1778Schristos                                                              (line  24)
770930da1778Schristos* start conditions, integer values:      Start Conditions.    (line 163)
771030da1778Schristos* start conditions, multiple:            Start Conditions.    (line  17)
771130da1778Schristos* start conditions, special wildcard condition: Start Conditions.
771230da1778Schristos                                                              (line  68)
771330da1778Schristos* start conditions, use of a stack:      Start Conditions.    (line 286)
771430da1778Schristos* start conditions, use of wildcard condition (<*>): Start Conditions.
771530da1778Schristos                                                              (line  72)
771630da1778Schristos* start conditions, using BEGIN:         Start Conditions.    (line  95)
771730da1778Schristos* stdin, default for yyin:               Generated Scanner.   (line  37)
771830da1778Schristos* stdout, as default for yyout:          Generated Scanner.   (line 101)
771930da1778Schristos* strings, scanning strings instead of files: Multiple Input Buffers.
772030da1778Schristos                                                              (line 175)
772130da1778Schristos* tables, creating serialized:           Creating Serialized Tables.
772230da1778Schristos                                                              (line   6)
772330da1778Schristos* tables, file format:                   Tables File Format.  (line   6)
772430da1778Schristos* tables, freeing:                       Loading and Unloading Serialized Tables.
772530da1778Schristos                                                              (line   6)
772630da1778Schristos* tables, loading and unloading:         Loading and Unloading Serialized Tables.
772730da1778Schristos                                                              (line   6)
772830da1778Schristos* terminating with yyterminate():        Actions.             (line 212)
772930da1778Schristos* token:                                 Matching.            (line  14)
773030da1778Schristos* trailing context, in patterns:         Patterns.            (line 118)
773130da1778Schristos* trailing context, limits of:           Patterns.            (line 275)
773230da1778Schristos* trailing context, matching:            Matching.            (line   6)
773330da1778Schristos* trailing context, performance costs:   Performance.         (line  12)
773430da1778Schristos* trailing context, variable length:     Performance.         (line 141)
773530da1778Schristos* unput():                               Actions.             (line 143)
773630da1778Schristos* unput(), and %pointer:                 Actions.             (line 162)
773730da1778Schristos* unput(), pushing back characters:      Actions.             (line 147)
773830da1778Schristos* user code, in flex input:              User Code Section.   (line   6)
773930da1778Schristos* username expansion:                    Simple Examples.     (line   8)
774030da1778Schristos* using integer values of start condition names: Start Conditions.
774130da1778Schristos                                                              (line 163)
774230da1778Schristos* verbatim text in patterns, syntax of:  Patterns.            (line  54)
774330da1778Schristos* warning, dangerous trailing context:   Limitations.         (line  20)
774430da1778Schristos* warning, rule cannot be matched:       Diagnostics.         (line  14)
774530da1778Schristos* warnings, diagnostic messages:         Diagnostics.         (line   6)
774630da1778Schristos* whitespace, compressing:               Actions.             (line  22)
774730da1778Schristos* yacc interface:                        Yacc.                (line  17)
774830da1778Schristos* yacc, interface:                       Yacc.                (line   6)
774930da1778Schristos* yyalloc, overriding:                   Overriding The Default Memory Management.
775030da1778Schristos                                                              (line   6)
775130da1778Schristos* yyfree, overriding:                    Overriding The Default Memory Management.
775230da1778Schristos                                                              (line   6)
775330da1778Schristos* yyin:                                  Generated Scanner.   (line  37)
775430da1778Schristos* yyinput():                             Actions.             (line 202)
775530da1778Schristos* yyleng:                                Matching.            (line  14)
775630da1778Schristos* yyleng, modification of:               Actions.             (line  47)
775730da1778Schristos* yyless():                              Actions.             (line 125)
775830da1778Schristos* yyless(), pushing back characters:     Actions.             (line 131)
775930da1778Schristos* yylex(), in generated scanner:         Generated Scanner.   (line   6)
776030da1778Schristos* yylex(), overriding:                   Generated Scanner.   (line  16)
776130da1778Schristos* yylex, overriding the prototype of:    Generated Scanner.   (line  20)
776230da1778Schristos* yylineno, in a reentrant scanner:      Reentrant Functions. (line  36)
776330da1778Schristos* yylineno, performance costs:           Performance.         (line  12)
776430da1778Schristos* yymore():                              Actions.             (line 104)
776530da1778Schristos* yymore() to append token to previous token: Actions.        (line 110)
776630da1778Schristos* yymore(), mega-kludge:                 Actions.             (line 110)
776730da1778Schristos* yymore, and yyleng:                    Actions.             (line  47)
776830da1778Schristos* yymore, performance penalty of:        Actions.             (line 119)
776930da1778Schristos* yyout:                                 Generated Scanner.   (line 101)
777030da1778Schristos* yyrealloc, overriding:                 Overriding The Default Memory Management.
777130da1778Schristos                                                              (line   6)
777230da1778Schristos* yyrestart():                           Generated Scanner.   (line  42)
777330da1778Schristos* yyterminate():                         Actions.             (line 212)
777430da1778Schristos* yytext:                                Matching.            (line  14)
777530da1778Schristos* yytext, default array size:            User Values.         (line  13)
777630da1778Schristos* yytext, memory considerations:         A Note About yytext And Memory.
777730da1778Schristos                                                              (line   6)
777830da1778Schristos* yytext, modification of:               Actions.             (line  42)
777930da1778Schristos* yytext, two types of:                  Matching.            (line  29)
778030da1778Schristos* yywrap():                              Generated Scanner.   (line  85)
778130da1778Schristos* yywrap, default for:                   Generated Scanner.   (line  93)
778230da1778Schristos* YY_CURRENT_BUFFER, and multiple buffers Finally, the macro: Multiple Input Buffers.
778330da1778Schristos                                                              (line  78)
778430da1778Schristos* YY_EXTRA_TYPE, defining your own type: Extra Data.          (line  33)
778530da1778Schristos* YY_FLUSH_BUFFER:                       Actions.             (line 206)
778630da1778Schristos* YY_INPUT:                              Generated Scanner.   (line  61)
778730da1778Schristos* YY_INPUT, overriding:                  Generated Scanner.   (line  71)
778830da1778Schristos* YY_START, example:                     Start Conditions.    (line 185)
778930da1778Schristos* YY_USER_ACTION to track each time a rule is matched: Misc Macros.
779030da1778Schristos                                                              (line  14)
779130da1778Schristos
7792