xref: /netbsd-src/external/bsd/flex/dist/doc/flex.info-1 (revision a5847cc334d9a7029f6352b847e9e8d71a0f9e0c)
1This is flex.info, produced by makeinfo version 4.8 from flex.texi.
2
3INFO-DIR-SECTION Programming
4START-INFO-DIR-ENTRY
5* flex: (flex).      Fast lexical analyzer generator (lex replacement).
6END-INFO-DIR-ENTRY
7
8   The flex manual is placed under the same licensing conditions as the
9rest of flex:
10
11   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007 The Flex
12Project.
13
14   Copyright (C) 1990, 1997 The Regents of the University of California.
15All rights reserved.
16
17   This code is derived from software contributed to Berkeley by Vern
18Paxson.
19
20   The United States Government has rights in this work pursuant to
21contract no. DE-AC03-76SF00098 between the United States Department of
22Energy and the University of California.
23
24   Redistribution and use in source and binary forms, with or without
25modification, are permitted provided that the following conditions are
26met:
27
28  1.  Redistributions of source code must retain the above copyright
29     notice, this list of conditions and the following disclaimer.
30
31  2. Redistributions in binary form must reproduce the above copyright
32     notice, this list of conditions and the following disclaimer in the
33     documentation and/or other materials provided with the
34     distribution.
35
36   Neither the name of the University nor the names of its contributors
37may be used to endorse or promote products derived from this software
38without specific prior written permission.
39
40   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
41WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
42MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
43
44
45File: flex.info,  Node: Top,  Next: Copyright,  Prev: (dir),  Up: (dir)
46
47flex
48****
49
50This manual describes `flex', a tool for generating programs that
51perform pattern-matching on text.  The manual includes both tutorial and
52reference sections.
53
54   This edition of `The flex Manual' documents `flex' version 2.5.35.
55It was last updated on 10 September 2007.
56
57   This manual was written by Vern Paxson, Will Estes and John Millaway.
58
59* Menu:
60
61* Copyright::
62* Reporting Bugs::
63* Introduction::
64* Simple Examples::
65* Format::
66* Patterns::
67* Matching::
68* Actions::
69* Generated Scanner::
70* Start Conditions::
71* Multiple Input Buffers::
72* EOF::
73* Misc Macros::
74* User Values::
75* Yacc::
76* Scanner Options::
77* Performance::
78* Cxx::
79* Reentrant::
80* Lex and Posix::
81* Memory Management::
82* Serialized Tables::
83* Diagnostics::
84* Limitations::
85* Bibliography::
86* FAQ::
87* Appendices::
88* Indices::
89
90 --- The Detailed Node Listing ---
91
92Format of the Input File
93
94* Definitions Section::
95* Rules Section::
96* User Code Section::
97* Comments in the Input::
98
99Scanner Options
100
101* Options for Specifying Filenames::
102* Options Affecting Scanner Behavior::
103* Code-Level And API Options::
104* Options for Scanner Speed and Size::
105* Debugging Options::
106* Miscellaneous Options::
107
108Reentrant C Scanners
109
110* Reentrant Uses::
111* Reentrant Overview::
112* Reentrant Example::
113* Reentrant Detail::
114* Reentrant Functions::
115
116The Reentrant API in Detail
117
118* Specify Reentrant::
119* Extra Reentrant Argument::
120* Global Replacement::
121* Init and Destroy Functions::
122* Accessor Methods::
123* Extra Data::
124* About yyscan_t::
125
126Memory Management
127
128* The Default Memory Management::
129* Overriding The Default Memory Management::
130* A Note About yytext And Memory::
131
132Serialized Tables
133
134* Creating Serialized Tables::
135* Loading and Unloading Serialized Tables::
136* Tables File Format::
137
138FAQ
139
140* When was flex born?::
141* How do I expand backslash-escape sequences in C-style quoted strings?::
142* Why do flex scanners call fileno if it is not ANSI compatible?::
143* Does flex support recursive pattern definitions?::
144* How do I skip huge chunks of input (tens of megabytes) while using flex?::
145* Flex is not matching my patterns in the same order that I defined them.::
146* My actions are executing out of order or sometimes not at all.::
147* How can I have multiple input sources feed into the same scanner at the same time?::
148* Can I build nested parsers that work with the same input file?::
149* How can I match text only at the end of a file?::
150* How can I make REJECT cascade across start condition boundaries?::
151* Why cant I use fast or full tables with interactive mode?::
152* How much faster is -F or -f than -C?::
153* If I have a simple grammar cant I just parse it with flex?::
154* Why doesn't yyrestart() set the start state back to INITIAL?::
155* How can I match C-style comments?::
156* The period isn't working the way I expected.::
157* Can I get the flex manual in another format?::
158* Does there exist a "faster" NDFA->DFA algorithm?::
159* How does flex compile the DFA so quickly?::
160* How can I use more than 8192 rules?::
161* How do I abandon a file in the middle of a scan and switch to a new file?::
162* How do I execute code only during initialization (only before the first scan)?::
163* How do I execute code at termination?::
164* Where else can I find help?::
165* Can I include comments in the "rules" section of the file?::
166* I get an error about undefined yywrap().::
167* How can I change the matching pattern at run time?::
168* How can I expand macros in the input?::
169* How can I build a two-pass scanner?::
170* How do I match any string not matched in the preceding rules?::
171* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
172* Is there a way to make flex treat NULL like a regular character?::
173* Whenever flex can not match the input it says "flex scanner jammed".::
174* Why doesn't flex have non-greedy operators like perl does?::
175* Memory leak - 16386 bytes allocated by malloc.::
176* How do I track the byte offset for lseek()?::
177* How do I use my own I/O classes in a C++ scanner?::
178* How do I skip as many chars as possible?::
179* deleteme00::
180* Are certain equivalent patterns faster than others?::
181* Is backing up a big deal?::
182* Can I fake multi-byte character support?::
183* deleteme01::
184* Can you discuss some flex internals?::
185* unput() messes up yy_at_bol::
186* The | operator is not doing what I want::
187* Why can't flex understand this variable trailing context pattern?::
188* The ^ operator isn't working::
189* Trailing context is getting confused with trailing optional patterns::
190* Is flex GNU or not?::
191* ERASEME53::
192* I need to scan if-then-else blocks and while loops::
193* ERASEME55::
194* ERASEME56::
195* ERASEME57::
196* Is there a repository for flex scanners?::
197* How can I conditionally compile or preprocess my flex input file?::
198* Where can I find grammars for lex and yacc?::
199* I get an end-of-buffer message for each character scanned.::
200* unnamed-faq-62::
201* unnamed-faq-63::
202* unnamed-faq-64::
203* unnamed-faq-65::
204* unnamed-faq-66::
205* unnamed-faq-67::
206* unnamed-faq-68::
207* unnamed-faq-69::
208* unnamed-faq-70::
209* unnamed-faq-71::
210* unnamed-faq-72::
211* unnamed-faq-73::
212* unnamed-faq-74::
213* unnamed-faq-75::
214* unnamed-faq-76::
215* unnamed-faq-77::
216* unnamed-faq-78::
217* unnamed-faq-79::
218* unnamed-faq-80::
219* unnamed-faq-81::
220* unnamed-faq-82::
221* unnamed-faq-83::
222* unnamed-faq-84::
223* unnamed-faq-85::
224* unnamed-faq-86::
225* unnamed-faq-87::
226* unnamed-faq-88::
227* unnamed-faq-90::
228* unnamed-faq-91::
229* unnamed-faq-92::
230* unnamed-faq-93::
231* unnamed-faq-94::
232* unnamed-faq-95::
233* unnamed-faq-96::
234* unnamed-faq-97::
235* unnamed-faq-98::
236* unnamed-faq-99::
237* unnamed-faq-100::
238* unnamed-faq-101::
239* What is the difference between YYLEX_PARAM and YY_DECL?::
240* Why do I get "conflicting types for yylex" error?::
241* How do I access the values set in a Flex action from within a Bison action?::
242
243Appendices
244
245* Makefiles and Flex::
246* Bison Bridge::
247* M4 Dependency::
248* Common Patterns::
249
250Indices
251
252* Concept Index::
253* Index of Functions and Macros::
254* Index of Variables::
255* Index of Data Types::
256* Index of Hooks::
257* Index of Scanner Options::
258
259
260File: flex.info,  Node: Copyright,  Next: Reporting Bugs,  Prev: Top,  Up: Top
261
2621 Copyright
263***********
264
265The flex manual is placed under the same licensing conditions as the
266rest of flex:
267
268   Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007 The Flex
269Project.
270
271   Copyright (C) 1990, 1997 The Regents of the University of California.
272All rights reserved.
273
274   This code is derived from software contributed to Berkeley by Vern
275Paxson.
276
277   The United States Government has rights in this work pursuant to
278contract no. DE-AC03-76SF00098 between the United States Department of
279Energy and the University of California.
280
281   Redistribution and use in source and binary forms, with or without
282modification, are permitted provided that the following conditions are
283met:
284
285  1.  Redistributions of source code must retain the above copyright
286     notice, this list of conditions and the following disclaimer.
287
288  2. Redistributions in binary form must reproduce the above copyright
289     notice, this list of conditions and the following disclaimer in the
290     documentation and/or other materials provided with the
291     distribution.
292
293   Neither the name of the University nor the names of its contributors
294may be used to endorse or promote products derived from this software
295without specific prior written permission.
296
297   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
298WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
299MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
300
301
302File: flex.info,  Node: Reporting Bugs,  Next: Introduction,  Prev: Copyright,  Up: Top
303
3042 Reporting Bugs
305****************
306
307If you find a bug in `flex', please report it using the SourceForge Bug
308Tracking facilities which can be found on flex's SourceForge Page
309(http://sourceforge.net/projects/flex).
310
311
312File: flex.info,  Node: Introduction,  Next: Simple Examples,  Prev: Reporting Bugs,  Up: Top
313
3143 Introduction
315**************
316
317`flex' is a tool for generating "scanners".  A scanner is a program
318which recognizes lexical patterns in text.  The `flex' program reads
319the given input files, or its standard input if no file names are
320given, for a description of a scanner to generate.  The description is
321in the form of pairs of regular expressions and C code, called "rules".
322`flex' generates as output a C source file, `lex.yy.c' by default,
323which defines a routine `yylex()'.  This file can be compiled and
324linked with the flex runtime library to produce an executable.  When
325the executable is run, it analyzes its input for occurrences of the
326regular expressions.  Whenever it finds one, it executes the
327corresponding C code.
328
329
330File: flex.info,  Node: Simple Examples,  Next: Format,  Prev: Introduction,  Up: Top
331
3324 Some Simple Examples
333**********************
334
335First some simple examples to get the flavor of how one uses `flex'.
336
337   The following `flex' input specifies a scanner which, when it
338encounters the string `username' will replace it with the user's login
339name:
340
341
342         %%
343         username    printf( "%s", getlogin() );
344
345   By default, any text not matched by a `flex' scanner is copied to
346the output, so the net effect of this scanner is to copy its input file
347to its output with each occurrence of `username' expanded.  In this
348input, there is just one rule.  `username' is the "pattern" and the
349`printf' is the "action".  The `%%' symbol marks the beginning of the
350rules.
351
352   Here's another simple example:
353
354
355                 int num_lines = 0, num_chars = 0;
356
357         %%
358         \n      ++num_lines; ++num_chars;
359         .       ++num_chars;
360
361         %%
362         main()
363                 {
364                 yylex();
365                 printf( "# of lines = %d, # of chars = %d\n",
366                         num_lines, num_chars );
367                 }
368
369   This scanner counts the number of characters and the number of lines
370in its input. It produces no output other than the final report on the
371character and line counts.  The first line declares two globals,
372`num_lines' and `num_chars', which are accessible both inside `yylex()'
373and in the `main()' routine declared after the second `%%'.  There are
374two rules, one which matches a newline (`\n') and increments both the
375line count and the character count, and one which matches any character
376other than a newline (indicated by the `.' regular expression).
377
378   A somewhat more complicated example:
379
380
381         /* scanner for a toy Pascal-like language */
382
383         %{
384         /* need this for the call to atof() below */
385         #include math.h>
386         %}
387
388         DIGIT    [0-9]
389         ID       [a-z][a-z0-9]*
390
391         %%
392
393         {DIGIT}+    {
394                     printf( "An integer: %s (%d)\n", yytext,
395                             atoi( yytext ) );
396                     }
397
398         {DIGIT}+"."{DIGIT}*        {
399                     printf( "A float: %s (%g)\n", yytext,
400                             atof( yytext ) );
401                     }
402
403         if|then|begin|end|procedure|function        {
404                     printf( "A keyword: %s\n", yytext );
405                     }
406
407         {ID}        printf( "An identifier: %s\n", yytext );
408
409         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
410
411         "{"[\^{}}\n]*"}"     /* eat up one-line comments */
412
413         [ \t\n]+          /* eat up whitespace */
414
415         .           printf( "Unrecognized character: %s\n", yytext );
416
417         %%
418
419         main( argc, argv )
420         int argc;
421         char **argv;
422             {
423             ++argv, --argc;  /* skip over program name */
424             if ( argc > 0 )
425                     yyin = fopen( argv[0], "r" );
426             else
427                     yyin = stdin;
428
429             yylex();
430             }
431
432   This is the beginnings of a simple scanner for a language like
433Pascal.  It identifies different types of "tokens" and reports on what
434it has seen.
435
436   The details of this example will be explained in the following
437sections.
438
439
440File: flex.info,  Node: Format,  Next: Patterns,  Prev: Simple Examples,  Up: Top
441
4425 Format of the Input File
443**************************
444
445The `flex' input file consists of three sections, separated by a line
446containing only `%%'.
447
448
449         definitions
450         %%
451         rules
452         %%
453         user code
454
455* Menu:
456
457* Definitions Section::
458* Rules Section::
459* User Code Section::
460* Comments in the Input::
461
462
463File: flex.info,  Node: Definitions Section,  Next: Rules Section,  Prev: Format,  Up: Format
464
4655.1 Format of the Definitions Section
466=====================================
467
468The "definitions section" contains declarations of simple "name"
469definitions to simplify the scanner specification, and declarations of
470"start conditions", which are explained in a later section.
471
472   Name definitions have the form:
473
474
475         name definition
476
477   The `name' is a word beginning with a letter or an underscore (`_')
478followed by zero or more letters, digits, `_', or `-' (dash).  The
479definition is taken to begin at the first non-whitespace character
480following the name and continuing to the end of the line.  The
481definition can subsequently be referred to using `{name}', which will
482expand to `(definition)'.  For example,
483
484
485         DIGIT    [0-9]
486         ID       [a-z][a-z0-9]*
487
488   Defines `DIGIT' to be a regular expression which matches a single
489digit, and `ID' to be a regular expression which matches a letter
490followed by zero-or-more letters-or-digits.  A subsequent reference to
491
492
493         {DIGIT}+"."{DIGIT}*
494
495   is identical to
496
497
498         ([0-9])+"."([0-9])*
499
500   and matches one-or-more digits followed by a `.' followed by
501zero-or-more digits.
502
503   An unindented comment (i.e., a line beginning with `/*') is copied
504verbatim to the output up to the next `*/'.
505
506   Any _indented_ text or text enclosed in `%{' and `%}' is also copied
507verbatim to the output (with the %{ and %} symbols removed).  The %{
508and %} symbols must appear unindented on lines by themselves.
509
510   A `%top' block is similar to a `%{' ... `%}' block, except that the
511code in a `%top' block is relocated to the _top_ of the generated file,
512before any flex definitions (1).  The `%top' block is useful when you
513want certain preprocessor macros to be defined or certain files to be
514included before the generated code.  The single characters, `{'  and
515`}' are used to delimit the `%top' block, as show in the example below:
516
517
518         %top{
519             /* This code goes at the "top" of the generated file. */
520             #include <stdint.h>
521             #include <inttypes.h>
522         }
523
524   Multiple `%top' blocks are allowed, and their order is preserved.
525
526   ---------- Footnotes ----------
527
528   (1) Actually, `yyIN_HEADER' is defined before the `%top' block.
529
530
531File: flex.info,  Node: Rules Section,  Next: User Code Section,  Prev: Definitions Section,  Up: Format
532
5335.2 Format of the Rules Section
534===============================
535
536The "rules" section of the `flex' input contains a series of rules of
537the form:
538
539
540         pattern   action
541
542   where the pattern must be unindented and the action must begin on
543the same line.  *Note Patterns::, for a further description of patterns
544and actions.
545
546   In the rules section, any indented or %{ %} enclosed text appearing
547before the first rule may be used to declare variables which are local
548to the scanning routine and (after the declarations) code which is to be
549executed whenever the scanning routine is entered.  Other indented or
550%{ %} text in the rule section is still copied to the output, but its
551meaning is not well-defined and it may well cause compile-time errors
552(this feature is present for POSIX compliance. *Note Lex and Posix::,
553for other such features).
554
555   Any _indented_ text or text enclosed in `%{' and `%}' is copied
556verbatim to the output (with the %{ and %} symbols removed).  The %{
557and %} symbols must appear unindented on lines by themselves.
558
559
560File: flex.info,  Node: User Code Section,  Next: Comments in the Input,  Prev: Rules Section,  Up: Format
561
5625.3 Format of the User Code Section
563===================================
564
565The user code section is simply copied to `lex.yy.c' verbatim.  It is
566used for companion routines which call or are called by the scanner.
567The presence of this section is optional; if it is missing, the second
568`%%' in the input file may be skipped, too.
569
570
571File: flex.info,  Node: Comments in the Input,  Prev: User Code Section,  Up: Format
572
5735.4 Comments in the Input
574=========================
575
576Flex supports C-style comments, that is, anything between `/*' and `*/'
577is considered a comment. Whenever flex encounters a comment, it copies
578the entire comment verbatim to the generated source code. Comments may
579appear just about anywhere, but with the following exceptions:
580
581   * Comments may not appear in the Rules Section wherever flex is
582     expecting a regular expression. This means comments may not appear
583     at the beginning of a line, or immediately following a list of
584     scanner states.
585
586   * Comments may not appear on an `%option' line in the Definitions
587     Section.
588
589   If you want to follow a simple rule, then always begin a comment on a
590new line, with one or more whitespace characters before the initial
591`/*').  This rule will work anywhere in the input file.
592
593   All the comments in the following example are valid:
594
595
596     %{
597     /* code block */
598     %}
599
600     /* Definitions Section */
601     %x STATE_X
602
603     %%
604         /* Rules Section */
605     ruleA   /* after regex */ { /* code block */ } /* after code block */
606             /* Rules Section (indented) */
607     <STATE_X>{
608     ruleC   ECHO;
609     ruleD   ECHO;
610     %{
611     /* code block */
612     %}
613     }
614     %%
615     /* User Code Section */
616
617
618File: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
619
6206 Patterns
621**********
622
623The patterns in the input (see *Note Rules Section::) are written using
624an extended set of regular expressions.  These are:
625
626`x'
627     match the character 'x'
628
629`.'
630     any character (byte) except newline
631
632`[xyz]'
633     a "character class"; in this case, the pattern matches either an
634     'x', a 'y', or a 'z'
635
636`[abj-oZ]'
637     a "character class" with a range in it; matches an 'a', a 'b', any
638     letter from 'j' through 'o', or a 'Z'
639
640`[^A-Z]'
641     a "negated character class", i.e., any character but those in the
642     class.  In this case, any character EXCEPT an uppercase letter.
643
644`[^A-Z\n]'
645     any character EXCEPT an uppercase letter or a newline
646
647`[a-z]{-}[aeiou]'
648     the lowercase consonants
649
650`r*'
651     zero or more r's, where r is any regular expression
652
653`r+'
654     one or more r's
655
656`r?'
657     zero or one r's (that is, "an optional r")
658
659`r{2,5}'
660     anywhere from two to five r's
661
662`r{2,}'
663     two or more r's
664
665`r{4}'
666     exactly 4 r's
667
668`{name}'
669     the expansion of the `name' definition (*note Format::).
670
671`"[xyz]\"foo"'
672     the literal string: `[xyz]"foo'
673
674`\X'
675     if X is `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
676     interpretation of `\x'.  Otherwise, a literal `X' (used to escape
677     operators such as `*')
678
679`\0'
680     a NUL character (ASCII code 0)
681
682`\123'
683     the character with octal value 123
684
685`\x2a'
686     the character with hexadecimal value 2a
687
688`(r)'
689     match an `r'; parentheses are used to override precedence (see
690     below)
691
692`(?r-s:pattern)'
693     apply option `r' and omit option `s' while interpreting pattern.
694     Options may be zero or more of the characters `i', `s', or `x'.
695
696     `i' means case-insensitive. `-i' means case-sensitive.
697
698     `s' alters the meaning of the `.' syntax to match any single byte
699     whatsoever.  `-s' alters the meaning of `.' to match any byte
700     except `\n'.
701
702     `x' ignores comments and whitespace in patterns. Whitespace is
703     ignored unless it is backslash-escaped, contained within `""'s, or
704     appears inside a character class.
705
706     The following are all valid:
707
708
709     (?:foo)         same as  (foo)
710     (?i:ab7)        same as  ([aA][bB]7)
711     (?-i:ab)        same as  (ab)
712     (?s:.)          same as  [\x00-\xFF]
713     (?-s:.)         same as  [^\n]
714     (?ix-s: a . b)  same as  ([Aa][^\n][bB])
715     (?x:a  b)       same as  ("ab")
716     (?x:a\ b)       same as  ("a b")
717     (?x:a" "b)      same as  ("a b")
718     (?x:a[ ]b)      same as  ("a b")
719     (?x:a
720         /* comment */
721         b
722         c)          same as  (abc)
723
724`(?# comment )'
725     omit everything within `()'. The first `)' character encountered
726     ends the pattern. It is not possible to for the comment to contain
727     a `)' character. The comment may span lines.
728
729`rs'
730     the regular expression `r' followed by the regular expression `s';
731     called "concatenation"
732
733`r|s'
734     either an `r' or an `s'
735
736`r/s'
737     an `r' but only if it is followed by an `s'.  The text matched by
738     `s' is included when determining whether this rule is the longest
739     match, but is then returned to the input before the action is
740     executed.  So the action only sees the text matched by `r'.  This
741     type of pattern is called "trailing context".  (There are some
742     combinations of `r/s' that flex cannot match correctly. *Note
743     Limitations::, regarding dangerous trailing context.)
744
745`^r'
746     an `r', but only at the beginning of a line (i.e., when just
747     starting to scan, or right after a newline has been scanned).
748
749`r$'
750     an `r', but only at the end of a line (i.e., just before a
751     newline).  Equivalent to `r/\n'.
752
753     Note that `flex''s notion of "newline" is exactly whatever the C
754     compiler used to compile `flex' interprets `\n' as; in particular,
755     on some DOS systems you must either filter out `\r's in the input
756     yourself, or explicitly use `r/\r\n' for `r$'.
757
758`<s>r'
759     an `r', but only in start condition `s' (see *Note Start
760     Conditions:: for discussion of start conditions).
761
762`<s1,s2,s3>r'
763     same, but in any of start conditions `s1', `s2', or `s3'.
764
765`<*>r'
766     an `r' in any start condition, even an exclusive one.
767
768`<<EOF>>'
769     an end-of-file.
770
771`<s1,s2><<EOF>>'
772     an end-of-file when in start condition `s1' or `s2'
773
774   Note that inside of a character class, all regular expression
775operators lose their special meaning except escape (`\') and the
776character class operators, `-', `]]', and, at the beginning of the
777class, `^'.
778
779   The regular expressions listed above are grouped according to
780precedence, from highest precedence at the top to lowest at the bottom.
781Those grouped together have equal precedence (see special note on the
782precedence of the repeat operator, `{}', under the documentation for
783the `--posix' POSIX compliance option).  For example,
784
785
786         foo|bar*
787
788   is the same as
789
790
791         (foo)|(ba(r*))
792
793   since the `*' operator has higher precedence than concatenation, and
794concatenation higher than alternation (`|').  This pattern therefore
795matches _either_ the string `foo' _or_ the string `ba' followed by
796zero-or-more `r''s.  To match `foo' or zero-or-more repetitions of the
797string `bar', use:
798
799
800         foo|(bar)*
801
802   And to match a sequence of zero or more repetitions of `foo' and
803`bar':
804
805
806         (foo|bar)*
807
808   In addition to characters and ranges of characters, character classes
809can also contain "character class expressions".  These are expressions
810enclosed inside `[': and `:]' delimiters (which themselves must appear
811between the `[' and `]' of the character class. Other elements may
812occur inside the character class, too).  The valid expressions are:
813
814
815         [:alnum:] [:alpha:] [:blank:]
816         [:cntrl:] [:digit:] [:graph:]
817         [:lower:] [:print:] [:punct:]
818         [:space:] [:upper:] [:xdigit:]
819
820   These expressions all designate a set of characters equivalent to the
821corresponding standard C `isXXX' function.  For example, `[:alnum:]'
822designates those characters for which `isalnum()' returns true - i.e.,
823any alphabetic or numeric character.  Some systems don't provide
824`isblank()', so flex defines `[:blank:]' as a blank or a tab.
825
826   For example, the following character classes are all equivalent:
827
828
829         [[:alnum:]]
830         [[:alpha:][:digit:]]
831         [[:alpha:][0-9]]
832         [a-zA-Z0-9]
833
834   A word of caution. Character classes are expanded immediately when
835seen in the `flex' input.  This means the character classes are
836sensitive to the locale in which `flex' is executed, and the resulting
837scanner will not be sensitive to the runtime locale.  This may or may
838not be desirable.
839
840   * If your scanner is case-insensitive (the `-i' flag), then
841     `[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
842
843   * Character classes with ranges, such as `[a-Z]', should be used with
844     caution in a case-insensitive scanner if the range spans upper or
845     lowercase characters. Flex does not know if you want to fold all
846     upper and lowercase characters together, or if you want the
847     literal numeric range specified (with no case folding). When in
848     doubt, flex will assume that you meant the literal numeric range,
849     and will issue a warning. The exception to this rule is a
850     character range such as `[a-z]' or `[S-W]' where it is obvious
851     that you want case-folding to occur. Here are some examples with
852     the `-i' flag enabled:
853
854     Range        Result      Literal Range        Alternate Range
855     `[a-t]'      ok          `[a-tA-T]'
856     `[A-T]'      ok          `[a-tA-T]'
857     `[A-t]'      ambiguous   `[A-Z\[\\\]_`a-t]'   `[a-tA-T]'
858     `[_-{]'      ambiguous   `[_`a-z{]'           `[_`a-zA-Z{]'
859     `[@-C]'      ambiguous   `[@ABC]'             `[@A-Z\[\\\]_`abc]'
860
861   * A negated character class such as the example `[^A-Z]' above
862     _will_ match a newline unless `\n' (or an equivalent escape
863     sequence) is one of the characters explicitly present in the
864     negated character class (e.g., `[^A-Z\n]').  This is unlike how
865     many other regular expression tools treat negated character
866     classes, but unfortunately the inconsistency is historically
867     entrenched.  Matching newlines means that a pattern like `[^"]*'
868     can match the entire input unless there's another quote in the
869     input.
870
871     Flex allows negation of character class expressions by prepending
872     `^' to the POSIX character class name.
873
874
875              [:^alnum:] [:^alpha:] [:^blank:]
876              [:^cntrl:] [:^digit:] [:^graph:]
877              [:^lower:] [:^print:] [:^punct:]
878              [:^space:] [:^upper:] [:^xdigit:]
879
880     Flex will issue a warning if the expressions `[:^upper:]' and
881     `[:^lower:]' appear in a case-insensitive scanner, since their
882     meaning is unclear. The current behavior is to skip them entirely,
883     but this may change without notice in future revisions of flex.
884
885   *  The `{-}' operator computes the difference of two character
886     classes. For example, `[a-c]{-}[b-z]' represents all the
887     characters in the class `[a-c]' that are not in the class `[b-z]'
888     (which in this case, is just the single character `a'). The `{-}'
889     operator is left associative, so `[abc]{-}[b]{-}[c]' is the same
890     as `[a]'. Be careful not to accidentally create an empty set,
891     which will never match.
892
893   *  The `{+}' operator computes the union of two character classes.
894     For example, `[a-z]{+}[0-9]' is the same as `[a-z0-9]'. This
895     operator is useful when preceded by the result of a difference
896     operation, as in, `[[:alpha:]]{-}[[:lower:]]{+}[q]', which is
897     equivalent to `[A-Zq]' in the "C" locale.
898
899   * A rule can have at most one instance of trailing context (the `/'
900     operator or the `$' operator).  The start condition, `^', and
901     `<<EOF>>' patterns can only occur at the beginning of a pattern,
902     and, as well as with `/' and `$', cannot be grouped inside
903     parentheses.  A `^' which does not occur at the beginning of a
904     rule or a `$' which does not occur at the end of a rule loses its
905     special properties and is treated as a normal character.
906
907   * The following are invalid:
908
909
910              foo/bar$
911              <sc1>foo<sc2>bar
912
913     Note that the first of these can be written `foo/bar\n'.
914
915   * The following will result in `$' or `^' being treated as a normal
916     character:
917
918
919              foo|(bar$)
920              foo|^bar
921
922     If the desired meaning is a `foo' or a
923     `bar'-followed-by-a-newline, the following could be used (the
924     special `|' action is explained below, *note Actions::):
925
926
927              foo      |
928              bar$     /* action goes here */
929
930     A similar trick will work for matching a `foo' or a
931     `bar'-at-the-beginning-of-a-line.
932
933
934File: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
935
9367 How the Input Is Matched
937**************************
938
939When the generated scanner is run, it analyzes its input looking for
940strings which match any of its patterns.  If it finds more than one
941match, it takes the one matching the most text (for trailing context
942rules, this includes the length of the trailing part, even though it
943will then be returned to the input).  If it finds two or more matches of
944the same length, the rule listed first in the `flex' input file is
945chosen.
946
947   Once the match is determined, the text corresponding to the match
948(called the "token") is made available in the global character pointer
949`yytext', and its length in the global integer `yyleng'.  The "action"
950corresponding to the matched pattern is then executed (*note
951Actions::), and then the remaining input is scanned for another match.
952
953   If no match is found, then the "default rule" is executed: the next
954character in the input is considered matched and copied to the standard
955output.  Thus, the simplest valid `flex' input is:
956
957
958         %%
959
960   which generates a scanner that simply copies its input (one
961character at a time) to its output.
962
963   Note that `yytext' can be defined in two different ways: either as a
964character _pointer_ or as a character _array_. You can control which
965definition `flex' uses by including one of the special directives
966`%pointer' or `%array' in the first (definitions) section of your flex
967input.  The default is `%pointer', unless you use the `-l' lex
968compatibility option, in which case `yytext' will be an array.  The
969advantage of using `%pointer' is substantially faster scanning and no
970buffer overflow when matching very large tokens (unless you run out of
971dynamic memory).  The disadvantage is that you are restricted in how
972your actions can modify `yytext' (*note Actions::), and calls to the
973`unput()' function destroys the present contents of `yytext', which can
974be a considerable porting headache when moving between different `lex'
975versions.
976
977   The advantage of `%array' is that you can then modify `yytext' to
978your heart's content, and calls to `unput()' do not destroy `yytext'
979(*note Actions::).  Furthermore, existing `lex' programs sometimes
980access `yytext' externally using declarations of the form:
981
982
983         extern char yytext[];
984
985   This definition is erroneous when used with `%pointer', but correct
986for `%array'.
987
988   The `%array' declaration defines `yytext' to be an array of `YYLMAX'
989characters, which defaults to a fairly large value.  You can change the
990size by simply #define'ing `YYLMAX' to a different value in the first
991section of your `flex' input.  As mentioned above, with `%pointer'
992yytext grows dynamically to accommodate large tokens.  While this means
993your `%pointer' scanner can accommodate very large tokens (such as
994matching entire blocks of comments), bear in mind that each time the
995scanner must resize `yytext' it also must rescan the entire token from
996the beginning, so matching such tokens can prove slow.  `yytext'
997presently does _not_ dynamically grow if a call to `unput()' results in
998too much text being pushed back; instead, a run-time error results.
999
1000   Also note that you cannot use `%array' with C++ scanner classes
1001(*note Cxx::).
1002
1003
1004File: flex.info,  Node: Actions,  Next: Generated Scanner,  Prev: Matching,  Up: Top
1005
10068 Actions
1007*********
1008
1009Each pattern in a rule has a corresponding "action", which can be any
1010arbitrary C statement.  The pattern ends at the first non-escaped
1011whitespace character; the remainder of the line is its action.  If the
1012action is empty, then when the pattern is matched the input token is
1013simply discarded.  For example, here is the specification for a program
1014which deletes all occurrences of `zap me' from its input:
1015
1016
1017         %%
1018         "zap me"
1019
1020   This example will copy all other characters in the input to the
1021output since they will be matched by the default rule.
1022
1023   Here is a program which compresses multiple blanks and tabs down to a
1024single blank, and throws away whitespace found at the end of a line:
1025
1026
1027         %%
1028         [ \t]+        putchar( ' ' );
1029         [ \t]+$       /* ignore this token */
1030
1031   If the action contains a `{', then the action spans till the
1032balancing `}' is found, and the action may cross multiple lines.
1033`flex' knows about C strings and comments and won't be fooled by braces
1034found within them, but also allows actions to begin with `%{' and will
1035consider the action to be all the text up to the next `%}' (regardless
1036of ordinary braces inside the action).
1037
1038   An action consisting solely of a vertical bar (`|') means "same as
1039the action for the next rule".  See below for an illustration.
1040
1041   Actions can include arbitrary C code, including `return' statements
1042to return a value to whatever routine called `yylex()'.  Each time
1043`yylex()' is called it continues processing tokens from where it last
1044left off until it either reaches the end of the file or executes a
1045return.
1046
1047   Actions are free to modify `yytext' except for lengthening it
1048(adding characters to its end-these will overwrite later characters in
1049the input stream).  This however does not apply when using `%array'
1050(*note Matching::). In that case, `yytext' may be freely modified in
1051any way.
1052
1053   Actions are free to modify `yyleng' except they should not do so if
1054the action also includes use of `yymore()' (see below).
1055
1056   There are a number of special directives which can be included
1057within an action:
1058
1059`ECHO'
1060     copies yytext to the scanner's output.
1061
1062`BEGIN'
1063     followed by the name of a start condition places the scanner in the
1064     corresponding start condition (see below).
1065
1066`REJECT'
1067     directs the scanner to proceed on to the "second best" rule which
1068     matched the input (or a prefix of the input).  The rule is chosen
1069     as described above in *Note Matching::, and `yytext' and `yyleng'
1070     set up appropriately.  It may either be one which matched as much
1071     text as the originally chosen rule but came later in the `flex'
1072     input file, or one which matched less text.  For example, the
1073     following will both count the words in the input and call the
1074     routine `special()' whenever `frob' is seen:
1075
1076
1077                      int word_count = 0;
1078              %%
1079
1080              frob        special(); REJECT;
1081              [^ \t\n]+   ++word_count;
1082
1083     Without the `REJECT', any occurrences of `frob' in the input would
1084     not be counted as words, since the scanner normally executes only
1085     one action per token.  Multiple uses of `REJECT' are allowed, each
1086     one finding the next best choice to the currently active rule.  For
1087     example, when the following scanner scans the token `abcd', it will
1088     write `abcdabcaba' to the output:
1089
1090
1091              %%
1092              a        |
1093              ab       |
1094              abc      |
1095              abcd     ECHO; REJECT;
1096              .|\n     /* eat up any unmatched character */
1097
1098     The first three rules share the fourth's action since they use the
1099     special `|' action.
1100
1101     `REJECT' is a particularly expensive feature in terms of scanner
1102     performance; if it is used in _any_ of the scanner's actions it
1103     will slow down _all_ of the scanner's matching.  Furthermore,
1104     `REJECT' cannot be used with the `-Cf' or `-CF' options (*note
1105     Scanner Options::).
1106
1107     Note also that unlike the other special actions, `REJECT' is a
1108     _branch_.  Code immediately following it in the action will _not_
1109     be executed.
1110
1111`yymore()'
1112     tells the scanner that the next time it matches a rule, the
1113     corresponding token should be _appended_ onto the current value of
1114     `yytext' rather than replacing it.  For example, given the input
1115     `mega-kludge' the following will write `mega-mega-kludge' to the
1116     output:
1117
1118
1119              %%
1120              mega-    ECHO; yymore();
1121              kludge   ECHO;
1122
1123     First `mega-' is matched and echoed to the output.  Then `kludge'
1124     is matched, but the previous `mega-' is still hanging around at the
1125     beginning of `yytext' so the `ECHO' for the `kludge' rule will
1126     actually write `mega-kludge'.
1127
1128   Two notes regarding use of `yymore()'.  First, `yymore()' depends on
1129the value of `yyleng' correctly reflecting the size of the current
1130token, so you must not modify `yyleng' if you are using `yymore()'.
1131Second, the presence of `yymore()' in the scanner's action entails a
1132minor performance penalty in the scanner's matching speed.
1133
1134   `yyless(n)' returns all but the first `n' characters of the current
1135token back to the input stream, where they will be rescanned when the
1136scanner looks for the next match.  `yytext' and `yyleng' are adjusted
1137appropriately (e.g., `yyleng' will now be equal to `n').  For example,
1138on the input `foobar' the following will write out `foobarbar':
1139
1140
1141         %%
1142         foobar    ECHO; yyless(3);
1143         [a-z]+    ECHO;
1144
1145   An argument of 0 to `yyless()' will cause the entire current input
1146string to be scanned again.  Unless you've changed how the scanner will
1147subsequently process its input (using `BEGIN', for example), this will
1148result in an endless loop.
1149
1150   Note that `yyless()' is a macro and can only be used in the flex
1151input file, not from other source files.
1152
1153   `unput(c)' puts the character `c' back onto the input stream.  It
1154will be the next character scanned.  The following action will take the
1155current token and cause it to be rescanned enclosed in parentheses.
1156
1157
1158         {
1159         int i;
1160         /* Copy yytext because unput() trashes yytext */
1161         char *yycopy = strdup( yytext );
1162         unput( ')' );
1163         for ( i = yyleng - 1; i >= 0; --i )
1164             unput( yycopy[i] );
1165         unput( '(' );
1166         free( yycopy );
1167         }
1168
1169   Note that since each `unput()' puts the given character back at the
1170_beginning_ of the input stream, pushing back strings must be done
1171back-to-front.
1172
1173   An important potential problem when using `unput()' is that if you
1174are using `%pointer' (the default), a call to `unput()' _destroys_ the
1175contents of `yytext', starting with its rightmost character and
1176devouring one character to the left with each call.  If you need the
1177value of `yytext' preserved after a call to `unput()' (as in the above
1178example), you must either first copy it elsewhere, or build your
1179scanner using `%array' instead (*note Matching::).
1180
1181   Finally, note that you cannot put back `EOF' to attempt to mark the
1182input stream with an end-of-file.
1183
1184   `input()' reads the next character from the input stream.  For
1185example, the following is one way to eat up C comments:
1186
1187
1188         %%
1189         "/*"        {
1190                     register int c;
1191
1192                     for ( ; ; )
1193                         {
1194                         while ( (c = input()) != '*' &&
1195                                 c != EOF )
1196                             ;    /* eat up text of comment */
1197
1198                         if ( c == '*' )
1199                             {
1200                             while ( (c = input()) == '*' )
1201                                 ;
1202                             if ( c == '/' )
1203                                 break;    /* found the end */
1204                             }
1205
1206                         if ( c == EOF )
1207                             {
1208                             error( "EOF in comment" );
1209                             break;
1210                             }
1211                         }
1212                     }
1213
1214   (Note that if the scanner is compiled using `C++', then `input()' is
1215instead referred to as yyinput(), in order to avoid a name clash with
1216the `C++' stream by the name of `input'.)
1217
1218   `YY_FLUSH_BUFFER()' flushes the scanner's internal buffer so that
1219the next time the scanner attempts to match a token, it will first
1220refill the buffer using `YY_INPUT()' (*note Generated Scanner::).  This
1221action is a special case of the more general `yy_flush_buffer()'
1222function, described below (*note Multiple Input Buffers::)
1223
1224   `yyterminate()' can be used in lieu of a return statement in an
1225action.  It terminates the scanner and returns a 0 to the scanner's
1226caller, indicating "all done".  By default, `yyterminate()' is also
1227called when an end-of-file is encountered.  It is a macro and may be
1228redefined.
1229
1230
1231File: flex.info,  Node: Generated Scanner,  Next: Start Conditions,  Prev: Actions,  Up: Top
1232
12339 The Generated Scanner
1234***********************
1235
1236The output of `flex' is the file `lex.yy.c', which contains the
1237scanning routine `yylex()', a number of tables used by it for matching
1238tokens, and a number of auxiliary routines and macros.  By default,
1239`yylex()' is declared as follows:
1240
1241
1242         int yylex()
1243             {
1244             ... various definitions and the actions in here ...
1245             }
1246
1247   (If your environment supports function prototypes, then it will be
1248`int yylex( void )'.)  This definition may be changed by defining the
1249`YY_DECL' macro.  For example, you could use:
1250
1251
1252         #define YY_DECL float lexscan( a, b ) float a, b;
1253
1254   to give the scanning routine the name `lexscan', returning a float,
1255and taking two floats as arguments.  Note that if you give arguments to
1256the scanning routine using a K&R-style/non-prototyped function
1257declaration, you must terminate the definition with a semi-colon (;).
1258
1259   `flex' generates `C99' function definitions by default. However flex
1260does have the ability to generate obsolete, er, `traditional', function
1261definitions. This is to support bootstrapping gcc on old systems.
1262Unfortunately, traditional definitions prevent us from using any
1263standard data types smaller than int (such as short, char, or bool) as
1264function arguments.  For this reason, future versions of `flex' may
1265generate standard C99 code only, leaving K&R-style functions to the
1266historians.  Currently, if you do *not* want `C99' definitions, then
1267you must use `%option noansi-definitions'.
1268
1269   Whenever `yylex()' is called, it scans tokens from the global input
1270file `yyin' (which defaults to stdin).  It continues until it either
1271reaches an end-of-file (at which point it returns the value 0) or one
1272of its actions executes a `return' statement.
1273
1274   If the scanner reaches an end-of-file, subsequent calls are undefined
1275unless either `yyin' is pointed at a new input file (in which case
1276scanning continues from that file), or `yyrestart()' is called.
1277`yyrestart()' takes one argument, a `FILE *' pointer (which can be
1278NULL, if you've set up `YY_INPUT' to scan from a source other than
1279`yyin'), and initializes `yyin' for scanning from that file.
1280Essentially there is no difference between just assigning `yyin' to a
1281new input file or using `yyrestart()' to do so; the latter is available
1282for compatibility with previous versions of `flex', and because it can
1283be used to switch input files in the middle of scanning.  It can also
1284be used to throw away the current input buffer, by calling it with an
1285argument of `yyin'; but it would be better to use `YY_FLUSH_BUFFER'
1286(*note Actions::).  Note that `yyrestart()' does _not_ reset the start
1287condition to `INITIAL' (*note Start Conditions::).
1288
1289   If `yylex()' stops scanning due to executing a `return' statement in
1290one of the actions, the scanner may then be called again and it will
1291resume scanning where it left off.
1292
1293   By default (and for purposes of efficiency), the scanner uses
1294block-reads rather than simple `getc()' calls to read characters from
1295`yyin'.  The nature of how it gets its input can be controlled by
1296defining the `YY_INPUT' macro.  The calling sequence for `YY_INPUT()'
1297is `YY_INPUT(buf,result,max_size)'.  Its action is to place up to
1298`max_size' characters in the character array `buf' and return in the
1299integer variable `result' either the number of characters read or the
1300constant `YY_NULL' (0 on Unix systems) to indicate `EOF'.  The default
1301`YY_INPUT' reads from the global file-pointer `yyin'.
1302
1303   Here is a sample definition of `YY_INPUT' (in the definitions
1304section of the input file):
1305
1306
1307         %{
1308         #define YY_INPUT(buf,result,max_size) \
1309             { \
1310             int c = getchar(); \
1311             result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
1312             }
1313         %}
1314
1315   This definition will change the input processing to occur one
1316character at a time.
1317
1318   When the scanner receives an end-of-file indication from YY_INPUT, it
1319then checks the `yywrap()' function.  If `yywrap()' returns false
1320(zero), then it is assumed that the function has gone ahead and set up
1321`yyin' to point to another input file, and scanning continues.  If it
1322returns true (non-zero), then the scanner terminates, returning 0 to
1323its caller.  Note that in either case, the start condition remains
1324unchanged; it does _not_ revert to `INITIAL'.
1325
1326   If you do not supply your own version of `yywrap()', then you must
1327either use `%option noyywrap' (in which case the scanner behaves as
1328though `yywrap()' returned 1), or you must link with `-lfl' to obtain
1329the default version of the routine, which always returns 1.
1330
1331   For scanning from in-memory buffers (e.g., scanning strings), see
1332*Note Scanning Strings::. *Note Multiple Input Buffers::.
1333
1334   The scanner writes its `ECHO' output to the `yyout' global (default,
1335`stdout'), which may be redefined by the user simply by assigning it to
1336some other `FILE' pointer.
1337
1338
1339File: flex.info,  Node: Start Conditions,  Next: Multiple Input Buffers,  Prev: Generated Scanner,  Up: Top
1340
134110 Start Conditions
1342*******************
1343
1344`flex' provides a mechanism for conditionally activating rules.  Any
1345rule whose pattern is prefixed with `<sc>' will only be active when the
1346scanner is in the "start condition" named `sc'.  For example,
1347
1348
1349         <STRING>[^"]*        { /* eat up the string body ... */
1350                     ...
1351                     }
1352
1353   will be active only when the scanner is in the `STRING' start
1354condition, and
1355
1356
1357         <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
1358                     ...
1359                     }
1360
1361   will be active only when the current start condition is either
1362`INITIAL', `STRING', or `QUOTE'.
1363
1364   Start conditions are declared in the definitions (first) section of
1365the input using unindented lines beginning with either `%s' or `%x'
1366followed by a list of names.  The former declares "inclusive" start
1367conditions, the latter "exclusive" start conditions.  A start condition
1368is activated using the `BEGIN' action.  Until the next `BEGIN' action
1369is executed, rules with the given start condition will be active and
1370rules with other start conditions will be inactive.  If the start
1371condition is inclusive, then rules with no start conditions at all will
1372also be active.  If it is exclusive, then _only_ rules qualified with
1373the start condition will be active.  A set of rules contingent on the
1374same exclusive start condition describe a scanner which is independent
1375of any of the other rules in the `flex' input.  Because of this,
1376exclusive start conditions make it easy to specify "mini-scanners"
1377which scan portions of the input that are syntactically different from
1378the rest (e.g., comments).
1379
1380   If the distinction between inclusive and exclusive start conditions
1381is still a little vague, here's a simple example illustrating the
1382connection between the two.  The set of rules:
1383
1384
1385         %s example
1386         %%
1387
1388         <example>foo   do_something();
1389
1390         bar            something_else();
1391
1392   is equivalent to
1393
1394
1395         %x example
1396         %%
1397
1398         <example>foo   do_something();
1399
1400         <INITIAL,example>bar    something_else();
1401
1402   Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
1403second example wouldn't be active (i.e., couldn't match) when in start
1404condition `example'.  If we just used `<example>' to qualify `bar',
1405though, then it would only be active in `example' and not in `INITIAL',
1406while in the first example it's active in both, because in the first
1407example the `example' start condition is an inclusive `(%s)' start
1408condition.
1409
1410   Also note that the special start-condition specifier `<*>' matches
1411every start condition.  Thus, the above example could also have been
1412written:
1413
1414
1415         %x example
1416         %%
1417
1418         <example>foo   do_something();
1419
1420         <*>bar    something_else();
1421
1422   The default rule (to `ECHO' any unmatched character) remains active
1423in start conditions.  It is equivalent to:
1424
1425
1426         <*>.|\n     ECHO;
1427
1428   `BEGIN(0)' returns to the original state where only the rules with
1429no start conditions are active.  This state can also be referred to as
1430the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to
1431`BEGIN(0)'.  (The parentheses around the start condition name are not
1432required but are considered good style.)
1433
1434   `BEGIN' actions can also be given as indented code at the beginning
1435of the rules section.  For example, the following will cause the scanner
1436to enter the `SPECIAL' start condition whenever `yylex()' is called and
1437the global variable `enter_special' is true:
1438
1439
1440                 int enter_special;
1441
1442         %x SPECIAL
1443         %%
1444                 if ( enter_special )
1445                     BEGIN(SPECIAL);
1446
1447         <SPECIAL>blahblahblah
1448         ...more rules follow...
1449
1450   To illustrate the uses of start conditions, here is a scanner which
1451provides two different interpretations of a string like `123.456'.  By
1452default it will treat it as three tokens, the integer `123', a dot
1453(`.'), and the integer `456'.  But if the string is preceded earlier in
1454the line by the string `expect-floats' it will treat it as a single
1455token, the floating-point number `123.456':
1456
1457
1458         %{
1459         #include <math.h>
1460         %}
1461         %s expect
1462
1463         %%
1464         expect-floats        BEGIN(expect);
1465
1466         <expect>[0-9]+@samp{.}[0-9]+      {
1467                     printf( "found a float, = %f\n",
1468                             atof( yytext ) );
1469                     }
1470         <expect>\n           {
1471                     /* that's the end of the line, so
1472                      * we need another "expect-number"
1473                      * before we'll recognize any more
1474                      * numbers
1475                      */
1476                     BEGIN(INITIAL);
1477                     }
1478
1479         [0-9]+      {
1480                     printf( "found an integer, = %d\n",
1481                             atoi( yytext ) );
1482                     }
1483
1484         "."         printf( "found a dot\n" );
1485
1486   Here is a scanner which recognizes (and discards) C comments while
1487maintaining a count of the current input line.
1488
1489
1490         %x comment
1491         %%
1492                 int line_num = 1;
1493
1494         "/*"         BEGIN(comment);
1495
1496         <comment>[^*\n]*        /* eat anything that's not a '*' */
1497         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1498         <comment>\n             ++line_num;
1499         <comment>"*"+"/"        BEGIN(INITIAL);
1500
1501   This scanner goes to a bit of trouble to match as much text as
1502possible with each rule.  In general, when attempting to write a
1503high-speed scanner try to match as much possible in each rule, as it's
1504a big win.
1505
1506   Note that start-conditions names are really integer values and can
1507be stored as such.  Thus, the above could be extended in the following
1508fashion:
1509
1510
1511         %x comment foo
1512         %%
1513                 int line_num = 1;
1514                 int comment_caller;
1515
1516         "/*"         {
1517                      comment_caller = INITIAL;
1518                      BEGIN(comment);
1519                      }
1520
1521         ...
1522
1523         <foo>"/*"    {
1524                      comment_caller = foo;
1525                      BEGIN(comment);
1526                      }
1527
1528         <comment>[^*\n]*        /* eat anything that's not a '*' */
1529         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1530         <comment>\n             ++line_num;
1531         <comment>"*"+"/"        BEGIN(comment_caller);
1532
1533   Furthermore, you can access the current start condition using the
1534integer-valued `YY_START' macro.  For example, the above assignments to
1535`comment_caller' could instead be written
1536
1537
1538         comment_caller = YY_START;
1539
1540   Flex provides `YYSTATE' as an alias for `YY_START' (since that is
1541what's used by AT&T `lex').
1542
1543   For historical reasons, start conditions do not have their own
1544name-space within the generated scanner. The start condition names are
1545unmodified in the generated scanner and generated header.  *Note
1546option-header::. *Note option-prefix::.
1547
1548   Finally, here's an example of how to match C-style quoted strings
1549using exclusive start conditions, including expanded escape sequences
1550(but not including checking for a string that's too long):
1551
1552
1553         %x str
1554
1555         %%
1556                 char string_buf[MAX_STR_CONST];
1557                 char *string_buf_ptr;
1558
1559
1560         \"      string_buf_ptr = string_buf; BEGIN(str);
1561
1562         <str>\"        { /* saw closing quote - all done */
1563                 BEGIN(INITIAL);
1564                 *string_buf_ptr = '\0';
1565                 /* return string constant token type and
1566                  * value to parser
1567                  */
1568                 }
1569
1570         <str>\n        {
1571                 /* error - unterminated string constant */
1572                 /* generate error message */
1573                 }
1574
1575         <str>\\[0-7]{1,3} {
1576                 /* octal escape sequence */
1577                 int result;
1578
1579                 (void) sscanf( yytext + 1, "%o", &result );
1580
1581                 if ( result > 0xff )
1582                         /* error, constant is out-of-bounds */
1583
1584                 *string_buf_ptr++ = result;
1585                 }
1586
1587         <str>\\[0-9]+ {
1588                 /* generate error - bad escape sequence; something
1589                  * like '\48' or '\0777777'
1590                  */
1591                 }
1592
1593         <str>\\n  *string_buf_ptr++ = '\n';
1594         <str>\\t  *string_buf_ptr++ = '\t';
1595         <str>\\r  *string_buf_ptr++ = '\r';
1596         <str>\\b  *string_buf_ptr++ = '\b';
1597         <str>\\f  *string_buf_ptr++ = '\f';
1598
1599         <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
1600
1601         <str>[^\\\n\"]+        {
1602                 char *yptr = yytext;
1603
1604                 while ( *yptr )
1605                         *string_buf_ptr++ = *yptr++;
1606                 }
1607
1608   Often, such as in some of the examples above, you wind up writing a
1609whole bunch of rules all preceded by the same start condition(s).  Flex
1610makes this a little easier and cleaner by introducing a notion of start
1611condition "scope".  A start condition scope is begun with:
1612
1613
1614         <SCs>{
1615
1616   where `SCs' is a list of one or more start conditions.  Inside the
1617start condition scope, every rule automatically has the prefix `SCs>'
1618applied to it, until a `}' which matches the initial `{'.  So, for
1619example,
1620
1621
1622         <ESC>{
1623             "\\n"   return '\n';
1624             "\\r"   return '\r';
1625             "\\f"   return '\f';
1626             "\\0"   return '\0';
1627         }
1628
1629   is equivalent to:
1630
1631
1632         <ESC>"\\n"  return '\n';
1633         <ESC>"\\r"  return '\r';
1634         <ESC>"\\f"  return '\f';
1635         <ESC>"\\0"  return '\0';
1636
1637   Start condition scopes may be nested.
1638
1639   The following routines are available for manipulating stacks of
1640start conditions:
1641
1642 -- Function: void yy_push_state ( int `new_state' )
1643     pushes the current start condition onto the top of the start
1644     condition stack and switches to `new_state' as though you had used
1645     `BEGIN new_state' (recall that start condition names are also
1646     integers).
1647
1648 -- Function: void yy_pop_state ()
1649     pops the top of the stack and switches to it via `BEGIN'.
1650
1651 -- Function: int yy_top_state ()
1652     returns the top of the stack without altering the stack's contents.
1653
1654   The start condition stack grows dynamically and so has no built-in
1655size limitation.  If memory is exhausted, program execution aborts.
1656
1657   To use start condition stacks, your scanner must include a `%option
1658stack' directive (*note Scanner Options::).
1659
1660
1661File: flex.info,  Node: Multiple Input Buffers,  Next: EOF,  Prev: Start Conditions,  Up: Top
1662
166311 Multiple Input Buffers
1664*************************
1665
1666Some scanners (such as those which support "include" files) require
1667reading from several input streams.  As `flex' scanners do a large
1668amount of buffering, one cannot control where the next input will be
1669read from by simply writing a `YY_INPUT()' which is sensitive to the
1670scanning context.  `YY_INPUT()' is only called when the scanner reaches
1671the end of its buffer, which may be a long time after scanning a
1672statement such as an `include' statement which requires switching the
1673input source.
1674
1675   To negotiate these sorts of problems, `flex' provides a mechanism
1676for creating and switching between multiple input buffers.  An input
1677buffer is created by using:
1678
1679 -- Function: YY_BUFFER_STATE yy_create_buffer ( FILE *file, int size )
1680
1681   which takes a `FILE' pointer and a size and creates a buffer
1682associated with the given file and large enough to hold `size'
1683characters (when in doubt, use `YY_BUF_SIZE' for the size).  It returns
1684a `YY_BUFFER_STATE' handle, which may then be passed to other routines
1685(see below).  The `YY_BUFFER_STATE' type is a pointer to an opaque
1686`struct yy_buffer_state' structure, so you may safely initialize
1687`YY_BUFFER_STATE' variables to `((YY_BUFFER_STATE) 0)' if you wish, and
1688also refer to the opaque structure in order to correctly declare input
1689buffers in source files other than that of your scanner.  Note that the
1690`FILE' pointer in the call to `yy_create_buffer' is only used as the
1691value of `yyin' seen by `YY_INPUT'.  If you redefine `YY_INPUT()' so it
1692no longer uses `yyin', then you can safely pass a NULL `FILE' pointer to
1693`yy_create_buffer'.  You select a particular buffer to scan from using:
1694
1695 -- Function: void yy_switch_to_buffer ( YY_BUFFER_STATE new_buffer )
1696
1697   The above function switches the scanner's input buffer so subsequent
1698tokens will come from `new_buffer'.  Note that `yy_switch_to_buffer()'
1699may be used by `yywrap()' to set things up for continued scanning,
1700instead of opening a new file and pointing `yyin' at it. If you are
1701looking for a stack of input buffers, then you want to use
1702`yypush_buffer_state()' instead of this function. Note also that
1703switching input sources via either `yy_switch_to_buffer()' or
1704`yywrap()' does _not_ change the start condition.
1705
1706 -- Function: void yy_delete_buffer ( YY_BUFFER_STATE buffer )
1707
1708   is used to reclaim the storage associated with a buffer.  (`buffer'
1709can be NULL, in which case the routine does nothing.)  You can also
1710clear the current contents of a buffer using:
1711
1712 -- Function: void yypush_buffer_state ( YY_BUFFER_STATE buffer )
1713
1714   This function pushes the new buffer state onto an internal stack.
1715The pushed state becomes the new current state. The stack is maintained
1716by flex and will grow as required. This function is intended to be used
1717instead of `yy_switch_to_buffer', when you want to change states, but
1718preserve the current state for later use.
1719
1720 -- Function: void yypop_buffer_state ( )
1721
1722   This function removes the current state from the top of the stack,
1723and deletes it by calling `yy_delete_buffer'.  The next state on the
1724stack, if any, becomes the new current state.
1725
1726 -- Function: void yy_flush_buffer ( YY_BUFFER_STATE buffer )
1727
1728   This function discards the buffer's contents, so the next time the
1729scanner attempts to match a token from the buffer, it will first fill
1730the buffer anew using `YY_INPUT()'.
1731
1732 -- Function: YY_BUFFER_STATE yy_new_buffer ( FILE *file, int size )
1733
1734   is an alias for `yy_create_buffer()', provided for compatibility
1735with the C++ use of `new' and `delete' for creating and destroying
1736dynamic objects.
1737
1738   `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE' handle to the
1739current buffer. It should not be used as an lvalue.
1740
1741   Here are two examples of using these features for writing a scanner
1742which expands include files (the `<<EOF>>' feature is discussed below).
1743
1744   This first example uses yypush_buffer_state and yypop_buffer_state.
1745Flex maintains the stack internally.
1746
1747
1748         /* the "incl" state is used for picking up the name
1749          * of an include file
1750          */
1751         %x incl
1752         %%
1753         include             BEGIN(incl);
1754
1755         [a-z]+              ECHO;
1756         [^a-z\n]*\n?        ECHO;
1757
1758         <incl>[ \t]*      /* eat the whitespace */
1759         <incl>[^ \t\n]+   { /* got the include file name */
1760                 yyin = fopen( yytext, "r" );
1761
1762                 if ( ! yyin )
1763                     error( ... );
1764
1765     			yypush_buffer_state(yy_create_buffer( yyin, YY_BUF_SIZE ));
1766
1767                 BEGIN(INITIAL);
1768                 }
1769
1770         <<EOF>> {
1771     			yypop_buffer_state();
1772
1773                 if ( !YY_CURRENT_BUFFER )
1774                     {
1775                     yyterminate();
1776                     }
1777                 }
1778
1779   The second example, below, does the same thing as the previous
1780example did, but manages its own input buffer stack manually (instead
1781of letting flex do it).
1782
1783
1784         /* the "incl" state is used for picking up the name
1785          * of an include file
1786          */
1787         %x incl
1788
1789         %{
1790         #define MAX_INCLUDE_DEPTH 10
1791         YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1792         int include_stack_ptr = 0;
1793         %}
1794
1795         %%
1796         include             BEGIN(incl);
1797
1798         [a-z]+              ECHO;
1799         [^a-z\n]*\n?        ECHO;
1800
1801         <incl>[ \t]*      /* eat the whitespace */
1802         <incl>[^ \t\n]+   { /* got the include file name */
1803                 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1804                     {
1805                     fprintf( stderr, "Includes nested too deeply" );
1806                     exit( 1 );
1807                     }
1808
1809                 include_stack[include_stack_ptr++] =
1810                     YY_CURRENT_BUFFER;
1811
1812                 yyin = fopen( yytext, "r" );
1813
1814                 if ( ! yyin )
1815                     error( ... );
1816
1817                 yy_switch_to_buffer(
1818                     yy_create_buffer( yyin, YY_BUF_SIZE ) );
1819
1820                 BEGIN(INITIAL);
1821                 }
1822
1823         <<EOF>> {
1824                 if ( --include_stack_ptr  0 )
1825                     {
1826                     yyterminate();
1827                     }
1828
1829                 else
1830                     {
1831                     yy_delete_buffer( YY_CURRENT_BUFFER );
1832                     yy_switch_to_buffer(
1833                          include_stack[include_stack_ptr] );
1834                     }
1835                 }
1836
1837   The following routines are available for setting up input buffers for
1838scanning in-memory strings instead of files.  All of them create a new
1839input buffer for scanning the string, and return a corresponding
1840`YY_BUFFER_STATE' handle (which you should delete with
1841`yy_delete_buffer()' when done with it).  They also switch to the new
1842buffer using `yy_switch_to_buffer()', so the next call to `yylex()'
1843will start scanning the string.
1844
1845 -- Function: YY_BUFFER_STATE yy_scan_string ( const char *str )
1846     scans a NUL-terminated string.
1847
1848 -- Function: YY_BUFFER_STATE yy_scan_bytes ( const char *bytes, int
1849          len )
1850     scans `len' bytes (including possibly `NUL's) starting at location
1851     `bytes'.
1852
1853   Note that both of these functions create and scan a _copy_ of the
1854string or bytes.  (This may be desirable, since `yylex()' modifies the
1855contents of the buffer it is scanning.)  You can avoid the copy by
1856using:
1857
1858 -- Function: YY_BUFFER_STATE yy_scan_buffer (char *base, yy_size_t
1859          size)
1860     which scans in place the buffer starting at `base', consisting of
1861     `size' bytes, the last two bytes of which _must_ be
1862     `YY_END_OF_BUFFER_CHAR' (ASCII NUL).  These last two bytes are not
1863     scanned; thus, scanning consists of `base[0]' through
1864     `base[size-2]', inclusive.
1865
1866   If you fail to set up `base' in this manner (i.e., forget the final
1867two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()' returns a
1868NULL pointer instead of creating a new input buffer.
1869
1870 -- Data type: yy_size_t
1871     is an integral type to which you can cast an integer expression
1872     reflecting the size of the buffer.
1873
1874
1875File: flex.info,  Node: EOF,  Next: Misc Macros,  Prev: Multiple Input Buffers,  Up: Top
1876
187712 End-of-File Rules
1878********************
1879
1880The special rule `<<EOF>>' indicates actions which are to be taken when
1881an end-of-file is encountered and `yywrap()' returns non-zero (i.e.,
1882indicates no further files to process).  The action must finish by
1883doing one of the following things:
1884
1885   * assigning `yyin' to a new input file (in previous versions of
1886     `flex', after doing the assignment you had to call the special
1887     action `YY_NEW_FILE'.  This is no longer necessary.)
1888
1889   * executing a `return' statement;
1890
1891   * executing the special `yyterminate()' action.
1892
1893   * or, switching to a new buffer using `yy_switch_to_buffer()' as
1894     shown in the example above.
1895
1896   <<EOF>> rules may not be used with other patterns; they may only be
1897qualified with a list of start conditions.  If an unqualified <<EOF>>
1898rule is given, it applies to _all_ start conditions which do not
1899already have <<EOF>> actions.  To specify an <<EOF>> rule for only the
1900initial start condition, use:
1901
1902
1903         <INITIAL><<EOF>>
1904
1905   These rules are useful for catching things like unclosed comments.
1906An example:
1907
1908
1909         %x quote
1910         %%
1911
1912         ...other rules for dealing with quotes...
1913
1914         <quote><<EOF>>   {
1915                  error( "unterminated quote" );
1916                  yyterminate();
1917                  }
1918        <<EOF>>  {
1919                  if ( *++filelist )
1920                      yyin = fopen( *filelist, "r" );
1921                  else
1922                     yyterminate();
1923                  }
1924
1925
1926File: flex.info,  Node: Misc Macros,  Next: User Values,  Prev: EOF,  Up: Top
1927
192813 Miscellaneous Macros
1929***********************
1930
1931The macro `YY_USER_ACTION' can be defined to provide an action which is
1932always executed prior to the matched rule's action.  For example, it
1933could be #define'd to call a routine to convert yytext to lower-case.
1934When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
1935number of the matched rule (rules are numbered starting with 1).
1936Suppose you want to profile how often each of your rules is matched.
1937The following would do the trick:
1938
1939
1940         #define YY_USER_ACTION ++ctr[yy_act]
1941
1942   where `ctr' is an array to hold the counts for the different rules.
1943Note that the macro `YY_NUM_RULES' gives the total number of rules
1944(including the default rule), even if you use `-s)', so a correct
1945declaration for `ctr' is:
1946
1947
1948         int ctr[YY_NUM_RULES];
1949
1950   The macro `YY_USER_INIT' may be defined to provide an action which
1951is always executed before the first scan (and before the scanner's
1952internal initializations are done).  For example, it could be used to
1953call a routine to read in a data table or open a logging file.
1954
1955   The macro `yy_set_interactive(is_interactive)' can be used to
1956control whether the current buffer is considered "interactive".  An
1957interactive buffer is processed more slowly, but must be used when the
1958scanner's input source is indeed interactive to avoid problems due to
1959waiting to fill buffers (see the discussion of the `-I' flag in *Note
1960Scanner Options::).  A non-zero value in the macro invocation marks the
1961buffer as interactive, a zero value as non-interactive.  Note that use
1962of this macro overrides `%option always-interactive' or `%option
1963never-interactive' (*note Scanner Options::).  `yy_set_interactive()'
1964must be invoked prior to beginning to scan the buffer that is (or is
1965not) to be considered interactive.
1966
1967   The macro `yy_set_bol(at_bol)' can be used to control whether the
1968current buffer's scanning context for the next token match is done as
1969though at the beginning of a line.  A non-zero macro argument makes
1970rules anchored with `^' active, while a zero argument makes `^' rules
1971inactive.
1972
1973   The macro `YY_AT_BOL()' returns true if the next token scanned from
1974the current buffer will have `^' rules active, false otherwise.
1975
1976   In the generated scanner, the actions are all gathered in one large
1977switch statement and separated using `YY_BREAK', which may be
1978redefined.  By default, it is simply a `break', to separate each rule's
1979action from the following rule's.  Redefining `YY_BREAK' allows, for
1980example, C++ users to #define YY_BREAK to do nothing (while being very
1981careful that every rule ends with a `break' or a `return'!) to avoid
1982suffering from unreachable statement warnings where because a rule's
1983action ends with `return', the `YY_BREAK' is inaccessible.
1984
1985
1986File: flex.info,  Node: User Values,  Next: Yacc,  Prev: Misc Macros,  Up: Top
1987
198814 Values Available To the User
1989*******************************
1990
1991This chapter summarizes the various values available to the user in the
1992rule actions.
1993
1994`char *yytext'
1995     holds the text of the current token.  It may be modified but not
1996     lengthened (you cannot append characters to the end).
1997
1998     If the special directive `%array' appears in the first section of
1999     the scanner description, then `yytext' is instead declared `char
2000     yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
2001     redefine in the first section if you don't like the default value
2002     (generally 8KB).  Using `%array' results in somewhat slower
2003     scanners, but the value of `yytext' becomes immune to calls to
2004     `unput()', which potentially destroy its value when `yytext' is a
2005     character pointer.  The opposite of `%array' is `%pointer', which
2006     is the default.
2007
2008     You cannot use `%array' when generating C++ scanner classes (the
2009     `-+' flag).
2010
2011`int yyleng'
2012     holds the length of the current token.
2013
2014`FILE *yyin'
2015     is the file which by default `flex' reads from.  It may be
2016     redefined but doing so only makes sense before scanning begins or
2017     after an EOF has been encountered.  Changing it in the midst of
2018     scanning will have unexpected results since `flex' buffers its
2019     input; use `yyrestart()' instead.  Once scanning terminates
2020     because an end-of-file has been seen, you can assign `yyin' at the
2021     new input file and then call the scanner again to continue
2022     scanning.
2023
2024`void yyrestart( FILE *new_file )'
2025     may be called to point `yyin' at the new input file.  The
2026     switch-over to the new file is immediate (any previously
2027     buffered-up input is lost).  Note that calling `yyrestart()' with
2028     `yyin' as an argument thus throws away the current input buffer
2029     and continues scanning the same input file.
2030
2031`FILE *yyout'
2032     is the file to which `ECHO' actions are done.  It can be reassigned
2033     by the user.
2034
2035`YY_CURRENT_BUFFER'
2036     returns a `YY_BUFFER_STATE' handle to the current buffer.
2037
2038`YY_START'
2039     returns an integer value corresponding to the current start
2040     condition.  You can subsequently use this value with `BEGIN' to
2041     return to that start condition.
2042
2043
2044File: flex.info,  Node: Yacc,  Next: Scanner Options,  Prev: User Values,  Up: Top
2045
204615 Interfacing with Yacc
2047************************
2048
2049One of the main uses of `flex' is as a companion to the `yacc'
2050parser-generator.  `yacc' parsers expect to call a routine named
2051`yylex()' to find the next input token.  The routine is supposed to
2052return the type of the next token as well as putting any associated
2053value in the global `yylval'.  To use `flex' with `yacc', one specifies
2054the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
2055containing definitions of all the `%tokens' appearing in the `yacc'
2056input.  This file is then included in the `flex' scanner.  For example,
2057if one of the tokens is `TOK_NUMBER', part of the scanner might look
2058like:
2059
2060
2061         %{
2062         #include "y.tab.h"
2063         %}
2064
2065         %%
2066
2067         [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
2068
2069
2070File: flex.info,  Node: Scanner Options,  Next: Performance,  Prev: Yacc,  Up: Top
2071
207216 Scanner Options
2073******************
2074
2075The various `flex' options are categorized by function in the following
2076menu. If you want to lookup a particular option by name, *Note Index of
2077Scanner Options::.
2078
2079* Menu:
2080
2081* Options for Specifying Filenames::
2082* Options Affecting Scanner Behavior::
2083* Code-Level And API Options::
2084* Options for Scanner Speed and Size::
2085* Debugging Options::
2086* Miscellaneous Options::
2087
2088   Even though there are many scanner options, a typical scanner might
2089only specify the following options:
2090
2091
2092     %option   8bit reentrant bison-bridge
2093     %option   warn nodefault
2094     %option   yylineno
2095     %option   outfile="scanner.c" header-file="scanner.h"
2096
2097   The first line specifies the general type of scanner we want. The
2098second line specifies that we are being careful. The third line asks
2099flex to track line numbers. The last line tells flex what to name the
2100files. (The options can be specified in any order. We just divided
2101them.)
2102
2103   `flex' also provides a mechanism for controlling options within the
2104scanner specification itself, rather than from the flex command-line.
2105This is done by including `%option' directives in the first section of
2106the scanner specification.  You can specify multiple options with a
2107single `%option' directive, and multiple directives in the first
2108section of your flex input file.
2109
2110   Most options are given simply as names, optionally preceded by the
2111word `no' (with no intervening whitespace) to negate their meaning.
2112The names are the same as their long-option equivalents (but without the
2113leading `--' ).
2114
2115   `flex' scans your rule actions to determine whether you use the
2116`REJECT' or `yymore()' features.  The `REJECT' and `yymore' options are
2117available to override its decision as to whether you use the options,
2118either by setting them (e.g., `%option reject)' to indicate the feature
2119is indeed used, or unsetting them to indicate it actually is not used
2120(e.g., `%option noyymore)'.
2121
2122   A number of options are available for lint purists who want to
2123suppress the appearance of unneeded routines in the generated scanner.
2124Each of the following, if unset (e.g., `%option nounput'), results in
2125the corresponding routine not appearing in the generated scanner:
2126
2127
2128         input, unput
2129         yy_push_state, yy_pop_state, yy_top_state
2130         yy_scan_buffer, yy_scan_bytes, yy_scan_string
2131
2132         yyget_extra, yyset_extra, yyget_leng, yyget_text,
2133         yyget_lineno, yyset_lineno, yyget_in, yyset_in,
2134         yyget_out, yyset_out, yyget_lval, yyset_lval,
2135         yyget_lloc, yyset_lloc, yyget_debug, yyset_debug
2136
2137   (though `yy_push_state()' and friends won't appear anyway unless you
2138use `%option stack)'.
2139
2140
2141File: flex.info,  Node: Options for Specifying Filenames,  Next: Options Affecting Scanner Behavior,  Prev: Scanner Options,  Up: Scanner Options
2142
214316.1 Options for Specifying Filenames
2144=====================================
2145
2146`--header-file=FILE, `%option header-file="FILE"''
2147     instructs flex to write a C header to `FILE'. This file contains
2148     function prototypes, extern variables, and types used by the
2149     scanner.  Only the external API is exported by the header file.
2150     Many macros that are usable from within scanner actions are not
2151     exported to the header file. This is due to namespace problems and
2152     the goal of a clean external API.
2153
2154     While in the header, the macro `yyIN_HEADER' is defined, where `yy'
2155     is substituted with the appropriate prefix.
2156
2157     The `--header-file' option is not compatible with the `--c++'
2158     option, since the C++ scanner provides its own header in
2159     `yyFlexLexer.h'.
2160
2161`-oFILE, --outfile=FILE, `%option outfile="FILE"''
2162     directs flex to write the scanner to the file `FILE' instead of
2163     `lex.yy.c'.  If you combine `--outfile' with the `--stdout' option,
2164     then the scanner is written to `stdout' but its `#line' directives
2165     (see the `-l' option above) refer to the file `FILE'.
2166
2167`-t, --stdout, `%option stdout''
2168     instructs `flex' to write the scanner it generates to standard
2169     output instead of `lex.yy.c'.
2170
2171`-SFILE, --skel=FILE'
2172     overrides the default skeleton file from which `flex' constructs
2173     its scanners.  You'll never need this option unless you are doing
2174     `flex' maintenance or development.
2175
2176`--tables-file=FILE'
2177     Write serialized scanner dfa tables to FILE. The generated scanner
2178     will not contain the tables, and requires them to be loaded at
2179     runtime.  *Note serialization::.
2180
2181`--tables-verify'
2182     This option is for flex development. We document it here in case
2183     you stumble upon it by accident or in case you suspect some
2184     inconsistency in the serialized tables.  Flex will serialize the
2185     scanner dfa tables but will also generate the in-code tables as it
2186     normally does. At runtime, the scanner will verify that the
2187     serialized tables match the in-code tables, instead of loading
2188     them.
2189
2190
2191
2192File: flex.info,  Node: Options Affecting Scanner Behavior,  Next: Code-Level And API Options,  Prev: Options for Specifying Filenames,  Up: Scanner Options
2193
219416.2 Options Affecting Scanner Behavior
2195=======================================
2196
2197`-i, --case-insensitive, `%option case-insensitive''
2198     instructs `flex' to generate a "case-insensitive" scanner.  The
2199     case of letters given in the `flex' input patterns will be ignored,
2200     and tokens in the input will be matched regardless of case.  The
2201     matched text given in `yytext' will have the preserved case (i.e.,
2202     it will not be folded).  For tricky behavior, see *Note case and
2203     character ranges::.
2204
2205`-l, --lex-compat, `%option lex-compat''
2206     turns on maximum compatibility with the original AT&T `lex'
2207     implementation.  Note that this does not mean _full_ compatibility.
2208     Use of this option costs a considerable amount of performance, and
2209     it cannot be used with the `--c++', `--full', `--fast', `-Cf', or
2210     `-CF' options.  For details on the compatibilities it provides, see
2211     *Note Lex and Posix::.  This option also results in the name
2212     `YY_FLEX_LEX_COMPAT' being `#define''d in the generated scanner.
2213
2214`-B, --batch, `%option batch''
2215     instructs `flex' to generate a "batch" scanner, the opposite of
2216     _interactive_ scanners generated by `--interactive' (see below).
2217     In general, you use `-B' when you are _certain_ that your scanner
2218     will never be used interactively, and you want to squeeze a
2219     _little_ more performance out of it.  If your goal is instead to
2220     squeeze out a _lot_ more performance, you should be using the
2221     `-Cf' or `-CF' options, which turn on `--batch' automatically
2222     anyway.
2223
2224`-I, --interactive, `%option interactive''
2225     instructs `flex' to generate an interactive scanner.  An
2226     interactive scanner is one that only looks ahead to decide what
2227     token has been matched if it absolutely must.  It turns out that
2228     always looking one extra character ahead, even if the scanner has
2229     already seen enough text to disambiguate the current token, is a
2230     bit faster than only looking ahead when necessary.  But scanners
2231     that always look ahead give dreadful interactive performance; for
2232     example, when a user types a newline, it is not recognized as a
2233     newline token until they enter _another_ token, which often means
2234     typing in another whole line.
2235
2236     `flex' scanners default to `interactive' unless you use the `-Cf'
2237     or `-CF' table-compression options (*note Performance::).  That's
2238     because if you're looking for high-performance you should be using
2239     one of these options, so if you didn't, `flex' assumes you'd
2240     rather trade off a bit of run-time performance for intuitive
2241     interactive behavior.  Note also that you _cannot_ use
2242     `--interactive' in conjunction with `-Cf' or `-CF'.  Thus, this
2243     option is not really needed; it is on by default for all those
2244     cases in which it is allowed.
2245
2246     You can force a scanner to _not_ be interactive by using `--batch'
2247
2248`-7, --7bit, `%option 7bit''
2249     instructs `flex' to generate a 7-bit scanner, i.e., one which can
2250     only recognize 7-bit characters in its input.  The advantage of
2251     using `--7bit' is that the scanner's tables can be up to half the
2252     size of those generated using the `--8bit'.  The disadvantage is
2253     that such scanners often hang or crash if their input contains an
2254     8-bit character.
2255
2256     Note, however, that unless you generate your scanner using the
2257     `-Cf' or `-CF' table compression options, use of `--7bit' will
2258     save only a small amount of table space, and make your scanner
2259     considerably less portable.  `Flex''s default behavior is to
2260     generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
2261     which case `flex' defaults to generating 7-bit scanners unless
2262     your site was always configured to generate 8-bit scanners (as will
2263     often be the case with non-USA sites).  You can tell whether flex
2264     generated a 7-bit or an 8-bit scanner by inspecting the flag
2265     summary in the `--verbose' output as described above.
2266
2267     Note that if you use `-Cfe' or `-CFe' `flex' still defaults to
2268     generating an 8-bit scanner, since usually with these compression
2269     options full 8-bit tables are not much more expensive than 7-bit
2270     tables.
2271
2272`-8, --8bit, `%option 8bit''
2273     instructs `flex' to generate an 8-bit scanner, i.e., one which can
2274     recognize 8-bit characters.  This flag is only needed for scanners
2275     generated using `-Cf' or `-CF', as otherwise flex defaults to
2276     generating an 8-bit scanner anyway.
2277
2278     See the discussion of `--7bit' above for `flex''s default behavior
2279     and the tradeoffs between 7-bit and 8-bit scanners.
2280
2281`--default, `%option default''
2282     generate the default rule.
2283
2284`--always-interactive, `%option always-interactive''
2285     instructs flex to generate a scanner which always considers its
2286     input _interactive_.  Normally, on each new input file the scanner
2287     calls `isatty()' in an attempt to determine whether the scanner's
2288     input source is interactive and thus should be read a character at
2289     a time.  When this option is used, however, then no such call is
2290     made.
2291
2292`--never-interactive, `--never-interactive''
2293     instructs flex to generate a scanner which never considers its
2294     input interactive.  This is the opposite of `always-interactive'.
2295
2296`-X, --posix, `%option posix''
2297     turns on maximum compatibility with the POSIX 1003.2-1992
2298     definition of `lex'.  Since `flex' was originally designed to
2299     implement the POSIX definition of `lex' this generally involves
2300     very few changes in behavior.  At the current writing the known
2301     differences between `flex' and the POSIX standard are:
2302
2303        * In POSIX and AT&T `lex', the repeat operator, `{}', has lower
2304          precedence than concatenation (thus `ab{3}' yields `ababab').
2305          Most POSIX utilities use an Extended Regular Expression (ERE)
2306          precedence that has the precedence of the repeat operator
2307          higher than concatenation (which causes `ab{3}' to yield
2308          `abbb').  By default, `flex' places the precedence of the
2309          repeat operator higher than concatenation which matches the
2310          ERE processing of other POSIX utilities.  When either
2311          `--posix' or `-l' are specified, `flex' will use the
2312          traditional AT&T and POSIX-compliant precedence for the
2313          repeat operator where concatenation has higher precedence
2314          than the repeat operator.
2315
2316`--stack, `%option stack''
2317     enables the use of start condition stacks (*note Start
2318     Conditions::).
2319
2320`--stdinit, `%option stdinit''
2321     if set (i.e., %option stdinit) initializes `yyin' and `yyout' to
2322     `stdin' and `stdout', instead of the default of `NULL'.  Some
2323     existing `lex' programs depend on this behavior, even though it is
2324     not compliant with ANSI C, which does not require `stdin' and
2325     `stdout' to be compile-time constant. In a reentrant scanner,
2326     however, this is not a problem since initialization is performed
2327     in `yylex_init' at runtime.
2328
2329`--yylineno, `%option yylineno''
2330     directs `flex' to generate a scanner that maintains the number of
2331     the current line read from its input in the global variable
2332     `yylineno'.  This option is implied by `%option lex-compat'.  In a
2333     reentrant C scanner, the macro `yylineno' is accessible regardless
2334     of the value of `%option yylineno', however, its value is not
2335     modified by `flex' unless `%option yylineno' is enabled.
2336
2337`--yywrap, `%option yywrap''
2338     if unset (i.e., `--noyywrap)', makes the scanner not call
2339     `yywrap()' upon an end-of-file, but simply assume that there are no
2340     more files to scan (until the user points `yyin' at a new file and
2341     calls `yylex()' again).
2342
2343
2344
2345File: flex.info,  Node: Code-Level And API Options,  Next: Options for Scanner Speed and Size,  Prev: Options Affecting Scanner Behavior,  Up: Scanner Options
2346
234716.3 Code-Level And API Options
2348===============================
2349
2350`--ansi-definitions, `%option ansi-definitions''
2351     instruct flex to generate ANSI C99 definitions for functions.
2352     This option is enabled by default.  If `%option
2353     noansi-definitions' is specified, then the obsolete style is
2354     generated.
2355
2356`--ansi-prototypes, `%option ansi-prototypes''
2357     instructs flex to generate ANSI C99 prototypes for functions.
2358     This option is enabled by default.  If `noansi-prototypes' is
2359     specified, then prototypes will have empty parameter lists.
2360
2361`--bison-bridge, `%option bison-bridge''
2362     instructs flex to generate a C scanner that is meant to be called
2363     by a `GNU bison' parser. The scanner has minor API changes for
2364     `bison' compatibility. In particular, the declaration of `yylex'
2365     is modified to take an additional parameter, `yylval'.  *Note
2366     Bison Bridge::.
2367
2368`--bison-locations, `%option bison-locations''
2369     instruct flex that `GNU bison' `%locations' are being used.  This
2370     means `yylex' will be passed an additional parameter, `yylloc'.
2371     This option implies `%option bison-bridge'.  *Note Bison Bridge::.
2372
2373`-L, --noline, `%option noline''
2374     instructs `flex' not to generate `#line' directives.  Without this
2375     option, `flex' peppers the generated scanner with `#line'
2376     directives so error messages in the actions will be correctly
2377     located with respect to either the original `flex' input file (if
2378     the errors are due to code in the input file), or `lex.yy.c' (if
2379     the errors are `flex''s fault - you should report these sorts of
2380     errors to the email address given in *Note Reporting Bugs::).
2381
2382`-R, --reentrant, `%option reentrant''
2383     instructs flex to generate a reentrant C scanner.  The generated
2384     scanner may safely be used in a multi-threaded environment. The
2385     API for a reentrant scanner is different than for a non-reentrant
2386     scanner *note Reentrant::).  Because of the API difference between
2387     reentrant and non-reentrant `flex' scanners, non-reentrant flex
2388     code must be modified before it is suitable for use with this
2389     option.  This option is not compatible with the `--c++' option.
2390
2391     The option `--reentrant' does not affect the performance of the
2392     scanner.
2393
2394`-+, --c++, `%option c++''
2395     specifies that you want flex to generate a C++ scanner class.
2396     *Note Cxx::, for details.
2397
2398`--array, `%option array''
2399     specifies that you want yytext to be an array instead of a char*
2400
2401`--pointer, `%option pointer''
2402     specify that  `yytext' should be a `char *', not an array.  This
2403     default is `char *'.
2404
2405`-PPREFIX, --prefix=PREFIX, `%option prefix="PREFIX"''
2406     changes the default `yy' prefix used by `flex' for all
2407     globally-visible variable and function names to instead be
2408     `PREFIX'.  For example, `--prefix=foo' changes the name of
2409     `yytext' to `footext'.  It also changes the name of the default
2410     output file from `lex.yy.c' to `lex.foo.c'.  Here is a partial
2411     list of the names affected:
2412
2413
2414              yy_create_buffer
2415              yy_delete_buffer
2416              yy_flex_debug
2417              yy_init_buffer
2418              yy_flush_buffer
2419              yy_load_buffer_state
2420              yy_switch_to_buffer
2421              yyin
2422              yyleng
2423              yylex
2424              yylineno
2425              yyout
2426              yyrestart
2427              yytext
2428              yywrap
2429              yyalloc
2430              yyrealloc
2431              yyfree
2432
2433     (If you are using a C++ scanner, then only `yywrap' and
2434     `yyFlexLexer' are affected.)  Within your scanner itself, you can
2435     still refer to the global variables and functions using either
2436     version of their name; but externally, they have the modified name.
2437
2438     This option lets you easily link together multiple `flex' programs
2439     into the same executable.  Note, though, that using this option
2440     also renames `yywrap()', so you now _must_ either provide your own
2441     (appropriately-named) version of the routine for your scanner, or
2442     use `%option noyywrap', as linking with `-lfl' no longer provides
2443     one for you by default.
2444
2445`--main, `%option main''
2446     directs flex to provide a default `main()' program for the
2447     scanner, which simply calls `yylex()'.  This option implies
2448     `noyywrap' (see below).
2449
2450`--nounistd, `%option nounistd''
2451     suppresses inclusion of the non-ANSI header file `unistd.h'. This
2452     option is meant to target environments in which `unistd.h' does
2453     not exist. Be aware that certain options may cause flex to
2454     generate code that relies on functions normally found in
2455     `unistd.h', (e.g. `isatty()', `read()'.)  If you wish to use these
2456     functions, you will have to inform your compiler where to find
2457     them.  *Note option-always-interactive::. *Note option-read::.
2458
2459`--yyclass=NAME, `%option yyclass="NAME"''
2460     only applies when generating a C++ scanner (the `--c++' option).
2461     It informs `flex' that you have derived `NAME' as a subclass of
2462     `yyFlexLexer', so `flex' will place your actions in the member
2463     function `foo::yylex()' instead of `yyFlexLexer::yylex()'.  It
2464     also generates a `yyFlexLexer::yylex()' member function that emits
2465     a run-time error (by invoking `yyFlexLexer::LexerError())' if
2466     called.  *Note Cxx::.
2467
2468
2469
2470File: flex.info,  Node: Options for Scanner Speed and Size,  Next: Debugging Options,  Prev: Code-Level And API Options,  Up: Scanner Options
2471
247216.4 Options for Scanner Speed and Size
2473=======================================
2474
2475`-C[aefFmr]'
2476     controls the degree of table compression and, more generally,
2477     trade-offs between small scanners and fast scanners.
2478
2479    `-C'
2480          A lone `-C' specifies that the scanner tables should be
2481          compressed but neither equivalence classes nor
2482          meta-equivalence classes should be used.
2483
2484    `-Ca, --align, `%option align''
2485          ("align") instructs flex to trade off larger tables in the
2486          generated scanner for faster performance because the elements
2487          of the tables are better aligned for memory access and
2488          computation.  On some RISC architectures, fetching and
2489          manipulating longwords is more efficient than with
2490          smaller-sized units such as shortwords.  This option can
2491          quadruple the size of the tables used by your scanner.
2492
2493    `-Ce, --ecs, `%option ecs''
2494          directs `flex' to construct "equivalence classes", i.e., sets
2495          of characters which have identical lexical properties (for
2496          example, if the only appearance of digits in the `flex' input
2497          is in the character class "[0-9]" then the digits '0', '1',
2498          ..., '9' will all be put in the same equivalence class).
2499          Equivalence classes usually give dramatic reductions in the
2500          final table/object file sizes (typically a factor of 2-5) and
2501          are pretty cheap performance-wise (one array look-up per
2502          character scanned).
2503
2504    `-Cf'
2505          specifies that the "full" scanner tables should be generated -
2506          `flex' should not compress the tables by taking advantages of
2507          similar transition functions for different states.
2508
2509    `-CF'
2510          specifies that the alternate fast scanner representation
2511          (described above under the `--fast' flag) should be used.
2512          This option cannot be used with `--c++'.
2513
2514    `-Cm, --meta-ecs, `%option meta-ecs''
2515          directs `flex' to construct "meta-equivalence classes", which
2516          are sets of equivalence classes (or characters, if equivalence
2517          classes are not being used) that are commonly used together.
2518          Meta-equivalence classes are often a big win when using
2519          compressed tables, but they have a moderate performance
2520          impact (one or two `if' tests and one array look-up per
2521          character scanned).
2522
2523    `-Cr, --read, `%option read''
2524          causes the generated scanner to _bypass_ use of the standard
2525          I/O library (`stdio') for input.  Instead of calling
2526          `fread()' or `getc()', the scanner will use the `read()'
2527          system call, resulting in a performance gain which varies
2528          from system to system, but in general is probably negligible
2529          unless you are also using `-Cf' or `-CF'.  Using `-Cr' can
2530          cause strange behavior if, for example, you read from `yyin'
2531          using `stdio' prior to calling the scanner (because the
2532          scanner will miss whatever text your previous reads left in
2533          the `stdio' input buffer).  `-Cr' has no effect if you define
2534          `YY_INPUT()' (*note Generated Scanner::).
2535
2536     The options `-Cf' or `-CF' and `-Cm' do not make sense together -
2537     there is no opportunity for meta-equivalence classes if the table
2538     is not being compressed.  Otherwise the options may be freely
2539     mixed, and are cumulative.
2540
2541     The default setting is `-Cem', which specifies that `flex' should
2542     generate equivalence classes and meta-equivalence classes.  This
2543     setting provides the highest degree of table compression.  You can
2544     trade off faster-executing scanners at the cost of larger tables
2545     with the following generally being true:
2546
2547
2548              slowest & smallest
2549                    -Cem
2550                    -Cm
2551                    -Ce
2552                    -C
2553                    -C{f,F}e
2554                    -C{f,F}
2555                    -C{f,F}a
2556              fastest & largest
2557
2558     Note that scanners with the smallest tables are usually generated
2559     and compiled the quickest, so during development you will usually
2560     want to use the default, maximal compression.
2561
2562     `-Cfe' is often a good compromise between speed and size for
2563     production scanners.
2564
2565`-f, --full, `%option full''
2566     specifies "fast scanner".  No table compression is done and
2567     `stdio' is bypassed.  The result is large but fast.  This option
2568     is equivalent to `--Cfr'
2569
2570`-F, --fast, `%option fast''
2571     specifies that the _fast_ scanner table representation should be
2572     used (and `stdio' bypassed).  This representation is about as fast
2573     as the full table representation `--full', and for some sets of
2574     patterns will be considerably smaller (and for others, larger).  In
2575     general, if the pattern set contains both _keywords_ and a
2576     catch-all, _identifier_ rule, such as in the set:
2577
2578
2579              "case"    return TOK_CASE;
2580              "switch"  return TOK_SWITCH;
2581              ...
2582              "default" return TOK_DEFAULT;
2583              [a-z]+    return TOK_ID;
2584
2585     then you're better off using the full table representation.  If
2586     only the _identifier_ rule is present and you then use a hash
2587     table or some such to detect the keywords, you're better off using
2588     `--fast'.
2589
2590     This option is equivalent to `-CFr'.  It cannot be used with
2591     `--c++'.
2592
2593
2594
2595File: flex.info,  Node: Debugging Options,  Next: Miscellaneous Options,  Prev: Options for Scanner Speed and Size,  Up: Scanner Options
2596
259716.5 Debugging Options
2598======================
2599
2600`-b, --backup, `%option backup''
2601     Generate backing-up information to `lex.backup'.  This is a list of
2602     scanner states which require backing up and the input characters on
2603     which they do so.  By adding rules one can remove backing-up
2604     states.  If _all_ backing-up states are eliminated and `-Cf' or
2605     `-CF' is used, the generated scanner will run faster (see the
2606     `--perf-report' flag).  Only users who wish to squeeze every last
2607     cycle out of their scanners need worry about this option.  (*note
2608     Performance::).
2609
2610`-d, --debug, `%option debug''
2611     makes the generated scanner run in "debug" mode.  Whenever a
2612     pattern is recognized and the global variable `yy_flex_debug' is
2613     non-zero (which is the default), the scanner will write to
2614     `stderr' a line of the form:
2615
2616
2617              -accepting rule at line 53 ("the matched text")
2618
2619     The line number refers to the location of the rule in the file
2620     defining the scanner (i.e., the file that was fed to flex).
2621     Messages are also generated when the scanner backs up, accepts the
2622     default rule, reaches the end of its input buffer (or encounters a
2623     NUL; at this point, the two look the same as far as the scanner's
2624     concerned), or reaches an end-of-file.
2625
2626`-p, --perf-report, `%option perf-report''
2627     generates a performance report to `stderr'.  The report consists of
2628     comments regarding features of the `flex' input file which will
2629     cause a serious loss of performance in the resulting scanner.  If
2630     you give the flag twice, you will also get comments regarding
2631     features that lead to minor performance losses.
2632
2633     Note that the use of `REJECT', and variable trailing context
2634     (*note Limitations::) entails a substantial performance penalty;
2635     use of `yymore()', the `^' operator, and the `--interactive' flag
2636     entail minor performance penalties.
2637
2638`-s, --nodefault, `%option nodefault''
2639     causes the _default rule_ (that unmatched scanner input is echoed
2640     to `stdout)' to be suppressed.  If the scanner encounters input
2641     that does not match any of its rules, it aborts with an error.
2642     This option is useful for finding holes in a scanner's rule set.
2643
2644`-T, --trace, `%option trace''
2645     makes `flex' run in "trace" mode.  It will generate a lot of
2646     messages to `stderr' concerning the form of the input and the
2647     resultant non-deterministic and deterministic finite automata.
2648     This option is mostly for use in maintaining `flex'.
2649
2650`-w, --nowarn, `%option nowarn''
2651     suppresses warning messages.
2652
2653`-v, --verbose, `%option verbose''
2654     specifies that `flex' should write to `stderr' a summary of
2655     statistics regarding the scanner it generates.  Most of the
2656     statistics are meaningless to the casual `flex' user, but the
2657     first line identifies the version of `flex' (same as reported by
2658     `--version'), and the next line the flags used when generating the
2659     scanner, including those that are on by default.
2660
2661`--warn, `%option warn''
2662     warn about certain things. In particular, if the default rule can
2663     be matched but no default rule has been given, the flex will warn
2664     you.  We recommend using this option always.
2665
2666
2667
2668File: flex.info,  Node: Miscellaneous Options,  Prev: Debugging Options,  Up: Scanner Options
2669
267016.6 Miscellaneous Options
2671==========================
2672
2673`-c'
2674     A do-nothing option included for POSIX compliance.
2675
2676`-h, -?, --help'
2677     generates a "help" summary of `flex''s options to `stdout' and
2678     then exits.
2679
2680`-n'
2681     Another do-nothing option included for POSIX compliance.
2682
2683`-V, --version'
2684     prints the version number to `stdout' and exits.
2685
2686
2687
2688File: flex.info,  Node: Performance,  Next: Cxx,  Prev: Scanner Options,  Up: Top
2689
269017 Performance Considerations
2691*****************************
2692
2693The main design goal of `flex' is that it generate high-performance
2694scanners.  It has been optimized for dealing well with large sets of
2695rules.  Aside from the effects on scanner speed of the table compression
2696`-C' options outlined above, there are a number of options/actions
2697which degrade performance.  These are, from most expensive to least:
2698
2699
2700         REJECT
2701         arbitrary trailing context
2702
2703         pattern sets that require backing up
2704         %option yylineno
2705         %array
2706
2707         %option interactive
2708         %option always-interactive
2709
2710         @samp{^} beginning-of-line operator
2711         yymore()
2712
2713   with the first two all being quite expensive and the last two being
2714quite cheap.  Note also that `unput()' is implemented as a routine call
2715that potentially does quite a bit of work, while `yyless()' is a
2716quite-cheap macro. So if you are just putting back some excess text you
2717scanned, use `yyless()'.
2718
2719   `REJECT' should be avoided at all costs when performance is
2720important.  It is a particularly expensive option.
2721
2722   There is one case when `%option yylineno' can be expensive. That is
2723when your patterns match long tokens that could _possibly_ contain a
2724newline character. There is no performance penalty for rules that can
2725not possibly match newlines, since flex does not need to check them for
2726newlines.  In general, you should avoid rules such as `[^f]+', which
2727match very long tokens, including newlines, and may possibly match your
2728entire file! A better approach is to separate `[^f]+' into two rules:
2729
2730
2731     %option yylineno
2732     %%
2733         [^f\n]+
2734         \n+
2735
2736   The above scanner does not incur a performance penalty.
2737
2738   Getting rid of backing up is messy and often may be an enormous
2739amount of work for a complicated scanner.  In principal, one begins by
2740using the `-b' flag to generate a `lex.backup' file.  For example, on
2741the input:
2742
2743
2744         %%
2745         foo        return TOK_KEYWORD;
2746         foobar     return TOK_KEYWORD;
2747
2748   the file looks like:
2749
2750
2751         State #6 is non-accepting -
2752          associated rule line numbers:
2753                2       3
2754          out-transitions: [ o ]
2755          jam-transitions: EOF [ \001-n  p-\177 ]
2756
2757         State #8 is non-accepting -
2758          associated rule line numbers:
2759                3
2760          out-transitions: [ a ]
2761          jam-transitions: EOF [ \001-`  b-\177 ]
2762
2763         State #9 is non-accepting -
2764          associated rule line numbers:
2765                3
2766          out-transitions: [ r ]
2767          jam-transitions: EOF [ \001-q  s-\177 ]
2768
2769         Compressed tables always back up.
2770
2771   The first few lines tell us that there's a scanner state in which it
2772can make a transition on an 'o' but not on any other character, and
2773that in that state the currently scanned text does not match any rule.
2774The state occurs when trying to match the rules found at lines 2 and 3
2775in the input file.  If the scanner is in that state and then reads
2776something other than an 'o', it will have to back up to find a rule
2777which is matched.  With a bit of headscratching one can see that this
2778must be the state it's in when it has seen `fo'.  When this has
2779happened, if anything other than another `o' is seen, the scanner will
2780have to back up to simply match the `f' (by the default rule).
2781
2782   The comment regarding State #8 indicates there's a problem when
2783`foob' has been scanned.  Indeed, on any character other than an `a',
2784the scanner will have to back up to accept "foo".  Similarly, the
2785comment for State #9 concerns when `fooba' has been scanned and an `r'
2786does not follow.
2787
2788   The final comment reminds us that there's no point going to all the
2789trouble of removing backing up from the rules unless we're using `-Cf'
2790or `-CF', since there's no performance gain doing so with compressed
2791scanners.
2792
2793   The way to remove the backing up is to add "error" rules:
2794
2795
2796         %%
2797         foo         return TOK_KEYWORD;
2798         foobar      return TOK_KEYWORD;
2799
2800         fooba       |
2801         foob        |
2802         fo          {
2803                     /* false alarm, not really a keyword */
2804                     return TOK_ID;
2805                     }
2806
2807   Eliminating backing up among a list of keywords can also be done
2808using a "catch-all" rule:
2809
2810
2811         %%
2812         foo         return TOK_KEYWORD;
2813         foobar      return TOK_KEYWORD;
2814
2815         [a-z]+      return TOK_ID;
2816
2817   This is usually the best solution when appropriate.
2818
2819   Backing up messages tend to cascade.  With a complicated set of rules
2820it's not uncommon to get hundreds of messages.  If one can decipher
2821them, though, it often only takes a dozen or so rules to eliminate the
2822backing up (though it's easy to make a mistake and have an error rule
2823accidentally match a valid token.  A possible future `flex' feature
2824will be to automatically add rules to eliminate backing up).
2825
2826   It's important to keep in mind that you gain the benefits of
2827eliminating backing up only if you eliminate _every_ instance of
2828backing up.  Leaving just one means you gain nothing.
2829
2830   _Variable_ trailing context (where both the leading and trailing
2831parts do not have a fixed length) entails almost the same performance
2832loss as `REJECT' (i.e., substantial).  So when possible a rule like:
2833
2834
2835         %%
2836         mouse|rat/(cat|dog)   run();
2837
2838   is better written:
2839
2840
2841         %%
2842         mouse/cat|dog         run();
2843         rat/cat|dog           run();
2844
2845   or as
2846
2847
2848         %%
2849         mouse|rat/cat         run();
2850         mouse|rat/dog         run();
2851
2852   Note that here the special '|' action does _not_ provide any
2853savings, and can even make things worse (*note Limitations::).
2854
2855   Another area where the user can increase a scanner's performance (and
2856one that's easier to implement) arises from the fact that the longer the
2857tokens matched, the faster the scanner will run.  This is because with
2858long tokens the processing of most input characters takes place in the
2859(short) inner scanning loop, and does not often have to go through the
2860additional work of setting up the scanning environment (e.g., `yytext')
2861for the action.  Recall the scanner for C comments:
2862
2863
2864         %x comment
2865         %%
2866                 int line_num = 1;
2867
2868         "/*"         BEGIN(comment);
2869
2870         <comment>[^*\n]*
2871         <comment>"*"+[^*/\n]*
2872         <comment>\n             ++line_num;
2873         <comment>"*"+"/"        BEGIN(INITIAL);
2874
2875   This could be sped up by writing it as:
2876
2877
2878         %x comment
2879         %%
2880                 int line_num = 1;
2881
2882         "/*"         BEGIN(comment);
2883
2884         <comment>[^*\n]*
2885         <comment>[^*\n]*\n      ++line_num;
2886         <comment>"*"+[^*/\n]*
2887         <comment>"*"+[^*/\n]*\n ++line_num;
2888         <comment>"*"+"/"        BEGIN(INITIAL);
2889
2890   Now instead of each newline requiring the processing of another
2891action, recognizing the newlines is distributed over the other rules to
2892keep the matched text as long as possible.  Note that _adding_ rules
2893does _not_ slow down the scanner!  The speed of the scanner is
2894independent of the number of rules or (modulo the considerations given
2895at the beginning of this section) how complicated the rules are with
2896regard to operators such as `*' and `|'.
2897
2898   A final example in speeding up a scanner: suppose you want to scan
2899through a file containing identifiers and keywords, one per line and
2900with no other extraneous characters, and recognize all the keywords.  A
2901natural first approach is:
2902
2903
2904         %%
2905         asm      |
2906         auto     |
2907         break    |
2908         ... etc ...
2909         volatile |
2910         while    /* it's a keyword */
2911
2912         .|\n     /* it's not a keyword */
2913
2914   To eliminate the back-tracking, introduce a catch-all rule:
2915
2916
2917         %%
2918         asm      |
2919         auto     |
2920         break    |
2921         ... etc ...
2922         volatile |
2923         while    /* it's a keyword */
2924
2925         [a-z]+   |
2926         .|\n     /* it's not a keyword */
2927
2928   Now, if it's guaranteed that there's exactly one word per line, then
2929we can reduce the total number of matches by a half by merging in the
2930recognition of newlines with that of the other tokens:
2931
2932
2933         %%
2934         asm\n    |
2935         auto\n   |
2936         break\n  |
2937         ... etc ...
2938         volatile\n |
2939         while\n  /* it's a keyword */
2940
2941         [a-z]+\n |
2942         .|\n     /* it's not a keyword */
2943
2944   One has to be careful here, as we have now reintroduced backing up
2945into the scanner.  In particular, while _we_ know that there will never
2946be any characters in the input stream other than letters or newlines,
2947`flex' can't figure this out, and it will plan for possibly needing to
2948back up when it has scanned a token like `auto' and then the next
2949character is something other than a newline or a letter.  Previously it
2950would then just match the `auto' rule and be done, but now it has no
2951`auto' rule, only a `auto\n' rule.  To eliminate the possibility of
2952backing up, we could either duplicate all rules but without final
2953newlines, or, since we never expect to encounter such an input and
2954therefore don't how it's classified, we can introduce one more
2955catch-all rule, this one which doesn't include a newline:
2956
2957
2958         %%
2959         asm\n    |
2960         auto\n   |
2961         break\n  |
2962         ... etc ...
2963         volatile\n |
2964         while\n  /* it's a keyword */
2965
2966         [a-z]+\n |
2967         [a-z]+   |
2968         .|\n     /* it's not a keyword */
2969
2970   Compiled with `-Cf', this is about as fast as one can get a `flex'
2971scanner to go for this particular problem.
2972
2973   A final note: `flex' is slow when matching `NUL's, particularly when
2974a token contains multiple `NUL's.  It's best to write rules which match
2975_short_ amounts of text if it's anticipated that the text will often
2976include `NUL's.
2977
2978   Another final note regarding performance: as mentioned in *Note
2979Matching::, dynamically resizing `yytext' to accommodate huge tokens is
2980a slow process because it presently requires that the (huge) token be
2981rescanned from the beginning.  Thus if performance is vital, you should
2982attempt to match "large" quantities of text but not "huge" quantities,
2983where the cutoff between the two is at about 8K characters per token.
2984
2985
2986File: flex.info,  Node: Cxx,  Next: Reentrant,  Prev: Performance,  Up: Top
2987
298818 Generating C++ Scanners
2989**************************
2990
2991*IMPORTANT*: the present form of the scanning class is _experimental_
2992and may change considerably between major releases.
2993
2994   `flex' provides two different ways to generate scanners for use with
2995C++.  The first way is to simply compile a scanner generated by `flex'
2996using a C++ compiler instead of a C compiler.  You should not encounter
2997any compilation errors (*note Reporting Bugs::).  You can then use C++
2998code in your rule actions instead of C code.  Note that the default
2999input source for your scanner remains `yyin', and default echoing is
3000still done to `yyout'.  Both of these remain `FILE *' variables and not
3001C++ _streams_.
3002
3003   You can also use `flex' to generate a C++ scanner class, using the
3004`-+' option (or, equivalently, `%option c++)', which is automatically
3005specified if the name of the `flex' executable ends in a '+', such as
3006`flex++'.  When using this option, `flex' defaults to generating the
3007scanner to the file `lex.yy.cc' instead of `lex.yy.c'.  The generated
3008scanner includes the header file `FlexLexer.h', which defines the
3009interface to two C++ classes.
3010
3011   The first class, `FlexLexer', provides an abstract base class
3012defining the general scanner class interface.  It provides the
3013following member functions:
3014
3015`const char* YYText()'
3016     returns the text of the most recently matched token, the
3017     equivalent of `yytext'.
3018
3019`int YYLeng()'
3020     returns the length of the most recently matched token, the
3021     equivalent of `yyleng'.
3022
3023`int lineno() const'
3024     returns the current input line number (see `%option yylineno)', or
3025     `1' if `%option yylineno' was not used.
3026
3027`void set_debug( int flag )'
3028     sets the debugging flag for the scanner, equivalent to assigning to
3029     `yy_flex_debug' (*note Scanner Options::).  Note that you must
3030     build the scanner using `%option debug' to include debugging
3031     information in it.
3032
3033`int debug() const'
3034     returns the current setting of the debugging flag.
3035
3036   Also provided are member functions equivalent to
3037`yy_switch_to_buffer()', `yy_create_buffer()' (though the first
3038argument is an `istream*' object pointer and not a `FILE*)',
3039`yy_flush_buffer()', `yy_delete_buffer()', and `yyrestart()' (again,
3040the first argument is a `istream*' object pointer).
3041
3042   The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
3043derived from `FlexLexer'.  It defines the following additional member
3044functions:
3045
3046`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
3047     constructs a `yyFlexLexer' object using the given streams for input
3048     and output.  If not specified, the streams default to `cin' and
3049     `cout', respectively.
3050
3051`virtual int yylex()'
3052     performs the same role is `yylex()' does for ordinary `flex'
3053     scanners: it scans the input stream, consuming tokens, until a
3054     rule's action returns a value.  If you derive a subclass `S' from
3055     `yyFlexLexer' and want to access the member functions and variables
3056     of `S' inside `yylex()', then you need to use `%option
3057     yyclass="S"' to inform `flex' that you will be using that subclass
3058     instead of `yyFlexLexer'.  In this case, rather than generating
3059     `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
3060     generates a dummy `yyFlexLexer::yylex()' that calls
3061     `yyFlexLexer::LexerError()' if called).
3062
3063`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
3064     reassigns `yyin' to `new_in' (if non-null) and `yyout' to
3065     `new_out' (if non-null), deleting the previous input buffer if
3066     `yyin' is reassigned.
3067
3068`int yylex( istream* new_in, ostream* new_out = 0 )'
3069     first switches the input streams via `switch_streams( new_in,
3070     new_out )' and then returns the value of `yylex()'.
3071
3072   In addition, `yyFlexLexer' defines the following protected virtual
3073functions which you can redefine in derived classes to tailor the
3074scanner:
3075
3076`virtual int LexerInput( char* buf, int max_size )'
3077     reads up to `max_size' characters into `buf' and returns the
3078     number of characters read.  To indicate end-of-input, return 0
3079     characters.  Note that `interactive' scanners (see the `-B' and
3080     `-I' flags in *Note Scanner Options::) define the macro
3081     `YY_INTERACTIVE'.  If you redefine `LexerInput()' and need to take
3082     different actions depending on whether or not the scanner might be
3083     scanning an interactive input source, you can test for the
3084     presence of this name via `#ifdef' statements.
3085
3086`virtual void LexerOutput( const char* buf, int size )'
3087     writes out `size' characters from the buffer `buf', which, while
3088     `NUL'-terminated, may also contain internal `NUL's if the
3089     scanner's rules can match text with `NUL's in them.
3090
3091`virtual void LexerError( const char* msg )'
3092     reports a fatal error message.  The default version of this
3093     function writes the message to the stream `cerr' and exits.
3094
3095   Note that a `yyFlexLexer' object contains its _entire_ scanning
3096state.  Thus you can use such objects to create reentrant scanners, but
3097see also *Note Reentrant::.  You can instantiate multiple instances of
3098the same `yyFlexLexer' class, and you can also combine multiple C++
3099scanner classes together in the same program using the `-P' option
3100discussed above.
3101
3102   Finally, note that the `%array' feature is not available to C++
3103scanner classes; you must use `%pointer' (the default).
3104
3105   Here is an example of a simple C++ scanner:
3106
3107
3108             // An example of using the flex C++ scanner class.
3109
3110         %{
3111         int mylineno = 0;
3112         %}
3113
3114         string  \"[^\n"]+\"
3115
3116         ws      [ \t]+
3117
3118         alpha   [A-Za-z]
3119         dig     [0-9]
3120         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
3121         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
3122         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
3123         number  {num1}|{num2}
3124
3125         %%
3126
3127         {ws}    /* skip blanks and tabs */
3128
3129         "/*"    {
3130                 int c;
3131
3132                 while((c = yyinput()) != 0)
3133                     {
3134                     if(c == '\n')
3135                         ++mylineno;
3136
3137                     else if(c == @samp{*})
3138                         {
3139                         if((c = yyinput()) == '/')
3140                             break;
3141                         else
3142                             unput(c);
3143                         }
3144                     }
3145                 }
3146
3147         {number}  cout  "number "  YYText()  '\n';
3148
3149         \n        mylineno++;
3150
3151         {name}    cout  "name "  YYText()  '\n';
3152
3153         {string}  cout  "string "  YYText()  '\n';
3154
3155         %%
3156
3157         int main( int /* argc */, char** /* argv */ )
3158             {
3159             @code{flex}Lexer* lexer = new yyFlexLexer;
3160             while(lexer->yylex() != 0)
3161                 ;
3162             return 0;
3163             }
3164
3165   If you want to create multiple (different) lexer classes, you use the
3166`-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
3167some other `xxFlexLexer'.  You then can include `<FlexLexer.h>' in your
3168other sources once per lexer class, first renaming `yyFlexLexer' as
3169follows:
3170
3171
3172         #undef yyFlexLexer
3173         #define yyFlexLexer xxFlexLexer
3174         #include <FlexLexer.h>
3175
3176         #undef yyFlexLexer
3177         #define yyFlexLexer zzFlexLexer
3178         #include <FlexLexer.h>
3179
3180   if, for example, you used `%option prefix="xx"' for one of your
3181scanners and `%option prefix="zz"' for the other.
3182
3183
3184File: flex.info,  Node: Reentrant,  Next: Lex and Posix,  Prev: Cxx,  Up: Top
3185
318619 Reentrant C Scanners
3187***********************
3188
3189`flex' has the ability to generate a reentrant C scanner. This is
3190accomplished by specifying `%option reentrant' (`-R') The generated
3191scanner is both portable, and safe to use in one or more separate
3192threads of control.  The most common use for reentrant scanners is from
3193within multi-threaded applications.  Any thread may create and execute
3194a reentrant `flex' scanner without the need for synchronization with
3195other threads.
3196
3197* Menu:
3198
3199* Reentrant Uses::
3200* Reentrant Overview::
3201* Reentrant Example::
3202* Reentrant Detail::
3203* Reentrant Functions::
3204
3205
3206File: flex.info,  Node: Reentrant Uses,  Next: Reentrant Overview,  Prev: Reentrant,  Up: Reentrant
3207
320819.1 Uses for Reentrant Scanners
3209================================
3210
3211However, there are other uses for a reentrant scanner.  For example, you
3212could scan two or more files simultaneously to implement a `diff' at
3213the token level (i.e., instead of at the character level):
3214
3215
3216         /* Example of maintaining more than one active scanner. */
3217
3218         do {
3219             int tok1, tok2;
3220
3221             tok1 = yylex( scanner_1 );
3222             tok2 = yylex( scanner_2 );
3223
3224             if( tok1 != tok2 )
3225                 printf("Files are different.");
3226
3227        } while ( tok1 && tok2 );
3228
3229   Another use for a reentrant scanner is recursion.  (Note that a
3230recursive scanner can also be created using a non-reentrant scanner and
3231buffer states. *Note Multiple Input Buffers::.)
3232
3233   The following crude scanner supports the `eval' command by invoking
3234another instance of itself.
3235
3236
3237         /* Example of recursive invocation. */
3238
3239         %option reentrant
3240
3241         %%
3242         "eval(".+")"  {
3243                           yyscan_t scanner;
3244                           YY_BUFFER_STATE buf;
3245
3246                           yylex_init( &scanner );
3247                           yytext[yyleng-1] = ' ';
3248
3249                           buf = yy_scan_string( yytext + 5, scanner );
3250                           yylex( scanner );
3251
3252                           yy_delete_buffer(buf,scanner);
3253                           yylex_destroy( scanner );
3254                      }
3255         ...
3256         %%
3257
3258
3259File: flex.info,  Node: Reentrant Overview,  Next: Reentrant Example,  Prev: Reentrant Uses,  Up: Reentrant
3260
326119.2 An Overview of the Reentrant API
3262=====================================
3263
3264The API for reentrant scanners is different than for non-reentrant
3265scanners. Here is a quick overview of the API:
3266
3267     `%option reentrant' must be specified.
3268
3269   * All functions take one additional argument: `yyscanner'
3270
3271   * All global variables are replaced by their macro equivalents.  (We
3272     tell you this because it may be important to you during debugging.)
3273
3274   * `yylex_init' and `yylex_destroy' must be called before and after
3275     `yylex', respectively.
3276
3277   * Accessor methods (get/set functions) provide access to common
3278     `flex' variables.
3279
3280   * User-specific data can be stored in `yyextra'.
3281
3282
3283File: flex.info,  Node: Reentrant Example,  Next: Reentrant Detail,  Prev: Reentrant Overview,  Up: Reentrant
3284
328519.3 Reentrant Example
3286======================
3287
3288First, an example of a reentrant scanner:
3289
3290         /* This scanner prints "//" comments. */
3291
3292         %option reentrant stack noyywrap
3293         %x COMMENT
3294
3295         %%
3296
3297         "//"                 yy_push_state( COMMENT, yyscanner);
3298         .|\n
3299
3300         <COMMENT>\n          yy_pop_state( yyscanner );
3301         <COMMENT>[^\n]+      fprintf( yyout, "%s\n", yytext);
3302
3303         %%
3304
3305         int main ( int argc, char * argv[] )
3306         {
3307             yyscan_t scanner;
3308
3309             yylex_init ( &scanner );
3310             yylex ( scanner );
3311             yylex_destroy ( scanner );
3312         return 0;
3313        }
3314
3315
3316File: flex.info,  Node: Reentrant Detail,  Next: Reentrant Functions,  Prev: Reentrant Example,  Up: Reentrant
3317
331819.4 The Reentrant API in Detail
3319================================
3320
3321Here are the things you need to do or know to use the reentrant C API of
3322`flex'.
3323
3324* Menu:
3325
3326* Specify Reentrant::
3327* Extra Reentrant Argument::
3328* Global Replacement::
3329* Init and Destroy Functions::
3330* Accessor Methods::
3331* Extra Data::
3332* About yyscan_t::
3333
3334
3335File: flex.info,  Node: Specify Reentrant,  Next: Extra Reentrant Argument,  Prev: Reentrant Detail,  Up: Reentrant Detail
3336
333719.4.1 Declaring a Scanner As Reentrant
3338---------------------------------------
3339
3340%option reentrant (-reentrant) must be specified.
3341
3342   Notice that `%option reentrant' is specified in the above example
3343(*note Reentrant Example::. Had this option not been specified, `flex'
3344would have happily generated a non-reentrant scanner without
3345complaining. You may explicitly specify `%option noreentrant', if you
3346do _not_ want a reentrant scanner, although it is not necessary. The
3347default is to generate a non-reentrant scanner.
3348
3349
3350File: flex.info,  Node: Extra Reentrant Argument,  Next: Global Replacement,  Prev: Specify Reentrant,  Up: Reentrant Detail
3351
335219.4.2 The Extra Argument
3353-------------------------
3354
3355All functions take one additional argument: `yyscanner'.
3356
3357   Notice that the calls to `yy_push_state' and `yy_pop_state' both
3358have an argument, `yyscanner' , that is not present in a non-reentrant
3359scanner.  Here are the declarations of `yy_push_state' and
3360`yy_pop_state' in the reentrant scanner:
3361
3362
3363         static void yy_push_state  ( int new_state , yyscan_t yyscanner ) ;
3364         static void yy_pop_state  ( yyscan_t yyscanner  ) ;
3365
3366   Notice that the argument `yyscanner' appears in the declaration of
3367both functions.  In fact, all `flex' functions in a reentrant scanner
3368have this additional argument.  It is always the last argument in the
3369argument list, it is always of type `yyscan_t' (which is typedef'd to
3370`void *') and it is always named `yyscanner'.  As you may have guessed,
3371`yyscanner' is a pointer to an opaque data structure encapsulating the
3372current state of the scanner.  For a list of function declarations, see
3373*Note Reentrant Functions::. Note that preprocessor macros, such as
3374`BEGIN', `ECHO', and `REJECT', do not take this additional argument.
3375
3376
3377File: flex.info,  Node: Global Replacement,  Next: Init and Destroy Functions,  Prev: Extra Reentrant Argument,  Up: Reentrant Detail
3378
337919.4.3 Global Variables Replaced By Macros
3380------------------------------------------
3381
3382All global variables in traditional flex have been replaced by macro
3383equivalents.
3384
3385   Note that in the above example, `yyout' and `yytext' are not plain
3386variables. These are macros that will expand to their equivalent lvalue.
3387All of the familiar `flex' globals have been replaced by their macro
3388equivalents. In particular, `yytext', `yyleng', `yylineno', `yyin',
3389`yyout', `yyextra', `yylval', and `yylloc' are macros. You may safely
3390use these macros in actions as if they were plain variables. We only
3391tell you this so you don't expect to link to these variables
3392externally. Currently, each macro expands to a member of an internal
3393struct, e.g.,
3394
3395
3396     #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)
3397
3398   One important thing to remember about `yytext' and friends is that
3399`yytext' is not a global variable in a reentrant scanner, you can not
3400access it directly from outside an action or from other functions. You
3401must use an accessor method, e.g., `yyget_text', to accomplish this.
3402(See below).
3403
3404
3405File: flex.info,  Node: Init and Destroy Functions,  Next: Accessor Methods,  Prev: Global Replacement,  Up: Reentrant Detail
3406
340719.4.4 Init and Destroy Functions
3408---------------------------------
3409
3410`yylex_init' and `yylex_destroy' must be called before and after
3411`yylex', respectively.
3412
3413
3414         int yylex_init ( yyscan_t * ptr_yy_globals ) ;
3415         int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ;
3416         int yylex ( yyscan_t yyscanner ) ;
3417         int yylex_destroy ( yyscan_t yyscanner ) ;
3418
3419   The function `yylex_init' must be called before calling any other
3420function. The argument to `yylex_init' is the address of an
3421uninitialized pointer to be filled in by `yylex_init', overwriting any
3422previous contents. The function `yylex_init_extra' may be used instead,
3423taking as its first argument a variable of type `YY_EXTRA_TYPE'.  See
3424the section on yyextra, below, for more details.
3425
3426   The value stored in `ptr_yy_globals' should thereafter be passed to
3427`yylex' and `yylex_destroy'.  Flex does not save the argument passed to
3428`yylex_init', so it is safe to pass the address of a local pointer to
3429`yylex_init' so long as it remains in scope for the duration of all
3430calls to the scanner, up to and including the call to `yylex_destroy'.
3431
3432   The function `yylex' should be familiar to you by now. The reentrant
3433version takes one argument, which is the value returned (via an
3434argument) by `yylex_init'.  Otherwise, it behaves the same as the
3435non-reentrant version of `yylex'.
3436
3437   Both `yylex_init' and `yylex_init_extra' returns 0 (zero) on success,
3438or non-zero on failure, in which case errno is set to one of the
3439following values:
3440
3441   * ENOMEM Memory allocation error. *Note memory-management::.
3442
3443   * EINVAL Invalid argument.
3444
3445   The function `yylex_destroy' should be called to free resources used
3446by the scanner. After `yylex_destroy' is called, the contents of
3447`yyscanner' should not be used.  Of course, there is no need to destroy
3448a scanner if you plan to reuse it.  A `flex' scanner (both reentrant
3449and non-reentrant) may be restarted by calling `yyrestart'.
3450
3451   Below is an example of a program that creates a scanner, uses it,
3452then destroys it when done:
3453
3454
3455         int main ()
3456         {
3457             yyscan_t scanner;
3458             int tok;
3459
3460             yylex_init(&scanner);
3461
3462             while ((tok=yylex()) > 0)
3463                 printf("tok=%d  yytext=%s\n", tok, yyget_text(scanner));
3464
3465             yylex_destroy(scanner);
3466             return 0;
3467         }
3468
3469
3470File: flex.info,  Node: Accessor Methods,  Next: Extra Data,  Prev: Init and Destroy Functions,  Up: Reentrant Detail
3471
347219.4.5 Accessing Variables with Reentrant Scanners
3473--------------------------------------------------
3474
3475Accessor methods (get/set functions) provide access to common `flex'
3476variables.
3477
3478   Many scanners that you build will be part of a larger project.
3479Portions of your project will need access to `flex' values, such as
3480`yytext'.  In a non-reentrant scanner, these values are global, so
3481there is no problem accessing them. However, in a reentrant scanner,
3482there are no global `flex' values. You can not access them directly.
3483Instead, you must access `flex' values using accessor methods (get/set
3484functions). Each accessor method is named `yyget_NAME' or `yyset_NAME',
3485where `NAME' is the name of the `flex' variable you want. For example:
3486
3487
3488         /* Set the last character of yytext to NULL. */
3489         void chop ( yyscan_t scanner )
3490         {
3491             int len = yyget_leng( scanner );
3492             yyget_text( scanner )[len - 1] = '\0';
3493         }
3494
3495   The above code may be called from within an action like this:
3496
3497
3498         %%
3499         .+\n    { chop( yyscanner );}
3500
3501   You may find that `%option header-file' is particularly useful for
3502generating prototypes of all the accessor functions. *Note
3503option-header::.
3504
3505
3506File: flex.info,  Node: Extra Data,  Next: About yyscan_t,  Prev: Accessor Methods,  Up: Reentrant Detail
3507
350819.4.6 Extra Data
3509-----------------
3510
3511User-specific data can be stored in `yyextra'.
3512
3513   In a reentrant scanner, it is unwise to use global variables to
3514communicate with or maintain state between different pieces of your
3515program.  However, you may need access to external data or invoke
3516external functions from within the scanner actions.  Likewise, you may
3517need to pass information to your scanner (e.g., open file descriptors,
3518or database connections).  In a non-reentrant scanner, the only way to
3519do this would be through the use of global variables.  `Flex' allows
3520you to store arbitrary, "extra" data in a scanner.  This data is
3521accessible through the accessor methods `yyget_extra' and `yyset_extra'
3522from outside the scanner, and through the shortcut macro `yyextra' from
3523within the scanner itself. They are defined as follows:
3524
3525
3526         #define YY_EXTRA_TYPE  void*
3527         YY_EXTRA_TYPE  yyget_extra ( yyscan_t scanner );
3528         void           yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);
3529
3530   In addition, an extra form of `yylex_init' is provided,
3531`yylex_init_extra'. This function is provided so that the yyextra value
3532can be accessed from within the very first yyalloc, used to allocate
3533the scanner itself.
3534
3535   By default, `YY_EXTRA_TYPE' is defined as type `void *'.  You may
3536redefine this type using `%option extra-type="your_type"' in the
3537scanner:
3538
3539
3540         /* An example of overriding YY_EXTRA_TYPE. */
3541         %{
3542         #include <sys/stat.h>
3543         #include <unistd.h>
3544         %}
3545         %option reentrant
3546         %option extra-type="struct stat *"
3547         %%
3548
3549         __filesize__     printf( "%ld", yyextra->st_size  );
3550         __lastmod__      printf( "%ld", yyextra->st_mtime );
3551         %%
3552         void scan_file( char* filename )
3553         {
3554             yyscan_t scanner;
3555             struct stat buf;
3556             FILE *in;
3557
3558             in = fopen( filename, "r" );
3559             stat( filename, &buf );
3560
3561             yylex_init_extra( buf, &scanner );
3562             yyset_in( in, scanner );
3563             yylex( scanner );
3564             yylex_destroy( scanner );
3565
3566             fclose( in );
3567        }
3568
3569
3570File: flex.info,  Node: About yyscan_t,  Prev: Extra Data,  Up: Reentrant Detail
3571
357219.4.7 About yyscan_t
3573---------------------
3574
3575`yyscan_t' is defined as:
3576
3577
3578          typedef void* yyscan_t;
3579
3580   It is initialized by `yylex_init()' to point to an internal
3581structure. You should never access this value directly. In particular,
3582you should never attempt to free it (use `yylex_destroy()' instead.)
3583
3584
3585File: flex.info,  Node: Reentrant Functions,  Prev: Reentrant Detail,  Up: Reentrant
3586
358719.5 Functions and Macros Available in Reentrant C Scanners
3588===========================================================
3589
3590The following Functions are available in a reentrant scanner:
3591
3592
3593         char *yyget_text ( yyscan_t scanner );
3594         int yyget_leng ( yyscan_t scanner );
3595         FILE *yyget_in ( yyscan_t scanner );
3596         FILE *yyget_out ( yyscan_t scanner );
3597         int yyget_lineno ( yyscan_t scanner );
3598         YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
3599         int  yyget_debug ( yyscan_t scanner );
3600
3601         void yyset_debug ( int flag, yyscan_t scanner );
3602         void yyset_in  ( FILE * in_str , yyscan_t scanner );
3603         void yyset_out  ( FILE * out_str , yyscan_t scanner );
3604         void yyset_lineno ( int line_number , yyscan_t scanner );
3605         void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );
3606
3607   There are no "set" functions for yytext and yyleng. This is
3608intentional.
3609
3610   The following Macro shortcuts are available in actions in a reentrant
3611scanner:
3612
3613
3614         yytext
3615         yyleng
3616         yyin
3617         yyout
3618         yylineno
3619         yyextra
3620         yy_flex_debug
3621
3622   In a reentrant C scanner, support for yylineno is always present
3623(i.e., you may access yylineno), but the value is never modified by
3624`flex' unless `%option yylineno' is enabled. This is to allow the user
3625to maintain the line count independently of `flex'.
3626
3627   The following functions and macros are made available when `%option
3628bison-bridge' (`--bison-bridge') is specified:
3629
3630
3631         YYSTYPE * yyget_lval ( yyscan_t scanner );
3632         void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
3633         yylval
3634
3635   The following functions and macros are made available when `%option
3636bison-locations' (`--bison-locations') is specified:
3637
3638
3639         YYLTYPE *yyget_lloc ( yyscan_t scanner );
3640         void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
3641         yylloc
3642
3643   Support for yylval assumes that `YYSTYPE' is a valid type.  Support
3644for yylloc assumes that `YYSLYPE' is a valid type.  Typically, these
3645types are generated by `bison', and are included in section 1 of the
3646`flex' input.
3647
3648
3649File: flex.info,  Node: Lex and Posix,  Next: Memory Management,  Prev: Reentrant,  Up: Top
3650
365120 Incompatibilities with Lex and Posix
3652***************************************
3653
3654`flex' is a rewrite of the AT&T Unix _lex_ tool (the two
3655implementations do not share any code, though), with some extensions and
3656incompatibilities, both of which are of concern to those who wish to
3657write scanners acceptable to both implementations.  `flex' is fully
3658compliant with the POSIX `lex' specification, except that when using
3659`%pointer' (the default), a call to `unput()' destroys the contents of
3660`yytext', which is counter to the POSIX specification.  In this section
3661we discuss all of the known areas of incompatibility between `flex',
3662AT&T `lex', and the POSIX specification.  `flex''s `-l' option turns on
3663maximum compatibility with the original AT&T `lex' implementation, at
3664the cost of a major loss in the generated scanner's performance.  We
3665note below which incompatibilities can be overcome using the `-l'
3666option.  `flex' is fully compatible with `lex' with the following
3667exceptions:
3668
3669   * The undocumented `lex' scanner internal variable `yylineno' is not
3670     supported unless `-l' or `%option yylineno' is used.
3671
3672   * `yylineno' should be maintained on a per-buffer basis, rather than
3673     a per-scanner (single global variable) basis.
3674
3675   * `yylineno' is not part of the POSIX specification.
3676
3677   * The `input()' routine is not redefinable, though it may be called
3678     to read characters following whatever has been matched by a rule.
3679     If `input()' encounters an end-of-file the normal `yywrap()'
3680     processing is done.  A "real" end-of-file is returned by `input()'
3681     as `EOF'.
3682
3683   * Input is instead controlled by defining the `YY_INPUT()' macro.
3684
3685   * The `flex' restriction that `input()' cannot be redefined is in
3686     accordance with the POSIX specification, which simply does not
3687     specify any way of controlling the scanner's input other than by
3688     making an initial assignment to `yyin'.
3689
3690   * The `unput()' routine is not redefinable.  This restriction is in
3691     accordance with POSIX.
3692
3693   * `flex' scanners are not as reentrant as `lex' scanners.  In
3694     particular, if you have an interactive scanner and an interrupt
3695     handler which long-jumps out of the scanner, and the scanner is
3696     subsequently called again, you may get the following message:
3697
3698
3699              fatal @code{flex} scanner internal error--end of buffer missed
3700
3701     To reenter the scanner, first use:
3702
3703
3704              yyrestart( yyin );
3705
3706     Note that this call will throw away any buffered input; usually
3707     this isn't a problem with an interactive scanner. *Note
3708     Reentrant::, for `flex''s reentrant API.
3709
3710   * Also note that `flex' C++ scanner classes _are_ reentrant, so if
3711     using C++ is an option for you, you should use them instead.
3712     *Note Cxx::, and *Note Reentrant::  for details.
3713
3714   * `output()' is not supported.  Output from the ECHO macro is done
3715     to the file-pointer `yyout' (default `stdout)'.
3716
3717   * `output()' is not part of the POSIX specification.
3718
3719   * `lex' does not support exclusive start conditions (%x), though they
3720     are in the POSIX specification.
3721
3722   * When definitions are expanded, `flex' encloses them in parentheses.
3723     With `lex', the following:
3724
3725
3726              NAME    [A-Z][A-Z0-9]*
3727              %%
3728              foo{NAME}?      printf( "Found it\n" );
3729              %%
3730
3731     will not match the string `foo' because when the macro is expanded
3732     the rule is equivalent to `foo[A-Z][A-Z0-9]*?'  and the precedence
3733     is such that the `?' is associated with `[A-Z0-9]*'.  With `flex',
3734     the rule will be expanded to `foo([A-Z][A-Z0-9]*)?' and so the
3735     string `foo' will match.
3736
3737   * Note that if the definition begins with `^' or ends with `$' then
3738     it is _not_ expanded with parentheses, to allow these operators to
3739     appear in definitions without losing their special meanings.  But
3740     the `<s>', `/', and `<<EOF>>' operators cannot be used in a `flex'
3741     definition.
3742
3743   * Using `-l' results in the `lex' behavior of no parentheses around
3744     the definition.
3745
3746   * The POSIX specification is that the definition be enclosed in
3747     parentheses.
3748
3749   * Some implementations of `lex' allow a rule's action to begin on a
3750     separate line, if the rule's pattern has trailing whitespace:
3751
3752
3753              %%
3754              foo|bar<space here>
3755                { foobar_action();}
3756
3757     `flex' does not support this feature.
3758
3759   * The `lex' `%r' (generate a Ratfor scanner) option is not
3760     supported.  It is not part of the POSIX specification.
3761
3762   * After a call to `unput()', _yytext_ is undefined until the next
3763     token is matched, unless the scanner was built using `%array'.
3764     This is not the case with `lex' or the POSIX specification.  The
3765     `-l' option does away with this incompatibility.
3766
3767   * The precedence of the `{,}' (numeric range) operator is different.
3768     The AT&T and POSIX specifications of `lex' interpret `abc{1,3}'
3769     as match one, two, or three occurrences of `abc'", whereas `flex'
3770     interprets it as "match `ab' followed by one, two, or three
3771     occurrences of `c'".  The `-l' and `--posix' options do away with
3772     this incompatibility.
3773
3774   * The precedence of the `^' operator is different.  `lex' interprets
3775     `^foo|bar' as "match either 'foo' at the beginning of a line, or
3776     'bar' anywhere", whereas `flex' interprets it as "match either
3777     `foo' or `bar' if they come at the beginning of a line".  The
3778     latter is in agreement with the POSIX specification.
3779
3780   * The special table-size declarations such as `%a' supported by
3781     `lex' are not required by `flex' scanners..  `flex' ignores them.
3782
3783   * The name `FLEX_SCANNER' is `#define''d so scanners may be written
3784     for use with either `flex' or `lex'.  Scanners also include
3785     `YY_FLEX_MAJOR_VERSION',  `YY_FLEX_MINOR_VERSION' and
3786     `YY_FLEX_SUBMINOR_VERSION' indicating which version of `flex'
3787     generated the scanner. For example, for the 2.5.22 release, these
3788     defines would be 2,  5 and 22 respectively. If the version of
3789     `flex' being used is a beta version, then the symbol `FLEX_BETA'
3790     is defined.
3791
3792   * The symbols `[[' and `]]' in the code sections of the input may
3793     conflict with the m4 delimiters. *Note M4 Dependency::.
3794
3795
3796   The following `flex' features are not included in `lex' or the POSIX
3797specification:
3798
3799   * C++ scanners
3800
3801   * %option
3802
3803   * start condition scopes
3804
3805   * start condition stacks
3806
3807   * interactive/non-interactive scanners
3808
3809   * yy_scan_string() and friends
3810
3811   * yyterminate()
3812
3813   * yy_set_interactive()
3814
3815   * yy_set_bol()
3816
3817   * YY_AT_BOL()    <<EOF>>
3818
3819   * <*>
3820
3821   * YY_DECL
3822
3823   * YY_START
3824
3825   * YY_USER_ACTION
3826
3827   * YY_USER_INIT
3828
3829   * #line directives
3830
3831   * %{}'s around actions
3832
3833   * reentrant C API
3834
3835   * multiple actions on a line
3836
3837   * almost all of the `flex' command-line options
3838
3839   The feature "multiple actions on a line" refers to the fact that
3840with `flex' you can put multiple actions on the same line, separated
3841with semi-colons, while with `lex', the following:
3842
3843
3844         foo    handle_foo(); ++num_foos_seen;
3845
3846   is (rather surprisingly) truncated to
3847
3848
3849         foo    handle_foo();
3850
3851   `flex' does not truncate the action.  Actions that are not enclosed
3852in braces are simply terminated at the end of the line.
3853
3854
3855File: flex.info,  Node: Memory Management,  Next: Serialized Tables,  Prev: Lex and Posix,  Up: Top
3856
385721 Memory Management
3858********************
3859
3860This chapter describes how flex handles dynamic memory, and how you can
3861override the default behavior.
3862
3863* Menu:
3864
3865* The Default Memory Management::
3866* Overriding The Default Memory Management::
3867* A Note About yytext And Memory::
3868
3869
3870File: flex.info,  Node: The Default Memory Management,  Next: Overriding The Default Memory Management,  Prev: Memory Management,  Up: Memory Management
3871
387221.1 The Default Memory Management
3873==================================
3874
3875Flex allocates dynamic memory during initialization, and once in a
3876while from within a call to yylex(). Initialization takes place during
3877the first call to yylex(). Thereafter, flex may reallocate more memory
3878if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up
3879all memory when you call `yylex_destroy' *Note faq-memory-leak::.
3880
3881   Flex allocates dynamic memory for four purposes, listed below (1)
3882
388316kB for the input buffer.
3884     Flex allocates memory for the character buffer used to perform
3885     pattern matching.  Flex must read ahead from the input stream and
3886     store it in a large character buffer.  This buffer is typically
3887     the largest chunk of dynamic memory flex consumes. This buffer
3888     will grow if necessary, doubling the size each time.  Flex frees
3889     this memory when you call yylex_destroy().  The default size of
3890     this buffer (16384 bytes) is almost always too large.  The ideal
3891     size for this buffer is the length of the longest token expected,
3892     in bytes, plus a little more.  Flex will allocate a few extra
3893     bytes for housekeeping. Currently, to override the size of the
3894     input buffer you must `#define YY_BUF_SIZE' to whatever number of
3895     bytes you want. We don't plan to change this in the near future,
3896     but we reserve the right to do so if we ever add a more robust
3897     memory management API.
3898
389964kb for the REJECT state. This will only be allocated if you use REJECT.
3900     The size is the large enough to hold the same number of states as
3901     characters in the input buffer. If you override the size of the
3902     input buffer (via `YY_BUF_SIZE'), then you automatically override
3903     the size of this buffer as well.
3904
3905100 bytes for the start condition stack.
3906     Flex allocates memory for the start condition stack. This is the
3907     stack used for pushing start states, i.e., with yy_push_state().
3908     It will grow if necessary.  Since the states are simply integers,
3909     this stack doesn't consume much memory.  This stack is not present
3910     if `%option stack' is not specified.  You will rarely need to tune
3911     this buffer. The ideal size for this stack is the maximum depth
3912     expected.  The memory for this stack is automatically destroyed
3913     when you call yylex_destroy(). *Note option-stack::.
3914
391540 bytes for each YY_BUFFER_STATE.
3916     Flex allocates memory for each YY_BUFFER_STATE. The buffer state
3917     itself is about 40 bytes, plus an additional large character
3918     buffer (described above.)  The initial buffer state is created
3919     during initialization, and with each call to yy_create_buffer().
3920     You can't tune the size of this, but you can tune the character
3921     buffer as described above. Any buffer state that you explicitly
3922     create by calling yy_create_buffer() is _NOT_ destroyed
3923     automatically. You must call yy_delete_buffer() to free the
3924     memory. The exception to this rule is that flex will delete the
3925     current buffer automatically when you call yylex_destroy(). If you
3926     delete the current buffer, be sure to set it to NULL.  That way,
3927     flex will not try to delete the buffer a second time (possibly
3928     crashing your program!) At the time of this writing, flex does not
3929     provide a growable stack for the buffer states.  You have to
3930     manage that yourself.  *Note Multiple Input Buffers::.
3931
393284 bytes for the reentrant scanner guts
3933     Flex allocates about 84 bytes for the reentrant scanner structure
3934     when you call yylex_init(). It is destroyed when the user calls
3935     yylex_destroy().
3936
3937
3938   ---------- Footnotes ----------
3939
3940   (1) The quantities given here are approximate, and may vary due to
3941host architecture, compiler configuration, or due to future
3942enhancements to flex.
3943
3944
3945File: flex.info,  Node: Overriding The Default Memory Management,  Next: A Note About yytext And Memory,  Prev: The Default Memory Management,  Up: Memory Management
3946
394721.2 Overriding The Default Memory Management
3948=============================================
3949
3950Flex calls the functions `yyalloc', `yyrealloc', and `yyfree' when it
3951needs to allocate or free memory. By default, these functions are
3952wrappers around the standard C functions, `malloc', `realloc', and
3953`free', respectively. You can override the default implementations by
3954telling flex that you will provide your own implementations.
3955
3956   To override the default implementations, you must do two things:
3957
3958  1. Suppress the default implementations by specifying one or more of
3959     the following options:
3960
3961        * `%option noyyalloc'
3962
3963        * `%option noyyrealloc'
3964
3965        * `%option noyyfree'.
3966
3967  2. Provide your own implementation of the following functions: (1)
3968
3969
3970          // For a non-reentrant scanner
3971          void * yyalloc (size_t bytes);
3972          void * yyrealloc (void * ptr, size_t bytes);
3973          void   yyfree (void * ptr);
3974
3975          // For a reentrant scanner
3976          void * yyalloc (size_t bytes, void * yyscanner);
3977          void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
3978          void   yyfree (void * ptr, void * yyscanner);
3979
3980
3981   In the following example, we will override all three memory
3982routines. We assume that there is a custom allocator with garbage
3983collection. In order to make this example interesting, we will use a
3984reentrant scanner, passing a pointer to the custom allocator through
3985`yyextra'.
3986
3987
3988     %{
3989     #include "some_allocator.h"
3990     %}
3991
3992     /* Suppress the default implementations. */
3993     %option noyyalloc noyyrealloc noyyfree
3994     %option reentrant
3995
3996     /* Initialize the allocator. */
3997     #define YY_EXTRA_TYPE  struct allocator*
3998     #define YY_USER_INIT  yyextra = allocator_create();
3999
4000     %%
4001     .|\n   ;
4002     %%
4003
4004     /* Provide our own implementations. */
4005     void * yyalloc (size_t bytes, void* yyscanner) {
4006         return allocator_alloc (yyextra, bytes);
4007     }
4008
4009     void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
4010         return allocator_realloc (yyextra, bytes);
4011     }
4012
4013     void yyfree (void * ptr, void * yyscanner) {
4014         /* Do nothing -- we leave it to the garbage collector. */
4015     }
4016
4017   ---------- Footnotes ----------
4018
4019   (1) It is not necessary to override all (or any) of the memory
4020management routines.  You may, for example, override `yyrealloc', but
4021not `yyfree' or `yyalloc'.
4022
4023
4024File: flex.info,  Node: A Note About yytext And Memory,  Prev: Overriding The Default Memory Management,  Up: Memory Management
4025
402621.3 A Note About yytext And Memory
4027===================================
4028
4029When flex finds a match, `yytext' points to the first character of the
4030match in the input buffer. The string itself is part of the input
4031buffer, and is _NOT_ allocated separately. The value of yytext will be
4032overwritten the next time yylex() is called. In short, the value of
4033yytext is only valid from within the matched rule's action.
4034
4035   Often, you want the value of yytext to persist for later processing,
4036i.e., by a parser with non-zero lookahead. In order to preserve yytext,
4037you will have to copy it with strdup() or a similar function. But this
4038introduces some headache because your parser is now responsible for
4039freeing the copy of yytext. If you use a yacc or bison parser,
4040(commonly used with flex), you will discover that the error recovery
4041mechanisms can cause memory to be leaked.
4042
4043   To prevent memory leaks from strdup'd yytext, you will have to track
4044the memory somehow. Our experience has shown that a garbage collection
4045mechanism or a pooled memory mechanism will save you a lot of grief
4046when writing parsers.
4047
4048
4049File: flex.info,  Node: Serialized Tables,  Next: Diagnostics,  Prev: Memory Management,  Up: Top
4050
405122 Serialized Tables
4052********************
4053
4054A `flex' scanner has the ability to save the DFA tables to a file, and
4055load them at runtime when needed.  The motivation for this feature is
4056to reduce the runtime memory footprint.  Traditionally, these tables
4057have been compiled into the scanner as C arrays, and are sometimes
4058quite large.  Since the tables are compiled into the scanner, the
4059memory used by the tables can never be freed.  This is a waste of
4060memory, especially if an application uses several scanners, but none of
4061them at the same time.
4062
4063   The serialization feature allows the tables to be loaded at runtime,
4064before scanning begins. The tables may be discarded when scanning is
4065finished.
4066
4067* Menu:
4068
4069* Creating Serialized Tables::
4070* Loading and Unloading Serialized Tables::
4071* Tables File Format::
4072
4073
4074File: flex.info,  Node: Creating Serialized Tables,  Next: Loading and Unloading Serialized Tables,  Prev: Serialized Tables,  Up: Serialized Tables
4075
407622.1 Creating Serialized Tables
4077===============================
4078
4079You may create a scanner with serialized tables by specifying:
4080
4081
4082         %option tables-file=FILE
4083     or
4084         --tables-file=FILE
4085
4086   These options instruct flex to save the DFA tables to the file FILE.
4087The tables will _not_ be embedded in the generated scanner. The scanner
4088will not function on its own. The scanner will be dependent upon the
4089serialized tables. You must load the tables from this file at runtime
4090before you can scan anything.
4091
4092   If you do not specify a filename to `--tables-file', the tables will
4093be saved to `lex.yy.tables', where `yy' is the appropriate prefix.
4094
4095   If your project uses several different scanners, you can concatenate
4096the serialized tables into one file, and flex will find the correct set
4097of tables, using the scanner prefix as part of the lookup key. An
4098example follows:
4099
4100
4101     $ flex --tables-file --prefix=cpp cpp.l
4102     $ flex --tables-file --prefix=c   c.l
4103     $ cat lex.cpp.tables lex.c.tables  >  all.tables
4104
4105   The above example created two scanners, `cpp', and `c'. Since we did
4106not specify a filename, the tables were serialized to `lex.c.tables' and
4107`lex.cpp.tables', respectively. Then, we concatenated the two files
4108together into `all.tables', which we will distribute with our project.
4109At runtime, we will open the file and tell flex to load the tables from
4110it.  Flex will find the correct tables automatically. (See next
4111section).
4112
4113
4114File: flex.info,  Node: Loading and Unloading Serialized Tables,  Next: Tables File Format,  Prev: Creating Serialized Tables,  Up: Serialized Tables
4115
411622.2 Loading and Unloading Serialized Tables
4117============================================
4118
4119If you've built your scanner with `%option tables-file', then you must
4120load the scanner tables at runtime. This can be accomplished with the
4121following function:
4122
4123 -- Function: int yytables_fload (FILE* FP [, yyscan_t SCANNER])
4124     Locates scanner tables in the stream pointed to by FP and loads
4125     them.  Memory for the tables is allocated via `yyalloc'.  You must
4126     call this function before the first call to `yylex'. The argument
4127     SCANNER only appears in the reentrant scanner.  This function
4128     returns `0' (zero) on success, or non-zero on error.
4129
4130   The loaded tables are *not* automatically destroyed (unloaded) when
4131you call `yylex_destroy'. The reason is that you may create several
4132scanners of the same type (in a reentrant scanner), each of which needs
4133access to these tables.  To avoid a nasty memory leak, you must call
4134the following function:
4135
4136 -- Function: int yytables_destroy ([yyscan_t SCANNER])
4137     Unloads the scanner tables. The tables must be loaded again before
4138     you can scan any more data.  The argument SCANNER only appears in
4139     the reentrant scanner.  This function returns `0' (zero) on
4140     success, or non-zero on error.
4141
4142   *The functions `yytables_fload' and `yytables_destroy' are not
4143thread-safe.* You must ensure that these functions are called exactly
4144once (for each scanner type) in a threaded program, before any thread
4145calls `yylex'.  After the tables are loaded, they are never written to,
4146and no thread protection is required thereafter - until you destroy
4147them.
4148
4149
4150File: flex.info,  Node: Tables File Format,  Prev: Loading and Unloading Serialized Tables,  Up: Serialized Tables
4151
415222.3 Tables File Format
4153=======================
4154
4155This section defines the file format of serialized `flex' tables.
4156
4157   The tables format allows for one or more sets of tables to be
4158specified, where each set corresponds to a given scanner. Scanners are
4159indexed by name, as described below. The file format is as follows:
4160
4161
4162                      TABLE SET 1
4163                     +-------------------------------+
4164             Header  | uint32          th_magic;     |
4165                     | uint32          th_hsize;     |
4166                     | uint32          th_ssize;     |
4167                     | uint16          th_flags;     |
4168                     | char            th_version[]; |
4169                     | char            th_name[];    |
4170                     | uint8           th_pad64[];   |
4171                     +-------------------------------+
4172             Table 1 | uint16          td_id;        |
4173                     | uint16          td_flags;     |
4174                     | uint32          td_lolen;     |
4175                     | uint32          td_hilen;     |
4176                     | void            td_data[];    |
4177                     | uint8           td_pad64[];   |
4178                     +-------------------------------+
4179             Table 2 |                               |
4180                .    .                               .
4181                .    .                               .
4182                .    .                               .
4183                .    .                               .
4184             Table n |                               |
4185                     +-------------------------------+
4186                      TABLE SET 2
4187                           .
4188                           .
4189                           .
4190                      TABLE SET N
4191
4192   The above diagram shows that a complete set of tables consists of a
4193header followed by multiple individual tables. Furthermore, multiple
4194complete sets may be present in the same file, each set with its own
4195header and tables. The sets are contiguous in the file. The only way to
4196know if another set follows is to check the next four bytes for the
4197magic number (or check for EOF). The header and tables sections are
4198padded to 64-bit boundaries. Below we describe each field in detail.
4199This format does not specify how the scanner will expand the given
4200data, i.e., data may be serialized as int8, but expanded to an int32
4201array at runtime. This is to reduce the size of the serialized data
4202where possible.  Remember, _all integer values are in network byte
4203order_.
4204
4205Fields of a table header:
4206
4207`th_magic'
4208     Magic number, always 0xF13C57B1.
4209
4210`th_hsize'
4211     Size of this entire header, in bytes, including all fields plus
4212     any padding.
4213
4214`th_ssize'
4215     Size of this entire set, in bytes, including the header, all
4216     tables, plus any padding.
4217
4218`th_flags'
4219     Bit flags for this table set. Currently unused.
4220
4221`th_version[]'
4222     Flex version in NULL-terminated string format. e.g., `2.5.13a'.
4223     This is the version of flex that was used to create the serialized
4224     tables.
4225
4226`th_name[]'
4227     Contains the name of this table set. The default is `yytables',
4228     and is prefixed accordingly, e.g., `footables'. Must be
4229     NULL-terminated.
4230
4231`th_pad64[]'
4232     Zero or more NULL bytes, padding the entire header to the next
4233     64-bit boundary as calculated from the beginning of the header.
4234
4235Fields of a table:
4236
4237`td_id'
4238     Specifies the table identifier. Possible values are:
4239    `YYTD_ID_ACCEPT (0x01)'
4240          `yy_accept'
4241
4242    `YYTD_ID_BASE   (0x02)'
4243          `yy_base'
4244
4245    `YYTD_ID_CHK    (0x03)'
4246          `yy_chk'
4247
4248    `YYTD_ID_DEF    (0x04)'
4249          `yy_def'
4250
4251    `YYTD_ID_EC     (0x05)'
4252          `yy_ec '
4253
4254    `YYTD_ID_META   (0x06)'
4255          `yy_meta'
4256
4257    `YYTD_ID_NUL_TRANS (0x07)'
4258          `yy_NUL_trans'
4259
4260    `YYTD_ID_NXT (0x08)'
4261          `yy_nxt'. This array may be two dimensional. See the
4262          `td_hilen' field below.
4263
4264    `YYTD_ID_RULE_CAN_MATCH_EOL (0x09)'
4265          `yy_rule_can_match_eol'
4266
4267    `YYTD_ID_START_STATE_LIST (0x0A)'
4268          `yy_start_state_list'. This array is handled specially
4269          because it is an array of pointers to structs. See the
4270          `td_flags' field below.
4271
4272    `YYTD_ID_TRANSITION (0x0B)'
4273          `yy_transition'. This array is handled specially because it
4274          is an array of structs. See the `td_lolen' field below.
4275
4276    `YYTD_ID_ACCLIST (0x0C)'
4277          `yy_acclist'
4278
4279`td_flags'
4280     Bit flags describing how to interpret the data in `td_data'.  The
4281     data arrays are one-dimensional by default, but may be two
4282     dimensional as specified in the `td_hilen' field.
4283
4284    `YYTD_DATA8 (0x01)'
4285          The data is serialized as an array of type int8.
4286
4287    `YYTD_DATA16 (0x02)'
4288          The data is serialized as an array of type int16.
4289
4290    `YYTD_DATA32 (0x04)'
4291          The data is serialized as an array of type int32.
4292
4293    `YYTD_PTRANS (0x08)'
4294          The data is a list of indexes of entries in the expanded
4295          `yy_transition' array.  Each index should be expanded to a
4296          pointer to the corresponding entry in the `yy_transition'
4297          array. We count on the fact that the `yy_transition' array
4298          has already been seen.
4299
4300    `YYTD_STRUCT (0x10)'
4301          The data is a list of yy_trans_info structs, each of which
4302          consists of two integers. There is no padding between struct
4303          elements or between structs.  The type of each member is
4304          determined by the `YYTD_DATA*' bits.
4305
4306`td_lolen'
4307     Specifies the number of elements in the lowest dimension array. If
4308     this is a one-dimensional array, then it is simply the number of
4309     elements in this array.  The element size is determined by the
4310     `td_flags' field.
4311
4312`td_hilen'
4313     If `td_hilen' is non-zero, then the data is a two-dimensional
4314     array.  Otherwise, the data is a one-dimensional array. `td_hilen'
4315     contains the number of elements in the higher dimensional array,
4316     and `td_lolen' contains the number of elements in the lowest
4317     dimension.
4318
4319     Conceptually, `td_data' is either `sometype td_data[td_lolen]', or
4320     `sometype td_data[td_hilen][td_lolen]', where `sometype' is
4321     specified by the `td_flags' field.  It is possible for both
4322     `td_lolen' and `td_hilen' to be zero, in which case `td_data' is a
4323     zero length array, and no data is loaded, i.e., this table is
4324     simply skipped. Flex does not currently generate tables of zero
4325     length.
4326
4327`td_data[]'
4328     The table data. This array may be a one- or two-dimensional array,
4329     of type `int8', `int16', `int32', `struct yy_trans_info', or
4330     `struct yy_trans_info*',  depending upon the values in the
4331     `td_flags', `td_lolen', and `td_hilen' fields.
4332
4333`td_pad64[]'
4334     Zero or more NULL bytes, padding the entire table to the next
4335     64-bit boundary as calculated from the beginning of this table.
4336
4337
4338File: flex.info,  Node: Diagnostics,  Next: Limitations,  Prev: Serialized Tables,  Up: Top
4339
434023 Diagnostics
4341**************
4342
4343The following is a list of `flex' diagnostic messages:
4344
4345   * `warning, rule cannot be matched' indicates that the given rule
4346     cannot be matched because it follows other rules that will always
4347     match the same text as it.  For example, in the following `foo'
4348     cannot be matched because it comes after an identifier "catch-all"
4349     rule:
4350
4351
4352              [a-z]+    got_identifier();
4353              foo       got_foo();
4354
4355     Using `REJECT' in a scanner suppresses this warning.
4356
4357   * `warning, -s option given but default rule can be matched' means
4358     that it is possible (perhaps only in a particular start condition)
4359     that the default rule (match any single character) is the only one
4360     that will match a particular input.  Since `-s' was given,
4361     presumably this is not intended.
4362
4363   * `reject_used_but_not_detected undefined' or
4364     `yymore_used_but_not_detected undefined'. These errors can occur
4365     at compile time.  They indicate that the scanner uses `REJECT' or
4366     `yymore()' but that `flex' failed to notice the fact, meaning that
4367     `flex' scanned the first two sections looking for occurrences of
4368     these actions and failed to find any, but somehow you snuck some in
4369     (via a #include file, for example).  Use `%option reject' or
4370     `%option yymore' to indicate to `flex' that you really do use
4371     these features.
4372
4373   * `flex scanner jammed'. a scanner compiled with `-s' has
4374     encountered an input string which wasn't matched by any of its
4375     rules.  This error can also occur due to internal problems.
4376
4377   * `token too large, exceeds YYLMAX'. your scanner uses `%array' and
4378     one of its rules matched a string longer than the `YYLMAX'
4379     constant (8K bytes by default).  You can increase the value by
4380     #define'ing `YYLMAX' in the definitions section of your `flex'
4381     input.
4382
4383   * `scanner requires -8 flag to use the character 'x''. Your scanner
4384     specification includes recognizing the 8-bit character `'x'' and
4385     you did not specify the -8 flag, and your scanner defaulted to
4386     7-bit because you used the `-Cf' or `-CF' table compression
4387     options.  See the discussion of the `-7' flag, *Note Scanner
4388     Options::, for details.
4389
4390   * `flex scanner push-back overflow'. you used `unput()' to push back
4391     so much text that the scanner's buffer could not hold both the
4392     pushed-back text and the current token in `yytext'.  Ideally the
4393     scanner should dynamically resize the buffer in this case, but at
4394     present it does not.
4395
4396   * `input buffer overflow, can't enlarge buffer because scanner uses
4397     REJECT'.  the scanner was working on matching an extremely large
4398     token and needed to expand the input buffer.  This doesn't work
4399     with scanners that use `REJECT'.
4400
4401   * `fatal flex scanner internal error--end of buffer missed'. This can
4402     occur in a scanner which is reentered after a long-jump has jumped
4403     out (or over) the scanner's activation frame.  Before reentering
4404     the scanner, use:
4405
4406              yyrestart( yyin );
4407     or, as noted above, switch to using the C++ scanner class.
4408
4409   * `too many start conditions in <> construct!'  you listed more start
4410     conditions in a <> construct than exist (so you must have listed at
4411     least one of them twice).
4412
4413
4414File: flex.info,  Node: Limitations,  Next: Bibliography,  Prev: Diagnostics,  Up: Top
4415
441624 Limitations
4417**************
4418
4419Some trailing context patterns cannot be properly matched and generate
4420warning messages (`dangerous trailing context').  These are patterns
4421where the ending of the first part of the rule matches the beginning of
4422the second part, such as `zx*/xy*', where the 'x*' matches the 'x' at
4423the beginning of the trailing context.  (Note that the POSIX draft
4424states that the text matched by such patterns is undefined.)  For some
4425trailing context rules, parts which are actually fixed-length are not
4426recognized as such, leading to the abovementioned performance loss.  In
4427particular, parts using `|' or `{n}' (such as `foo{3}') are always
4428considered variable-length.  Combining trailing context with the
4429special `|' action can result in _fixed_ trailing context being turned
4430into the more expensive _variable_ trailing context.  For example, in
4431the following:
4432
4433
4434         %%
4435         abc      |
4436         xyz/def
4437
4438   Use of `unput()' invalidates yytext and yyleng, unless the `%array'
4439directive or the `-l' option has been used.  Pattern-matching of `NUL's
4440is substantially slower than matching other characters.  Dynamic
4441resizing of the input buffer is slow, as it entails rescanning all the
4442text matched so far by the current (generally huge) token.  Due to both
4443buffering of input and read-ahead, you cannot intermix calls to
4444`<stdio.h>' routines, such as, getchar(), with `flex' rules and expect
4445it to work.  Call `input()' instead.  The total table entries listed by
4446the `-v' flag excludes the number of table entries needed to determine
4447what rule has been matched.  The number of entries is equal to the
4448number of DFA states if the scanner does not use `REJECT', and somewhat
4449greater than the number of states if it does.  `REJECT' cannot be used
4450with the `-f' or `-F' options.
4451
4452   The `flex' internal algorithms need documentation.
4453
4454
4455File: flex.info,  Node: Bibliography,  Next: FAQ,  Prev: Limitations,  Up: Top
4456
445725 Additional Reading
4458*********************
4459
4460You may wish to read more about the following programs:
4461   * lex
4462
4463   * yacc
4464
4465   * sed
4466
4467   * awk
4468
4469   The following books may contain material of interest:
4470
4471   John Levine, Tony Mason, and Doug Brown, _Lex & Yacc_, O'Reilly and
4472Associates.  Be sure to get the 2nd edition.
4473
4474   M. E. Lesk and E. Schmidt, _LEX - Lexical Analyzer Generator_
4475
4476   Alfred Aho, Ravi Sethi and Jeffrey Ullman, _Compilers: Principles,
4477Techniques and Tools_, Addison-Wesley (1986).  Describes the
4478pattern-matching techniques used by `flex' (deterministic finite
4479automata).
4480
4481
4482File: flex.info,  Node: FAQ,  Next: Appendices,  Prev: Bibliography,  Up: Top
4483
4484FAQ
4485***
4486
4487From time to time, the `flex' maintainer receives certain questions.
4488Rather than repeat answers to well-understood problems, we publish them
4489here.
4490
4491* Menu:
4492
4493* When was flex born?::
4494* How do I expand backslash-escape sequences in C-style quoted strings?::
4495* Why do flex scanners call fileno if it is not ANSI compatible?::
4496* Does flex support recursive pattern definitions?::
4497* How do I skip huge chunks of input (tens of megabytes) while using flex?::
4498* Flex is not matching my patterns in the same order that I defined them.::
4499* My actions are executing out of order or sometimes not at all.::
4500* How can I have multiple input sources feed into the same scanner at the same time?::
4501* Can I build nested parsers that work with the same input file?::
4502* How can I match text only at the end of a file?::
4503* How can I make REJECT cascade across start condition boundaries?::
4504* Why cant I use fast or full tables with interactive mode?::
4505* How much faster is -F or -f than -C?::
4506* If I have a simple grammar cant I just parse it with flex?::
4507* Why doesn't yyrestart() set the start state back to INITIAL?::
4508* How can I match C-style comments?::
4509* The period isn't working the way I expected.::
4510* Can I get the flex manual in another format?::
4511* Does there exist a "faster" NDFA->DFA algorithm?::
4512* How does flex compile the DFA so quickly?::
4513* How can I use more than 8192 rules?::
4514* How do I abandon a file in the middle of a scan and switch to a new file?::
4515* How do I execute code only during initialization (only before the first scan)?::
4516* How do I execute code at termination?::
4517* Where else can I find help?::
4518* Can I include comments in the "rules" section of the file?::
4519* I get an error about undefined yywrap().::
4520* How can I change the matching pattern at run time?::
4521* How can I expand macros in the input?::
4522* How can I build a two-pass scanner?::
4523* How do I match any string not matched in the preceding rules?::
4524* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
4525* Is there a way to make flex treat NULL like a regular character?::
4526* Whenever flex can not match the input it says "flex scanner jammed".::
4527* Why doesn't flex have non-greedy operators like perl does?::
4528* Memory leak - 16386 bytes allocated by malloc.::
4529* How do I track the byte offset for lseek()?::
4530* How do I use my own I/O classes in a C++ scanner?::
4531* How do I skip as many chars as possible?::
4532* deleteme00::
4533* Are certain equivalent patterns faster than others?::
4534* Is backing up a big deal?::
4535* Can I fake multi-byte character support?::
4536* deleteme01::
4537* Can you discuss some flex internals?::
4538* unput() messes up yy_at_bol::
4539* The | operator is not doing what I want::
4540* Why can't flex understand this variable trailing context pattern?::
4541* The ^ operator isn't working::
4542* Trailing context is getting confused with trailing optional patterns::
4543* Is flex GNU or not?::
4544* ERASEME53::
4545* I need to scan if-then-else blocks and while loops::
4546* ERASEME55::
4547* ERASEME56::
4548* ERASEME57::
4549* Is there a repository for flex scanners?::
4550* How can I conditionally compile or preprocess my flex input file?::
4551* Where can I find grammars for lex and yacc?::
4552* I get an end-of-buffer message for each character scanned.::
4553* unnamed-faq-62::
4554* unnamed-faq-63::
4555* unnamed-faq-64::
4556* unnamed-faq-65::
4557* unnamed-faq-66::
4558* unnamed-faq-67::
4559* unnamed-faq-68::
4560* unnamed-faq-69::
4561* unnamed-faq-70::
4562* unnamed-faq-71::
4563* unnamed-faq-72::
4564* unnamed-faq-73::
4565* unnamed-faq-74::
4566* unnamed-faq-75::
4567* unnamed-faq-76::
4568* unnamed-faq-77::
4569* unnamed-faq-78::
4570* unnamed-faq-79::
4571* unnamed-faq-80::
4572* unnamed-faq-81::
4573* unnamed-faq-82::
4574* unnamed-faq-83::
4575* unnamed-faq-84::
4576* unnamed-faq-85::
4577* unnamed-faq-86::
4578* unnamed-faq-87::
4579* unnamed-faq-88::
4580* unnamed-faq-90::
4581* unnamed-faq-91::
4582* unnamed-faq-92::
4583* unnamed-faq-93::
4584* unnamed-faq-94::
4585* unnamed-faq-95::
4586* unnamed-faq-96::
4587* unnamed-faq-97::
4588* unnamed-faq-98::
4589* unnamed-faq-99::
4590* unnamed-faq-100::
4591* unnamed-faq-101::
4592* What is the difference between YYLEX_PARAM and YY_DECL?::
4593* Why do I get "conflicting types for yylex" error?::
4594* How do I access the values set in a Flex action from within a Bison action?::
4595
4596
4597File: flex.info,  Node: When was flex born?,  Next: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
4598
4599When was flex born?
4600===================
4601
4602Vern Paxson took over the `Software Tools' lex project from Jef
4603Poskanzer in 1982.  At that point it was written in Ratfor.  Around
46041987 or so, Paxson translated it into C, and a legend was born :-).
4605
4606
4607File: flex.info,  Node: How do I expand backslash-escape sequences in C-style quoted strings?,  Next: Why do flex scanners call fileno if it is not ANSI compatible?,  Prev: When was flex born?,  Up: FAQ
4608
4609How do I expand backslash-escape sequences in C-style quoted strings?
4610=====================================================================
4611
4612A key point when scanning quoted strings is that you cannot (easily)
4613write a single rule that will precisely match the string if you allow
4614things like embedded escape sequences and newlines.  If you try to
4615match strings with a single rule then you'll wind up having to rescan
4616the string anyway to find any escape sequences.
4617
4618   Instead you can use exclusive start conditions and a set of rules,
4619one for matching non-escaped text, one for matching a single escape,
4620one for matching an embedded newline, and one for recognizing the end
4621of the string.  Each of these rules is then faced with the question of
4622where to put its intermediary results.  The best solution is for the
4623rules to append their local value of `yytext' to the end of a "string
4624literal" buffer.  A rule like the escape-matcher will append to the
4625buffer the meaning of the escape sequence rather than the literal text
4626in `yytext'.  In this way, `yytext' does not need to be modified at all.
4627
4628
4629File: flex.info,  Node: Why do flex scanners call fileno if it is not ANSI compatible?,  Next: Does flex support recursive pattern definitions?,  Prev: How do I expand backslash-escape sequences in C-style quoted strings?,  Up: FAQ
4630
4631Why do flex scanners call fileno if it is not ANSI compatible?
4632==============================================================
4633
4634Flex scanners call `fileno()' in order to get the file descriptor
4635corresponding to `yyin'. The file descriptor may be passed to
4636`isatty()' or `read()', depending upon which `%options' you specified.
4637If your system does not have `fileno()' support, to get rid of the
4638`read()' call, do not specify `%option read'. To get rid of the
4639`isatty()' call, you must specify one of `%option always-interactive' or
4640`%option never-interactive'.
4641
4642
4643File: flex.info,  Node: Does flex support recursive pattern definitions?,  Next: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Prev: Why do flex scanners call fileno if it is not ANSI compatible?,  Up: FAQ
4644
4645Does flex support recursive pattern definitions?
4646================================================
4647
4648e.g.,
4649
4650
4651     %%
4652     block   "{"({block}|{statement})*"}"
4653
4654   No. You cannot have recursive definitions.  The pattern-matching
4655power of regular expressions in general (and therefore flex scanners,
4656too) is limited.  In particular, regular expressions cannot "balance"
4657parentheses to an arbitrary degree.  For example, it's impossible to
4658write a regular expression that matches all strings containing the same
4659number of '{'s as '}'s.  For more powerful pattern matching, you need a
4660parser, such as `GNU bison'.
4661
4662
4663File: flex.info,  Node: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Next: Flex is not matching my patterns in the same order that I defined them.,  Prev: Does flex support recursive pattern definitions?,  Up: FAQ
4664
4665How do I skip huge chunks of input (tens of megabytes) while using flex?
4666========================================================================
4667
4668Use `fseek()' (or `lseek()') to position yyin, then call `yyrestart()'.
4669
4670
4671File: flex.info,  Node: Flex is not matching my patterns in the same order that I defined them.,  Next: My actions are executing out of order or sometimes not at all.,  Prev: How do I skip huge chunks of input (tens of megabytes) while using flex?,  Up: FAQ
4672
4673Flex is not matching my patterns in the same order that I defined them.
4674=======================================================================
4675
4676`flex' picks the rule that matches the most text (i.e., the longest
4677possible input string).  This is because `flex' uses an entirely
4678different matching technique ("deterministic finite automata") that
4679actually does all of the matching simultaneously, in parallel.  (Seems
4680impossible, but it's actually a fairly simple technique once you
4681understand the principles.)
4682
4683   A side-effect of this parallel matching is that when the input
4684matches more than one rule, `flex' scanners pick the rule that matched
4685the _most_ text. This is explained further in the manual, in the
4686section *Note Matching::.
4687
4688   If you want `flex' to choose a shorter match, then you can work
4689around this behavior by expanding your short rule to match more text,
4690then put back the extra:
4691
4692
4693     data_.*        yyless( 5 ); BEGIN BLOCKIDSTATE;
4694
4695   Another fix would be to make the second rule active only during the
4696`<BLOCKIDSTATE>' start condition, and make that start condition
4697exclusive by declaring it with `%x' instead of `%s'.
4698
4699   A final fix is to change the input language so that the ambiguity for
4700`data_' is removed, by adding characters to it that don't match the
4701identifier rule, or by removing characters (such as `_') from the
4702identifier rule so it no longer matches `data_'.  (Of course, you might
4703also not have the option of changing the input language.)
4704
4705
4706File: flex.info,  Node: My actions are executing out of order or sometimes not at all.,  Next: How can I have multiple input sources feed into the same scanner at the same time?,  Prev: Flex is not matching my patterns in the same order that I defined them.,  Up: FAQ
4707
4708My actions are executing out of order or sometimes not at all.
4709==============================================================
4710
4711Most likely, you have (in error) placed the opening `{' of the action
4712block on a different line than the rule, e.g.,
4713
4714
4715     ^(foo|bar)
4716     {  <<<--- WRONG!
4717
4718     }
4719
4720   `flex' requires that the opening `{' of an action associated with a
4721rule begin on the same line as does the rule.  You need instead to
4722write your rules as follows:
4723
4724
4725     ^(foo|bar)   {  // CORRECT!
4726
4727     }
4728
4729
4730File: flex.info,  Node: How can I have multiple input sources feed into the same scanner at the same time?,  Next: Can I build nested parsers that work with the same input file?,  Prev: My actions are executing out of order or sometimes not at all.,  Up: FAQ
4731
4732How can I have multiple input sources feed into the same scanner at the same time?
4733==================================================================================
4734
4735If ...
4736   * your scanner is free of backtracking (verified using `flex''s `-b'
4737     flag),
4738
4739   * AND you run your scanner interactively (`-I' option; default
4740     unless using special table compression options),
4741
4742   * AND you feed it one character at a time by redefining `YY_INPUT'
4743     to do so,
4744
4745   then every time it matches a token, it will have exhausted its input
4746buffer (because the scanner is free of backtracking).  This means you
4747can safely use `select()' at the point and only call `yylex()' for
4748another token if `select()' indicates there's data available.
4749
4750   That is, move the `select()' out from the input function to a point
4751where it determines whether `yylex()' gets called for the next token.
4752
4753   With this approach, you will still have problems if your input can
4754arrive piecemeal; `select()' could inform you that the beginning of a
4755token is available, you call `yylex()' to get it, but it winds up
4756blocking waiting for the later characters in the token.
4757
4758   Here's another way:  Move your input multiplexing inside of
4759`YY_INPUT'.  That is, whenever `YY_INPUT' is called, it `select()''s to
4760see where input is available.  If input is available for the scanner,
4761it reads and returns the next byte.  If input is available from another
4762source, it calls whatever function is responsible for reading from that
4763source.  (If no input is available, it blocks until some input is
4764available.)  I've used this technique in an interpreter I wrote that
4765both reads keyboard input using a `flex' scanner and IPC traffic from
4766sockets, and it works fine.
4767
4768
4769File: flex.info,  Node: Can I build nested parsers that work with the same input file?,  Next: How can I match text only at the end of a file?,  Prev: How can I have multiple input sources feed into the same scanner at the same time?,  Up: FAQ
4770
4771Can I build nested parsers that work with the same input file?
4772==============================================================
4773
4774This is not going to work without some additional effort.  The reason is
4775that `flex' block-buffers the input it reads from `yyin'.  This means
4776that the "outermost" `yylex()', when called, will automatically slurp
4777up the first 8K of input available on yyin, and subsequent calls to
4778other `yylex()''s won't see that input.  You might be tempted to work
4779around this problem by redefining `YY_INPUT' to only return a small
4780amount of text, but it turns out that that approach is quite difficult.
4781Instead, the best solution is to combine all of your scanners into one
4782large scanner, using a different exclusive start condition for each.
4783
4784
4785File: flex.info,  Node: How can I match text only at the end of a file?,  Next: How can I make REJECT cascade across start condition boundaries?,  Prev: Can I build nested parsers that work with the same input file?,  Up: FAQ
4786
4787How can I match text only at the end of a file?
4788===============================================
4789
4790There is no way to write a rule which is "match this text, but only if
4791it comes at the end of the file".  You can fake it, though, if you
4792happen to have a character lying around that you don't allow in your
4793input.  Then you redefine `YY_INPUT' to call your own routine which, if
4794it sees an `EOF', returns the magic character first (and remembers to
4795return a real `EOF' next time it's called).  Then you could write:
4796
4797
4798     <COMMENT>(.|\n)*{EOF_CHAR}    /* saw comment at EOF */
4799
4800
4801File: flex.info,  Node: How can I make REJECT cascade across start condition boundaries?,  Next: Why cant I use fast or full tables with interactive mode?,  Prev: How can I match text only at the end of a file?,  Up: FAQ
4802
4803How can I make REJECT cascade across start condition boundaries?
4804================================================================
4805
4806You can do this as follows.  Suppose you have a start condition `A', and
4807after exhausting all of the possible matches in `<A>', you want to try
4808matches in `<INITIAL>'.  Then you could use the following:
4809
4810
4811     %x A
4812     %%
4813     <A>rule_that_is_long    ...; REJECT;
4814     <A>rule                 ...; REJECT; /* shorter rule */
4815     <A>etc.
4816     ...
4817     <A>.|\n  {
4818     /* Shortest and last rule in <A>, so
4819     * cascaded REJECTs will eventually
4820     * wind up matching this rule.  We want
4821     * to now switch to the initial state
4822     * and try matching from there instead.
4823     */
4824     yyless(0);    /* put back matched text */
4825     BEGIN(INITIAL);
4826     }
4827
4828
4829File: flex.info,  Node: Why cant I use fast or full tables with interactive mode?,  Next: How much faster is -F or -f than -C?,  Prev: How can I make REJECT cascade across start condition boundaries?,  Up: FAQ
4830
4831Why can't I use fast or full tables with interactive mode?
4832==========================================================
4833
4834One of the assumptions flex makes is that interactive applications are
4835inherently slow (they're waiting on a human after all).  It has to do
4836with how the scanner detects that it must be finished scanning a token.
4837For interactive scanners, after scanning each character the current
4838state is looked up in a table (essentially) to see whether there's a
4839chance of another input character possibly extending the length of the
4840match.  If not, the scanner halts.  For non-interactive scanners, the
4841end-of-token test is much simpler, basically a compare with 0, so no
4842memory bus cycles.  Since the test occurs in the innermost scanning
4843loop, one would like to make it go as fast as possible.
4844
4845   Still, it seems reasonable to allow the user to choose to trade off
4846a bit of performance in this area to gain the corresponding
4847flexibility.  There might be another reason, though, why fast scanners
4848don't support the interactive option.
4849
4850
4851File: flex.info,  Node: How much faster is -F or -f than -C?,  Next: If I have a simple grammar cant I just parse it with flex?,  Prev: Why cant I use fast or full tables with interactive mode?,  Up: FAQ
4852
4853How much faster is -F or -f than -C?
4854====================================
4855
4856Much faster (factor of 2-3).
4857
4858
4859File: flex.info,  Node: If I have a simple grammar cant I just parse it with flex?,  Next: Why doesn't yyrestart() set the start state back to INITIAL?,  Prev: How much faster is -F or -f than -C?,  Up: FAQ
4860
4861If I have a simple grammar can't I just parse it with flex?
4862===========================================================
4863
4864Is your grammar recursive? That's almost always a sign that you're
4865better off using a parser/scanner rather than just trying to use a
4866scanner alone.
4867
4868
4869File: flex.info,  Node: Why doesn't yyrestart() set the start state back to INITIAL?,  Next: How can I match C-style comments?,  Prev: If I have a simple grammar cant I just parse it with flex?,  Up: FAQ
4870
4871Why doesn't yyrestart() set the start state back to INITIAL?
4872============================================================
4873
4874There are two reasons.  The first is that there might be programs that
4875rely on the start state not changing across file changes.  The second
4876is that beginning with `flex' version 2.4, use of `yyrestart()' is no
4877longer required, so fixing the problem there doesn't solve the more
4878general problem.
4879
4880
4881File: flex.info,  Node: How can I match C-style comments?,  Next: The period isn't working the way I expected.,  Prev: Why doesn't yyrestart() set the start state back to INITIAL?,  Up: FAQ
4882
4883How can I match C-style comments?
4884=================================
4885
4886You might be tempted to try something like this:
4887
4888
4889     "/*".*"*/"       // WRONG!
4890
4891   or, worse, this:
4892
4893
4894     "/*"(.|\n)"*/"   // WRONG!
4895
4896   The above rules will eat too much input, and blow up on things like:
4897
4898
4899     /* a comment */ do_my_thing( "oops */" );
4900
4901   Here is one way which allows you to track line information:
4902
4903
4904     <INITIAL>{
4905     "/*"              BEGIN(IN_COMMENT);
4906     }
4907     <IN_COMMENT>{
4908     "*/"      BEGIN(INITIAL);
4909     [^*\n]+   // eat comment in chunks
4910     "*"       // eat the lone star
4911     \n        yylineno++;
4912     }
4913
4914
4915File: flex.info,  Node: The period isn't working the way I expected.,  Next: Can I get the flex manual in another format?,  Prev: How can I match C-style comments?,  Up: FAQ
4916
4917The '.' isn't working the way I expected.
4918=========================================
4919
4920Here are some tips for using `.':
4921
4922   * A common mistake is to place the grouping parenthesis AFTER an
4923     operator, when you really meant to place the parenthesis BEFORE
4924     the operator, e.g., you probably want this `(foo|bar)+' and NOT
4925     this `(foo|bar+)'.
4926
4927     The first pattern matches the words `foo' or `bar' any number of
4928     times, e.g., it matches the text `barfoofoobarfoo'. The second
4929     pattern matches a single instance of `foo' or a single instance of
4930     `bar' followed by one or more `r's, e.g., it matches the text
4931     `barrrr' .
4932
4933   * A `.' inside `[]''s just means a literal`.' (period), and NOT "any
4934     character except newline".
4935
4936   * Remember that `.' matches any character EXCEPT `\n' (and `EOF').
4937     If you really want to match ANY character, including newlines,
4938     then use `(.|\n)' Beware that the regex `(.|\n)+' will match your
4939     entire input!
4940
4941   * Finally, if you want to match a literal `.' (a period), then use
4942     `[.]' or `"."'
4943
4944
4945File: flex.info,  Node: Can I get the flex manual in another format?,  Next: Does there exist a "faster" NDFA->DFA algorithm?,  Prev: The period isn't working the way I expected.,  Up: FAQ
4946
4947Can I get the flex manual in another format?
4948============================================
4949
4950The `flex' source distribution  includes a texinfo manual. You are free
4951to convert that texinfo into whatever format you desire. The `texinfo'
4952package includes tools for conversion to a number of formats.
4953
4954
4955File: flex.info,  Node: Does there exist a "faster" NDFA->DFA algorithm?,  Next: How does flex compile the DFA so quickly?,  Prev: Can I get the flex manual in another format?,  Up: FAQ
4956
4957Does there exist a "faster" NDFA->DFA algorithm?
4958================================================
4959
4960There's no way around the potential exponential running time - it can
4961take you exponential time just to enumerate all of the DFA states.  In
4962practice, though, the running time is closer to linear, or sometimes
4963quadratic.
4964
4965
4966File: flex.info,  Node: How does flex compile the DFA so quickly?,  Next: How can I use more than 8192 rules?,  Prev: Does there exist a "faster" NDFA->DFA algorithm?,  Up: FAQ
4967
4968How does flex compile the DFA so quickly?
4969=========================================
4970
4971There are two big speed wins that `flex' uses:
4972
4973  1. It analyzes the input rules to construct equivalence classes for
4974     those characters that always make the same transitions.  It then
4975     rewrites the NFA using equivalence classes for transitions instead
4976     of characters.  This cuts down the NFA->DFA computation time
4977     dramatically, to the point where, for uncompressed DFA tables, the
4978     DFA generation is often I/O bound in writing out the tables.
4979
4980  2. It maintains hash values for previously computed DFA states, so
4981     testing whether a newly constructed DFA state is equivalent to a
4982     previously constructed state can be done very quickly, by first
4983     comparing hash values.
4984
4985
4986File: flex.info,  Node: How can I use more than 8192 rules?,  Next: How do I abandon a file in the middle of a scan and switch to a new file?,  Prev: How does flex compile the DFA so quickly?,  Up: FAQ
4987
4988How can I use more than 8192 rules?
4989===================================
4990
4991`Flex' is compiled with an upper limit of 8192 rules per scanner.  If
4992you need more than 8192 rules in your scanner, you'll have to recompile
4993`flex' with the following changes in `flexdef.h':
4994
4995
4996     <    #define YY_TRAILING_MASK 0x2000
4997     <    #define YY_TRAILING_HEAD_MASK 0x4000
4998     --
4999     >    #define YY_TRAILING_MASK 0x20000000
5000     >    #define YY_TRAILING_HEAD_MASK 0x40000000
5001
5002   This should work okay as long as your C compiler uses 32 bit
5003integers.  But you might want to think about whether using such a huge
5004number of rules is the best way to solve your problem.
5005
5006   The following may also be relevant:
5007
5008   With luck, you should be able to increase the definitions in
5009flexdef.h for:
5010
5011
5012     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5013     #define MAXIMUM_MNS 31999
5014     #define BAD_SUBSCRIPT -32767
5015
5016   recompile everything, and it'll all work.  Flex only has these
501716-bit-like values built into it because a long time ago it was
5018developed on a machine with 16-bit ints.  I've given this advice to
5019others in the past but haven't heard back from them whether it worked
5020okay or not...
5021
5022
5023File: flex.info,  Node: How do I abandon a file in the middle of a scan and switch to a new file?,  Next: How do I execute code only during initialization (only before the first scan)?,  Prev: How can I use more than 8192 rules?,  Up: FAQ
5024
5025How do I abandon a file in the middle of a scan and switch to a new file?
5026=========================================================================
5027
5028Just call `yyrestart(newfile)'. Be sure to reset the start state if you
5029want a "fresh start, since `yyrestart' does NOT reset the start state
5030back to `INITIAL'.
5031
5032
5033File: flex.info,  Node: How do I execute code only during initialization (only before the first scan)?,  Next: How do I execute code at termination?,  Prev: How do I abandon a file in the middle of a scan and switch to a new file?,  Up: FAQ
5034
5035How do I execute code only during initialization (only before the first scan)?
5036==============================================================================
5037
5038You can specify an initial action by defining the macro `YY_USER_INIT'
5039(though note that `yyout' may not be available at the time this macro
5040is executed).  Or you can add to the beginning of your rules section:
5041
5042
5043     %%
5044         /* Must be indented! */
5045         static int did_init = 0;
5046
5047         if ( ! did_init ){
5048     do_my_init();
5049             did_init = 1;
5050         }
5051
5052
5053File: flex.info,  Node: How do I execute code at termination?,  Next: Where else can I find help?,  Prev: How do I execute code only during initialization (only before the first scan)?,  Up: FAQ
5054
5055How do I execute code at termination?
5056=====================================
5057
5058You can specify an action for the `<<EOF>>' rule.
5059
5060
5061File: flex.info,  Node: Where else can I find help?,  Next: Can I include comments in the "rules" section of the file?,  Prev: How do I execute code at termination?,  Up: FAQ
5062
5063Where else can I find help?
5064===========================
5065
5066You can find the flex homepage on the web at
5067`http://flex.sourceforge.net/'. See that page for details about flex
5068mailing lists as well.
5069
5070
5071File: flex.info,  Node: Can I include comments in the "rules" section of the file?,  Next: I get an error about undefined yywrap().,  Prev: Where else can I find help?,  Up: FAQ
5072
5073Can I include comments in the "rules" section of the file?
5074==========================================================
5075
5076Yes, just about anywhere you want to. See the manual for the specific
5077syntax.
5078
5079
5080File: flex.info,  Node: I get an error about undefined yywrap().,  Next: How can I change the matching pattern at run time?,  Prev: Can I include comments in the "rules" section of the file?,  Up: FAQ
5081
5082I get an error about undefined yywrap().
5083========================================
5084
5085You must supply a `yywrap()' function of your own, or link to `libfl.a'
5086(which provides one), or use
5087
5088
5089     %option noyywrap
5090
5091   in your source to say you don't want a `yywrap()' function.
5092
5093
5094File: flex.info,  Node: How can I change the matching pattern at run time?,  Next: How can I expand macros in the input?,  Prev: I get an error about undefined yywrap().,  Up: FAQ
5095
5096How can I change the matching pattern at run time?
5097==================================================
5098
5099You can't, it's compiled into a static table when flex builds the
5100scanner.
5101
5102
5103File: flex.info,  Node: How can I expand macros in the input?,  Next: How can I build a two-pass scanner?,  Prev: How can I change the matching pattern at run time?,  Up: FAQ
5104
5105How can I expand macros in the input?
5106=====================================
5107
5108The best way to approach this problem is at a higher level, e.g., in
5109the parser.
5110
5111   However, you can do this using multiple input buffers.
5112
5113
5114     %%
5115     macro/[a-z]+	{
5116     /* Saw the macro "macro" followed by extra stuff. */
5117     main_buffer = YY_CURRENT_BUFFER;
5118     expansion_buffer = yy_scan_string(expand(yytext));
5119     yy_switch_to_buffer(expansion_buffer);
5120     }
5121
5122     <<EOF>>	{
5123     if ( expansion_buffer )
5124     {
5125     // We were doing an expansion, return to where
5126     // we were.
5127     yy_switch_to_buffer(main_buffer);
5128     yy_delete_buffer(expansion_buffer);
5129     expansion_buffer = 0;
5130     }
5131     else
5132     yyterminate();
5133     }
5134
5135   You probably will want a stack of expansion buffers to allow nested
5136macros.  From the above though hopefully the idea is clear.
5137
5138
5139File: flex.info,  Node: How can I build a two-pass scanner?,  Next: How do I match any string not matched in the preceding rules?,  Prev: How can I expand macros in the input?,  Up: FAQ
5140
5141How can I build a two-pass scanner?
5142===================================
5143
5144One way to do it is to filter the first pass to a temporary file, then
5145process the temporary file on the second pass. You will probably see a
5146performance hit, due to all the disk I/O.
5147
5148   When you need to look ahead far forward like this, it almost always
5149means that the right solution is to build a parse tree of the entire
5150input, then walk it after the parse in order to generate the output.
5151In a sense, this is a two-pass approach, once through the text and once
5152through the parse tree, but the performance hit for the latter is
5153usually an order of magnitude smaller, since everything is already
5154classified, in binary format, and residing in memory.
5155
5156
5157File: flex.info,  Node: How do I match any string not matched in the preceding rules?,  Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Prev: How can I build a two-pass scanner?,  Up: FAQ
5158
5159How do I match any string not matched in the preceding rules?
5160=============================================================
5161
5162One way to assign precedence, is to place the more specific rules
5163first. If two rules would match the same input (same sequence of
5164characters) then the first rule listed in the `flex' input wins, e.g.,
5165
5166
5167     %%
5168     foo[a-zA-Z_]+    return FOO_ID;
5169     bar[a-zA-Z_]+    return BAR_ID;
5170     [a-zA-Z_]+       return GENERIC_ID;
5171
5172   Note that the rule `[a-zA-Z_]+' must come *after* the others.  It
5173will match the same amount of text as the more specific rules, and in
5174that case the `flex' scanner will pick the first rule listed in your
5175scanner as the one to match.
5176
5177
5178File: flex.info,  Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Next: Is there a way to make flex treat NULL like a regular character?,  Prev: How do I match any string not matched in the preceding rules?,  Up: FAQ
5179
5180I am trying to port code from AT&T lex that uses yysptr and yysbuf.
5181===================================================================
5182
5183Those are internal variables pointing into the AT&T scanner's input
5184buffer.  I imagine they're being manipulated in user versions of the
5185`input()' and `unput()' functions.  If so, what you need to do is
5186analyze those functions to figure out what they're doing, and then
5187replace `input()' with an appropriate definition of `YY_INPUT'.  You
5188shouldn't need to (and must not) replace `flex''s `unput()' function.
5189
5190
5191File: flex.info,  Node: Is there a way to make flex treat NULL like a regular character?,  Next: Whenever flex can not match the input it says "flex scanner jammed".,  Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Up: FAQ
5192
5193Is there a way to make flex treat NULL like a regular character?
5194================================================================
5195
5196Yes, `\0' and `\x00' should both do the trick.  Perhaps you have an
5197ancient version of `flex'.  The latest release is version 2.5.35.
5198
5199
5200File: flex.info,  Node: Whenever flex can not match the input it says "flex scanner jammed".,  Next: Why doesn't flex have non-greedy operators like perl does?,  Prev: Is there a way to make flex treat NULL like a regular character?,  Up: FAQ
5201
5202Whenever flex can not match the input it says "flex scanner jammed".
5203====================================================================
5204
5205You need to add a rule that matches the otherwise-unmatched text, e.g.,
5206
5207
5208     %option yylineno
5209     %%
5210     [[a bunch of rules here]]
5211
5212     .	printf("bad input character '%s' at line %d\n", yytext, yylineno);
5213
5214   See `%option default' for more information.
5215
5216
5217File: flex.info,  Node: Why doesn't flex have non-greedy operators like perl does?,  Next: Memory leak - 16386 bytes allocated by malloc.,  Prev: Whenever flex can not match the input it says "flex scanner jammed".,  Up: FAQ
5218
5219Why doesn't flex have non-greedy operators like perl does?
5220==========================================================
5221
5222A DFA can do a non-greedy match by stopping the first time it enters an
5223accepting state, instead of consuming input until it determines that no
5224further matching is possible (a "jam" state).  This is actually easier
5225to implement than longest leftmost match (which flex does).
5226
5227   But it's also much less useful than longest leftmost match.  In
5228general, when you find yourself wishing for non-greedy matching, that's
5229usually a sign that you're trying to make the scanner do some parsing.
5230That's generally the wrong approach, since it lacks the power to do a
5231decent job.  Better is to either introduce a separate parser, or to
5232split the scanner into multiple scanners using (exclusive) start
5233conditions.
5234
5235   You might have a separate start state once you've seen the `BEGIN'.
5236In that state, you might then have a regex that will match `END' (to
5237kick you out of the state), and perhaps `(.|\n)' to get a single
5238character within the chunk ...
5239
5240   This approach also has much better error-reporting properties.
5241
5242
5243File: flex.info,  Node: Memory leak - 16386 bytes allocated by malloc.,  Next: How do I track the byte offset for lseek()?,  Prev: Why doesn't flex have non-greedy operators like perl does?,  Up: FAQ
5244
5245Memory leak - 16386 bytes allocated by malloc.
5246==============================================
5247
5248UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
5249you did not call `yylex_destroy()'. If you are using an earlier version
5250of `flex', then read on.
5251
5252   The leak is about 16426 bytes.  That is, (8192 * 2 + 2) for the
5253read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
5254alignment). The leak is in the non-reentrant C scanner only (NOT in the
5255reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
5256when you are done, the buffer is never freed.
5257
5258   However, the leak won't multiply since the buffer is reused no
5259matter how many times you call `yylex()'.
5260
5261   If you want to reclaim the memory when you are completely done
5262scanning, then you might try this:
5263
5264
5265     /* For non-reentrant C scanner only. */
5266     yy_delete_buffer(YY_CURRENT_BUFFER);
5267     yy_init = 1;
5268
5269   Note: `yy_init' is an "internal variable", and hasn't been tested in
5270this situation. It is possible that some other globals may need
5271resetting as well.
5272
5273
5274File: flex.info,  Node: How do I track the byte offset for lseek()?,  Next: How do I use my own I/O classes in a C++ scanner?,  Prev: Memory leak - 16386 bytes allocated by malloc.,  Up: FAQ
5275
5276How do I track the byte offset for lseek()?
5277===========================================
5278
5279
5280     >   We thought that it would be possible to have this number through the
5281     >   evaluation of the following expression:
5282     >
5283     >   seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
5284
5285   While this is the right idea, it has two problems.  The first is that
5286it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
5287during an invocation of `YY_INPUT' (or that your input source will
5288return less even though `YY_READ_BUF_SIZE' bytes were requested).  The
5289second problem is that when refilling its internal buffer, `flex' keeps
5290some characters from the previous buffer (because usually it's in the
5291middle of a match, and needs those characters to construct `yytext' for
5292the match once it's done).  Because of this, `yy_c_buf_p -
5293YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
5294already read from the current buffer.
5295
5296   An alternative solution is to count the number of characters you've
5297matched since starting to scan.  This can be done by using
5298`YY_USER_ACTION'.  For example,
5299
5300
5301     #define YY_USER_ACTION num_chars += yyleng;
5302
5303   (You need to be careful to update your bookkeeping if you use
5304`yymore('), `yyless()', `unput()', or `input()'.)
5305
5306
5307File: flex.info,  Node: How do I use my own I/O classes in a C++ scanner?,  Next: How do I skip as many chars as possible?,  Prev: How do I track the byte offset for lseek()?,  Up: FAQ
5308
5309How do I use my own I/O classes in a C++ scanner?
5310=================================================
5311
5312When the flex C++ scanning class rewrite finally happens, then this
5313sort of thing should become much easier.
5314
5315   You can do this by passing the various functions (such as
5316`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
5317dealing with your own I/O classes surreptitiously (i.e., stashing them
5318in special member variables).  This works because the only assumption
5319about the lexer regarding what's done with the iostream's is that
5320they're ultimately passed to `LexerInput()' and `LexerOutput', which
5321then do whatever is necessary with them.
5322
5323
5324File: flex.info,  Node: How do I skip as many chars as possible?,  Next: deleteme00,  Prev: How do I use my own I/O classes in a C++ scanner?,  Up: FAQ
5325
5326How do I skip as many chars as possible?
5327========================================
5328
5329How do I skip as many chars as possible - without interfering with the
5330other patterns?
5331
5332   In the example below, we want to skip over characters until we see
5333the phrase "endskip". The following will _NOT_ work correctly (do you
5334see why not?)
5335
5336
5337     /* INCORRECT SCANNER */
5338     %x SKIP
5339     %%
5340     <INITIAL>startskip   BEGIN(SKIP);
5341     ...
5342     <SKIP>"endskip"       BEGIN(INITIAL);
5343     <SKIP>.*             ;
5344
5345   The problem is that the pattern .* will eat up the word "endskip."
5346The simplest (but slow) fix is:
5347
5348
5349     <SKIP>"endskip"      BEGIN(INITIAL);
5350     <SKIP>.              ;
5351
5352   The fix involves making the second rule match more, without making
5353it match "endskip" plus something else.  So for example:
5354
5355
5356     <SKIP>"endskip"     BEGIN(INITIAL);
5357     <SKIP>[^e]+         ;
5358     <SKIP>.		        ;/* so you eat up e's, too */
5359
5360
5361File: flex.info,  Node: deleteme00,  Next: Are certain equivalent patterns faster than others?,  Prev: How do I skip as many chars as possible?,  Up: FAQ
5362
5363deleteme00
5364==========
5365
5366
5367     QUESTION:
5368     When was flex born?
5369
5370     Vern Paxson took over
5371     the Software Tools lex project from Jef Poskanzer in 1982.  At that point it
5372     was written in Ratfor.  Around 1987 or so, Paxson translated it into C, and
5373     a legend was born :-).
5374
5375
5376File: flex.info,  Node: Are certain equivalent patterns faster than others?,  Next: Is backing up a big deal?,  Prev: deleteme00,  Up: FAQ
5377
5378Are certain equivalent patterns faster than others?
5379===================================================
5380
5381
5382     To: Adoram Rogel <adoram@orna.hybridge.com>
5383     Subject: Re: Flex 2.5.2 performance questions
5384     In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
5385     Date: Wed, 18 Sep 96 10:51:02 PDT
5386     From: Vern Paxson <vern>
5387
5388     [Note, the most recent flex release is 2.5.4, which you can get from
5389     ftp.ee.lbl.gov.  It has bug fixes over 2.5.2 and 2.5.3.]
5390
5391     > 1. Using the pattern
5392     >    ([Ff](oot)?)?[Nn](ote)?(\.)?
5393     >    instead of
5394     >    (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
5395     >    (in a very complicated flex program) caused the program to slow from
5396     >    300K+/min to 100K/min (no other changes were done).
5397
5398     These two are not equivalent.  For example, the first can match "footnote."
5399     but the second can only match "footnote".  This is almost certainly the
5400     cause in the discrepancy - the slower scanner run is matching more tokens,
5401     and/or having to do more backing up.
5402
5403     > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
5404
5405     From a performance point of view, they're equivalent (modulo presumably
5406     minor effects such as memory cache hit rates; and the presence of trailing
5407     context, see below).  From a space point of view, the first is slightly
5408     preferable.
5409
5410     > 3. I have a pattern that look like this:
5411     >    pats {p1}|{p2}|{p3}|...|{p50}     (50 patterns ORd)
5412     >
5413     >    running yet another complicated program that includes the following rule:
5414     >    <snext>{and}/{no4}{bb}{pats}
5415     >
5416     >    gets me to "too complicated - over 32,000 states"...
5417
5418     I can't tell from this example whether the trailing context is variable-length
5419     or fixed-length (it could be the latter if {and} is fixed-length).  If it's
5420     variable length, which flex -p will tell you, then this reflects a basic
5421     performance problem, and if you can eliminate it by restructuring your
5422     scanner, you will see significant improvement.
5423
5424     >    so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
5425     >    10 patterns and changed the rule to be 5 rules.
5426     >    This did compile, but what is the rule of thumb here ?
5427
5428     The rule is to avoid trailing context other than fixed-length, in which for
5429     a/b, either the 'a' pattern or the 'b' pattern have a fixed length.  Use
5430     of the '|' operator automatically makes the pattern variable length, so in
5431     this case '[Ff]oot' is preferred to '(F|f)oot'.
5432
5433     > 4. I changed a rule that looked like this:
5434     >    <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
5435     >
5436     >    to the next 2 rules:
5437     >    <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
5438     >    <snext8>{and}{bb}/{ROMAN}         { BEGIN...
5439     >
5440     >    Again, I understand the using [^...] will cause a great performance loss
5441
5442     Actually, it doesn't cause any sort of performance loss.  It's a surprising
5443     fact about regular expressions that they always match in linear time
5444     regardless of how complex they are.
5445
5446     >    but are there any specific rules about it ?
5447
5448     See the "Performance Considerations" section of the man page, and also
5449     the example in MISC/fastwc/.
5450
5451     		Vern
5452
5453
5454File: flex.info,  Node: Is backing up a big deal?,  Next: Can I fake multi-byte character support?,  Prev: Are certain equivalent patterns faster than others?,  Up: FAQ
5455
5456Is backing up a big deal?
5457=========================
5458
5459
5460     To: Adoram Rogel <adoram@hybridge.com>
5461     Subject: Re: Flex 2.5.2 performance questions
5462     In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
5463     Date: Thu, 19 Sep 96 09:58:00 PDT
5464     From: Vern Paxson <vern>
5465
5466     > a lot about the backing up problem.
5467     > I believe that there lies my biggest problem, and I'll try to improve
5468     > it.
5469
5470     Since you have variable trailing context, this is a bigger performance
5471     problem.  Fixing it is usually easier than fixing backing up, which in a
5472     complicated scanner (yours seems to fit the bill) can be extremely
5473     difficult to do correctly.
5474
5475     You also don't mention what flags you are using for your scanner.
5476     -f makes a large speed difference, and -Cfe buys you nearly as much
5477     speed but the resulting scanner is considerably smaller.
5478
5479     > I have an | operator in {and} and in {pats} so both of them are variable
5480     > length.
5481
5482     -p should have reported this.
5483
5484     > Is changing one of them to fixed-length is enough ?
5485
5486     Yes.
5487
5488     > Is it possible to change the 32,000 states limit ?
5489
5490     Yes.  I've appended instructions on how.  Before you make this change,
5491     though, you should think about whether there are ways to fundamentally
5492     simplify your scanner - those are certainly preferable!
5493
5494     		Vern
5495
5496     To increase the 32K limit (on a machine with 32 bit integers), you increase
5497     the magnitude of the following in flexdef.h:
5498
5499     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
5500     #define MAXIMUM_MNS 31999
5501     #define BAD_SUBSCRIPT -32767
5502     #define MAX_SHORT 32700
5503
5504     Adding a 0 or two after each should do the trick.
5505
5506
5507File: flex.info,  Node: Can I fake multi-byte character support?,  Next: deleteme01,  Prev: Is backing up a big deal?,  Up: FAQ
5508
5509Can I fake multi-byte character support?
5510========================================
5511
5512
5513     To: Heeman_Lee@hp.com
5514     Subject: Re: flex - multi-byte support?
5515     In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
5516     Date: Fri, 04 Oct 1996 11:42:18 PDT
5517     From: Vern Paxson <vern>
5518
5519     >      I assume as long as my *.l file defines the
5520     >      range of expected character code values (in octal format), flex will
5521     >      scan the file and read multi-byte characters correctly. But I have no
5522     >      confidence in this assumption.
5523
5524     Your lack of confidence is justified - this won't work.
5525
5526     Flex has in it a widespread assumption that the input is processed
5527     one byte at a time.  Fixing this is on the to-do list, but is involved,
5528     so it won't happen any time soon.  In the interim, the best I can suggest
5529     (unless you want to try fixing it yourself) is to write your rules in
5530     terms of pairs of bytes, using definitions in the first section:
5531
5532     	X	\xfe\xc2
5533     	...
5534     	%%
5535     	foo{X}bar	found_foo_fe_c2_bar();
5536
5537     etc.  Definitely a pain - sorry about that.
5538
5539     By the way, the email address you used for me is ancient, indicating you
5540     have a very old version of flex.  You can get the most recent, 2.5.4, from
5541     ftp.ee.lbl.gov.
5542
5543     		Vern
5544
5545
5546File: flex.info,  Node: deleteme01,  Next: Can you discuss some flex internals?,  Prev: Can I fake multi-byte character support?,  Up: FAQ
5547
5548deleteme01
5549==========
5550
5551
5552     To: moleary@primus.com
5553     Subject: Re: Flex / Unicode compatibility question
5554     In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
5555     Date: Tue, 22 Oct 1996 11:06:13 PDT
5556     From: Vern Paxson <vern>
5557
5558     Unfortunately flex at the moment has a widespread assumption within it
5559     that characters are processed 8 bits at a time.  I don't see any easy
5560     fix for this (other than writing your rules in terms of double characters -
5561     a pain).  I also don't know of a wider lex, though you might try surfing
5562     the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
5563     toolkit (try searching say Alta Vista for "Purdue Compiler Construction
5564     Toolkit").
5565
5566     Fixing flex to handle wider characters is on the long-term to-do list.
5567     But since flex is a strictly spare-time project these days, this probably
5568     won't happen for quite a while, unless someone else does it first.
5569
5570     		Vern
5571
5572
5573File: flex.info,  Node: Can you discuss some flex internals?,  Next: unput() messes up yy_at_bol,  Prev: deleteme01,  Up: FAQ
5574
5575Can you discuss some flex internals?
5576====================================
5577
5578
5579     To: Johan Linde <jl@theophys.kth.se>
5580     Subject: Re: translation of flex
5581     In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
5582     Date: Mon, 11 Nov 1996 10:33:50 PST
5583     From: Vern Paxson <vern>
5584
5585     > I'm working for the Swedish team translating GNU program, and I'm currently
5586     > working with flex. I have a few questions about some of the messages which
5587     > I hope you can answer.
5588
5589     All of the things you're wondering about, by the way, concerning flex
5590     internals - probably the only person who understands what they mean in
5591     English is me!  So I wouldn't worry too much about getting them right.
5592     That said ...
5593
5594     > #: main.c:545
5595     > msgid "  %d protos created\n"
5596     >
5597     > Does proto mean prototype?
5598
5599     Yes - prototypes of state compression tables.
5600
5601     > #: main.c:539
5602     > msgid "  %d/%d (peak %d) template nxt-chk entries created\n"
5603     >
5604     > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
5605     > However, 'template next-check entries' doesn't make much sense to me. To be
5606     > able to find a good translation I need to know a little bit more about it.
5607
5608     There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
5609     scanner tables.  It involves creating two pairs of tables.  The first has
5610     "base" and "default" entries, the second has "next" and "check" entries.
5611     The "base" entry is indexed by the current state and yields an index into
5612     the next/check table.  The "default" entry gives what to do if the state
5613     transition isn't found in next/check.  The "next" entry gives the next
5614     state to enter, but only if the "check" entry verifies that this entry is
5615     correct for the current state.  Flex creates templates of series of
5616     next/check entries and then encodes differences from these templates as a
5617     way to compress the tables.
5618
5619     > #: main.c:533
5620     > msgid "  %d/%d base-def entries created\n"
5621     >
5622     > The same problem here for 'base-def'.
5623
5624     See above.
5625
5626     		Vern
5627
5628
5629File: flex.info,  Node: unput() messes up yy_at_bol,  Next: The | operator is not doing what I want,  Prev: Can you discuss some flex internals?,  Up: FAQ
5630
5631unput() messes up yy_at_bol
5632===========================
5633
5634
5635     To: Xinying Li <xli@npac.syr.edu>
5636     Subject: Re: FLEX ?
5637     In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
5638     Date: Wed, 13 Nov 1996 19:51:54 PST
5639     From: Vern Paxson <vern>
5640
5641     > "unput()" them to input flow, question occurs. If I do this after I scan
5642     > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
5643     > means the carriage flag has gone.
5644
5645     You can control this by calling yy_set_bol().  It's described in the manual.
5646
5647     >      And if in pre-reading it goes to the end of file, is anything done
5648     > to control the end of curren buffer and end of file?
5649
5650     No, there's no way to put back an end-of-file.
5651
5652     >      By the way I am using flex 2.5.2 and using the "-l".
5653
5654     The latest release is 2.5.4, by the way.  It fixes some bugs in 2.5.2 and
5655     2.5.3.  You can get it from ftp.ee.lbl.gov.
5656
5657     		Vern
5658
5659
5660File: flex.info,  Node: The | operator is not doing what I want,  Next: Why can't flex understand this variable trailing context pattern?,  Prev: unput() messes up yy_at_bol,  Up: FAQ
5661
5662The | operator is not doing what I want
5663=======================================
5664
5665
5666     To: Alain.ISSARD@st.com
5667     Subject: Re: Start condition with FLEX
5668     In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
5669     Date: Mon, 18 Nov 1996 10:41:34 PST
5670     From: Vern Paxson <vern>
5671
5672     > I am not able to use the start condition scope and to use the | (OR) with
5673     > rules having start conditions.
5674
5675     The problem is that if you use '|' as a regular expression operator, for
5676     example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
5677     any blanks around it.  If you instead want the special '|' *action* (which
5678     from your scanner appears to be the case), which is a way of giving two
5679     different rules the same action:
5680
5681     	foo	|
5682     	bar	matched_foo_or_bar();
5683
5684     then '|' *must* be separated from the first rule by whitespace and *must*
5685     be followed by a new line.  You *cannot* write it as:
5686
5687     	foo | bar	matched_foo_or_bar();
5688
5689     even though you might think you could because yacc supports this syntax.
5690     The reason for this unfortunately incompatibility is historical, but it's
5691     unlikely to be changed.
5692
5693     Your problems with start condition scope are simply due to syntax errors
5694     from your use of '|' later confusing flex.
5695
5696     Let me know if you still have problems.
5697
5698     		Vern
5699
5700
5701File: flex.info,  Node: Why can't flex understand this variable trailing context pattern?,  Next: The ^ operator isn't working,  Prev: The | operator is not doing what I want,  Up: FAQ
5702
5703Why can't flex understand this variable trailing context pattern?
5704=================================================================
5705
5706
5707     To: Gregory Margo <gmargo@newton.vip.best.com>
5708     Subject: Re: flex-2.5.3 bug report
5709     In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
5710     Date: Sat, 23 Nov 1996 17:07:32 PST
5711     From: Vern Paxson <vern>
5712
5713     > Enclosed is a lex file that "real" lex will process, but I cannot get
5714     > flex to process it.  Could you try it and maybe point me in the right direction?
5715
5716     Your problem is that some of the definitions in the scanner use the '/'
5717     trailing context operator, and have it enclosed in ()'s.  Flex does not
5718     allow this operator to be enclosed in ()'s because doing so allows undefined
5719     regular expressions such as "(a/b)+".  So the solution is to remove the
5720     parentheses.  Note that you must also be building the scanner with the -l
5721     option for AT&T lex compatibility.  Without this option, flex automatically
5722     encloses the definitions in parentheses.
5723
5724     		Vern
5725
5726
5727File: flex.info,  Node: The ^ operator isn't working,  Next: Trailing context is getting confused with trailing optional patterns,  Prev: Why can't flex understand this variable trailing context pattern?,  Up: FAQ
5728
5729The ^ operator isn't working
5730============================
5731
5732
5733     To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
5734     Subject: Re: Flex Bug ?
5735     In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
5736     Date: Tue, 26 Nov 1996 11:15:05 PST
5737     From: Vern Paxson <vern>
5738
5739     > In my lexer code, i have the line :
5740     > ^\*.*          { }
5741     >
5742     > Thus all lines starting with an astrix (*) are comment lines.
5743     > This does not work !
5744
5745     I can't get this problem to reproduce - it works fine for me.  Note
5746     though that if what you have is slightly different:
5747
5748     	COMMENT	^\*.*
5749     	%%
5750     	{COMMENT}	{ }
5751
5752     then it won't work, because flex pushes back macro definitions enclosed
5753     in ()'s, so the rule becomes
5754
5755     	(^\*.*)		{ }
5756
5757     and now that the '^' operator is not at the immediate beginning of the
5758     line, it's interpreted as just a regular character.  You can avoid this
5759     behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
5760
5761     		Vern
5762
5763
5764File: flex.info,  Node: Trailing context is getting confused with trailing optional patterns,  Next: Is flex GNU or not?,  Prev: The ^ operator isn't working,  Up: FAQ
5765
5766Trailing context is getting confused with trailing optional patterns
5767====================================================================
5768
5769
5770     To: Adoram Rogel <adoram@hybridge.com>
5771     Subject: Re: Flex 2.5.4 BOF ???
5772     In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
5773     Date: Wed, 27 Nov 1996 10:56:25 PST
5774     From: Vern Paxson <vern>
5775
5776     >     Organization(s)?/[a-z]
5777     >
5778     > This matched "Organizations" (looking in debug mode, the trailing s
5779     > was matched with trailing context instead of the optional (s) in the
5780     > end of the word.
5781
5782     That should only happen with lex.  Flex can properly match this pattern.
5783     (That might be what you're saying, I'm just not sure.)
5784
5785     > Is there a way to avoid this dangerous trailing context problem ?
5786
5787     Unfortunately, there's no easy way.  On the other hand, I don't see why
5788     it should be a problem.  Lex's matching is clearly wrong, and I'd hope
5789     that usually the intent remains the same as expressed with the pattern,
5790     so flex's matching will be correct.
5791
5792     		Vern
5793
5794
5795File: flex.info,  Node: Is flex GNU or not?,  Next: ERASEME53,  Prev: Trailing context is getting confused with trailing optional patterns,  Up: FAQ
5796
5797Is flex GNU or not?
5798===================
5799
5800
5801     To: Cameron MacKinnon <mackin@interlog.com>
5802     Subject: Re: Flex documentation bug
5803     In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
5804     Date: Sun, 01 Dec 1996 22:29:39 PST
5805     From: Vern Paxson <vern>
5806
5807     > I'm not sure how or where to submit bug reports (documentation or
5808     > otherwise) for the GNU project stuff ...
5809
5810     Well, strictly speaking flex isn't part of the GNU project.  They just
5811     distribute it because no one's written a decent GPL'd lex replacement.
5812     So you should send bugs directly to me.  Those sent to the GNU folks
5813     sometimes find there way to me, but some may drop between the cracks.
5814
5815     > In GNU Info, under the section 'Start Conditions', and also in the man
5816     > page (mine's dated April '95) is a nice little snippet showing how to
5817     > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
5818     > size. Unfortunately, no overflow checking is ever done ...
5819
5820     This is already mentioned in the manual:
5821
5822     Finally, here's an example of how to  match  C-style  quoted
5823     strings using exclusive start conditions, including expanded
5824     escape sequences (but not including checking  for  a  string
5825     that's too long):
5826
5827     The reason for not doing the overflow checking is that it will needlessly
5828     clutter up an example whose main purpose is just to demonstrate how to
5829     use flex.
5830
5831     The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
5832
5833     		Vern
5834
5835
5836File: flex.info,  Node: ERASEME53,  Next: I need to scan if-then-else blocks and while loops,  Prev: Is flex GNU or not?,  Up: FAQ
5837
5838ERASEME53
5839=========
5840
5841
5842     To: tsv@cs.UManitoba.CA
5843     Subject: Re: Flex (reg)..
5844     In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
5845     Date: Thu, 06 Mar 1997 15:54:19 PST
5846     From: Vern Paxson <vern>
5847
5848     > [:alpha:] ([:alnum:] | \\_)*
5849
5850     If your rule really has embedded blanks as shown above, then it won't
5851     work, as the first blank delimits the rule from the action.  (It wouldn't
5852     even compile ...)  You need instead:
5853
5854     [:alpha:]([:alnum:]|\\_)*
5855
5856     and that should work fine - there's no restriction on what can go inside
5857     of ()'s except for the trailing context operator, '/'.
5858
5859     		Vern
5860
5861
5862File: flex.info,  Node: I need to scan if-then-else blocks and while loops,  Next: ERASEME55,  Prev: ERASEME53,  Up: FAQ
5863
5864I need to scan if-then-else blocks and while loops
5865==================================================
5866
5867
5868     To: "Mike Stolnicki" <mstolnic@ford.com>
5869     Subject: Re: FLEX help
5870     In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
5871     Date: Fri, 30 May 1997 10:46:35 PDT
5872     From: Vern Paxson <vern>
5873
5874     > We'd like to add "if-then-else", "while", and "for" statements to our
5875     > language ...
5876     > We've investigated many possible solutions.  The one solution that seems
5877     > the most reasonable involves knowing the position of a TOKEN in yyin.
5878
5879     I strongly advise you to instead build a parse tree (abstract syntax tree)
5880     and loop over that instead.  You'll find this has major benefits in keeping
5881     your interpreter simple and extensible.
5882
5883     That said, the functionality you mention for get_position and set_position
5884     have been on the to-do list for a while.  As flex is a purely spare-time
5885     project for me, no guarantees when this will be added (in particular, it
5886     for sure won't be for many months to come).
5887
5888     		Vern
5889
5890
5891File: flex.info,  Node: ERASEME55,  Next: ERASEME56,  Prev: I need to scan if-then-else blocks and while loops,  Up: FAQ
5892
5893ERASEME55
5894=========
5895
5896
5897     To: Colin Paul Adams <colin@colina.demon.co.uk>
5898     Subject: Re: Flex C++ classes and Bison
5899     In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
5900     Date: Fri, 15 Aug 1997 10:48:19 PDT
5901     From: Vern Paxson <vern>
5902
5903     > #define YY_DECL   int yylex (YYSTYPE *lvalp, struct parser_control
5904     > *parm)
5905     >
5906     > I have been trying  to get this to work as a C++ scanner, but it does
5907     > not appear to be possible (warning that it matches no declarations in
5908     > yyFlexLexer, or something like that).
5909     >
5910     > Is this supposed to be possible, or is it being worked on (I DID
5911     > notice the comment that scanner classes are still experimental, so I'm
5912     > not too hopeful)?
5913
5914     What you need to do is derive a subclass from yyFlexLexer that provides
5915     the above yylex() method, squirrels away lvalp and parm into member
5916     variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
5917
5918     		Vern
5919
5920
5921File: flex.info,  Node: ERASEME56,  Next: ERASEME57,  Prev: ERASEME55,  Up: FAQ
5922
5923ERASEME56
5924=========
5925
5926
5927     To: Mikael.Latvala@lmf.ericsson.se
5928     Subject: Re: Possible mistake in Flex v2.5 document
5929     In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
5930     Date: Fri, 05 Sep 1997 10:01:54 PDT
5931     From: Vern Paxson <vern>
5932
5933     > In that example you show how to count comment lines when using
5934     > C style /* ... */ comments. My question is, shouldn't you take into
5935     > account a scenario where end of a comment marker occurs inside
5936     > character or string literals?
5937
5938     The scanner certainly needs to also scan character and string literals.
5939     However it does that (there's an example in the man page for strings), the
5940     lexer will recognize the beginning of the literal before it runs across the
5941     embedded "/*".  Consequently, it will finish scanning the literal before it
5942     even considers the possibility of matching "/*".
5943
5944     Example:
5945
5946     	'([^']*|{ESCAPE_SEQUENCE})'
5947
5948     will match all the text between the ''s (inclusive).  So the lexer
5949     considers this as a token beginning at the first ', and doesn't even
5950     attempt to match other tokens inside it.
5951
5952     I thinnk this subtlety is not worth putting in the manual, as I suspect
5953     it would confuse more people than it would enlighten.
5954
5955     		Vern
5956
5957
5958File: flex.info,  Node: ERASEME57,  Next: Is there a repository for flex scanners?,  Prev: ERASEME56,  Up: FAQ
5959
5960ERASEME57
5961=========
5962
5963
5964     To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
5965     Subject: Re: flex limitations
5966     In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
5967     Date: Mon, 08 Sep 1997 11:38:08 PDT
5968     From: Vern Paxson <vern>
5969
5970     > %%
5971     > [a-zA-Z]+       /* skip a line */
5972     >                 {  printf("got %s\n", yytext); }
5973     > %%
5974
5975     What version of flex are you using?  If I feed this to 2.5.4, it complains:
5976
5977     	"bug.l", line 5: EOF encountered inside an action
5978     	"bug.l", line 5: unrecognized rule
5979     	"bug.l", line 5: fatal parse error
5980
5981     Not the world's greatest error message, but it manages to flag the problem.
5982
5983     (With the introduction of start condition scopes, flex can't accommodate
5984     an action on a separate line, since it's ambiguous with an indented rule.)
5985
5986     You can get 2.5.4 from ftp.ee.lbl.gov.
5987
5988     		Vern
5989
5990
5991File: flex.info,  Node: Is there a repository for flex scanners?,  Next: How can I conditionally compile or preprocess my flex input file?,  Prev: ERASEME57,  Up: FAQ
5992
5993Is there a repository for flex scanners?
5994========================================
5995
5996Not that we know of. You might try asking on comp.compilers.
5997
5998
5999File: flex.info,  Node: How can I conditionally compile or preprocess my flex input file?,  Next: Where can I find grammars for lex and yacc?,  Prev: Is there a repository for flex scanners?,  Up: FAQ
6000
6001How can I conditionally compile or preprocess my flex input file?
6002=================================================================
6003
6004Flex doesn't have a preprocessor like C does.  You might try using m4,
6005or the C preprocessor plus a sed script to clean up the result.
6006
6007
6008File: flex.info,  Node: Where can I find grammars for lex and yacc?,  Next: I get an end-of-buffer message for each character scanned.,  Prev: How can I conditionally compile or preprocess my flex input file?,  Up: FAQ
6009
6010Where can I find grammars for lex and yacc?
6011===========================================
6012
6013In the sources for flex and bison.
6014
6015
6016File: flex.info,  Node: I get an end-of-buffer message for each character scanned.,  Next: unnamed-faq-62,  Prev: Where can I find grammars for lex and yacc?,  Up: FAQ
6017
6018I get an end-of-buffer message for each character scanned.
6019==========================================================
6020
6021This will happen if your LexerInput() function returns only one
6022character at a time, which can happen either if you're scanner is
6023"interactive", or if the streams library on your platform always
6024returns 1 for yyin->gcount().
6025
6026   Solution: override LexerInput() with a version that returns whole
6027buffers.
6028
6029
6030File: flex.info,  Node: unnamed-faq-62,  Next: unnamed-faq-63,  Prev: I get an end-of-buffer message for each character scanned.,  Up: FAQ
6031
6032unnamed-faq-62
6033==============
6034
6035
6036     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6037     Subject: Re: Flex maximums
6038     In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
6039     Date: Mon, 17 Nov 1997 17:16:15 PST
6040     From: Vern Paxson <vern>
6041
6042     > I took a quick look into the flex-sources and altered some #defines in
6043     > flexdefs.h:
6044     >
6045     > 	#define INITIAL_MNS 64000
6046     > 	#define MNS_INCREMENT 1024000
6047     > 	#define MAXIMUM_MNS 64000
6048
6049     The things to fix are to add a couple of zeroes to:
6050
6051     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
6052     #define MAXIMUM_MNS 31999
6053     #define BAD_SUBSCRIPT -32767
6054     #define MAX_SHORT 32700
6055
6056     and, if you get complaints about too many rules, make the following change too:
6057
6058     	#define YY_TRAILING_MASK 0x200000
6059     	#define YY_TRAILING_HEAD_MASK 0x400000
6060
6061     - Vern
6062
6063
6064File: flex.info,  Node: unnamed-faq-63,  Next: unnamed-faq-64,  Prev: unnamed-faq-62,  Up: FAQ
6065
6066unnamed-faq-63
6067==============
6068
6069
6070     To: jimmey@lexis-nexis.com (Jimmey Todd)
6071     Subject: Re: FLEX question regarding istream vs ifstream
6072     In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
6073     Date: Mon, 15 Dec 1997 13:21:35 PST
6074     From: Vern Paxson <vern>
6075
6076     >         stdin_handle = YY_CURRENT_BUFFER;
6077     >         ifstream fin( "aFile" );
6078     >         yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
6079     >
6080     > What I'm wanting to do, is pass the contents of a file thru one set
6081     > of rules and then pass stdin thru another set... It works great if, I
6082     > don't use the C++ classes. But since everything else that I'm doing is
6083     > in C++, I thought I'd be consistent.
6084     >
6085     > The problem is that 'yy_create_buffer' is expecting an istream* as it's
6086     > first argument (as stated in the man page). However, fin is a ifstream
6087     > object. Any ideas on what I might be doing wrong? Any help would be
6088     > appreciated. Thanks!!
6089
6090     You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
6091     Then its type will be compatible with the expected istream*, because ifstream
6092     is derived from istream.
6093
6094     		Vern
6095
6096
6097File: flex.info,  Node: unnamed-faq-64,  Next: unnamed-faq-65,  Prev: unnamed-faq-63,  Up: FAQ
6098
6099unnamed-faq-64
6100==============
6101
6102
6103     To: Enda Fadian <fadiane@piercom.ie>
6104     Subject: Re: Question related to Flex man page?
6105     In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
6106     Date: Tue, 16 Dec 1997 14:17:09 PST
6107     From: Vern Paxson <vern>
6108
6109     > Can you explain to me what is ment by a long-jump in relation to flex?
6110
6111     Using the longjmp() function while inside yylex() or a routine called by it.
6112
6113     > what is the flex activation frame.
6114
6115     Just yylex()'s stack frame.
6116
6117     > As far as I can see yyrestart will bring me back to the sart of the input
6118     > file and using flex++ isnot really an option!
6119
6120     No, yyrestart() doesn't imply a rewind, even though its name might sound
6121     like it does.  It tells the scanner to flush its internal buffers and
6122     start reading from the given file at its present location.
6123
6124     		Vern
6125
6126
6127File: flex.info,  Node: unnamed-faq-65,  Next: unnamed-faq-66,  Prev: unnamed-faq-64,  Up: FAQ
6128
6129unnamed-faq-65
6130==============
6131
6132
6133     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
6134     Subject: Re: Need urgent Help
6135     In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
6136     Date: Sun, 21 Dec 1997 21:30:46 PST
6137     From: Vern Paxson <vern>
6138
6139     > /usr/lib/yaccpar: In function `int yyparse()':
6140     > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
6141     >
6142     > ld: Undefined symbol
6143     >    _yylex
6144     >    _yyparse
6145     >    _yyin
6146
6147     This is a known problem with Solaris C++ (and/or Solaris yacc).  I believe
6148     the fix is to explicitly insert some 'extern "C"' statements for the
6149     corresponding routines/symbols.
6150
6151     		Vern
6152
6153
6154File: flex.info,  Node: unnamed-faq-66,  Next: unnamed-faq-67,  Prev: unnamed-faq-65,  Up: FAQ
6155
6156unnamed-faq-66
6157==============
6158
6159
6160     To: mc0307@mclink.it
6161     Cc: gnu@prep.ai.mit.edu
6162     Subject: Re: [mc0307@mclink.it: Help request]
6163     In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
6164     Date: Sun, 21 Dec 1997 22:33:37 PST
6165     From: Vern Paxson <vern>
6166
6167     > This is my definition for float and integer types:
6168     > . . .
6169     > NZD          [1-9]
6170     > ...
6171     > I've tested my program on other lex version (on UNIX Sun Solaris an HP
6172     > UNIX) and it work well, so I think that my definitions are correct.
6173     > There are any differences between Lex and Flex?
6174
6175     There are indeed differences, as discussed in the man page.  The one
6176     you are probably running into is that when flex expands a name definition,
6177     it puts parentheses around the expansion, while lex does not.  There's
6178     an example in the man page of how this can lead to different matching.
6179     Flex's behavior complies with the POSIX standard (or at least with the
6180     last POSIX draft I saw).
6181
6182     		Vern
6183
6184
6185File: flex.info,  Node: unnamed-faq-67,  Next: unnamed-faq-68,  Prev: unnamed-faq-66,  Up: FAQ
6186
6187unnamed-faq-67
6188==============
6189
6190
6191     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
6192     Subject: Re: Thanks
6193     In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
6194     Date: Mon, 22 Dec 1997 14:35:05 PST
6195     From: Vern Paxson <vern>
6196
6197     > Thank you very much for your help. I compile and link well with C++ while
6198     > declaring 'yylex ...' extern, But a little problem remains. I get a
6199     > segmentation default when executing ( I linked with lfl library) while it
6200     > works well when using LEX instead of flex. Do you have some ideas about the
6201     > reason for this ?
6202
6203     The one possible reason for this that comes to mind is if you've defined
6204     yytext as "extern char yytext[]" (which is what lex uses) instead of
6205     "extern char *yytext" (which is what flex uses).  If it's not that, then
6206     I'm afraid I don't know what the problem might be.
6207
6208     		Vern
6209
6210
6211File: flex.info,  Node: unnamed-faq-68,  Next: unnamed-faq-69,  Prev: unnamed-faq-67,  Up: FAQ
6212
6213unnamed-faq-68
6214==============
6215
6216
6217     To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
6218     Subject: Re: flex 2.5: c++ scanners & start conditions
6219     In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
6220     Date: Tue, 06 Jan 1998 19:19:30 PST
6221     From: Vern Paxson <vern>
6222
6223     > The problem is that when I do this (using %option c++) start
6224     > conditions seem to not apply.
6225
6226     The BEGIN macro modifies the yy_start variable.  For C scanners, this
6227     is a static with scope visible through the whole file.  For C++ scanners,
6228     it's a member variable, so it only has visible scope within a member
6229     function.  Your lexbegin() routine is not a member function when you
6230     build a C++ scanner, so it's not modifying the correct yy_start.  The
6231     diagnostic that indicates this is that you found you needed to add
6232     a declaration of yy_start in order to get your scanner to compile when
6233     using C++; instead, the correct fix is to make lexbegin() a member
6234     function (by deriving from yyFlexLexer).
6235
6236     		Vern
6237
6238
6239File: flex.info,  Node: unnamed-faq-69,  Next: unnamed-faq-70,  Prev: unnamed-faq-68,  Up: FAQ
6240
6241unnamed-faq-69
6242==============
6243
6244
6245     To: "Boris Zinin" <boris@ippe.rssi.ru>
6246     Subject: Re: current position in flex buffer
6247     In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
6248     Date: Mon, 12 Jan 1998 12:03:15 PST
6249     From: Vern Paxson <vern>
6250
6251     > The problem is how to determine the current position in flex active
6252     > buffer when a rule is matched....
6253
6254     You will need to keep track of this explicitly, such as by redefining
6255     YY_USER_ACTION to count the number of characters matched.
6256
6257     The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
6258
6259     		Vern
6260
6261
6262File: flex.info,  Node: unnamed-faq-70,  Next: unnamed-faq-71,  Prev: unnamed-faq-69,  Up: FAQ
6263
6264unnamed-faq-70
6265==============
6266
6267
6268     To: Bik.Dhaliwal@bis.org
6269     Subject: Re: Flex question
6270     In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
6271     Date: Tue, 27 Jan 1998 22:41:52 PST
6272     From: Vern Paxson <vern>
6273
6274     > That requirement involves knowing
6275     > the character position at which a particular token was matched
6276     > in the lexer.
6277
6278     The way you have to do this is by explicitly keeping track of where
6279     you are in the file, by counting the number of characters scanned
6280     for each token (available in yyleng).  It may prove convenient to
6281     do this by redefining YY_USER_ACTION, as described in the manual.
6282
6283     		Vern
6284
6285
6286File: flex.info,  Node: unnamed-faq-71,  Next: unnamed-faq-72,  Prev: unnamed-faq-70,  Up: FAQ
6287
6288unnamed-faq-71
6289==============
6290
6291
6292     To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
6293     Subject: Re: flex: how to control start condition from parser?
6294     In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
6295     Date: Tue, 27 Jan 1998 22:45:37 PST
6296     From: Vern Paxson <vern>
6297
6298     > It seems useful for the parser to be able to tell the lexer about such
6299     > context dependencies, because then they don't have to be limited to
6300     > local or sequential context.
6301
6302     One way to do this is to have the parser call a stub routine that's
6303     included in the scanner's .l file, and consequently that has access ot
6304     BEGIN.  The only ugliness is that the parser can't pass in the state
6305     it wants, because those aren't visible - but if you don't have many
6306     such states, then using a different set of names doesn't seem like
6307     to much of a burden.
6308
6309     While generating a .h file like you suggests is certainly cleaner,
6310     flex development has come to a virtual stand-still :-(, so a workaround
6311     like the above is much more pragmatic than waiting for a new feature.
6312
6313     		Vern
6314
6315
6316File: flex.info,  Node: unnamed-faq-72,  Next: unnamed-faq-73,  Prev: unnamed-faq-71,  Up: FAQ
6317
6318unnamed-faq-72
6319==============
6320
6321
6322     To: Barbara Denny <denny@3com.com>
6323     Subject: Re: freebsd flex bug?
6324     In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
6325     Date: Fri, 30 Jan 1998 12:42:32 PST
6326     From: Vern Paxson <vern>
6327
6328     > lex.yy.c:1996: parse error before `='
6329
6330     This is the key, identifying this error.  (It may help to pinpoint
6331     it by using flex -L, so it doesn't generate #line directives in its
6332     output.)  I will bet you heavy money that you have a start condition
6333     name that is also a variable name, or something like that; flex spits
6334     out #define's for each start condition name, mapping them to a number,
6335     so you can wind up with:
6336
6337     	%x foo
6338     	%%
6339     		...
6340     	%%
6341     	void bar()
6342     		{
6343     		int foo = 3;
6344     		}
6345
6346     and the penultimate will turn into "int 1 = 3" after C preprocessing,
6347     since flex will put "#define foo 1" in the generated scanner.
6348
6349     		Vern
6350
6351
6352File: flex.info,  Node: unnamed-faq-73,  Next: unnamed-faq-74,  Prev: unnamed-faq-72,  Up: FAQ
6353
6354unnamed-faq-73
6355==============
6356
6357
6358     To: Maurice Petrie <mpetrie@infoscigroup.com>
6359     Subject: Re: Lost flex .l file
6360     In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
6361     Date: Mon, 02 Feb 1998 11:15:12 PST
6362     From: Vern Paxson <vern>
6363
6364     > I am curious as to
6365     > whether there is a simple way to backtrack from the generated source to
6366     > reproduce the lost list of tokens we are searching on.
6367
6368     In theory, it's straight-forward to go from the DFA representation
6369     back to a regular-expression representation - the two are isomorphic.
6370     In practice, a huge headache, because you have to unpack all the tables
6371     back into a single DFA representation, and then write a program to munch
6372     on that and translate it into an RE.
6373
6374     Sorry for the less-than-happy news ...
6375
6376     		Vern
6377
6378
6379File: flex.info,  Node: unnamed-faq-74,  Next: unnamed-faq-75,  Prev: unnamed-faq-73,  Up: FAQ
6380
6381unnamed-faq-74
6382==============
6383
6384
6385     To: jimmey@lexis-nexis.com (Jimmey Todd)
6386     Subject: Re: Flex performance question
6387     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6388     Date: Thu, 19 Feb 1998 08:48:51 PST
6389     From: Vern Paxson <vern>
6390
6391     > What I have found, is that the smaller the data chunk, the faster the
6392     > program executes. This is the opposite of what I expected. Should this be
6393     > happening this way?
6394
6395     This is exactly what will happen if your input file has embedded NULs.
6396     From the man page:
6397
6398     A final note: flex is slow when matching NUL's, particularly
6399     when  a  token  contains multiple NUL's.  It's best to write
6400     rules which match short amounts of text if it's  anticipated
6401     that the text will often include NUL's.
6402
6403     So that's the first thing to look for.
6404
6405     		Vern
6406
6407
6408File: flex.info,  Node: unnamed-faq-75,  Next: unnamed-faq-76,  Prev: unnamed-faq-74,  Up: FAQ
6409
6410unnamed-faq-75
6411==============
6412
6413
6414     To: jimmey@lexis-nexis.com (Jimmey Todd)
6415     Subject: Re: Flex performance question
6416     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
6417     Date: Thu, 19 Feb 1998 15:42:25 PST
6418     From: Vern Paxson <vern>
6419
6420     So there are several problems.
6421
6422     First, to go fast, you want to match as much text as possible, which
6423     your scanners don't in the case that what they're scanning is *not*
6424     a <RN> tag.  So you want a rule like:
6425
6426     	[^<]+
6427
6428     Second, C++ scanners are particularly slow if they're interactive,
6429     which they are by default.  Using -B speeds it up by a factor of 3-4
6430     on my workstation.
6431
6432     Third, C++ scanners that use the istream interface are slow, because
6433     of how poorly implemented istream's are.  I built two versions of
6434     the following scanner:
6435
6436     	%%
6437     	.*\n
6438     	.*
6439     	%%
6440
6441     and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
6442     The C++ istream version, using -B, takes 3.8 seconds.
6443
6444     		Vern
6445
6446
6447File: flex.info,  Node: unnamed-faq-76,  Next: unnamed-faq-77,  Prev: unnamed-faq-75,  Up: FAQ
6448
6449unnamed-faq-76
6450==============
6451
6452
6453     To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com>
6454     Subject: Re: FLEX 2.5 & THE YEAR 2000
6455     In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT.
6456     Date: Wed, 03 Jun 1998 10:22:26 PDT
6457     From: Vern Paxson <vern>
6458
6459     > I am researching the Y2K problem with General Electric R&D
6460     > and need to know if there are any known issues concerning
6461     > the above mentioned software and Y2K regardless of version.
6462
6463     There shouldn't be, all it ever does with the date is ask the system
6464     for it and then print it out.
6465
6466     		Vern
6467
6468
6469File: flex.info,  Node: unnamed-faq-77,  Next: unnamed-faq-78,  Prev: unnamed-faq-76,  Up: FAQ
6470
6471unnamed-faq-77
6472==============
6473
6474
6475     To: "Hans Dermot Doran" <htd@ibhdoran.com>
6476     Subject: Re: flex problem
6477     In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT.
6478     Date: Tue, 21 Jul 1998 14:23:34 PDT
6479     From: Vern Paxson <vern>
6480
6481     > To overcome this, I gets() the stdin into a string and lex the string. The
6482     > string is lexed OK except that the end of string isn't lexed properly
6483     > (yy_scan_string()), that is the lexer dosn't recognise the end of string.
6484
6485     Flex doesn't contain mechanisms for recognizing buffer endpoints.  But if
6486     you use fgets instead (which you should anyway, to protect against buffer
6487     overflows), then the final \n will be preserved in the string, and you can
6488     scan that in order to find the end of the string.
6489
6490     		Vern
6491
6492
6493File: flex.info,  Node: unnamed-faq-78,  Next: unnamed-faq-79,  Prev: unnamed-faq-77,  Up: FAQ
6494
6495unnamed-faq-78
6496==============
6497
6498
6499     To: soumen@almaden.ibm.com
6500     Subject: Re: Flex++ 2.5.3 instance member vs. static member
6501     In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT.
6502     Date: Tue, 28 Jul 1998 01:10:34 PDT
6503     From: Vern Paxson <vern>
6504
6505     > %{
6506     > int mylineno = 0;
6507     > %}
6508     > ws      [ \t]+
6509     > alpha   [A-Za-z]
6510     > dig     [0-9]
6511     > %%
6512     >
6513     > Now you'd expect mylineno to be a member of each instance of class
6514     > yyFlexLexer, but is this the case?  A look at the lex.yy.cc file seems to
6515     > indicate otherwise; unless I am missing something the declaration of
6516     > mylineno seems to be outside any class scope.
6517     >
6518     > How will this work if I want to run a multi-threaded application with each
6519     > thread creating a FlexLexer instance?
6520
6521     Derive your own subclass and make mylineno a member variable of it.
6522
6523     		Vern
6524
6525
6526File: flex.info,  Node: unnamed-faq-79,  Next: unnamed-faq-80,  Prev: unnamed-faq-78,  Up: FAQ
6527
6528unnamed-faq-79
6529==============
6530
6531
6532     To: Adoram Rogel <adoram@hybridge.com>
6533     Subject: Re: More than 32K states change hangs
6534     In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT.
6535     Date: Tue, 04 Aug 1998 22:28:45 PDT
6536     From: Vern Paxson <vern>
6537
6538     > Vern Paxson,
6539     >
6540     > I followed your advice, posted on Usenet bu you, and emailed to me
6541     > personally by you, on how to overcome the 32K states limit. I'm running
6542     > on Linux machines.
6543     > I took the full source of version 2.5.4 and did the following changes in
6544     > flexdef.h:
6545     > #define JAMSTATE -327660
6546     > #define MAXIMUM_MNS 319990
6547     > #define BAD_SUBSCRIPT -327670
6548     > #define MAX_SHORT 327000
6549     >
6550     > and compiled.
6551     > All looked fine, including check and bigcheck, so I installed.
6552
6553     Hmmm, you shouldn't increase MAX_SHORT, though looking through my email
6554     archives I see that I did indeed recommend doing so.  Try setting it back
6555     to 32700; that should suffice that you no longer need -Ca.  If it still
6556     hangs, then the interesting question is - where?
6557
6558     > Compiling the same hanged program with a out-of-the-box (RedHat 4.2
6559     > distribution of Linux)
6560     > flex 2.5.4 binary works.
6561
6562     Since Linux comes with source code, you should diff it against what
6563     you have to see what problems they missed.
6564
6565     > Should I always compile with the -Ca option now ? even short and simple
6566     > filters ?
6567
6568     No, definitely not.  It's meant to be for those situations where you
6569     absolutely must squeeze every last cycle out of your scanner.
6570
6571     		Vern
6572
6573
6574File: flex.info,  Node: unnamed-faq-80,  Next: unnamed-faq-81,  Prev: unnamed-faq-79,  Up: FAQ
6575
6576unnamed-faq-80
6577==============
6578
6579
6580     To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com>
6581     Subject: Re: flex output for static code portion
6582     In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT.
6583     Date: Mon, 17 Aug 1998 23:57:42 PDT
6584     From: Vern Paxson <vern>
6585
6586     > I would like to use flex under the hood to generate a binary file
6587     > containing the data structures that control the parse.
6588
6589     This has been on the wish-list for a long time.  In principle it's
6590     straight-forward - you redirect mkdata() et al's I/O to another file,
6591     and modify the skeleton to have a start-up function that slurps these
6592     into dynamic arrays.  The concerns are (1) the scanner generation code
6593     is hairy and full of corner cases, so it's easy to get surprised when
6594     going down this path :-( ; and (2) being careful about buffering so
6595     that when the tables change you make sure the scanner starts in the
6596     correct state and reading at the right point in the input file.
6597
6598     > I was wondering if you know of anyone who has used flex in this way.
6599
6600     I don't - but it seems like a reasonable project to undertake (unlike
6601     numerous other flex tweaks :-).
6602
6603     		Vern
6604
6605
6606File: flex.info,  Node: unnamed-faq-81,  Next: unnamed-faq-82,  Prev: unnamed-faq-80,  Up: FAQ
6607
6608unnamed-faq-81
6609==============
6610
6611
6612     Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11])
6613     	by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838
6614     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT)
6615     Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2])
6616     	by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694
6617     	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200
6618     Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200
6619     From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de>
6620     Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de>
6621     Subject: "flex scanner push-back overflow"
6622     To: vern@ee.lbl.gov
6623     Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST)
6624     Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6625     X-NoJunk: Do NOT send commercial mail, spam or ads to this address!
6626     X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/
6627     X-Mailer: ELM [version 2.4ME+ PL28 (25)]
6628     MIME-Version: 1.0
6629     Content-Type: text/plain; charset=US-ASCII
6630     Content-Transfer-Encoding: 7bit
6631
6632     Hi Vern,
6633
6634     Yesterday, I encountered a strange problem: I use the macro processor m4
6635     to include some lengthy lists into a .l file. Following is a flex macro
6636     definition that causes some serious pain in my neck:
6637
6638     AUTHOR           ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...])
6639
6640     The complete list contains about 10kB. When I try to "flex" this file
6641     (on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
6642     some of the predefined values in flexdefs.h) I get the error:
6643
6644     myflex/flex -8  sentag.tmp.l
6645     flex scanner push-back overflow
6646
6647     When I remove the slashes in the macro definition everything works fine.
6648     As I understand it, the double quotes escape the slash-character so it
6649     really means "/" and not "trailing context". Furthermore, I tried to
6650     escape the slashes with backslashes, but with no use, the same error message
6651     appeared when flexing the code.
6652
6653     Do you have an idea what's going on here?
6654
6655     Greetings from Germany,
6656     	Georg
6657     --
6658     Georg Rehm                                     georg@cl-ki.uni-osnabrueck.de
6659     Institute for Semantic Information Processing, University of Osnabrueck, FRG
6660
6661
6662File: flex.info,  Node: unnamed-faq-82,  Next: unnamed-faq-83,  Prev: unnamed-faq-81,  Up: FAQ
6663
6664unnamed-faq-82
6665==============
6666
6667
6668     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
6669     Subject: Re: "flex scanner push-back overflow"
6670     In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT.
6671     Date: Thu, 20 Aug 1998 07:05:35 PDT
6672     From: Vern Paxson <vern>
6673
6674     > myflex/flex -8  sentag.tmp.l
6675     > flex scanner push-back overflow
6676
6677     Flex itself uses a flex scanner.  That scanner is running out of buffer
6678     space when it tries to unput() the humongous macro you've defined.  When
6679     you remove the '/'s, you make it small enough so that it fits in the buffer;
6680     removing spaces would do the same thing.
6681
6682     The fix is to either rethink how come you're using such a big macro and
6683     perhaps there's another/better way to do it; or to rebuild flex's own
6684     scan.c with a larger value for
6685
6686     	#define YY_BUF_SIZE 16384
6687
6688     - Vern
6689
6690
6691File: flex.info,  Node: unnamed-faq-83,  Next: unnamed-faq-84,  Prev: unnamed-faq-82,  Up: FAQ
6692
6693unnamed-faq-83
6694==============
6695
6696
6697     To: Jan Kort <jan@research.techforce.nl>
6698     Subject: Re: Flex
6699     In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200.
6700     Date: Sat, 05 Sep 1998 00:59:49 PDT
6701     From: Vern Paxson <vern>
6702
6703     > %%
6704     >
6705     > "TEST1\n"       { fprintf(stderr, "TEST1\n"); yyless(5); }
6706     > ^\n             { fprintf(stderr, "empty line\n"); }
6707     > .               { }
6708     > \n              { fprintf(stderr, "new line\n"); }
6709     >
6710     > %%
6711     > -- input ---------------------------------------
6712     > TEST1
6713     > -- output --------------------------------------
6714     > TEST1
6715     > empty line
6716     > ------------------------------------------------
6717
6718     IMHO, it's not clear whether or not this is in fact a bug.  It depends
6719     on whether you view yyless() as backing up in the input stream, or as
6720     pushing new characters onto the beginning of the input stream.  Flex
6721     interprets it as the latter (for implementation convenience, I'll admit),
6722     and so considers the newline as in fact matching at the beginning of a
6723     line, as after all the last token scanned an entire line and so the
6724     scanner is now at the beginning of a new line.
6725
6726     I agree that this is counter-intuitive for yyless(), given its
6727     functional description (it's less so for unput(), depending on whether
6728     you're unput()'ing new text or scanned text).  But I don't plan to
6729     change it any time soon, as it's a pain to do so.  Consequently,
6730     you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak
6731     your scanner into the behavior you desire.
6732
6733     Sorry for the less-than-completely-satisfactory answer.
6734
6735     		Vern
6736
6737
6738File: flex.info,  Node: unnamed-faq-84,  Next: unnamed-faq-85,  Prev: unnamed-faq-83,  Up: FAQ
6739
6740unnamed-faq-84
6741==============
6742
6743
6744     To: Patrick Krusenotto <krusenot@mac-info-link.de>
6745     Subject: Re: Problems with restarting flex-2.5.2-generated scanner
6746     In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT.
6747     Date: Thu, 24 Sep 1998 23:28:43 PDT
6748     From: Vern Paxson <vern>
6749
6750     > I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately
6751     > trying to make my scanner restart with a new file after my parser stops
6752     > with a parse error. When my compiler restarts, the parser always
6753     > receives the token after the token (in the old file!) that caused the
6754     > parser error.
6755
6756     I suspect the problem is that your parser has read ahead in order
6757     to attempt to resolve an ambiguity, and when it's restarted it picks
6758     up with that token rather than reading a fresh one.  If you're using
6759     yacc, then the special "error" production can sometimes be used to
6760     consume tokens in an attempt to get the parser into a consistent state.
6761
6762     		Vern
6763
6764
6765File: flex.info,  Node: unnamed-faq-85,  Next: unnamed-faq-86,  Prev: unnamed-faq-84,  Up: FAQ
6766
6767unnamed-faq-85
6768==============
6769
6770
6771     To: Henric Jungheim <junghelh@pe-nelson.com>
6772     Subject: Re: flex 2.5.4a
6773     In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST.
6774     Date: Tue, 27 Oct 1998 16:50:14 PST
6775     From: Vern Paxson <vern>
6776
6777     > This brings up a feature request:  How about a command line
6778     > option to specify the filename when reading from stdin?  That way one
6779     > doesn't need to create a temporary file in order to get the "#line"
6780     > directives to make sense.
6781
6782     Use -o combined with -t (per the man page description of -o).
6783
6784     > P.S., Is there any simple way to use non-blocking IO to parse multiple
6785     > streams?
6786
6787     Simple, no.
6788
6789     One approach might be to return a magic character on EWOULDBLOCK and
6790     have a rule
6791
6792     	.*<magic-character>	// put back .*, eat magic character
6793
6794     This is off the top of my head, not sure it'll work.
6795
6796     		Vern
6797
6798
6799File: flex.info,  Node: unnamed-faq-86,  Next: unnamed-faq-87,  Prev: unnamed-faq-85,  Up: FAQ
6800
6801unnamed-faq-86
6802==============
6803
6804
6805     To: "Repko, Billy D" <billy.d.repko@intel.com>
6806     Subject: Re: Compiling scanners
6807     In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST.
6808     Date: Thu, 14 Jan 1999 00:25:30 PST
6809     From: Vern Paxson <vern>
6810
6811     > It appears that maybe it cannot find the lfl library.
6812
6813     The Makefile in the distribution builds it, so you should have it.
6814     It's exceedingly trivial, just a main() that calls yylex() and
6815     a yyrap() that always returns 1.
6816
6817     > %%
6818     >       \n      ++num_lines; ++num_chars;
6819     >       .       ++num_chars;
6820
6821     You can't indent your rules like this - that's where the errors are coming
6822     from.  Flex copies indented text to the output file, it's how you do things
6823     like
6824
6825     	int num_lines_seen = 0;
6826
6827     to declare local variables.
6828
6829     		Vern
6830
6831
6832File: flex.info,  Node: unnamed-faq-87,  Next: unnamed-faq-88,  Prev: unnamed-faq-86,  Up: FAQ
6833
6834unnamed-faq-87
6835==============
6836
6837
6838     To: Erick Branderhorst <Erick.Branderhorst@asml.nl>
6839     Subject: Re: flex input buffer
6840     In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST.
6841     Date: Tue, 09 Feb 1999 21:03:37 PST
6842     From: Vern Paxson <vern>
6843
6844     > In the flex.skl file the size of the default input buffers is set.  Can you
6845     > explain why this size is set and why it is such a high number.
6846
6847     It's large to optimize performance when scanning large files.  You can
6848     safely make it a lot lower if needed.
6849
6850     		Vern
6851
6852
6853File: flex.info,  Node: unnamed-faq-88,  Next: unnamed-faq-90,  Prev: unnamed-faq-87,  Up: FAQ
6854
6855unnamed-faq-88
6856==============
6857
6858
6859     To: "Guido Minnen" <guidomi@cogs.susx.ac.uk>
6860     Subject: Re: Flex error message
6861     In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST.
6862     Date: Thu, 25 Feb 1999 00:11:31 PST
6863     From: Vern Paxson <vern>
6864
6865     > I'm extending a larger scanner written in Flex and I keep running into
6866     > problems. More specifically, I get the error message:
6867     > "flex: input rules are too complicated (>= 32000 NFA states)"
6868
6869     Increase the definitions in flexdef.h for:
6870
6871     #define JAMSTATE -32766 /* marks a reference to the state that always j
6872     ams */
6873     #define MAXIMUM_MNS 31999
6874     #define BAD_SUBSCRIPT -32767
6875
6876     recompile everything, and it should all work.
6877
6878     		Vern
6879
6880
6881File: flex.info,  Node: unnamed-faq-90,  Next: unnamed-faq-91,  Prev: unnamed-faq-88,  Up: FAQ
6882
6883unnamed-faq-90
6884==============
6885
6886
6887     To: "Dmitriy Goldobin" <gold@ems.chel.su>
6888     Subject: Re: FLEX trouble
6889     In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT.
6890     Date: Tue, 01 Jun 1999 00:15:07 PDT
6891     From: Vern Paxson <vern>
6892
6893     >   I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20
6894     > but rule "/*"(.|\n)*"*/" don't work ?
6895
6896     The second of these will have to scan the entire input stream (because
6897     "(.|\n)*" matches an arbitrary amount of any text) in order to see if
6898     it ends with "*/", terminating the comment.  That potentially will overflow
6899     the input buffer.
6900
6901     >   More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error
6902     > 'unrecognized rule'.
6903
6904     You can't use the '/' operator inside parentheses.  It's not clear
6905     what "(a/b)*" actually means.
6906
6907     >   I now use workaround with state <comment>, but single-rule is
6908     > better, i think.
6909
6910     Single-rule is nice but will always have the problem of either setting
6911     restrictions on comments (like not allowing multi-line comments) and/or
6912     running the risk of consuming the entire input stream, as noted above.
6913
6914     		Vern
6915
6916
6917File: flex.info,  Node: unnamed-faq-91,  Next: unnamed-faq-92,  Prev: unnamed-faq-90,  Up: FAQ
6918
6919unnamed-faq-91
6920==============
6921
6922
6923     Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18])
6924     	by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100
6925     	for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT)
6926     Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999
6927     To: vern@ee.lbl.gov
6928     Date: Tue, 15 Jun 1999 08:55:43 -0700
6929     From: "Aki Niimura" <neko@my-deja.com>
6930     Message-ID: <KNONDOHDOBGAEAAA@my-deja.com>
6931     Mime-Version: 1.0
6932     Cc:
6933     X-Sent-Mail: on
6934     Reply-To:
6935     X-Mailer: MailCity Service
6936     Subject: A question on flex C++ scanner
6937     X-Sender-Ip: 12.72.207.61
6938     Organization: My Deja Email  (http://www.my-deja.com:80)
6939     Content-Type: text/plain; charset=us-ascii
6940     Content-Transfer-Encoding: 7bit
6941
6942     Dear Dr. Paxon,
6943
6944     I have been using flex for years.
6945     It works very well on many projects.
6946     Most case, I used it to generate a scanner on C language.
6947     However, one project I needed to generate  a scanner
6948     on C++ lanuage. Thanks to your enhancement, flex did
6949     the job.
6950
6951     Currently, I'm working on enhancing my previous project.
6952     I need to deal with multiple input streams (recursive
6953     inclusion) in this scanner (C++).
6954     I did similar thing for another scanner (C) as you
6955     explained in your documentation.
6956
6957     The generated scanner (C++) has necessary methods:
6958     - switch_to_buffer(struct yy_buffer_state *b)
6959     - yy_create_buffer(istream *is, int sz)
6960     - yy_delete_buffer(struct yy_buffer_state *b)
6961
6962     However, I couldn't figure out how to access current
6963     buffer (yy_current_buffer).
6964
6965     yy_current_buffer is a protected member of yyFlexLexer.
6966     I can't access it directly.
6967     Then, I thought yy_create_buffer() with is = 0 might
6968     return current stream buffer. But it seems not as far
6969     as I checked the source. (flex 2.5.4)
6970
6971     I went through the Web in addition to Flex documentation.
6972     However, it hasn't been successful, so far.
6973
6974     It is not my intention to bother you, but, can you
6975     comment about how to obtain the current stream buffer?
6976
6977     Your response would be highly appreciated.
6978
6979     Best regards,
6980     Aki Niimura
6981
6982     --== Sent via Deja.com http://www.deja.com/ ==--
6983     Share what you know. Learn what you don't.
6984
6985
6986File: flex.info,  Node: unnamed-faq-92,  Next: unnamed-faq-93,  Prev: unnamed-faq-91,  Up: FAQ
6987
6988unnamed-faq-92
6989==============
6990
6991
6992     To: neko@my-deja.com
6993     Subject: Re: A question on flex C++ scanner
6994     In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT.
6995     Date: Tue, 15 Jun 1999 09:04:24 PDT
6996     From: Vern Paxson <vern>
6997
6998     > However, I couldn't figure out how to access current
6999     > buffer (yy_current_buffer).
7000
7001     Derive your own subclass from yyFlexLexer.
7002
7003     		Vern
7004
7005
7006File: flex.info,  Node: unnamed-faq-93,  Next: unnamed-faq-94,  Prev: unnamed-faq-92,  Up: FAQ
7007
7008unnamed-faq-93
7009==============
7010
7011
7012     To: "Stones, Darren" <Darren.Stones@nectech.co.uk>
7013     Subject: Re: You're the man to see?
7014     In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT.
7015     Date: Wed, 23 Jun 1999 09:01:40 PDT
7016     From: Vern Paxson <vern>
7017
7018     > I hope you can help me.  I am using Flex and Bison to produce an interpreted
7019     > language.  However all goes well until I try to implement an IF statement or
7020     > a WHILE.  I cannot get this to work as the parser parses all the conditions
7021     > eg. the TRUE and FALSE conditons to check for a rule match.  So I cannot
7022     > make a decision!!
7023
7024     You need to use the parser to build a parse tree (= abstract syntax trwee),
7025     and when that's all done you recursively evaluate the tree, binding variables
7026     to values at that time.
7027
7028     		Vern
7029
7030
7031File: flex.info,  Node: unnamed-faq-94,  Next: unnamed-faq-95,  Prev: unnamed-faq-93,  Up: FAQ
7032
7033unnamed-faq-94
7034==============
7035
7036
7037     To: Petr Danecek <petr@ics.cas.cz>
7038     Subject: Re: flex - question
7039     In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT.
7040     Date: Fri, 02 Jul 1999 16:52:13 PDT
7041     From: Vern Paxson <vern>
7042
7043     > file, it takes an enormous amount of time. It is funny, because the
7044     > source code has only 12 rules!!! I think it looks like an exponencial
7045     > growth.
7046
7047     Right, that's the problem - some patterns (those with a lot of
7048     ambiguity, where yours has because at any given time the scanner can
7049     be in the middle of all sorts of combinations of the different
7050     rules) blow up exponentially.
7051
7052     For your rules, there is an easy fix.  Change the ".*" that comes fater
7053     the directory name to "[^ ]*".  With that in place, the rules are no
7054     longer nearly so ambiguous, because then once one of the directories
7055     has been matched, no other can be matched (since they all require a
7056     leading blank).
7057
7058     If that's not an acceptable solution, then you can enter a start state
7059     to pick up the .*\n after each directory is matched.
7060
7061     Also note that for speed, you'll want to add a ".*" rule at the end,
7062     otherwise rules that don't match any of the patterns will be matched
7063     very slowly, a character at a time.
7064
7065     		Vern
7066
7067
7068File: flex.info,  Node: unnamed-faq-95,  Next: unnamed-faq-96,  Prev: unnamed-faq-94,  Up: FAQ
7069
7070unnamed-faq-95
7071==============
7072
7073
7074     To: Tielman Koekemoer <tielman@spi.co.za>
7075     Subject: Re: Please help.
7076     In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT.
7077     Date: Thu, 08 Jul 1999 08:20:39 PDT
7078     From: Vern Paxson <vern>
7079
7080     > I was hoping you could help me with my problem.
7081     >
7082     > I tried compiling (gnu)flex on a Solaris 2.4 machine
7083     > but when I ran make (after configure) I got an error.
7084     >
7085     > --------------------------------------------------------------
7086     > gcc -c -I. -I. -g -O parse.c
7087     > ./flex -t -p  ./scan.l >scan.c
7088     > sh: ./flex: not found
7089     > *** Error code 1
7090     > make: Fatal error: Command failed for target `scan.c'
7091     > -------------------------------------------------------------
7092     >
7093     > What's strange to me is that I'm only
7094     > trying to install flex now. I then edited the Makefile to
7095     > and changed where it says "FLEX = flex" to "FLEX = lex"
7096     > ( lex: the native Solaris one ) but then it complains about
7097     > the "-p" option. Is there any way I can compile flex without
7098     > using flex or lex?
7099     >
7100     > Thanks so much for your time.
7101
7102     You managed to step on the bootstrap sequence, which first copies
7103     initscan.c to scan.c in order to build flex.  Try fetching a fresh
7104     distribution from ftp.ee.lbl.gov.  (Or you can first try removing
7105     ".bootstrap" and doing a make again.)
7106
7107     		Vern
7108
7109
7110File: flex.info,  Node: unnamed-faq-96,  Next: unnamed-faq-97,  Prev: unnamed-faq-95,  Up: FAQ
7111
7112unnamed-faq-96
7113==============
7114
7115
7116     To: Tielman Koekemoer <tielman@spi.co.za>
7117     Subject: Re: Please help.
7118     In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT.
7119     Date: Fri, 09 Jul 1999 00:27:20 PDT
7120     From: Vern Paxson <vern>
7121
7122     > First I removed .bootstrap (and ran make) - no luck. I downloaded the
7123     > software but I still have the same problem. Is there anything else I
7124     > could try.
7125
7126     Try:
7127
7128     	cp initscan.c scan.c
7129     	touch scan.c
7130     	make scan.o
7131
7132     If this last tries to first build scan.c from scan.l using ./flex, then
7133     your "make" is broken, in which case compile scan.c to scan.o by hand.
7134
7135     		Vern
7136
7137
7138File: flex.info,  Node: unnamed-faq-97,  Next: unnamed-faq-98,  Prev: unnamed-faq-96,  Up: FAQ
7139
7140unnamed-faq-97
7141==============
7142
7143
7144     To: Sumanth Kamenani <skamenan@crl.nmsu.edu>
7145     Subject: Re: Error
7146     In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT.
7147     Date: Tue, 20 Jul 1999 00:18:26 PDT
7148     From: Vern Paxson <vern>
7149
7150     > I am getting a compilation error. The error is given as "unknown symbol- yylex".
7151
7152     The parser relies on calling yylex(), but you're instead using the C++ scanning
7153     class, so you need to supply a yylex() "glue" function that calls an instance
7154     scanner of the scanner (e.g., "scanner->yylex()").
7155
7156     		Vern
7157
7158
7159File: flex.info,  Node: unnamed-faq-98,  Next: unnamed-faq-99,  Prev: unnamed-faq-97,  Up: FAQ
7160
7161unnamed-faq-98
7162==============
7163
7164
7165     To: daniel@synchrods.synchrods.COM (Daniel Senderowicz)
7166     Subject: Re: lex
7167     In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST.
7168     Date: Tue, 23 Nov 1999 15:54:30 PST
7169     From: Vern Paxson <vern>
7170
7171     Well, your problem is the
7172
7173     switch (yybgin-yysvec-1) {      /* witchcraft */
7174
7175     at the beginning of lex rules.  "witchcraft" == "non-portable".  It's
7176     assuming knowledge of the AT&T lex's internal variables.
7177
7178     For flex, you can probably do the equivalent using a switch on YYSTATE.
7179
7180     		Vern
7181
7182
7183File: flex.info,  Node: unnamed-faq-99,  Next: unnamed-faq-100,  Prev: unnamed-faq-98,  Up: FAQ
7184
7185unnamed-faq-99
7186==============
7187
7188
7189     To: archow@hss.hns.com
7190     Subject: Re: Regarding distribution of flex and yacc based grammars
7191     In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530.
7192     Date: Wed, 22 Dec 1999 01:56:24 PST
7193     From: Vern Paxson <vern>
7194
7195     > When we provide the customer with an object code distribution, is it
7196     > necessary for us to provide source
7197     > for the generated C files from flex and bison since they are generated by
7198     > flex and bison ?
7199
7200     For flex, no.  I don't know what the current state of this is for bison.
7201
7202     > Also, is there any requrirement for us to neccessarily  provide source for
7203     > the grammar files which are fed into flex and bison ?
7204
7205     Again, for flex, no.
7206
7207     See the file "COPYING" in the flex distribution for the legalese.
7208
7209     		Vern
7210
7211
7212File: flex.info,  Node: unnamed-faq-100,  Next: unnamed-faq-101,  Prev: unnamed-faq-99,  Up: FAQ
7213
7214unnamed-faq-100
7215===============
7216
7217
7218     To: Martin Gallwey <gallweym@hyperion.moe.ul.ie>
7219     Subject: Re: Flex, and self referencing rules
7220     In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST.
7221     Date: Sat, 19 Feb 2000 18:33:16 PST
7222     From: Vern Paxson <vern>
7223
7224     > However, I do not use unput anywhere. I do use self-referencing
7225     > rules like this:
7226     >
7227     > UnaryExpr               ({UnionExpr})|("-"{UnaryExpr})
7228
7229     You can't do this - flex is *not* a parser like yacc (which does indeed
7230     allow recursion), it is a scanner that's confined to regular expressions.
7231
7232     		Vern
7233
7234
7235File: flex.info,  Node: unnamed-faq-101,  Next: What is the difference between YYLEX_PARAM and YY_DECL?,  Prev: unnamed-faq-100,  Up: FAQ
7236
7237unnamed-faq-101
7238===============
7239
7240
7241     To: slg3@lehigh.edu (SAMUEL L. GULDEN)
7242     Subject: Re: Flex problem
7243     In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST.
7244     Date: Thu, 02 Mar 2000 23:00:46 PST
7245     From: Vern Paxson <vern>
7246
7247     If this is exactly your program:
7248
7249     > digit [0-9]
7250     > digits {digit}+
7251     > whitespace [ \t\n]+
7252     >
7253     > %%
7254     > "[" { printf("open_brac\n");}
7255     > "]" { printf("close_brac\n");}
7256     > "+" { printf("addop\n");}
7257     > "*" { printf("multop\n");}
7258     > {digits} { printf("NUMBER = %s\n", yytext);}
7259     > whitespace ;
7260
7261     then the problem is that the last rule needs to be "{whitespace}" !
7262
7263     		Vern
7264
7265
7266File: flex.info,  Node: What is the difference between YYLEX_PARAM and YY_DECL?,  Next: Why do I get "conflicting types for yylex" error?,  Prev: unnamed-faq-101,  Up: FAQ
7267
7268What is the difference between YYLEX_PARAM and YY_DECL?
7269=======================================================
7270
7271YYLEX_PARAM is not a flex symbol. It is for Bison. It tells Bison to
7272pass extra params when it calls yylex() from the parser.
7273
7274   YY_DECL is the Flex declaration of yylex. The default is similar to
7275this:
7276
7277
7278     #define int yy_lex ()
7279
7280
7281File: flex.info,  Node: Why do I get "conflicting types for yylex" error?,  Next: How do I access the values set in a Flex action from within a Bison action?,  Prev: What is the difference between YYLEX_PARAM and YY_DECL?,  Up: FAQ
7282
7283Why do I get "conflicting types for yylex" error?
7284=================================================
7285
7286This is a compiler error regarding a generated Bison parser, not a Flex
7287scanner.  It means you need a prototype of yylex() in the top of the
7288Bison file.  Be sure the prototype matches YY_DECL.
7289
7290
7291File: flex.info,  Node: How do I access the values set in a Flex action from within a Bison action?,  Prev: Why do I get "conflicting types for yylex" error?,  Up: FAQ
7292
7293How do I access the values set in a Flex action from within a Bison action?
7294===========================================================================
7295
7296With $1, $2, $3, etc. These are called "Semantic Values" in the Bison
7297manual.  See *Note Top: (bison)Top.
7298
7299
7300File: flex.info,  Node: Appendices,  Next: Indices,  Prev: FAQ,  Up: Top
7301
7302Appendix A Appendices
7303*********************
7304
7305* Menu:
7306
7307* Makefiles and Flex::
7308* Bison Bridge::
7309* M4 Dependency::
7310* Common Patterns::
7311
7312
7313File: flex.info,  Node: Makefiles and Flex,  Next: Bison Bridge,  Prev: Appendices,  Up: Appendices
7314
7315A.1 Makefiles and Flex
7316======================
7317
7318In this appendix, we provide tips for writing Makefiles to build your
7319scanners.
7320
7321   In a traditional build environment, we say that the `.c' files are
7322the sources, and the `.o' files are the intermediate files. When using
7323`flex', however, the `.l' files are the sources, and the generated `.c'
7324files (along with the `.o' files) are the intermediate files.  This
7325requires you to carefully plan your Makefile.
7326
7327   Modern `make' programs understand that `foo.l' is intended to
7328generate `lex.yy.c' or `foo.c', and will behave accordingly(1)(2).  The
7329following Makefile does not explicitly instruct `make' how to build
7330`foo.c' from `foo.l'. Instead, it relies on the implicit rules of the
7331`make' program to build the intermediate file, `scan.c':
7332
7333
7334         # Basic Makefile -- relies on implicit rules
7335         # Creates "myprogram" from "scan.l" and "myprogram.c"
7336         #
7337         LEX=flex
7338         myprogram: scan.o myprogram.o
7339         scan.o: scan.l
7340
7341   For simple cases, the above may be sufficient. For other cases, you
7342may have to explicitly instruct `make' how to build your scanner.  The
7343following is an example of a Makefile containing explicit rules:
7344
7345
7346         # Basic Makefile -- provides explicit rules
7347         # Creates "myprogram" from "scan.l" and "myprogram.c"
7348         #
7349         LEX=flex
7350         myprogram: scan.o myprogram.o
7351                 $(CC) -o $@  $(LDFLAGS) $^
7352
7353         myprogram.o: myprogram.c
7354                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7355
7356         scan.o: scan.c
7357                 $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^
7358
7359         scan.c: scan.l
7360                 $(LEX) $(LFLAGS) -o $@ $^
7361
7362         clean:
7363                 $(RM) *.o scan.c
7364
7365   Notice in the above example that `scan.c' is in the `clean' target.
7366This is because we consider the file `scan.c' to be an intermediate
7367file.
7368
7369   Finally, we provide a realistic example of a `flex' scanner used
7370with a `bison' parser(3).  There is a tricky problem we have to deal
7371with. Since a `flex' scanner will typically include a header file
7372(e.g., `y.tab.h') generated by the parser, we need to be sure that the
7373header file is generated BEFORE the scanner is compiled. We handle this
7374case in the following example:
7375
7376
7377         # Makefile example -- scanner and parser.
7378         # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c"
7379         #
7380         LEX     = flex
7381         YACC    = bison -y
7382         YFLAGS  = -d
7383         objects = scan.o parse.o myprogram.o
7384
7385         myprogram: $(objects)
7386         scan.o: scan.l parse.c
7387         parse.o: parse.y
7388         myprogram.o: myprogram.c
7389
7390   In the above example, notice the line,
7391
7392
7393         scan.o: scan.l parse.c
7394
7395   , which lists the file `parse.c' (the generated parser) as a
7396dependency of `scan.o'. We want to ensure that the parser is created
7397before the scanner is compiled, and the above line seems to do the
7398trick. Feel free to experiment with your specific implementation of
7399`make'.
7400
7401   For more details on writing Makefiles, see *Note Top: (make)Top.
7402
7403   ---------- Footnotes ----------
7404
7405   (1) GNU `make' and GNU `automake' are two such programs that provide
7406implicit rules for flex-generated scanners.
7407
7408   (2) GNU `automake' may generate code to execute flex in
7409lex-compatible mode, or to stdout. If this is not what you want, then
7410you should provide an explicit rule in your Makefile.am
7411
7412   (3) This example also applies to yacc parsers.
7413
7414
7415File: flex.info,  Node: Bison Bridge,  Next: M4 Dependency,  Prev: Makefiles and Flex,  Up: Appendices
7416
7417A.2 C Scanners with Bison Parsers
7418=================================
7419
7420This section describes the `flex' features useful when integrating
7421`flex' with `GNU bison'(1).  Skip this section if you are not using
7422`bison' with your scanner.  Here we discuss only the `flex' half of the
7423`flex' and `bison' pair.  We do not discuss `bison' in any detail.  For
7424more information about generating `bison' parsers, see *Note Top:
7425(bison)Top.
7426
7427   A compatible `bison' scanner is generated by declaring `%option
7428bison-bridge' or by supplying `--bison-bridge' when invoking `flex'
7429from the command line.  This instructs `flex' that the macro `yylval'
7430may be used. The data type for `yylval', `YYSTYPE', is typically
7431defined in a header file, included in section 1 of the `flex' input
7432file.  For a list of functions and macros available, *Note
7433bison-functions::.
7434
7435   The declaration of yylex becomes,
7436
7437
7438           int yylex ( YYSTYPE * lvalp, yyscan_t scanner );
7439
7440   If `%option bison-locations' is specified, then the declaration
7441becomes,
7442
7443
7444           int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner );
7445
7446   Note that the macros `yylval' and `yylloc' evaluate to pointers.
7447Support for `yylloc' is optional in `bison', so it is optional in
7448`flex' as well. The following is an example of a `flex' scanner that is
7449compatible with `bison'.
7450
7451
7452         /* Scanner for "C" assignment statements... sort of. */
7453         %{
7454         #include "y.tab.h"  /* Generated by bison. */
7455         %}
7456
7457         %option bison-bridge bison-locations
7458         %
7459
7460         [[:digit:]]+  { yylval->num = atoi(yytext);   return NUMBER;}
7461         [[:alnum:]]+  { yylval->str = strdup(yytext); return STRING;}
7462         "="|";"       { return yytext[0];}
7463         .  {}
7464         %
7465
7466   As you can see, there really is no magic here. We just use `yylval'
7467as we would any other variable. The data type of `yylval' is generated
7468by `bison', and included in the file `y.tab.h'. Here is the
7469corresponding `bison' parser:
7470
7471
7472         /* Parser to convert "C" assignments to lisp. */
7473         %{
7474         /* Pass the argument to yyparse through to yylex. */
7475         #define YYPARSE_PARAM scanner
7476         #define YYLEX_PARAM   scanner
7477         %}
7478         %locations
7479         %pure_parser
7480         %union {
7481             int num;
7482             char* str;
7483         }
7484         %token <str> STRING
7485         %token <num> NUMBER
7486         %%
7487         assignment:
7488             STRING '=' NUMBER ';' {
7489                 printf( "(setf %s %d)", $1, $3 );
7490            }
7491         ;
7492
7493   ---------- Footnotes ----------
7494
7495   (1) The features described here are purely optional, and are by no
7496means the only way to use flex with bison.  We merely provide some glue
7497to ease development of your parser-scanner pair.
7498
7499
7500File: flex.info,  Node: M4 Dependency,  Next: Common Patterns,  Prev: Bison Bridge,  Up: Appendices
7501
7502A.3 M4 Dependency
7503=================
7504
7505The macro processor `m4'(1) must be installed wherever flex is
7506installed.  `flex' invokes `m4', found by searching the directories in
7507the `PATH' environment variable. Any code you place in section 1 or in
7508the actions will be sent through m4. Please follow these rules to
7509protect your code from unwanted `m4' processing.
7510
7511   * Do not use symbols that begin with, `m4_', such as, `m4_define',
7512     or `m4_include', since those are reserved for `m4' macro names. If
7513     for some reason you need m4_ as a prefix, use a preprocessor
7514     #define to get your symbol past m4 unmangled.
7515
7516   * Do not use the strings `[[' or `]]' anywhere in your code. The
7517     former is not valid in C, except within comments and strings, but
7518     the latter is valid in code such as `x[y[z]]'. The solution is
7519     simple. To get the literal string `"]]"', use `"]""]"'. To get the
7520     array notation `x[y[z]]', use `x[y[z] ]'. Flex will attempt to
7521     detect these sequences in user code, and escape them. However,
7522     it's best to avoid this complexity where possible, by removing
7523     such sequences from your code.
7524
7525
7526   `m4' is only required at the time you run `flex'. The generated
7527scanner is ordinary C or C++, and does _not_ require `m4'.
7528
7529   ---------- Footnotes ----------
7530
7531   (1) The use of m4 is subject to change in future revisions of flex.
7532It is not part of the public API of flex. Do not depend on it.
7533
7534
7535File: flex.info,  Node: Common Patterns,  Prev: M4 Dependency,  Up: Appendices
7536
7537A.4 Common Patterns
7538===================
7539
7540This appendix provides examples of common regular expressions you might
7541use in your scanner.
7542
7543* Menu:
7544
7545* Numbers::
7546* Identifiers::
7547* Quoted Constructs::
7548* Addresses::
7549
7550
7551File: flex.info,  Node: Numbers,  Next: Identifiers,  Up: Common Patterns
7552
7553A.4.1 Numbers
7554-------------
7555
7556C99 decimal constant
7557     `([[:digit:]]{-}[0])[[:digit:]]*'
7558
7559C99 hexadecimal constant
7560     `0[xX][[:xdigit:]]+'
7561
7562C99 octal constant
7563     `0[0123456]*'
7564
7565C99 floating point constant
7566
7567      {dseq}      ([[:digit:]]+)
7568      {dseq_opt}  ([[:digit:]]*)
7569      {frac}      (({dseq_opt}"."{dseq})|{dseq}".")
7570      {exp}       ([eE][+-]?{dseq})
7571      {exp_opt}   ({exp}?)
7572      {fsuff}     [flFL]
7573      {fsuff_opt} ({fsuff}?)
7574      {hpref}     (0[xX])
7575      {hdseq}     ([[:xdigit:]]+)
7576      {hdseq_opt} ([[:xdigit:]]*)
7577      {hfrac}     (({hdseq_opt}"."{hdseq})|({hdseq}"."))
7578      {bexp}      ([pP][+-]?{dseq})
7579      {dfc}       (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt}))
7580      {hfc}       (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt}))
7581
7582      {c99_floating_point_constant}  ({dfc}|{hfc})
7583
7584     See C99 section 6.4.4.2 for the gory details.
7585
7586
7587
7588File: flex.info,  Node: Identifiers,  Next: Quoted Constructs,  Prev: Numbers,  Up: Common Patterns
7589
7590A.4.2 Identifiers
7591-----------------
7592
7593C99 Identifier
7594
7595     ucn        ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))
7596     nondigit    [_[:alpha:]]
7597     c99_id     ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})*
7598
7599     Technically, the above pattern does not encompass all possible C99
7600     identifiers, since C99 allows for "implementation-defined"
7601     characters. In practice, C compilers follow the above pattern,
7602     with the addition of the `$' character.
7603
7604UTF-8 Encoded Unicode Code Point
7605
7606     [\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
7607
7608
7609
7610File: flex.info,  Node: Quoted Constructs,  Next: Addresses,  Prev: Identifiers,  Up: Common Patterns
7611
7612A.4.3 Quoted Constructs
7613-----------------------
7614
7615C99 String Literal
7616     `L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"'
7617
7618C99 Comment
7619     `("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)'
7620
7621     Note that in C99, a `//'-style comment may be split across lines,
7622     and, contrary to popular belief, does not include the trailing
7623     `\n' character.
7624
7625     A better way to scan `/* */' comments is by line, rather than
7626     matching possibly huge comments all at once. This will allow you
7627     to scan comments of unlimited length, as long as line breaks
7628     appear at sane intervals. This is also more efficient when used
7629     with automatic line number processing. *Note option-yylineno::.
7630
7631
7632     <INITIAL>{
7633         "/*"      BEGIN(COMMENT);
7634     }
7635     <COMMENT>{
7636         "*/"      BEGIN(0);
7637         [^*\n]+   ;
7638         "*"[^/]   ;
7639         \n        ;
7640     }
7641
7642
7643
7644File: flex.info,  Node: Addresses,  Prev: Quoted Constructs,  Up: Common Patterns
7645
7646A.4.4 Addresses
7647---------------
7648
7649IPv4 Address
7650     `(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))'
7651
7652IPv6 Address
7653
7654     hex4         ([[:xdigit:]]{1,4})
7655     hexseq       ({hex4}(:{hex4}*))
7656     hexpart      ({hexseq}|({hexseq}::({hexseq}?))|::{hexseq})
7657     IPv6address  ({hexpart}(":"{IPv4address})?)
7658
7659     See RFC2373 for details.
7660
7661URI
7662     `(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?'
7663
7664     This pattern is nearly useless, since it allows just about any
7665     character to appear in a URI, including spaces and control
7666     characters.  See RFC2396 for details.
7667
7668
7669
7670File: flex.info,  Node: Indices,  Prev: Appendices,  Up: Top
7671
7672Indices
7673*******
7674
7675* Menu:
7676
7677* Concept Index::
7678* Index of Functions and Macros::
7679* Index of Variables::
7680* Index of Data Types::
7681* Index of Hooks::
7682* Index of Scanner Options::
7683
7684