16. Scanner Options

The various flex options are categorized by function in the following menu. If you want to lookup a particular option by name, See section Index of Scanner Options.

Even though there are many scanner options, a typical scanner might only specify the following options:

%option   8bit reentrant bison-bridge
%option   warn nodefault
%option   yylineno
%option   outfile="scanner.c" header-file="scanner.h"

The first line specifies the general type of scanner we want. The second line specifies that we are being careful. The third line asks flex to track line numbers. The last line tells flex what to name the files. (The options can be specified in any order. We just divided them.)

flex also provides a mechanism for controlling options within the scanner specification itself, rather than from the flex command-line. This is done by including %option directives in the first section of the scanner specification. You can specify multiple options with a single %option directive, and multiple directives in the first section of your flex input file.

Most options are given simply as names, optionally preceded by the word ‘no’ (with no intervening whitespace) to negate their meaning. The names are the same as their long-option equivalents (but without the leading ‘--’ ).

flex scans your rule actions to determine whether you use the REJECT or yymore() features. The REJECT and yymore options are available to override its decision as to whether you use the options, either by setting them (e.g., %option reject) to indicate the feature is indeed used, or unsetting them to indicate it actually is not used (e.g., %option noyymore).

A number of options are available for lint purists who want to suppress the appearance of unneeded routines in the generated scanner. Each of the following, if unset (e.g., %option nounput), results in the corresponding routine not appearing in the generated scanner:

    input, unput
    yy_push_state, yy_pop_state, yy_top_state
    yy_scan_buffer, yy_scan_bytes, yy_scan_string

    yyget_extra, yyset_extra, yyget_leng, yyget_text,
    yyget_lineno, yyset_lineno, yyget_in, yyset_in,
    yyget_out, yyset_out, yyget_lval, yyset_lval,
    yyget_lloc, yyset_lloc, yyget_debug, yyset_debug

(though yy_push_state() and friends won’t appear anyway unless you use %option stack).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

16.1 Options for Specifying Filenames

‘--header-file=FILE, %option header-file="FILE"’

instructs flex to write a C header to ‘FILE’. This file contains function prototypes, extern variables, and types used by the scanner. Only the external API is exported by the header file. Many macros that are usable from within scanner actions are not exported to the header file. This is due to namespace problems and the goal of a clean external API.

While in the header, the macro yyIN_HEADER is defined, where ‘yy’ is substituted with the appropriate prefix.

The ‘--header-file’ option is not compatible with the ‘--c++’ option, since the C++ scanner provides its own header in ‘yyFlexLexer.h’.

‘-oFILE, --outfile=FILE, %option outfile="FILE"’

directs flex to write the scanner to the file ‘FILE’ instead of ‘lex.yy.c’. If you combine ‘--outfile’ with the ‘--stdout’ option, then the scanner is written to ‘stdout’ but its #line directives (see the ‘-l’ option above) refer to the file ‘FILE’.

‘-t, --stdout, %option stdout’

instructs flex to write the scanner it generates to standard output instead of ‘lex.yy.c’.

‘-SFILE, --skel=FILE’

overrides the default skeleton file from which flex constructs its scanners. You’ll never need this option unless you are doing flex maintenance or development.

‘--tables-file=FILE’

Write serialized scanner dfa tables to FILE. The generated scanner will not contain the tables, and requires them to be loaded at runtime. See serialization.

‘--tables-verify’

This option is for flex development. We document it here in case you stumble upon it by accident or in case you suspect some inconsistency in the serialized tables. Flex will serialize the scanner dfa tables but will also generate the in-code tables as it normally does. At runtime, the scanner will verify that the serialized tables match the in-code tables, instead of loading them.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

16.2 Options Affecting Scanner Behavior

‘-i, --case-insensitive, %option case-insensitive’

instructs flex to generate a case-insensitive scanner. The case of letters given in the flex input patterns will be ignored, and tokens in the input will be matched regardless of case. The matched text given in yytext will have the preserved case (i.e., it will not be folded). For tricky behavior, see case and character ranges.

‘-l, --lex-compat, %option lex-compat’

turns on maximum compatibility with the original AT&T lex implementation. Note that this does not mean full compatibility. Use of this option costs a considerable amount of performance, and it cannot be used with the ‘--c++’, ‘--full’, ‘--fast’, ‘-Cf’, or ‘-CF’ options. For details on the compatibilities it provides, see Incompatibilities with Lex and Posix. This option also results in the name YY_FLEX_LEX_COMPAT being #define’d in the generated scanner.

‘-B, --batch, %option batch’

instructs flex to generate a batch scanner, the opposite of interactive scanners generated by ‘--interactive’ (see below). In general, you use ‘-B’ when you are certain that your scanner will never be used interactively, and you want to squeeze a little more performance out of it. If your goal is instead to squeeze out a lot more performance, you should be using the ‘-Cf’ or ‘-CF’ options, which turn on ‘--batch’ automatically anyway.

‘-I, --interactive, %option interactive’

instructs flex to generate an interactive scanner. An interactive scanner is one that only looks ahead to decide what token has been matched if it absolutely must. It turns out that always looking one extra character ahead, even if the scanner has already seen enough text to disambiguate the current token, is a bit faster than only looking ahead when necessary. But scanners that always look ahead give dreadful interactive performance; for example, when a user types a newline, it is not recognized as a newline token until they enter another token, which often means typing in another whole line.

flex scanners default to interactive unless you use the ‘-Cf’ or ‘-CF’ table-compression options (see section Performance Considerations). That’s because if you’re looking for high-performance you should be using one of these options, so if you didn’t, flex assumes you’d rather trade off a bit of run-time performance for intuitive interactive behavior. Note also that you cannot use ‘--interactive’ in conjunction with ‘-Cf’ or ‘-CF’. Thus, this option is not really needed; it is on by default for all those cases in which it is allowed.

You can force a scanner to not be interactive by using ‘--batch’

‘-7, --7bit, %option 7bit’

instructs flex to generate a 7-bit scanner, i.e., one which can only recognize 7-bit characters in its input. The advantage of using ‘--7bit’ is that the scanner’s tables can be up to half the size of those generated using the ‘--8bit’. The disadvantage is that such scanners often hang or crash if their input contains an 8-bit character.

Note, however, that unless you generate your scanner using the ‘-Cf’ or ‘-CF’ table compression options, use of ‘--7bit’ will save only a small amount of table space, and make your scanner considerably less portable. Flex’s default behavior is to generate an 8-bit scanner unless you use the ‘-Cf’ or ‘-CF’, in which case flex defaults to generating 7-bit scanners unless your site was always configured to generate 8-bit scanners (as will often be the case with non-USA sites). You can tell whether flex generated a 7-bit or an 8-bit scanner by inspecting the flag summary in the ‘--verbose’ output as described above.

Note that if you use ‘-Cfe’ or ‘-CFe’ flex still defaults to generating an 8-bit scanner, since usually with these compression options full 8-bit tables are not much more expensive than 7-bit tables.

‘-8, --8bit, %option 8bit’

instructs flex to generate an 8-bit scanner, i.e., one which can recognize 8-bit characters. This flag is only needed for scanners generated using ‘-Cf’ or ‘-CF’, as otherwise flex defaults to generating an 8-bit scanner anyway.

See the discussion of ‘--7bit’ above for flex’s default behavior and the tradeoffs between 7-bit and 8-bit scanners.

‘--default, %option default’

generate the default rule.

‘--always-interactive, %option always-interactive’

instructs flex to generate a scanner which always considers its input interactive. Normally, on each new input file the scanner calls isatty() in an attempt to determine whether the scanner’s input source is interactive and thus should be read a character at a time. When this option is used, however, then no such call is made.

‘--never-interactive, --never-interactive’

instructs flex to generate a scanner which never considers its input interactive. This is the opposite of always-interactive.

‘-X, --posix, %option posix’

turns on maximum compatibility with the POSIX 1003.2-1992 definition of lex. Since flex was originally designed to implement the POSIX definition of lex this generally involves very few changes in behavior. At the current writing the known differences between flex and the POSIX standard are:

In POSIX and AT&T lex, the repeat operator, ‘{}’, has lower precedence than concatenation (thus ‘ab{3}’ yields ‘ababab’). Most POSIX utilities use an Extended Regular Expression (ERE) precedence that has the precedence of the repeat operator higher than concatenation (which causes ‘ab{3}’ to yield ‘abbb’). By default, flex places the precedence of the repeat operator higher than concatenation which matches the ERE processing of other POSIX utilities. When either ‘--posix’ or ‘-l’ are specified, flex will use the traditional AT&T and POSIX-compliant precedence for the repeat operator where concatenation has higher precedence than the repeat operator.

‘--stack, %option stack’

enables the use of start condition stacks (see section Start Conditions).

‘--stdinit, %option stdinit’

if set (i.e., %option stdinit) initializes yyin and yyout to ‘stdin’ and ‘stdout’, instead of the default of ‘NULL’. Some existing lex programs depend on this behavior, even though it is not compliant with ANSI C, which does not require ‘stdin’ and ‘stdout’ to be compile-time constant. In a reentrant scanner, however, this is not a problem since initialization is performed in yylex_init at runtime.

‘--yylineno, %option yylineno’

directs flex to generate a scanner that maintains the number of the current line read from its input in the global variable yylineno. This option is implied by %option lex-compat. In a reentrant C scanner, the macro yylineno is accessible regardless of the value of %option yylineno, however, its value is not modified by flex unless %option yylineno is enabled.

‘--yywrap, %option yywrap’

if unset (i.e., --noyywrap), makes the scanner not call yywrap() upon an end-of-file, but simply assume that there are no more files to scan (until the user points ‘yyin’ at a new file and calls yylex() again).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

16.3 Code-Level And API Options

‘--ansi-definitions, %option ansi-definitions’

instruct flex to generate ANSI C99 definitions for functions. This option is enabled by default. If %option noansi-definitions is specified, then the obsolete style is generated.

‘--ansi-prototypes, %option ansi-prototypes’

instructs flex to generate ANSI C99 prototypes for functions. This option is enabled by default. If noansi-prototypes is specified, then prototypes will have empty parameter lists.

‘--bison-bridge, %option bison-bridge’

instructs flex to generate a C scanner that is meant to be called by a GNU bison parser. The scanner has minor API changes for bison compatibility. In particular, the declaration of yylex is modified to take an additional parameter, yylval. See section C Scanners with Bison Parsers.

‘--bison-locations, %option bison-locations’

instruct flex that GNU bison %locations are being used. This means yylex will be passed an additional parameter, yylloc. This option implies %option bison-bridge. See section C Scanners with Bison Parsers.

‘-L, --noline, %option noline’

instructs flex not to generate #line directives. Without this option, flex peppers the generated scanner with #line directives so error messages in the actions will be correctly located with respect to either the original flex input file (if the errors are due to code in the input file), or ‘lex.yy.c’ (if the errors are flex’s fault – you should report these sorts of errors to the email address given in Reporting Bugs).

‘-R, --reentrant, %option reentrant’

instructs flex to generate a reentrant C scanner. The generated scanner may safely be used in a multi-threaded environment. The API for a reentrant scanner is different than for a non-reentrant scanner see section Reentrant C Scanners). Because of the API difference between reentrant and non-reentrant flex scanners, non-reentrant flex code must be modified before it is suitable for use with this option. This option is not compatible with the ‘--c++’ option.

The option ‘--reentrant’ does not affect the performance of the scanner.

‘-+, --c++, %option c++’

specifies that you want flex to generate a C++ scanner class. See section Generating C++ Scanners, for details.

‘--array, %option array’

specifies that you want yytext to be an array instead of a char*

‘--pointer, %option pointer’

specify that yytext should be a char *, not an array. This default is char *.

‘-PPREFIX, --prefix=PREFIX, %option prefix="PREFIX"’

changes the default ‘yy’ prefix used by flex for all globally-visible variable and function names to instead be ‘PREFIX’. For example, ‘--prefix=foo’ changes the name of yytext to footext. It also changes the name of the default output file from ‘lex.yy.c’ to ‘lex.foo.c’. Here is a partial list of the names affected:

    yy_create_buffer
    yy_delete_buffer
    yy_flex_debug
    yy_init_buffer
    yy_flush_buffer
    yy_load_buffer_state
    yy_switch_to_buffer
    yyin
    yyleng
    yylex
    yylineno
    yyout
    yyrestart
    yytext
    yywrap
    yyalloc
    yyrealloc
    yyfree

(If you are using a C++ scanner, then only yywrap and yyFlexLexer are affected.) Within your scanner itself, you can still refer to the global variables and functions using either version of their name; but externally, they have the modified name.

This option lets you easily link together multiple flex programs into the same executable. Note, though, that using this option also renames yywrap(), so you now must either provide your own (appropriately-named) version of the routine for your scanner, or use %option noyywrap, as linking with ‘-lfl’ no longer provides one for you by default.

‘--main, %option main’

directs flex to provide a default main() program for the scanner, which simply calls yylex(). This option implies noyywrap (see below).

‘--nounistd, %option nounistd’

suppresses inclusion of the non-ANSI header file ‘unistd.h’. This option is meant to target environments in which ‘unistd.h’ does not exist. Be aware that certain options may cause flex to generate code that relies on functions normally found in ‘unistd.h’, (e.g. isatty(), read().) If you wish to use these functions, you will have to inform your compiler where to find them. See option-always-interactive. See option-read.

‘--yyclass=NAME, %option yyclass="NAME"’

only applies when generating a C++ scanner (the ‘--c++’ option). It informs flex that you have derived NAME as a subclass of yyFlexLexer, so flex will place your actions in the member function foo::yylex() instead of yyFlexLexer::yylex(). It also generates a yyFlexLexer::yylex() member function that emits a run-time error (by invoking yyFlexLexer::LexerError()) if called. See section Generating C++ Scanners.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

16.4 Options for Scanner Speed and Size

‘-C[aefFmr]’

controls the degree of table compression and, more generally, trade-offs between small scanners and fast scanners.

‘-C’

A lone ‘-C’ specifies that the scanner tables should be compressed but neither equivalence classes nor meta-equivalence classes should be used.

‘-Ca, --align, %option align’

(“align”) instructs flex to trade off larger tables in the generated scanner for faster performance because the elements of the tables are better aligned for memory access and computation. On some RISC architectures, fetching and manipulating longwords is more efficient than with smaller-sized units such as shortwords. This option can quadruple the size of the tables used by your scanner.

‘-Ce, --ecs, %option ecs’

directs flex to construct equivalence classes, i.e., sets of characters which have identical lexical properties (for example, if the only appearance of digits in the flex input is in the character class “[0-9]” then the digits ’0’, ’1’, ..., ’9’ will all be put in the same equivalence class). Equivalence classes usually give dramatic reductions in the final table/object file sizes (typically a factor of 2-5) and are pretty cheap performance-wise (one array look-up per character scanned).

‘-Cf’

specifies that the full scanner tables should be generated - flex should not compress the tables by taking advantages of similar transition functions for different states.

‘-CF’

specifies that the alternate fast scanner representation (described above under the ‘--fast’ flag) should be used. This option cannot be used with ‘--c++’.

‘-Cm, --meta-ecs, %option meta-ecs’

directs flex to construct meta-equivalence classes, which are sets of equivalence classes (or characters, if equivalence classes are not being used) that are commonly used together. Meta-equivalence classes are often a big win when using compressed tables, but they have a moderate performance impact (one or two if tests and one array look-up per character scanned).

‘-Cr, --read, %option read’

causes the generated scanner to bypass use of the standard I/O library (stdio) for input. Instead of calling fread() or getc(), the scanner will use the read() system call, resulting in a performance gain which varies from system to system, but in general is probably negligible unless you are also using ‘-Cf’ or ‘-CF’. Using ‘-Cr’ can cause strange behavior if, for example, you read from ‘yyin’ using stdio prior to calling the scanner (because the scanner will miss whatever text your previous reads left in the stdio input buffer). ‘-Cr’ has no effect if you define YY_INPUT() (see section The Generated Scanner).

The options ‘-Cf’ or ‘-CF’ and ‘-Cm’ do not make sense together - there is no opportunity for meta-equivalence classes if the table is not being compressed. Otherwise the options may be freely mixed, and are cumulative.

The default setting is ‘-Cem’, which specifies that flex should generate equivalence classes and meta-equivalence classes. This setting provides the highest degree of table compression. You can trade off faster-executing scanners at the cost of larger tables with the following generally being true:

    slowest & smallest
          -Cem
          -Cm
          -Ce
          -C
          -C{f,F}e
          -C{f,F}
          -C{f,F}a
    fastest & largest

Note that scanners with the smallest tables are usually generated and compiled the quickest, so during development you will usually want to use the default, maximal compression.

‘-Cfe’ is often a good compromise between speed and size for production scanners.

‘-f, --full, %option full’

specifies fast scanner. No table compression is done and stdio is bypassed. The result is large but fast. This option is equivalent to ‘--Cfr’

‘-F, --fast, %option fast’

specifies that the fast scanner table representation should be used (and stdio bypassed). This representation is about as fast as the full table representation ‘--full’, and for some sets of patterns will be considerably smaller (and for others, larger). In general, if the pattern set contains both keywords and a catch-all, identifier rule, such as in the set:

    "case"    return TOK_CASE;
    "switch"  return TOK_SWITCH;
    ...
    "default" return TOK_DEFAULT;
    [a-z]+    return TOK_ID;

then you’re better off using the full table representation. If only the identifier rule is present and you then use a hash table or some such to detect the keywords, you’re better off using ‘--fast’.

This option is equivalent to ‘-CFr’. It cannot be used with ‘--c++’.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

16.5 Debugging Options

‘-b, --backup, %option backup’

Generate backing-up information to ‘lex.backup’. This is a list of scanner states which require backing up and the input characters on which they do so. By adding rules one can remove backing-up states. If all backing-up states are eliminated and ‘-Cf’ or -CF is used, the generated scanner will run faster (see the ‘--perf-report’ flag). Only users who wish to squeeze every last cycle out of their scanners need worry about this option. (see section Performance Considerations).

‘-d, --debug, %option debug’

makes the generated scanner run in debug mode. Whenever a pattern is recognized and the global variable yy_flex_debug is non-zero (which is the default), the scanner will write to ‘stderr’ a line of the form:

    -accepting rule at line 53 ("the matched text")

The line number refers to the location of the rule in the file defining the scanner (i.e., the file that was fed to flex). Messages are also generated when the scanner backs up, accepts the default rule, reaches the end of its input buffer (or encounters a NUL; at this point, the two look the same as far as the scanner’s concerned), or reaches an end-of-file.

‘-p, --perf-report, %option perf-report’

generates a performance report to ‘stderr’. The report consists of comments regarding features of the flex input file which will cause a serious loss of performance in the resulting scanner. If you give the flag twice, you will also get comments regarding features that lead to minor performance losses.

Note that the use of REJECT, and variable trailing context (see section Limitations) entails a substantial performance penalty; use of yymore(), the ‘^’ operator, and the ‘--interactive’ flag entail minor performance penalties.

‘-s, --nodefault, %option nodefault’

causes the default rule (that unmatched scanner input is echoed to ‘stdout)’ to be suppressed. If the scanner encounters input that does not match any of its rules, it aborts with an error. This option is useful for finding holes in a scanner’s rule set.

‘-T, --trace, %option trace’

makes flex run in trace mode. It will generate a lot of messages to ‘stderr’ concerning the form of the input and the resultant non-deterministic and deterministic finite automata. This option is mostly for use in maintaining flex.

‘-w, --nowarn, %option nowarn’

suppresses warning messages.

‘-v, --verbose, %option verbose’

specifies that flex should write to ‘stderr’ a summary of statistics regarding the scanner it generates. Most of the statistics are meaningless to the casual flex user, but the first line identifies the version of flex (same as reported by ‘--version’), and the next line the flags used when generating the scanner, including those that are on by default.

‘--warn, %option warn’

warn about certain things. In particular, if the default rule can be matched but no default rule has been given, the flex will warn you. We recommend using this option always.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

16.6 Miscellaneous Options

‘-c’: A do-nothing option included for POSIX compliance.
‘-h, -?, --help’: generates a “help” summary of flex’s options to ‘stdout’ and then exits.
‘-n’: Another do-nothing option included for POSIX compliance.
‘-V, --version’: prints the version number to ‘stdout’ and exits.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Rick Perry on January 7, 2013 using texi2html 1.82.