[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The various flex
options are categorized by function in the following
menu. If you want to lookup a particular option by name, See section Index of Scanner Options.
16.1 Options for Specifying Filenames | ||
16.2 Options Affecting Scanner Behavior | ||
16.3 Code-Level And API Options | ||
16.4 Options for Scanner Speed and Size | ||
16.5 Debugging Options | ||
16.6 Miscellaneous Options |
Even though there are many scanner options, a typical scanner might only specify the following options:
%option 8bit reentrant bison-bridge %option warn nodefault %option yylineno %option outfile="scanner.c" header-file="scanner.h" |
The first line specifies the general type of scanner we want. The second line specifies that we are being careful. The third line asks flex to track line numbers. The last line tells flex what to name the files. (The options can be specified in any order. We just divided them.)
flex
also provides a mechanism for controlling options within the
scanner specification itself, rather than from the flex command-line.
This is done by including %option
directives in the first section
of the scanner specification. You can specify multiple options with a
single %option
directive, and multiple directives in the first
section of your flex input file.
Most options are given simply as names, optionally preceded by the word ‘no’ (with no intervening whitespace) to negate their meaning. The names are the same as their long-option equivalents (but without the leading ‘--’ ).
flex
scans your rule actions to determine whether you use the
REJECT
or yymore()
features. The REJECT
and
yymore
options are available to override its decision as to
whether you use the options, either by setting them (e.g., %option
reject)
to indicate the feature is indeed used, or unsetting them to
indicate it actually is not used (e.g., %option noyymore)
.
A number of options are available for lint purists who want to suppress
the appearance of unneeded routines in the generated scanner. Each of
the following, if unset (e.g., %option nounput
), results in the
corresponding routine not appearing in the generated scanner:
input, unput yy_push_state, yy_pop_state, yy_top_state yy_scan_buffer, yy_scan_bytes, yy_scan_string yyget_extra, yyset_extra, yyget_leng, yyget_text, yyget_lineno, yyset_lineno, yyget_in, yyset_in, yyget_out, yyset_out, yyget_lval, yyset_lval, yyget_lloc, yyset_lloc, yyget_debug, yyset_debug |
(though yy_push_state()
and friends won’t appear anyway unless
you use %option stack)
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
%option header-file="FILE"
’instructs flex to write a C header to ‘FILE’. This file contains function prototypes, extern variables, and types used by the scanner. Only the external API is exported by the header file. Many macros that are usable from within scanner actions are not exported to the header file. This is due to namespace problems and the goal of a clean external API.
While in the header, the macro yyIN_HEADER
is defined, where ‘yy’
is substituted with the appropriate prefix.
The ‘--header-file’ option is not compatible with the ‘--c++’ option, since the C++ scanner provides its own header in ‘yyFlexLexer.h’.
%option outfile="FILE"
’directs flex to write the scanner to the file ‘FILE’ instead of
‘lex.yy.c’. If you combine ‘--outfile’ with the ‘--stdout’ option,
then the scanner is written to ‘stdout’ but its #line
directives (see the ‘-l’ option above) refer to the file
‘FILE’.
%option stdout
’instructs flex
to write the scanner it generates to standard
output instead of ‘lex.yy.c’.
overrides the default skeleton file from which
flex
constructs its scanners. You’ll never need this option unless you are doing
flex
maintenance or development.
Write serialized scanner dfa tables to FILE. The generated scanner will not contain the tables, and requires them to be loaded at runtime. See serialization.
This option is for flex development. We document it here in case you stumble upon it by accident or in case you suspect some inconsistency in the serialized tables. Flex will serialize the scanner dfa tables but will also generate the in-code tables as it normally does. At runtime, the scanner will verify that the serialized tables match the in-code tables, instead of loading them.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
%option case-insensitive
’instructs flex
to generate a case-insensitive scanner. The
case of letters given in the flex
input patterns will be ignored,
and tokens in the input will be matched regardless of case. The matched
text given in yytext
will have the preserved case (i.e., it will
not be folded). For tricky behavior, see case and character ranges.
%option lex-compat
’turns on maximum compatibility with the original AT&T lex
implementation. Note that this does not mean full compatibility.
Use of this option costs a considerable amount of performance, and it
cannot be used with the ‘--c++’, ‘--full’, ‘--fast’, ‘-Cf’, or
‘-CF’ options. For details on the compatibilities it provides, see
Incompatibilities with Lex and Posix. This option also results in the name
YY_FLEX_LEX_COMPAT
being #define
’d in the generated scanner.
%option batch
’instructs flex
to generate a batch scanner, the opposite of
interactive scanners generated by ‘--interactive’ (see below). In
general, you use ‘-B’ when you are certain that your scanner
will never be used interactively, and you want to squeeze a
little more performance out of it. If your goal is instead to
squeeze out a lot more performance, you should be using the
‘-Cf’ or ‘-CF’ options, which turn on ‘--batch’ automatically
anyway.
%option interactive
’instructs flex
to generate an interactive scanner. An
interactive scanner is one that only looks ahead to decide what token
has been matched if it absolutely must. It turns out that always
looking one extra character ahead, even if the scanner has already seen
enough text to disambiguate the current token, is a bit faster than only
looking ahead when necessary. But scanners that always look ahead give
dreadful interactive performance; for example, when a user types a
newline, it is not recognized as a newline token until they enter
another token, which often means typing in another whole line.
flex
scanners default to interactive
unless you use the
‘-Cf’ or ‘-CF’ table-compression options
(see section Performance Considerations). That’s because if you’re looking for
high-performance you should be using one of these options, so if you
didn’t, flex
assumes you’d rather trade off a bit of run-time
performance for intuitive interactive behavior. Note also that you
cannot use ‘--interactive’ in conjunction with ‘-Cf’ or
‘-CF’. Thus, this option is not really needed; it is on by default
for all those cases in which it is allowed.
You can force a scanner to not be interactive by using ‘--batch’
%option 7bit
’instructs flex
to generate a 7-bit scanner, i.e., one which can
only recognize 7-bit characters in its input. The advantage of using
‘--7bit’ is that the scanner’s tables can be up to half the size of
those generated using the ‘--8bit’. The disadvantage is that such
scanners often hang or crash if their input contains an 8-bit character.
Note, however, that unless you generate your scanner using the
‘-Cf’ or ‘-CF’ table compression options, use of ‘--7bit’
will save only a small amount of table space, and make your scanner
considerably less portable. Flex
’s default behavior is to
generate an 8-bit scanner unless you use the ‘-Cf’ or ‘-CF’,
in which case flex
defaults to generating 7-bit scanners unless
your site was always configured to generate 8-bit scanners (as will
often be the case with non-USA sites). You can tell whether flex
generated a 7-bit or an 8-bit scanner by inspecting the flag summary in
the ‘--verbose’ output as described above.
Note that if you use ‘-Cfe’ or ‘-CFe’ flex
still
defaults to generating an 8-bit scanner, since usually with these
compression options full 8-bit tables are not much more expensive than
7-bit tables.
%option 8bit
’instructs flex
to generate an 8-bit scanner, i.e., one which can
recognize 8-bit characters. This flag is only needed for scanners
generated using ‘-Cf’ or ‘-CF’, as otherwise flex defaults to
generating an 8-bit scanner anyway.
See the discussion of
‘--7bit’
above for flex
’s default behavior and the tradeoffs between 7-bit
and 8-bit scanners.
%option default
’generate the default rule.
%option always-interactive
’instructs flex to generate a scanner which always considers its input
interactive. Normally, on each new input file the scanner calls
isatty()
in an attempt to determine whether the scanner’s input
source is interactive and thus should be read a character at a time.
When this option is used, however, then no such call is made.
--never-interactive
’instructs flex to generate a scanner which never considers its input
interactive. This is the opposite of always-interactive
.
%option posix
’turns on maximum compatibility with the POSIX 1003.2-1992 definition of
lex
. Since flex
was originally designed to implement the
POSIX definition of lex
this generally involves very few changes
in behavior. At the current writing the known differences between
flex
and the POSIX standard are:
lex
, the repeat operator, ‘{}’, has lower
precedence than concatenation (thus ‘ab{3}’ yields ‘ababab’).
Most POSIX utilities use an Extended Regular Expression (ERE) precedence
that has the precedence of the repeat operator higher than concatenation
(which causes ‘ab{3}’ to yield ‘abbb’). By default, flex
places the precedence of the repeat operator higher than concatenation
which matches the ERE processing of other POSIX utilities. When either
‘--posix’ or ‘-l’ are specified, flex
will use the
traditional AT&T and POSIX-compliant precedence for the repeat operator
where concatenation has higher precedence than the repeat operator.
%option stack
’enables the use of start condition stacks (see section Start Conditions).
%option stdinit
’if set (i.e., %option stdinit) initializes yyin
and
yyout
to ‘stdin’ and ‘stdout’, instead of the default of
‘NULL’. Some existing lex
programs depend on this behavior,
even though it is not compliant with ANSI C, which does not require
‘stdin’ and ‘stdout’ to be compile-time constant. In a
reentrant scanner, however, this is not a problem since initialization
is performed in yylex_init
at runtime.
%option yylineno
’directs flex
to generate a scanner
that maintains the number of the current line read from its input in the
global variable yylineno
. This option is implied by %option
lex-compat
. In a reentrant C scanner, the macro yylineno
is
accessible regardless of the value of %option yylineno
, however, its
value is not modified by flex
unless %option yylineno
is enabled.
%option yywrap
’if unset (i.e., --noyywrap)
, makes the scanner not call
yywrap()
upon an end-of-file, but simply assume that there are no
more files to scan (until the user points ‘yyin’ at a new file and
calls yylex()
again).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
%option ansi-definitions
’instruct flex to generate ANSI C99 definitions for functions.
This option is enabled by default.
If %option noansi-definitions
is specified, then the obsolete style
is generated.
%option ansi-prototypes
’instructs flex to generate ANSI C99 prototypes for functions.
This option is enabled by default.
If noansi-prototypes
is specified, then
prototypes will have empty parameter lists.
%option bison-bridge
’instructs flex to generate a C scanner that is
meant to be called by a
GNU bison
parser. The scanner has minor API changes for
bison
compatibility. In particular, the declaration of
yylex
is modified to take an additional parameter,
yylval
.
See section C Scanners with Bison Parsers.
%option bison-locations
’instruct flex that
GNU bison
%locations
are being used.
This means yylex
will be passed
an additional parameter, yylloc
. This option
implies %option bison-bridge
.
See section C Scanners with Bison Parsers.
%option noline
’instructs
flex
not to generate
#line
directives. Without this option,
flex
peppers the generated scanner
with #line
directives so error messages in the actions will be correctly
located with respect to either the original
flex
input file (if the errors are due to code in the input file), or
‘lex.yy.c’
(if the errors are
flex
’s
fault – you should report these sorts of errors to the email address
given in Reporting Bugs).
%option reentrant
’instructs flex to generate a reentrant C scanner. The generated scanner
may safely be used in a multi-threaded environment. The API for a
reentrant scanner is different than for a non-reentrant scanner
see section Reentrant C Scanners). Because of the API difference between
reentrant and non-reentrant flex
scanners, non-reentrant flex
code must be modified before it is suitable for use with this option.
This option is not compatible with the ‘--c++’ option.
The option ‘--reentrant’ does not affect the performance of the scanner.
%option c++
’specifies that you want flex to generate a C++ scanner class. See section Generating C++ Scanners, for details.
%option array
’specifies that you want yytext to be an array instead of a char*
%option pointer
’specify that yytext
should be a char *
, not an array.
This default is char *
.
%option prefix="PREFIX"
’changes the default ‘yy’ prefix used by flex
for all
globally-visible variable and function names to instead be
‘PREFIX’. For example, ‘--prefix=foo’ changes the name of
yytext
to footext
. It also changes the name of the default
output file from ‘lex.yy.c’ to ‘lex.foo.c’. Here is a partial
list of the names affected:
yy_create_buffer yy_delete_buffer yy_flex_debug yy_init_buffer yy_flush_buffer yy_load_buffer_state yy_switch_to_buffer yyin yyleng yylex yylineno yyout yyrestart yytext yywrap yyalloc yyrealloc yyfree |
(If you are using a C++ scanner, then only yywrap
and
yyFlexLexer
are affected.) Within your scanner itself, you can
still refer to the global variables and functions using either version
of their name; but externally, they have the modified name.
This option lets you easily link together multiple
flex
programs into the same executable. Note, though, that using this
option also renames
yywrap()
,
so you now
must
either
provide your own (appropriately-named) version of the routine for your
scanner, or use
%option noyywrap
,
as linking with
‘-lfl’
no longer provides one for you by default.
%option main
’ directs flex to provide a default main()
program for the
scanner, which simply calls yylex()
. This option implies
noyywrap
(see below).
%option nounistd
’suppresses inclusion of the non-ANSI header file ‘unistd.h’. This option
is meant to target environments in which ‘unistd.h’ does not exist. Be aware
that certain options may cause flex to generate code that relies on functions
normally found in ‘unistd.h’, (e.g. isatty()
, read()
.)
If you wish to use these functions, you will have to inform your compiler where
to find them.
See option-always-interactive. See option-read.
%option yyclass="NAME"
’only applies when generating a C++ scanner (the ‘--c++’ option). It
informs flex
that you have derived NAME
as a subclass of
yyFlexLexer
, so flex
will place your actions in the member
function foo::yylex()
instead of yyFlexLexer::yylex()
. It
also generates a yyFlexLexer::yylex()
member function that emits
a run-time error (by invoking yyFlexLexer::LexerError())
if
called. See section Generating C++ Scanners.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
controls the degree of table compression and, more generally, trade-offs between small scanners and fast scanners.
A lone ‘-C’ specifies that the scanner tables should be compressed but neither equivalence classes nor meta-equivalence classes should be used.
%option align
’(“align”) instructs flex to trade off larger tables in the generated scanner for faster performance because the elements of the tables are better aligned for memory access and computation. On some RISC architectures, fetching and manipulating longwords is more efficient than with smaller-sized units such as shortwords. This option can quadruple the size of the tables used by your scanner.
%option ecs
’directs flex
to construct equivalence classes, i.e., sets
of characters which have identical lexical properties (for example, if
the only appearance of digits in the flex
input is in the
character class “[0-9]” then the digits ’0’, ’1’, ..., ’9’ will all be
put in the same equivalence class). Equivalence classes usually give
dramatic reductions in the final table/object file sizes (typically a
factor of 2-5) and are pretty cheap performance-wise (one array look-up
per character scanned).
specifies that the full scanner tables should be generated -
flex
should not compress the tables by taking advantages of
similar transition functions for different states.
specifies that the alternate fast scanner representation (described above under the ‘--fast’ flag) should be used. This option cannot be used with ‘--c++’.
%option meta-ecs
’directs
flex
to construct
meta-equivalence classes,
which are sets of equivalence classes (or characters, if equivalence
classes are not being used) that are commonly used together. Meta-equivalence
classes are often a big win when using compressed tables, but they
have a moderate performance impact (one or two if
tests and one
array look-up per character scanned).
%option read
’causes the generated scanner to bypass use of the standard I/O
library (stdio
) for input. Instead of calling fread()
or
getc()
, the scanner will use the read()
system call,
resulting in a performance gain which varies from system to system, but
in general is probably negligible unless you are also using ‘-Cf’
or ‘-CF’. Using ‘-Cr’ can cause strange behavior if, for
example, you read from ‘yyin’ using stdio
prior to calling
the scanner (because the scanner will miss whatever text your previous
reads left in the stdio
input buffer). ‘-Cr’ has no effect
if you define YY_INPUT()
(see section The Generated Scanner).
The options ‘-Cf’ or ‘-CF’ and ‘-Cm’ do not make sense together - there is no opportunity for meta-equivalence classes if the table is not being compressed. Otherwise the options may be freely mixed, and are cumulative.
The default setting is ‘-Cem’, which specifies that flex
should generate equivalence classes and meta-equivalence classes. This
setting provides the highest degree of table compression. You can trade
off faster-executing scanners at the cost of larger tables with the
following generally being true:
slowest & smallest -Cem -Cm -Ce -C -C{f,F}e -C{f,F} -C{f,F}a fastest & largest |
Note that scanners with the smallest tables are usually generated and compiled the quickest, so during development you will usually want to use the default, maximal compression.
‘-Cfe’ is often a good compromise between speed and size for production scanners.
%option full
’specifies
fast scanner.
No table compression is done and stdio
is bypassed.
The result is large but fast. This option is equivalent to
‘--Cfr’
%option fast
’specifies that the fast scanner table representation should be
used (and stdio
bypassed). This representation is about as fast
as the full table representation ‘--full’, and for some sets of
patterns will be considerably smaller (and for others, larger). In
general, if the pattern set contains both keywords and a
catch-all, identifier rule, such as in the set:
"case" return TOK_CASE; "switch" return TOK_SWITCH; ... "default" return TOK_DEFAULT; [a-z]+ return TOK_ID; |
then you’re better off using the full table representation. If only the identifier rule is present and you then use a hash table or some such to detect the keywords, you’re better off using ‘--fast’.
This option is equivalent to ‘-CFr’. It cannot be used with ‘--c++’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
%option backup
’Generate backing-up information to ‘lex.backup’. This is a list of
scanner states which require backing up and the input characters on
which they do so. By adding rules one can remove backing-up states. If
all backing-up states are eliminated and ‘-Cf’ or -CF
is used, the generated scanner will run faster (see the ‘--perf-report’ flag).
Only users who wish to squeeze every last cycle out of their scanners
need worry about this option. (see section Performance Considerations).
%option debug
’makes the generated scanner run in debug mode. Whenever a pattern
is recognized and the global variable yy_flex_debug
is non-zero
(which is the default), the scanner will write to ‘stderr’ a line
of the form:
-accepting rule at line 53 ("the matched text") |
The line number refers to the location of the rule in the file defining the scanner (i.e., the file that was fed to flex). Messages are also generated when the scanner backs up, accepts the default rule, reaches the end of its input buffer (or encounters a NUL; at this point, the two look the same as far as the scanner’s concerned), or reaches an end-of-file.
%option perf-report
’generates a performance report to ‘stderr’. The report consists of
comments regarding features of the flex
input file which will
cause a serious loss of performance in the resulting scanner. If you
give the flag twice, you will also get comments regarding features that
lead to minor performance losses.
Note that the use of REJECT
, and
variable trailing context (see section Limitations) entails a substantial
performance penalty; use of yymore()
, the ‘^’ operator, and
the ‘--interactive’ flag entail minor performance penalties.
%option nodefault
’causes the default rule (that unmatched scanner input is echoed to ‘stdout)’ to be suppressed. If the scanner encounters input that does not match any of its rules, it aborts with an error. This option is useful for finding holes in a scanner’s rule set.
%option trace
’makes flex
run in trace mode. It will generate a lot of
messages to ‘stderr’ concerning the form of the input and the
resultant non-deterministic and deterministic finite automata. This
option is mostly for use in maintaining flex
.
%option nowarn
’suppresses warning messages.
%option verbose
’specifies that flex
should write to ‘stderr’ a summary of
statistics regarding the scanner it generates. Most of the statistics
are meaningless to the casual flex
user, but the first line
identifies the version of flex
(same as reported by ‘--version’),
and the next line the flags used when generating the scanner, including
those that are on by default.
%option warn
’warn about certain things. In particular, if the default rule can be matched but no default rule has been given, the flex will warn you. We recommend using this option always.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A do-nothing option included for POSIX compliance.
generates a “help” summary of flex
’s options to ‘stdout’
and then exits.
Another do-nothing option included for POSIX compliance.
prints the version number to ‘stdout’ and exits.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Rick Perry on January 7, 2013 using texi2html 1.82.