The patterns in the input (see Rules Section) are written using an extended set of regular expressions. These are:
‘i’ means case-insensitive. ‘-i’ means case-sensitive.
‘s’ alters the meaning of the ‘.’ syntax to match any single byte whatsoever. ‘-s’ alters the meaning of ‘.’ to match any byte except ‘\n’.
‘x’ ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ‘""’s, or appears inside a character class.
The following are all valid:
(?:foo) same as (foo)
(?i:ab7) same as ([aA][bB]7)
(?-i:ab) same as (ab)
(?s:.) same as [\x00-\xFF]
(?-s:.) same as [^\n]
(?ix-s: a . b) same as ([Aa][^\n][bB])
(?x:a b) same as ("ab")
(?x:a\ b) same as ("a b")
(?x:a" "b) same as ("a b")
(?x:a[ ]b) same as ("a b")
(?x:a
/* comment */
b
c) same as (abc)
Note that flex's notion of “newline” is exactly
whatever the C compiler used to compile flex
interprets ‘\n’ as; in particular, on some DOS
systems you must either filter out ‘\r’s in the
input yourself, or explicitly use ‘r/\r\n’ for ‘r$’.
s (see Start Conditions for discussion of start conditions).
s1, s2, or s3.
s1 or s2
Note that inside of a character class, all regular expression operators lose their special meaning except escape (‘\’) and the character class operators, ‘-’, ‘]]’, and, at the beginning of the class, ‘^’.
The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom. Those grouped together have equal precedence (see special note on the precedence of the repeat operator, ‘{}’, under the documentation for the ‘--posix’ POSIX compliance option). For example,
foo|bar*
is the same as
(foo)|(ba(r*))
since the ‘*’ operator has higher precedence than concatenation, and concatenation higher than alternation (‘|’). This pattern therefore matches either the string ‘foo’ or the string ‘ba’ followed by zero-or-more ‘r’'s. To match ‘foo’ or zero-or-more repetitions of the string ‘bar’, use:
foo|(bar)*
And to match a sequence of zero or more repetitions of ‘foo’ and ‘bar’:
(foo|bar)*
In addition to characters and ranges of characters, character classes can also contain character class expressions. These are expressions enclosed inside ‘[’: and ‘:]’ delimiters (which themselves must appear between the ‘[’ and ‘]’ of the character class. Other elements may occur inside the character class, too). The valid expressions are:
[:alnum:] [:alpha:] [:blank:]
[:cntrl:] [:digit:] [:graph:]
[:lower:] [:print:] [:punct:]
[:space:] [:upper:] [:xdigit:]
These expressions all designate a set of characters equivalent to the
corresponding standard C isXXX function. For example,
‘[:alnum:]’ designates those characters for which isalnum()
returns true - i.e., any alphabetic or numeric character. Some systems
don't provide isblank(), so flex defines ‘[:blank:]’ as a
blank or a tab.
For example, the following character classes are all equivalent:
[[:alnum:]]
[[:alpha:][:digit:]]
[[:alpha:][0-9]]
[a-zA-Z0-9]
A word of caution. Character classes are expanded immediately when seen in the flex input.
This means the character classes are sensitive to the locale in which flex
is executed, and the resulting scanner will not be sensitive to the runtime locale.
This may or may not be desirable.
| Range | Result | Literal Range | Alternate Range
|
| ‘[a-t]’ | ok | ‘[a-tA-T]’ |
|
| ‘[A-T]’ | ok | ‘[a-tA-T]’ |
|
| ‘[A-t]’ | ambiguous | ‘[A-Z\[\\\]_`a-t]’ | ‘[a-tA-T]’
|
| ‘[_-{]’ | ambiguous | ‘[_`a-z{]’ | ‘[_`a-zA-Z{]’
|
| ‘[@-C]’ | ambiguous | ‘[@ABC]’ | ‘[@A-Z\[\\\]_`abc]’
|
Flex allows negation of character class expressions by prepending ‘^’ to the POSIX character class name.
[:^alnum:] [:^alpha:] [:^blank:]
[:^cntrl:] [:^digit:] [:^graph:]
[:^lower:] [:^print:] [:^punct:]
[:^space:] [:^upper:] [:^xdigit:]
Flex will issue a warning if the expressions ‘[:^upper:]’ and ‘[:^lower:]’ appear in a case-insensitive scanner, since their meaning is unclear. The current behavior is to skip them entirely, but this may change without notice in future revisions of flex.
foo/bar$
<sc1>foo<sc2>bar
Note that the first of these can be written ‘foo/bar\n’.
foo|(bar$)
foo|^bar
If the desired meaning is a ‘foo’ or a
‘bar’-followed-by-a-newline, the following could be used (the
special | action is explained below, see Actions):
foo |
bar$ /* action goes here */
A similar trick will work for matching a ‘foo’ or a ‘bar’-at-the-beginning-of-a-line.