[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A.1 Makefiles and Flex | ||
A.2 C Scanners with Bison Parsers | ||
A.3 M4 Dependency | ||
A.4 Common Patterns |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In this appendix, we provide tips for writing Makefiles to build your scanners.
In a traditional build environment, we say that the ‘.c’ files are the
sources, and the ‘.o’ files are the intermediate files. When using
flex
, however, the ‘.l’ files are the sources, and the generated
‘.c’ files (along with the ‘.o’ files) are the intermediate files.
This requires you to carefully plan your Makefile.
Modern make
programs understand that ‘foo.l’ is intended to
generate ‘lex.yy.c’ or ‘foo.c’, and will behave
accordingly(4)(5). The
following Makefile does not explicitly instruct make
how to build
‘foo.c’ from ‘foo.l’. Instead, it relies on the implicit rules of the
make
program to build the intermediate file, ‘scan.c’:
# Basic Makefile -- relies on implicit rules # Creates "myprogram" from "scan.l" and "myprogram.c" # LEX=flex myprogram: scan.o myprogram.o scan.o: scan.l |
For simple cases, the above may be sufficient. For other cases,
you may have to explicitly instruct make
how to build your scanner.
The following is an example of a Makefile containing explicit rules:
# Basic Makefile -- provides explicit rules # Creates "myprogram" from "scan.l" and "myprogram.c" # LEX=flex myprogram: scan.o myprogram.o $(CC) -o $@ $(LDFLAGS) $^ myprogram.o: myprogram.c $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ scan.o: scan.c $(CC) $(CPPFLAGS) $(CFLAGS) -o $@ -c $^ scan.c: scan.l $(LEX) $(LFLAGS) -o $@ $^ clean: $(RM) *.o scan.c |
Notice in the above example that ‘scan.c’ is in the clean
target.
This is because we consider the file ‘scan.c’ to be an intermediate file.
Finally, we provide a realistic example of a flex
scanner used with a
bison
parser(6).
There is a tricky problem we have to deal with. Since a flex
scanner
will typically include a header file (e.g., ‘y.tab.h’) generated by the
parser, we need to be sure that the header file is generated BEFORE the scanner
is compiled. We handle this case in the following example:
# Makefile example -- scanner and parser. # Creates "myprogram" from "scan.l", "parse.y", and "myprogram.c" # LEX = flex YACC = bison -y YFLAGS = -d objects = scan.o parse.o myprogram.o myprogram: $(objects) scan.o: scan.l parse.c parse.o: parse.y myprogram.o: myprogram.c |
In the above example, notice the line,
scan.o: scan.l parse.c |
, which lists the file ‘parse.c’ (the generated parser) as a dependency of
‘scan.o’. We want to ensure that the parser is created before the scanner
is compiled, and the above line seems to do the trick. Feel free to experiment
with your specific implementation of make
.
For more details on writing Makefiles, see (make)Top section ‘Top’ in The GNU Make Manual.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes the flex
features useful when integrating
flex
with GNU bison
(7).
Skip this section if you are not using
bison
with your scanner. Here we discuss only the flex
half of the flex
and bison
pair. We do not discuss
bison
in any detail. For more information about generating
bison
parsers, see (bison)Top section ‘Top’ in the GNU Bison Manual.
A compatible bison
scanner is generated by declaring ‘%option
bison-bridge’ or by supplying ‘--bison-bridge’ when invoking flex
from the command line. This instructs flex
that the macro
yylval
may be used. The data type for
yylval
, YYSTYPE
,
is typically defined in a header file, included in section 1 of the
flex
input file. For a list of functions and macros
available, See bison-functions.
The declaration of yylex becomes,
int yylex ( YYSTYPE * lvalp, yyscan_t scanner ); |
If %option bison-locations
is specified, then the declaration
becomes,
int yylex ( YYSTYPE * lvalp, YYLTYPE * llocp, yyscan_t scanner ); |
Note that the macros yylval
and yylloc
evaluate to pointers.
Support for yylloc
is optional in bison
, so it is optional in
flex
as well. The following is an example of a flex
scanner that
is compatible with bison
.
/* Scanner for "C" assignment statements... sort of. */ %{ #include "y.tab.h" /* Generated by bison. */ %} %option bison-bridge bison-locations % [[:digit:]]+ { yylval->num = atoi(yytext); return NUMBER;} [[:alnum:]]+ { yylval->str = strdup(yytext); return STRING;} "="|";" { return yytext[0];} . {} % |
As you can see, there really is no magic here. We just use
yylval
as we would any other variable. The data type of
yylval
is generated by bison
, and included in the file
‘y.tab.h’. Here is the corresponding bison
parser:
/* Parser to convert "C" assignments to lisp. */ %{ /* Pass the argument to yyparse through to yylex. */ #define YYPARSE_PARAM scanner #define YYLEX_PARAM scanner %} %locations %pure_parser %union { int num; char* str; } %token <str> STRING %token <num> NUMBER %% assignment: STRING '=' NUMBER ';' { printf( "(setf %s %d)", $1, $3 ); } ; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The macro processor m4
(8)
must be installed wherever flex is installed.
flex
invokes ‘m4’, found by searching the directories in the
PATH
environment variable. Any code you place in section 1 or in the
actions will be sent through m4. Please follow these rules to protect your
code from unwanted m4
processing.
m4
macro names. If for
some reason you need m4_ as a prefix, use a preprocessor #define to get your
symbol past m4 unmangled.
x[y[z]]
. The solution is simple. To get the literal string
"]]"
, use "]""]"
. To get the array notation x[y[z]]
,
use x[y[z] ]
. Flex will attempt to detect these sequences in user code, and
escape them. However, it’s best to avoid this complexity where possible, by
removing such sequences from your code.
m4
is only required at the time you run flex
. The generated
scanner is ordinary C or C++, and does not require m4
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This appendix provides examples of common regular expressions you might use in your scanner.
A.4.1 Numbers | ||
A.4.2 Identifiers | ||
A.4.3 Quoted Constructs | ||
A.4.4 Addresses |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
([[:digit:]]{-}[0])[[:digit:]]*
0[xX][[:xdigit:]]+
0[01234567]*
{dseq} ([[:digit:]]+) {dseq_opt} ([[:digit:]]*) {frac} (({dseq_opt}"."{dseq})|{dseq}".") {exp} ([eE][+-]?{dseq}) {exp_opt} ({exp}?) {fsuff} [flFL] {fsuff_opt} ({fsuff}?) {hpref} (0[xX]) {hdseq} ([[:xdigit:]]+) {hdseq_opt} ([[:xdigit:]]*) {hfrac} (({hdseq_opt}"."{hdseq})|({hdseq}".")) {bexp} ([pP][+-]?{dseq}) {dfc} (({frac}{exp_opt}{fsuff_opt})|({dseq}{exp}{fsuff_opt})) {hfc} (({hpref}{hfrac}{bexp}{fsuff_opt})|({hpref}{hdseq}{bexp}{fsuff_opt})) {c99_floating_point_constant} ({dfc}|{hfc})
See C99 section 6.4.4.2 for the gory details.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ucn ((\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8}))) nondigit [_[:alpha:]] c99_id ([_[:alpha:]]|{ucn})([_[:alnum:]]|{ucn})*
Technically, the above pattern does not encompass all possible C99 identifiers, since C99 allows for "implementation-defined" characters. In practice, C compilers follow the above pattern, with the addition of the ‘$’ character.
[\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
L?\"([^\"\\\n]|(\\['\"?\\abfnrtv])|(\\([0123456]{1,3}))|(\\x[[:xdigit:]]+)|(\\u([[:xdigit:]]{4}))|(\\U([[:xdigit:]]{8})))*\"
("/*"([^*]|"*"[^/])*"*/")|("/"(\\\n)*"/"[^\n]*)
Note that in C99, a ‘//’-style comment may be split across lines, and, contrary to popular belief, does not include the trailing ‘\n’ character.
A better way to scan ‘/* */’ comments is by line, rather than matching possibly huge comments all at once. This will allow you to scan comments of unlimited length, as long as line breaks appear at sane intervals. This is also more efficient when used with automatic line number processing. See option-yylineno.
<INITIAL>{ "/*" BEGIN(COMMENT); } <COMMENT>{ "*/" BEGIN(0); [^*\n]+ ; "*"[^/] ; \n ; }
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
dec-octet [0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5] IPv4address {dec-octet}\.{dec-octet}\.{dec-octet}\.{dec-octet}
h16 [0-9A-Fa-f]{1,4} ls32 {h16}:{h16}|{IPv4address} IPv6address ({h16}:){6}{ls32}| ::({h16}:){5}{ls32}| ({h16})?::({h16}:){4}{ls32}| (({h16}:){0,1}{h16})?::({h16}:){3}{ls32}| (({h16}:){0,2}{h16})?::({h16}:){2}{ls32}| (({h16}:){0,3}{h16})?::{h16}:{ls32}| (({h16}:){0,4}{h16})?::{ls32}| (({h16}:){0,5}{h16})?::{h16}| (({h16}:){0,6}{h16})?::
See RFC 2373 for details.
Note that you have to fold the definition of IPv6address
into one
line and that it also matches the “unspecified address” “::”.
(([^:/?#]+):)?("//"([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
This pattern is nearly useless, since it allows just about any character to appear in a URI, including spaces and control characters. See RFC 2396 for details.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Rick Perry on January 7, 2013 using texi2html 1.82.