[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylex
The lexical analyzer function, yylex
, recognizes tokens from
the input stream and returns them to the parser. Bison does not create
this function automatically; you must write it so that yyparse
can
call it. The function is sometimes referred to as a lexical scanner.
In simple programs, yylex
is often defined at the end of the
Bison grammar file. If yylex
is defined in a separate source
file, you need to arrange for the token-type macro definitions to be
available there. To do this, use the ‘-d’ option when you run
Bison, so that it will write these macro definitions into the separate
parser header file, ‘name.tab.h’, which you can include in
the other source files that need it. See section Invoking Bison.
4.6.1 Calling Convention for yylex | How yyparse calls yylex .
| |
4.6.2 Semantic Values of Tokens | How yylex must return the semantic value
of the token it has read.
| |
4.6.3 Textual Locations of Tokens | How yylex must return the text location
(line number, etc.) of the token, if the
actions want that.
| |
4.6.4 Calling Conventions for Pure Parsers | How the calling convention differs in a pure parser (see section A Pure (Reentrant) Parser). |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylex
The value that yylex
returns must be the positive numeric code
for the type of token it has just found; a zero or negative value
signifies end-of-input.
When a token is referred to in the grammar rules by a name, that name
in the parser implementation file becomes a C macro whose definition
is the proper numeric code for that token type. So yylex
can
use the name to indicate that type. See section Symbols, Terminal and Nonterminal.
When a token is referred to in the grammar rules by a character literal,
the numeric code for that character is also the code for the token type.
So yylex
can simply return that character code, possibly converted
to unsigned char
to avoid sign-extension. The null character
must not be used this way, because its code is zero and that
signifies end-of-input.
Here is an example showing these things:
int yylex (void) { … if (c == EOF) /* Detect end-of-input. */ return 0; … if (c == '+' || c == '-') return c; /* Assume token type for '+' is '+'. */ … return INT; /* Return the type of the token. */ … } |
This interface has been designed so that the output from the lex
utility can be used without change as the definition of yylex
.
If the grammar uses literal string tokens, there are two ways that
yylex
can determine the token type codes for them:
yylex
can use these symbolic names like
all others. In this case, the use of the literal string tokens in
the grammar file has no effect on yylex
.
yylex
can find the multicharacter token in the yytname
table. The index of the token in the table is the token type’s code.
The name of a multicharacter token is recorded in yytname
with a
double-quote, the token’s characters, and another double-quote. The
token’s characters are escaped as necessary to be suitable as input
to Bison.
Here’s code for looking up a multicharacter token in yytname
,
assuming that the characters of the token are stored in
token_buffer
, and assuming that the token does not contain any
characters like ‘"’ that require escaping.
for (i = 0; i < YYNTOKENS; i++) { if (yytname[i] != 0 && yytname[i][0] == '"' && ! strncmp (yytname[i] + 1, token_buffer, strlen (token_buffer)) && yytname[i][strlen (token_buffer) + 1] == '"' && yytname[i][strlen (token_buffer) + 2] == 0) break; } |
The yytname
table is generated only if you use the
%token-table
declaration. See section Bison Declaration Summary.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In an ordinary (nonreentrant) parser, the semantic value of the token must
be stored into the global variable yylval
. When you are using
just one data type for semantic values, yylval
has that type.
Thus, if the type is int
(the default), you might write this in
yylex
:
… yylval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
When you are using multiple data types, yylval
’s type is a union
made from the %union
declaration (see section The Union Declaration). So when you store a token’s value, you
must use the proper member of the union. If the %union
declaration looks like this:
%union { int intval; double val; symrec *tptr; } |
then the code in yylex
might look like this:
… yylval.intval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you are using the ‘@n’-feature (see section Tracking Locations)
in actions to keep track of the textual locations of tokens and groupings,
then you must provide this information in yylex
. The function
yyparse
expects to find the textual location of a token just parsed
in the global variable yylloc
. So yylex
must store the proper
data in that variable.
By default, the value of yylloc
is a structure and you need only
initialize the members that are going to be used by the actions. The
four members are called first_line
, first_column
,
last_line
and last_column
. Note that the use of this
feature makes the parser noticeably slower.
The data type of yylloc
has the name YYLTYPE
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you use the Bison declaration %define api.pure full
to request a
pure, reentrant parser, the global communication variables yylval
and yylloc
cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by
pointers passed as arguments to yylex
. You must declare them as
shown here, and pass the information back by storing it through those
pointers.
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp) { … *lvalp = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … } |
If the grammar file does not use the ‘@’ constructs to refer to
textual locations, then the type YYLTYPE
will not be defined. In
this case, omit the second argument; yylex
will be called with
only one argument.
If you wish to pass additional arguments to yylex
, use
%lex-param
just like %parse-param
(see section The Parser Function yyparse
). To pass additional arguments to both yylex
and
yyparse
, use %param
.
Specify that argument-declaration are additional yylex
argument
declarations. You may pass one or more such declarations, which is
equivalent to repeating %lex-param
.
Specify that argument-declaration are additional
yylex
/yyparse
argument declaration. This is equivalent to
‘%lex-param {argument-declaration} … %parse-param
{argument-declaration} …’. You may pass one or more
declarations, which is equivalent to repeating %param
.
For instance:
%lex-param {scanner_mode *mode} %parse-param {parser_mode *mode} %param {environment_type *env} |
results in the following signatures:
int yylex (scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
If ‘%define api.pure full’ is added:
int yylex (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
and finally, if both ‘%define api.pure full’ and %locations
are
used:
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.