| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylexThe lexical analyzer function, yylex, recognizes tokens from
the input stream and returns them to the parser. Bison does not create
this function automatically; you must write it so that yyparse can
call it. The function is sometimes referred to as a lexical scanner.
In simple programs, yylex is often defined at the end of the
Bison grammar file. If yylex is defined in a separate source
file, you need to arrange for the token-type macro definitions to be
available there. To do this, use the ‘-d’ option when you run
Bison, so that it will write these macro definitions into the separate
parser header file, ‘name.tab.h’, which you can include in
the other source files that need it. See section Invoking Bison.
4.6.1 Calling Convention for yylex | How yyparse calls yylex.
| |
| 4.6.2 Semantic Values of Tokens | How yylex must return the semantic value
of the token it has read.
| |
| 4.6.3 Textual Locations of Tokens | How yylex must return the text location
(line number, etc.) of the token, if the
actions want that.
| |
| 4.6.4 Calling Conventions for Pure Parsers | How the calling convention differs in a pure parser (see section A Pure (Reentrant) Parser). |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylexThe value that yylex returns must be the positive numeric code
for the type of token it has just found; a zero or negative value
signifies end-of-input.
When a token is referred to in the grammar rules by a name, that name
in the parser implementation file becomes a C macro whose definition
is the proper numeric code for that token type. So yylex can
use the name to indicate that type. See section Symbols, Terminal and Nonterminal.
When a token is referred to in the grammar rules by a character literal,
the numeric code for that character is also the code for the token type.
So yylex can simply return that character code, possibly converted
to unsigned char to avoid sign-extension. The null character
must not be used this way, because its code is zero and that
signifies end-of-input.
Here is an example showing these things:
int
yylex (void)
{
…
if (c == EOF) /* Detect end-of-input. */
return 0;
…
if (c == '+' || c == '-')
return c; /* Assume token type for '+' is '+'. */
…
return INT; /* Return the type of the token. */
…
}
|
This interface has been designed so that the output from the lex
utility can be used without change as the definition of yylex.
If the grammar uses literal string tokens, there are two ways that
yylex can determine the token type codes for them:
yylex can use these symbolic names like
all others. In this case, the use of the literal string tokens in
the grammar file has no effect on yylex.
yylex can find the multicharacter token in the yytname
table. The index of the token in the table is the token type’s code.
The name of a multicharacter token is recorded in yytname with a
double-quote, the token’s characters, and another double-quote. The
token’s characters are escaped as necessary to be suitable as input
to Bison.
Here’s code for looking up a multicharacter token in yytname,
assuming that the characters of the token are stored in
token_buffer, and assuming that the token does not contain any
characters like ‘"’ that require escaping.
for (i = 0; i < YYNTOKENS; i++)
{
if (yytname[i] != 0
&& yytname[i][0] == '"'
&& ! strncmp (yytname[i] + 1, token_buffer,
strlen (token_buffer))
&& yytname[i][strlen (token_buffer) + 1] == '"'
&& yytname[i][strlen (token_buffer) + 2] == 0)
break;
}
|
The yytname table is generated only if you use the
%token-table declaration. See section Bison Declaration Summary.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In an ordinary (nonreentrant) parser, the semantic value of the token must
be stored into the global variable yylval. When you are using
just one data type for semantic values, yylval has that type.
Thus, if the type is int (the default), you might write this in
yylex:
… yylval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
When you are using multiple data types, yylval’s type is a union
made from the %union declaration (see section The Union Declaration). So when you store a token’s value, you
must use the proper member of the union. If the %union
declaration looks like this:
%union {
int intval;
double val;
symrec *tptr;
}
|
then the code in yylex might look like this:
… yylval.intval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you are using the ‘@n’-feature (see section Tracking Locations)
in actions to keep track of the textual locations of tokens and groupings,
then you must provide this information in yylex. The function
yyparse expects to find the textual location of a token just parsed
in the global variable yylloc. So yylex must store the proper
data in that variable.
By default, the value of yylloc is a structure and you need only
initialize the members that are going to be used by the actions. The
four members are called first_line, first_column,
last_line and last_column. Note that the use of this
feature makes the parser noticeably slower.
The data type of yylloc has the name YYLTYPE.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you use the Bison declaration %define api.pure full to request a
pure, reentrant parser, the global communication variables yylval
and yylloc cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by
pointers passed as arguments to yylex. You must declare them as
shown here, and pass the information back by storing it through those
pointers.
int
yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
{
…
*lvalp = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */
…
}
|
If the grammar file does not use the ‘@’ constructs to refer to
textual locations, then the type YYLTYPE will not be defined. In
this case, omit the second argument; yylex will be called with
only one argument.
If you wish to pass additional arguments to yylex, use
%lex-param just like %parse-param (see section The Parser Function yyparse). To pass additional arguments to both yylex and
yyparse, use %param.
Specify that argument-declaration are additional yylex argument
declarations. You may pass one or more such declarations, which is
equivalent to repeating %lex-param.
Specify that argument-declaration are additional
yylex/yyparse argument declaration. This is equivalent to
‘%lex-param {argument-declaration} … %parse-param
{argument-declaration} …’. You may pass one or more
declarations, which is equivalent to repeating %param.
For instance:
%lex-param {scanner_mode *mode}
%parse-param {parser_mode *mode}
%param {environment_type *env}
|
results in the following signatures:
int yylex (scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
If ‘%define api.pure full’ is added:
int yylex (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
and finally, if both ‘%define api.pure full’ and %locations are
used:
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp,
scanner_mode *mode, environment_type *env);
int yyparse (parser_mode *mode, environment_type *env);
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.