[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Bison parser is actually a C function named yyparse
. Here we
describe the interface conventions of yyparse
and the other
functions that it needs to use.
Keep in mind that the parser uses many C identifiers starting with ‘yy’ and ‘YY’ for internal purposes. If you use such an identifier (aside from those in this manual) in an action or in epilogue in the grammar file, you are likely to run into trouble.
4.1 The Parser Function yyparse | How to call yyparse and what it returns.
| |
4.2 The Push Parser Function yypush_parse | How to call yypush_parse and what it returns.
| |
4.3 The Pull Parser Function yypull_parse | How to call yypull_parse and what it returns.
| |
4.4 The Parser Create Function yystate_new | How to call yypstate_new and what it returns.
| |
4.5 The Parser Delete Function yystate_delete | How to call yypstate_delete and what it returns.
| |
4.6 The Lexical Analyzer Function yylex | You must supply a function yylex
which reads tokens.
| |
4.7 The Error Reporting Function yyerror | You must supply a function yyerror .
| |
4.8 Special Features for Use in Actions | Special features for use in actions. | |
4.9 Parser Internationalization | How to let the parser speak in the user’s native language. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yyparse
You call the function yyparse
to cause parsing to occur. This
function reads tokens, executes actions, and ultimately returns when it
encounters end-of-input or an unrecoverable syntax error. You can also
write an action which directs yyparse
to return immediately
without reading further.
The value returned by yyparse
is 0 if parsing was successful (return
is due to end-of-input).
The value is 1 if parsing failed because of invalid input, i.e., input
that contains a syntax error or that causes YYABORT
to be
invoked.
The value is 2 if parsing failed due to memory exhaustion.
In an action, you can cause immediate return from yyparse
by using
these macros:
Return immediately with value 0 (to report success).
Return immediately with value 1 (to report failure).
If you use a reentrant parser, you can optionally pass additional
parameter information to it in a reentrant way. To do so, use the
declaration %parse-param
:
Declare that one or more
argument-declaration are additional yyparse
arguments.
The argument-declaration is used when declaring
functions or prototypes. The last identifier in
argument-declaration must be the argument name.
Here’s an example. Write this in the parser:
%parse-param {int *nastiness} {int *randomness} |
Then call the parser like this:
{
int nastiness, randomness;
… /* Store proper data in |
In the grammar actions, use expressions like this to refer to the data:
exp: … { …; *randomness += 1; … } |
Using the following:
%parse-param {int *randomness} |
Results in these signatures:
void yyerror (int *randomness, const char *msg); int yyparse (int *randomness); |
Or, if both %define api.pure full
(or just %define api.pure
)
and %locations
are used:
void yyerror (YYLTYPE *llocp, int *randomness, const char *msg); int yyparse (int *randomness); |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yypush_parse
(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
You call the function yypush_parse
to parse a single token. This
function is available if either the ‘%define api.push-pull push’ or
‘%define api.push-pull both’ declaration is used.
See section A Push Parser.
The value returned by yypush_parse
is the same as for yyparse with
the following exception: it returns YYPUSH_MORE
if more input is
required to finish parsing the grammar.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yypull_parse
(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
You call the function yypull_parse
to parse the rest of the input
stream. This function is available if the ‘%define api.push-pull both’
declaration is used.
See section A Push Parser.
The value returned by yypull_parse
is the same as for yyparse
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yystate_new
(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
You call the function yypstate_new
to create a new parser instance.
This function is available if either the ‘%define api.push-pull push’ or
‘%define api.push-pull both’ declaration is used.
See section A Push Parser.
The function will return a valid parser instance if there was memory available or 0 if no memory was available. In impure mode, it will also return 0 if a parser instance is currently allocated.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yystate_delete
(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
You call the function yypstate_delete
to delete a parser instance.
function is available if either the ‘%define api.push-pull push’ or
‘%define api.push-pull both’ declaration is used.
See section A Push Parser.
This function will reclaim the memory associated with a parser instance. After this call, you should no longer attempt to use the parser instance.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylex
The lexical analyzer function, yylex
, recognizes tokens from
the input stream and returns them to the parser. Bison does not create
this function automatically; you must write it so that yyparse
can
call it. The function is sometimes referred to as a lexical scanner.
In simple programs, yylex
is often defined at the end of the
Bison grammar file. If yylex
is defined in a separate source
file, you need to arrange for the token-type macro definitions to be
available there. To do this, use the ‘-d’ option when you run
Bison, so that it will write these macro definitions into the separate
parser header file, ‘name.tab.h’, which you can include in
the other source files that need it. See section Invoking Bison.
4.6.1 Calling Convention for yylex | How yyparse calls yylex .
| |
4.6.2 Semantic Values of Tokens | How yylex must return the semantic value
of the token it has read.
| |
4.6.3 Textual Locations of Tokens | How yylex must return the text location
(line number, etc.) of the token, if the
actions want that.
| |
4.6.4 Calling Conventions for Pure Parsers | How the calling convention differs in a pure parser (see section A Pure (Reentrant) Parser). |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylex
The value that yylex
returns must be the positive numeric code
for the type of token it has just found; a zero or negative value
signifies end-of-input.
When a token is referred to in the grammar rules by a name, that name
in the parser implementation file becomes a C macro whose definition
is the proper numeric code for that token type. So yylex
can
use the name to indicate that type. See section Symbols, Terminal and Nonterminal.
When a token is referred to in the grammar rules by a character literal,
the numeric code for that character is also the code for the token type.
So yylex
can simply return that character code, possibly converted
to unsigned char
to avoid sign-extension. The null character
must not be used this way, because its code is zero and that
signifies end-of-input.
Here is an example showing these things:
int yylex (void) { … if (c == EOF) /* Detect end-of-input. */ return 0; … if (c == '+' || c == '-') return c; /* Assume token type for '+' is '+'. */ … return INT; /* Return the type of the token. */ … } |
This interface has been designed so that the output from the lex
utility can be used without change as the definition of yylex
.
If the grammar uses literal string tokens, there are two ways that
yylex
can determine the token type codes for them:
yylex
can use these symbolic names like
all others. In this case, the use of the literal string tokens in
the grammar file has no effect on yylex
.
yylex
can find the multicharacter token in the yytname
table. The index of the token in the table is the token type’s code.
The name of a multicharacter token is recorded in yytname
with a
double-quote, the token’s characters, and another double-quote. The
token’s characters are escaped as necessary to be suitable as input
to Bison.
Here’s code for looking up a multicharacter token in yytname
,
assuming that the characters of the token are stored in
token_buffer
, and assuming that the token does not contain any
characters like ‘"’ that require escaping.
for (i = 0; i < YYNTOKENS; i++) { if (yytname[i] != 0 && yytname[i][0] == '"' && ! strncmp (yytname[i] + 1, token_buffer, strlen (token_buffer)) && yytname[i][strlen (token_buffer) + 1] == '"' && yytname[i][strlen (token_buffer) + 2] == 0) break; } |
The yytname
table is generated only if you use the
%token-table
declaration. See section Bison Declaration Summary.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In an ordinary (nonreentrant) parser, the semantic value of the token must
be stored into the global variable yylval
. When you are using
just one data type for semantic values, yylval
has that type.
Thus, if the type is int
(the default), you might write this in
yylex
:
… yylval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
When you are using multiple data types, yylval
’s type is a union
made from the %union
declaration (see section The Union Declaration). So when you store a token’s value, you
must use the proper member of the union. If the %union
declaration looks like this:
%union { int intval; double val; symrec *tptr; } |
then the code in yylex
might look like this:
… yylval.intval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you are using the ‘@n’-feature (see section Tracking Locations)
in actions to keep track of the textual locations of tokens and groupings,
then you must provide this information in yylex
. The function
yyparse
expects to find the textual location of a token just parsed
in the global variable yylloc
. So yylex
must store the proper
data in that variable.
By default, the value of yylloc
is a structure and you need only
initialize the members that are going to be used by the actions. The
four members are called first_line
, first_column
,
last_line
and last_column
. Note that the use of this
feature makes the parser noticeably slower.
The data type of yylloc
has the name YYLTYPE
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you use the Bison declaration %define api.pure full
to request a
pure, reentrant parser, the global communication variables yylval
and yylloc
cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by
pointers passed as arguments to yylex
. You must declare them as
shown here, and pass the information back by storing it through those
pointers.
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp) { … *lvalp = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … } |
If the grammar file does not use the ‘@’ constructs to refer to
textual locations, then the type YYLTYPE
will not be defined. In
this case, omit the second argument; yylex
will be called with
only one argument.
If you wish to pass additional arguments to yylex
, use
%lex-param
just like %parse-param
(see section The Parser Function yyparse
). To pass additional arguments to both yylex
and
yyparse
, use %param
.
Specify that argument-declaration are additional yylex
argument
declarations. You may pass one or more such declarations, which is
equivalent to repeating %lex-param
.
Specify that argument-declaration are additional
yylex
/yyparse
argument declaration. This is equivalent to
‘%lex-param {argument-declaration} … %parse-param
{argument-declaration} …’. You may pass one or more
declarations, which is equivalent to repeating %param
.
For instance:
%lex-param {scanner_mode *mode} %parse-param {parser_mode *mode} %param {environment_type *env} |
results in the following signatures:
int yylex (scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
If ‘%define api.pure full’ is added:
int yylex (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
and finally, if both ‘%define api.pure full’ and %locations
are
used:
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, scanner_mode *mode, environment_type *env); int yyparse (parser_mode *mode, environment_type *env); |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yyerror
The Bison parser detects a syntax error (or parse error)
whenever it reads a token which cannot satisfy any syntax rule. An
action in the grammar can also explicitly proclaim an error, using the
macro YYERROR
(see section Special Features for Use in Actions).
The Bison parser expects to report the error by calling an error
reporting function named yyerror
, which you must supply. It is
called by yyparse
whenever a syntax error is found, and it
receives one argument. For a syntax error, the string is normally
"syntax error"
.
If you invoke ‘%define parse.error verbose’ in the Bison declarations
section (see section The Bison Declarations Section), then
Bison provides a more verbose and specific error message string instead of
just plain "syntax error"
. However, that message sometimes
contains incorrect information if LAC is not enabled (see section LAC).
The parser can detect one other kind of error: memory exhaustion. This
can happen when the input contains constructions that are very deeply
nested. It isn’t likely you will encounter this, since the Bison
parser normally extends its stack automatically up to a very large limit. But
if memory is exhausted, yyparse
calls yyerror
in the usual
fashion, except that the argument string is "memory exhausted"
.
In some cases diagnostics like "syntax error"
are
translated automatically from English to some other language before
they are passed to yyerror
. See section Parser Internationalization.
The following definition suffices in simple programs:
void yyerror (char const *s) { fprintf (stderr, "%s\n", s); } |
After yyerror
returns to yyparse
, the latter will attempt
error recovery if you have written suitable error recovery grammar rules
(see section Error Recovery). If recovery is impossible, yyparse
will
immediately return 1.
Obviously, in location tracking pure parsers, yyerror
should have
an access to the current location. With %define api.pure
, this is
indeed the case for the GLR parsers, but not for the Yacc parser, for
historical reasons, and this is the why %define api.pure full
should be
prefered over %define api.pure
.
When %locations %define api.pure full
is used, yyerror
has the
following signature:
void yyerror (YYLTYPE *locp, char const *msg); |
The prototypes are only indications of how the code produced by Bison
uses yyerror
. Bison-generated code always ignores the returned
value, so yyerror
can return any type, including void
.
Also, yyerror
can be a variadic function; that is why the
message is always passed last.
Traditionally yyerror
returns an int
that is always
ignored, but this is purely for historical reasons, and void
is
preferable since it more accurately describes the return type for
yyerror
.
The variable yynerrs
contains the number of syntax errors
reported so far. Normally this variable is global; but if you
request a pure parser (see section A Pure (Reentrant) Parser)
then it is a local variable which only the actions can access.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here is a table of Bison constructs, variables and macros that are useful in actions.
Acts like a variable that contains the semantic value for the grouping made by the current rule. See section Actions.
Acts like a variable that contains the semantic value for the nth component of the current rule. See section Actions.
Like $$
but specifies alternative typealt in the union
specified by the %union
declaration. See section Data Types of Values in Actions.
Like $n
but specifies alternative typealt in the
union specified by the %union
declaration.
See section Data Types of Values in Actions.
;
Return immediately from yyparse
, indicating failure.
See section The Parser Function yyparse
.
;
Return immediately from yyparse
, indicating success.
See section The Parser Function yyparse
.
;
Unshift a token. This macro is allowed only for rules that reduce a single value, and only when there is no lookahead token. It is also disallowed in GLR parsers. It installs a lookahead token with token type token and semantic value value; then it discards the value that was going to be reduced by this rule.
If the macro is used when it is not valid, such as when there is a lookahead token already, then it reports a syntax error with a message ‘cannot back up’ and performs ordinary error recovery.
In either case, the rest of the action is not executed.
Value stored in yychar
when there is no lookahead token.
Value stored in yychar
when the lookahead is the end of the input
stream.
;
Cause an immediate syntax error. This statement initiates error
recovery just as if the parser itself had detected an error; however, it
does not call yyerror
, and does not print any message. If you
want to print an error message, call yyerror
explicitly before
the ‘YYERROR;’ statement. See section Error Recovery.
The expression YYRECOVERING ()
yields 1 when the parser
is recovering from a syntax error, and 0 otherwise.
See section Error Recovery.
Variable containing either the lookahead token, or YYEOF
when the
lookahead is the end of the input stream, or YYEMPTY
when no lookahead
has been performed so the next token is not yet known.
Do not modify yychar
in a deferred semantic action (see section GLR Semantic Actions).
See section Lookahead Tokens.
;
Discard the current lookahead token. This is useful primarily in
error rules.
Do not invoke yyclearin
in a deferred semantic action (see section GLR Semantic Actions).
See section Error Recovery.
;
Resume generating error messages immediately for subsequent syntax errors. This is useful primarily in error rules. See section Error Recovery.
Variable containing the lookahead token location when yychar
is not set
to YYEMPTY
or YYEOF
.
Do not modify yylloc
in a deferred semantic action (see section GLR Semantic Actions).
See section Actions and Locations.
Variable containing the lookahead token semantic value when yychar
is
not set to YYEMPTY
or YYEOF
.
Do not modify yylval
in a deferred semantic action (see section GLR Semantic Actions).
See section Actions.
Acts like a structure variable containing information on the textual location of the grouping made by the current rule. See section Tracking Locations.
Acts like a structure variable containing information on the textual location of the nth component of the current rule. See section Tracking Locations.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A Bison-generated parser can print diagnostics, including error and
tracing messages. By default, they appear in English. However, Bison
also supports outputting diagnostics in the user’s native language. To
make this work, the user should set the usual environment variables.
See (gettext)Users section ‘The User’s View’ in GNU gettext
utilities.
For example, the shell command ‘export LC_ALL=fr_CA.UTF-8’ might
set the user’s locale to French Canadian using the UTF-8
encoding. The exact set of available locales depends on the user’s
installation.
The maintainer of a package that uses a Bison-generated parser enables the internationalization of the parser’s output through the following steps. Here we assume a package that uses GNU Autoconf and GNU Automake.
cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4 |
AM_GNU_GETTEXT
invocation, add an invocation of BISON_I18N
. This macro is
defined in the file ‘bison-i18n.m4’ that you copied earlier. It
causes ‘configure’ to find the value of the
BISON_LOCALEDIR
variable, and it defines the source-language
symbol YYENABLE_NLS
to enable translations in the
Bison-generated parser.
main
function of your program, designate the directory
containing Bison’s runtime message catalog, through a call to
‘bindtextdomain’ with domain name ‘bison-runtime’.
For example:
bindtextdomain ("bison-runtime", BISON_LOCALEDIR); |
Typically this appears after any other call bindtextdomain
(PACKAGE, LOCALEDIR)
that your package already has. Here we rely on
‘BISON_LOCALEDIR’ to be defined as a string through the
‘Makefile’.
main
function, make ‘BISON_LOCALEDIR’ available as a C preprocessor macro,
either in ‘DEFS’ or in ‘AM_CPPFLAGS’. For example:
DEFS = @DEFS@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' |
or:
AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' |
autoreconf
to generate the build
infrastructure.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.