[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Bison declarations section of a Bison grammar defines the symbols used in formulating the grammar and the data types of semantic values. See section Symbols, Terminal and Nonterminal.
All token type names (but not single-character literal tokens such as
'+'
and '*'
) must be declared. Nonterminal symbols must be
declared if you need to specify which data type to use for the semantic
value (see section More Than One Value Type).
The first rule in the grammar file also specifies the start symbol, by default. If you want some other symbol to be the start symbol, you must declare it explicitly (see section Languages and Context-Free Grammars).
3.7.1 Require a Version of Bison | Requiring a Bison version. | |
3.7.2 Token Type Names | Declaring terminal symbols. | |
3.7.3 Operator Precedence | Declaring terminals with precedence and associativity. | |
3.7.4 Nonterminal Symbols | Declaring the choice of type for a nonterminal symbol. | |
3.7.5 Performing Actions before Parsing | Code run before parsing starts. | |
3.7.6 Freeing Discarded Symbols | Declaring how symbols are freed. | |
3.7.7 Printing Semantic Values | Declaring how symbol values are displayed. | |
3.7.8 Suppressing Conflict Warnings | Suppressing warnings about parsing conflicts. | |
3.7.9 The Start-Symbol | Specifying the start symbol. | |
3.7.10 A Pure (Reentrant) Parser | Requesting a reentrant parser. | |
3.7.11 A Push Parser | Requesting a push parser. | |
3.7.12 Bison Declaration Summary | Table of all Bison declarations. | |
3.7.13 %define Summary | Defining variables to adjust Bison’s behavior. | |
3.7.14 %code Summary | Inserting code into the parser source. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You may require the minimum version of Bison to process the grammar. If
the requirement is not met, bison
exits with an error (exit
status 63).
%require "version" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The basic way to declare a token type name (terminal symbol) is as follows:
%token name |
Bison will convert this into a #define
directive in
the parser, so that the function yylex
(if it is in this file)
can use the name name to stand for this token type’s code.
Alternatively, you can use %left
, %right
,
%precedence
, or
%nonassoc
instead of %token
, if you wish to specify
associativity and precedence. See section Operator Precedence.
You can explicitly specify the numeric code for a token type by appending a nonnegative decimal or hexadecimal integer value in the field immediately following the token name:
%token NUM 300 %token XNUM 0x12d // a GNU extension |
It is generally best, however, to let Bison choose the numeric codes for all token types. Bison will automatically select codes that don’t conflict with each other or with normal characters.
In the event that the stack type is a union, you must augment the
%token
or other token declaration to include the data type
alternative delimited by angle-brackets (see section More Than One Value Type).
For example:
%union { /* define stack type */ double val; symrec *tptr; } %token <val> NUM /* define token NUM and its type */ |
You can associate a literal string token with a token type name by
writing the literal string at the end of a %token
declaration which declares the name. For example:
%token arrow "=>" |
For example, a grammar for the C language might specify these names with equivalent literal string tokens:
%token <operator> OR "||" %token <operator> LE 134 "<=" %left OR "<=" |
Once you equate the literal string and the token name, you can use them
interchangeably in further declarations or the grammar rules. The
yylex
function can use the token name or the literal string to
obtain the token type code number (see section Calling Convention for yylex
).
Syntax error messages passed to yyerror
from the parser will reference
the literal string instead of the token name.
The token numbered as 0 corresponds to end of file; the following line allows for nicer error messages referring to “end of file” instead of “$end”:
%token END 0 "end of file" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Use the %left
, %right
, %nonassoc
, or
%precedence
declaration to
declare a token and specify its precedence and associativity, all at
once. These are called precedence declarations.
See section Operator Precedence, for general information on
operator precedence.
The syntax of a precedence declaration is nearly the same as that of
%token
: either
%left symbols… |
or
%left <type> symbols… |
And indeed any of these declarations serves the purposes of %token
.
But in addition, they specify the associativity and relative precedence for
all the symbols:
%left
specifies
left-associativity (grouping x with y first) and
%right
specifies right-associativity (grouping y with
z first). %nonassoc
specifies no associativity, which
means that ‘x op y op z’ is
considered a syntax error.
%precedence
gives only precedence to the symbols, and
defines no associativity at all. Use this to define precedence only,
and leave any potential conflict due to associativity enabled.
For backward compatibility, there is a confusing difference between the
argument lists of %token
and precedence declarations.
Only a %token
can associate a literal string with a token type name.
A precedence declaration always interprets a literal string as a reference to a
separate token.
For example:
%left OR "<=" // Does not declare an alias. %left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=". |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you use %union
to specify multiple value types, you must
declare the value type of each nonterminal symbol for which values are
used. This is done with a %type
declaration, like this:
%type <type> nonterminal… |
Here nonterminal is the name of a nonterminal symbol, and
type is the name given in the %union
to the alternative
that you want (see section The Union Declaration). You
can give any number of nonterminal symbols in the same %type
declaration, if they have the same value type. Use spaces to separate
the symbol names.
You can also declare the value type of a terminal symbol. To do this,
use the same <type>
construction in a declaration for the
terminal symbol. All kinds of token declarations allow
<type>
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Sometimes your parser needs to perform some initializations before
parsing. The %initial-action
directive allows for such arbitrary
code.
Declare that the braced code must be invoked before parsing each time
yyparse
is called. The code may use $$
(or
$<tag>$
) and @$
— initial value and location of the
lookahead — and the %parse-param
.
For instance, if your locations use a file name, you may use
%parse-param { char const *file_name }; %initial-action { @$.initialize (file_name); }; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
During error recovery (see section Error Recovery), symbols already pushed
on the stack and tokens coming from the rest of the file are discarded
until the parser falls on its feet. If the parser runs out of memory,
or if it returns via YYABORT
or YYACCEPT
, all the
symbols on the stack must be discarded. Even if the parser succeeds, it
must discard the start symbol.
When discarded symbols convey heap based information, this memory is lost. While this behavior can be tolerable for batch parsers, such as in traditional compilers, it is unacceptable for programs like shells or protocol implementations that may parse and execute indefinitely.
The %destructor
directive defines code that is called when a
symbol is automatically discarded.
Invoke the braced code whenever the parser discards one of the
symbols. Within code, $$
(or $<tag>$
)
designates the semantic value associated with the discarded symbol, and
@$
designates its location. The additional parser parameters are
also available (see section The Parser Function yyparse
).
When a symbol is listed among symbols, its %destructor
is called a
per-symbol %destructor
.
You may also define a per-type %destructor
by listing a semantic type
tag among symbols.
In that case, the parser will invoke this code whenever it discards any
grammar symbol that has that semantic type tag unless that symbol has its own
per-symbol %destructor
.
Finally, you can define two different kinds of default %destructor
s.
(These default forms are experimental.
More user feedback will help to determine whether they should become permanent
features.)
You can place each of <*>
and <>
in the symbols list of
exactly one %destructor
declaration in your grammar file.
The parser will invoke the code associated with one of these whenever it
discards any user-defined grammar symbol that has no per-symbol and no per-type
%destructor
.
The parser uses the code for <*>
in the case of such a grammar
symbol for which you have formally declared a semantic type tag (%type
counts as such a declaration, but $<tag>$
does not).
The parser uses the code for <>
in the case of such a grammar
symbol that has no declared semantic type tag.
For example:
%union { char *string; } %token <string> STRING1 STRING2 %type <string> string1 string2 %union { char character; } %token <character> CHR %type <character> chr %token TAGLESS %destructor { } <character> %destructor { free ($$); } <*> %destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1 %destructor { printf ("Discarding tagless symbol.\n"); } <> |
guarantees that, when the parser discards any user-defined symbol that has a
semantic type tag other than <character>
, it passes its semantic value
to free
by default.
However, when the parser discards a STRING1
or a string1
, it also
prints its line number to stdout
.
It performs only the second %destructor
in this case, so it invokes
free
only once.
Finally, the parser merely prints a message whenever it discards any symbol,
such as TAGLESS
, that has no semantic type tag.
A Bison-generated parser invokes the default %destructor
s only for
user-defined as opposed to Bison-defined symbols.
For example, the parser will not invoke either kind of default
%destructor
for the special Bison-defined symbols $accept
,
$undefined
, or $end
(see section Bison Symbols),
none of which you can reference in your grammar.
It also will not invoke either for the error
token (see section error), which is always defined by Bison regardless of whether you
reference it in your grammar.
However, it may invoke one of them for the end token (token 0) if you
redefine it from $end
to, for example, END
:
%token END 0 |
Finally, Bison will never invoke a %destructor
for an unreferenced
mid-rule semantic value (see section Actions in Mid-Rule).
That is, Bison does not consider a mid-rule to have a semantic value if you
do not reference $$
in the mid-rule’s action or $n
(where n is the right-hand side symbol position of the mid-rule) in
any later action in that rule. However, if you do reference either, the
Bison-generated parser will invoke the <>
%destructor
whenever
it discards the mid-rule symbol.
Discarded symbols are the following:
parse
,
The parser can return immediately because of an explicit call to
YYABORT
or YYACCEPT
, or failed error recovery, or memory
exhaustion.
Right-hand side symbols of a rule that explicitly triggers a syntax
error via YYERROR
are not discarded automatically. As a rule
of thumb, destructors are invoked only when user actions cannot manage
the memory.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When run-time traces are enabled (see section Tracing Your Parser), the parser reports its actions, such as reductions. When a symbol involved in an action is reported, only its kind is displayed, as the parser cannot know how semantic values should be formatted.
The %printer
directive defines code that is called when a symbol is
reported. Its syntax is the same as %destructor
(see section Freeing Discarded Symbols).
Invoke the braced code whenever the parser displays one of the
symbols. Within code, yyoutput
denotes the output stream
(a FILE*
in C, and an std::ostream&
in C++), $$
(or
$<tag>$
) designates the semantic value associated with the
symbol, and @$
its location. The additional parser parameters are
also available (see section The Parser Function yyparse
).
The symbols are defined as for %destructor
(see section Freeing Discarded Symbols.): they can be per-type (e.g.,
‘<ival>’), per-symbol (e.g., ‘exp’, ‘NUM’, ‘"float"’),
typed per-default (i.e., ‘<*>’, or untyped per-default (i.e.,
‘<>’).
For example:
%union { char *string; } %token <string> STRING1 STRING2 %type <string> string1 string2 %union { char character; } %token <character> CHR %type <character> chr %token TAGLESS %printer { fprintf (yyoutput, "'%c'", $$); } <character> %printer { fprintf (yyoutput, "&%p", $$); } <*> %printer { fprintf (yyoutput, "\"%s\"", $$); } STRING1 string1 %printer { fprintf (yyoutput, "<>"); } <> |
guarantees that, when the parser print any symbol that has a semantic type
tag other than <character>
, it display the address of the semantic
value by default. However, when the parser displays a STRING1
or a
string1
, it formats it as a string in double quotes. It performs
only the second %printer
in this case, so it prints only once.
Finally, the parser print ‘<>’ for any symbol, such as TAGLESS
,
that has no semantic type tag. See also
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Bison normally warns if there are any conflicts in the grammar
(see section Shift/Reduce Conflicts), but most real grammars
have harmless shift/reduce conflicts which are resolved in a predictable
way and would be difficult to eliminate. It is desirable to suppress
the warning about these conflicts unless the number of conflicts
changes. You can do this with the %expect
declaration.
The declaration looks like this:
%expect n |
Here n is a decimal integer. The declaration says there should be n shift/reduce conflicts and no reduce/reduce conflicts. Bison reports an error if the number of shift/reduce conflicts differs from n, or if there are any reduce/reduce conflicts.
For deterministic parsers, reduce/reduce conflicts are more serious, and should be eliminated entirely. Bison will always report reduce/reduce conflicts for these parsers. With GLR parsers, however, both kinds of conflicts are routine; otherwise, there would be no need to use GLR parsing. Therefore, it is also possible to specify an expected number of reduce/reduce conflicts in GLR parsers, using the declaration:
%expect-rr n |
In general, using %expect
involves these steps:
%expect
. Use the ‘-v’ option
to get a verbose list of where the conflicts occur. Bison will also
print the number of conflicts.
%expect
declaration, copying the number n from the
number which Bison printed. With GLR parsers, add an
%expect-rr
declaration as well.
Now Bison will report an error if you introduce an unexpected conflict, but will keep silent otherwise.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Bison assumes by default that the start symbol for the grammar is the first
nonterminal specified in the grammar specification section. The programmer
may override this restriction with the %start
declaration as follows:
%start symbol |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A reentrant program is one which does not alter in the course of execution; in other words, it consists entirely of pure (read-only) code. Reentrancy is important whenever asynchronous execution is possible; for example, a nonreentrant program may not be safe to call from a signal handler. In systems with multiple threads of control, a nonreentrant program must be called only within interlocks.
Normally, Bison generates a parser which is not reentrant. This is
suitable for most uses, and it permits compatibility with Yacc. (The
standard Yacc interfaces are inherently nonreentrant, because they use
statically allocated variables for communication with yylex
,
including yylval
and yylloc
.)
Alternatively, you can generate a pure, reentrant parser. The Bison declaration ‘%define api.pure’ says that you want the parser to be reentrant. It looks like this:
%define api.pure full |
The result is that the communication variables yylval
and
yylloc
become local variables in yyparse
, and a different
calling convention is used for the lexical analyzer function
yylex
. See section Calling Conventions for Pure Parsers, for the details of this. The variable yynerrs
becomes local in yyparse
in pull mode but it becomes a member
of yypstate
in push mode. (see section The Error Reporting Function yyerror
). The convention for calling
yyparse
itself is unchanged.
Whether the parser is pure has nothing to do with the grammar rules. You can generate either a pure parser or a nonreentrant parser from any valid grammar.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
A pull parser is called once and it takes control until all its input is completely parsed. A push parser, on the other hand, is called each time a new token is made available.
A push parser is typically useful when the parser is part of a main event loop in the client’s application. This is typically a requirement of a GUI, when the main event loop needs to be triggered within a certain time period.
Normally, Bison generates a pull parser. The following Bison declaration says that you want the parser to be a push parser (see section api.push-pull):
%define api.push-pull push |
In almost all cases, you want to ensure that your push parser is also a pure parser (see section A Pure (Reentrant) Parser). The only time you should create an impure push parser is to have backwards compatibility with the impure Yacc pull mode interface. Unless you know what you are doing, your declarations should look like this:
%define api.pure full %define api.push-pull push |
There is a major notable functional difference between the pure push parser and the impure push parser. It is acceptable for a pure push parser to have many parser instances, of the same type of parser, in memory at the same time. An impure push parser should only use one parser at a time.
When a push parser is selected, Bison will generate some new symbols in
the generated parser. yypstate
is a structure that the generated
parser uses to store the parser’s state. yypstate_new
is the
function that will create a new parser instance. yypstate_delete
will free the resources associated with the corresponding parser instance.
Finally, yypush_parse
is the function that should be called whenever a
token is available to provide the parser. A trivial example
of using a pure push parser would look like this:
int status; yypstate *ps = yypstate_new (); do { status = yypush_parse (ps, yylex (), NULL); } while (status == YYPUSH_MORE); yypstate_delete (ps); |
If the user decided to use an impure push parser, a few things about
the generated parser will change. The yychar
variable becomes
a global variable instead of a variable in the yypush_parse
function.
For this reason, the signature of the yypush_parse
function is
changed to remove the token as a parameter. A nonreentrant push parser
example would thus look like this:
extern int yychar; int status; yypstate *ps = yypstate_new (); do { yychar = yylex (); status = yypush_parse (ps); } while (status == YYPUSH_MORE); yypstate_delete (ps); |
That’s it. Notice the next token is put into the global variable yychar
for use by the next invocation of the yypush_parse
function.
Bison also supports both the push parser interface along with the pull parser
interface in the same generated parser. In order to get this functionality,
you should replace the ‘%define api.push-pull push’ declaration with the
‘%define api.push-pull both’ declaration. Doing this will create all of
the symbols mentioned earlier along with the two extra symbols, yyparse
and yypull_parse
. yyparse
can be used exactly as it normally
would be used. However, the user should note that it is implemented in the
generated parser by calling yypull_parse
.
This makes the yyparse
function that is generated with the
‘%define api.push-pull both’ declaration slower than the normal
yyparse
function. If the user
calls the yypull_parse
function it will parse the rest of the input
stream. It is possible to yypush_parse
tokens to select a subgrammar
and then yypull_parse
the rest of the input stream. If you would like
to switch back and forth between between parsing styles, you would have to
write your own yypull_parse
function that knows when to quit looking
for input. An example of using the yypull_parse
function would look
like this:
yypstate *ps = yypstate_new (); yypull_parse (ps); /* Will call the lexer */ yypstate_delete (ps); |
Adding the ‘%define api.pure’ declaration does exactly the same thing to the generated parser with ‘%define api.push-pull both’ as it did for ‘%define api.push-pull push’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here is a summary of the declarations used to define a grammar:
Declare the collection of data types that semantic values may have (see section The Union Declaration).
Declare a terminal symbol (token type name) with no precedence or associativity specified (see section Token Type Names).
Declare a terminal symbol (token type name) that is right-associative (see section Operator Precedence).
Declare a terminal symbol (token type name) that is left-associative (see section Operator Precedence).
Declare a terminal symbol (token type name) that is nonassociative (see section Operator Precedence). Using it in a way that would be associative is a syntax error.
Declare the type of semantic values for a nonterminal symbol (see section Nonterminal Symbols).
Specify the grammar’s start symbol (see section The Start-Symbol).
Declare the expected number of shift-reduce conflicts (see section Suppressing Conflict Warnings).
In order to change the behavior of bison
, use the following
directives:
Insert code verbatim into the output parser source at the default location or at the location specified by qualifier. See section %code Summary.
Instrument the parser for traces. Obsoleted by ‘%define parse.trace’. See section Tracing Your Parser.
Define a variable to adjust Bison’s behavior. See section %define Summary.
Write a parser header file containing macro definitions for the token type names defined in the grammar as well as a few other declarations. If the parser implementation file is named ‘name.c’ then the parser header file is named ‘name.h’.
For C parsers, the parser header file declares YYSTYPE
unless
YYSTYPE
is already defined as a macro or you have used a
<type>
tag without using %union
. Therefore, if
you are using a %union
(see section More Than One Value Type) with components that require other definitions, or if you
have defined a YYSTYPE
macro or type definition (see section Data Types of Semantic Values), you need to arrange for these
definitions to be propagated to all modules, e.g., by putting them in
a prerequisite header that is included both by your parser and by any
other module that needs YYSTYPE
.
Unless your parser is pure, the parser header file declares
yylval
as an external variable. See section A Pure (Reentrant) Parser.
If you have also used locations, the parser header file declares
YYLTYPE
and yylloc
using a protocol similar to that of the
YYSTYPE
macro and yylval
. See section Tracking Locations.
This parser header file is normally essential if you wish to put the
definition of yylex
in a separate source file, because
yylex
typically needs to be able to refer to the
above-mentioned declarations and to the token type codes. See section Semantic Values of Tokens.
If you have declared %code requires
or %code provides
, the output
header also contains their code.
See section %code Summary.
The generated header is protected against multiple inclusions with a C preprocessor guard: ‘YY_PREFIX_FILE_INCLUDED’, where PREFIX and FILE are the prefix (see section Multiple Parsers in the Same Program) and generated file name turned uppercase, with each series of non alphanumerical characters converted to a single underscore.
For instance with ‘%define api.prefix {calc}’ and ‘%defines "lib/parse.h"’, the header will be guarded as follows.
#ifndef YY_CALC_LIB_PARSE_H_INCLUDED # define YY_CALC_LIB_PARSE_H_INCLUDED ... #endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */ |
Same as above, but save in the file ‘defines-file’.
Specify how the parser should reclaim the memory associated to discarded symbols. See section Freeing Discarded Symbols.
Specify a prefix to use for all Bison output file names. The names are chosen as if the grammar file were named ‘prefix.y’.
Specify the programming language for the generated parser. Currently supported languages include C, C++, and Java. language is case-insensitive.
Generate the code processing the locations (see section Special Features for Use in Actions). This mode is enabled as soon as the grammar uses the special ‘@n’ tokens, but if your grammar does not use it, using ‘%locations’ allows for more accurate syntax error messages.
Rename the external symbols used in the parser so that they start with
prefix instead of ‘yy’. The precise list of symbols renamed
in C parsers
is yyparse
, yylex
, yyerror
, yynerrs
,
yylval
, yychar
, yydebug
, and
(if locations are used) yylloc
. If you use a push parser,
yypush_parse
, yypull_parse
, yypstate
,
yypstate_new
and yypstate_delete
will
also be renamed. For example, if you use ‘%name-prefix "c_"’, the
names become c_parse
, c_lex
, and so on.
For C++ parsers, see the ‘%define api.namespace’ documentation in this
section.
See section Multiple Parsers in the Same Program.
Don’t generate any #line
preprocessor commands in the parser
implementation file. Ordinarily Bison writes these commands in the
parser implementation file so that the C compiler and debuggers will
associate errors and object code with your source file (the grammar
file). This directive causes them to associate errors with the parser
implementation file, treating it as an independent source file in its
own right.
Generate the parser implementation in ‘file’.
Deprecated version of ‘%define api.pure’ (see section api.pure), for which Bison is more careful to warn about unreasonable usage.
Require version version or higher of Bison. See section Require a Version of Bison.
Specify the skeleton to use.
If file does not contain a /
, file is the name of a skeleton
file in the Bison installation directory.
If it does, file is an absolute file name or a file name relative to the
directory of the grammar file.
This is similar to how most shells resolve commands.
Generate an array of token names in the parser implementation file.
The name of the array is yytname
; yytname[i]
is
the name of the token whose internal Bison token code number is
i. The first three elements of yytname
correspond to the
predefined tokens "$end"
, "error"
, and
"$undefined"
; after these come the symbols defined in the
grammar file.
The name in the table includes all the characters needed to represent
the token in Bison. For single-character literals and literal
strings, this includes the surrounding quoting characters and any
escape sequences. For example, the Bison single-character literal
'+'
corresponds to a three-character name, represented in C as
"'+'"
; and the Bison two-character literal string "\\/"
corresponds to a five-character name, represented in C as
"\"\\\\/\""
.
When you specify %token-table
, Bison also generates macro
definitions for macros YYNTOKENS
, YYNNTS
, and
YYNRULES
, and YYNSTATES
:
YYNTOKENS
The highest token number, plus one.
YYNNTS
The number of nonterminal symbols.
YYNRULES
The number of grammar rules,
YYNSTATES
The number of parser states (see section Parser States).
Write an extra output file containing verbose descriptions of the parser states and what is done for each type of lookahead token in that state. See section Understanding Your Parser, for more information.
Pretend the option ‘--yacc’ was given, i.e., imitate Yacc, including its naming conventions. See section Bison Options, for more.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are many features of Bison’s behavior that can be controlled by
assigning the feature a single value. For historical reasons, some
such features are assigned values by dedicated directives, such as
%start
, which assigns the start symbol. However, newer such
features are associated with variables, which are assigned by the
%define
directive:
Define variable to value.
The type of the values depend on the syntax. Braces denote value in the target language (e.g., a namespace, a type, etc.). Keyword values (no delimiters) denote finite choice (e.g., a variation of a feature). String values denote remaining cases (e.g., a file name).
It is an error if a variable is defined by %define
multiple
times, but see -D name[=value].
The rest of this section summarizes variables and values that
%define
accepts.
Some variables take Boolean values. In this case, Bison will complain if the variable definition does not meet one of the following four conditions:
value
is true
value
is omitted (or ""
is specified).
This is equivalent to true
.
value
is false
.
What variables are accepted, as well as their meanings and default values, depend on the selected target language and/or the parser skeleton (see section %language, see section %skeleton). Unaccepted variables produce an error. Some of the accepted variables are described below.
%define api.namespace {foo::bar} |
Bison uses foo::bar
verbatim in references such as:
foo::bar::parser::semantic_type |
However, to open a namespace, Bison removes any leading ::
and then
splits on any remaining occurrences:
namespace foo { namespace bar { class position; class location; } } |
"::"
. For example, "foo"
or "::foo::bar"
.
%name-prefix
, which defaults to yy
.
This usage of %name-prefix
is for backward compatibility and can
be confusing since %name-prefix
also specifies the textual prefix
for the lexical analyzer function. Thus, if you specify
%name-prefix
, it is best to also specify ‘%define
api.namespace’ so that %name-prefix
only affects the
lexical analyzer function. For example, if you specify:
%define api.namespace {foo} %name-prefix "bar::" |
The parser namespace is foo
and yylex
is referenced as
bar::lex
.
location_type
for C++ in Bison 2.5 and for Java in Bison 2.4.
yy
true
, false
, full
The value may be omitted: this is equivalent to specifying true
, as is
the case for Boolean values.
When %define api.pure full
is used, the parser is made reentrant. This
changes the signature for yylex
(see section Calling Conventions for Pure Parsers), and also that of
yyerror
when the tracking of locations has been activated, as shown
below.
The true
value is very similar to the full
value, the only
difference is in the signature of yyerror
on Yacc parsers without
%parse-param
, for historical reasons.
I.e., if ‘%locations %define api.pure’ is passed then the prototypes for
yyerror
are:
void yyerror (char const *msg); // Yacc parsers. void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers. |
But if ‘%locations %define api.pure %parse-param {int *nastiness}’ is used, then both parsers have the same signature:
void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg); |
(see section The Error Reporting Function yyerror
)
false
full
value was introduced in Bison 2.7
pull
, push
, both
pull
false
%token FILE for ERROR %define api.token.prefix {TOK_} %% start: FILE for ERROR; |
generates the definition of the symbols TOK_FILE
, TOK_for
,
and TOK_ERROR
in the generated source files. In particular, the
scanner must use these prefixed token names, while the grammar itself
may still use the short names (as in the sample rule given above). The
generated informational files (‘*.output’, ‘*.xml’,
‘*.dot’) are not modified by this prefix.
Bison also prefixes the generated member names of the semantic value union. See section Generating the Semantic Value Type, for more details.
See Calc++ Parser and Calc++ Scanner, for a complete example.
This grammar has no semantic value at all. This is not properly supported yet.
The type is defined thanks to the %union
directive. You don’t have
to define api.value.type
in that case, using %union
suffices.
See section The Union Declaration.
For instance:
%define api.value.type union-directive %union { int ival; char *sval; } %token <ival> INT "integer" %token <sval> STR "string" |
The symbols are defined with type names, from which Bison will generate a
union
. For instance:
%define api.value.type union %token <int> INT "integer" %token <char *> STR "string" |
This feature needs user feedback to stabilize. Note that most C++ objects
cannot be stored in a union
.
This is similar to union
, but special storage techniques are used to
allow any kind of C++ object to be used. For instance:
%define api.value.type variant %token <int> INT "integer" %token <std::string> STR "string" |
This feature needs user feedback to stabilize. See section C++ Variants.
Use this type as semantic value.
%code requires { struct my_value { enum { is_int, is_str } kind; union { int ival; char *sval; } u; }; } %define api.value.type {struct my_value} %token <u.ival> INT "integer" %token <u.sval> STR "string" |
%union
if %union
is used, otherwise …
int
if type tags are used (i.e., ‘%token <type>…’ or
‘%token <type>…’ is used), otherwise …
""
stype
.
Obsoleted by api.location.type
since Bison 2.7.
most
, consistent
, accepting
accepting
if lr.type
is canonical-lr
.
most
otherwise.
lr.default-reductions
in 2.5, renamed as
lr.default-reduction
in 3.0.
false
lr.keep_unreachable_states
in 2.3b, renamed as
lr.keep-unreachable-states
in 2.5, and as
lr.keep-unreachable-state
in 3.0.
lalr
, ielr
, canonical-lr
lalr
Obsoleted by api.namespace
false
yyerror
.
simple
Error messages passed to yyerror
are simply "syntax
error"
.
verbose
Error messages report the unexpected token, and possibly the expected ones.
However, this report can often be incorrect when LAC is not enabled
(see section LAC).
simple
none
, full
none
In C/C++, define the macro YYDEBUG
(or prefixDEBUG
with
‘%define api.prefix {prefix}’), see Multiple Parsers in the Same Program) to 1 in the parser implementation
file if it is not already defined, so that the debugging facilities are
compiled.
false
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The %code
directive inserts code verbatim into the output
parser source at any of a predefined set of locations. It thus serves
as a flexible and user-friendly alternative to the traditional Yacc
prologue, %{code%}
. This section summarizes the
functionality of %code
for the various target languages
supported by Bison. For a detailed discussion of how to use
%code
in place of %{code%}
for C/C++ and why it
is advantageous to do so, see section Prologue Alternatives.
This is the unqualified form of the %code
directive. It
inserts code verbatim at a language-dependent default location
in the parser implementation.
For C/C++, the default location is the parser implementation file
after the usual contents of the parser header file. Thus, the
unqualified form replaces %{code%}
for most purposes.
For Java, the default location is inside the parser class.
This is the qualified form of the %code
directive.
qualifier identifies the purpose of code and thus the
location(s) where Bison should insert it. That is, if you need to
specify location-sensitive code that does not belong at the
default location selected by the unqualified %code
form, use
this form instead.
For any particular qualifier or for the unqualified form, if there are
multiple occurrences of the %code
directive, Bison concatenates
the specified code in the order in which it appears in the grammar
file.
Not all qualifiers are accepted for all target languages. Unaccepted qualifiers produce an error. Some of the accepted qualifiers are:
requires
YYSTYPE
and YYLTYPE
. In other words, it’s the best place to
define types referenced in %union
directives. If you use
#define
to override Bison’s default YYSTYPE
and YYLTYPE
definitions, then it is also the best place. However you should rather
%define
api.value.type
and api.location.type
.
YYSTYPE
and YYLTYPE
definitions.
provides
YYSTYPE
, YYLTYPE
, and
token definitions.
top
%code
or %code requires
should usually be more appropriate than %code top
. However,
occasionally it is necessary to insert code much nearer the top of the
parser implementation file. For example:
%code top { #define _GNU_SOURCE #include <stdio.h> } |
imports
Though we say the insertion locations are language-dependent, they are technically skeleton-dependent. Writers of non-standard skeletons however should choose their locations consistently with the behavior of the standard Bison skeletons.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.