Bison 3.0.2: 3.7 Bison Declarations

3.7 Bison Declarations

The Bison declarations section of a Bison grammar defines the symbols used in formulating the grammar and the data types of semantic values. See section Symbols, Terminal and Nonterminal.

All token type names (but not single-character literal tokens such as '+' and '*') must be declared. Nonterminal symbols must be declared if you need to specify which data type to use for the semantic value (see section More Than One Value Type).

The first rule in the grammar file also specifies the start symbol, by default. If you want some other symbol to be the start symbol, you must declare it explicitly (see section Languages and Context-Free Grammars).

3.7.1 Require a Version of Bison		Requiring a Bison version.
3.7.2 Token Type Names		Declaring terminal symbols.
3.7.3 Operator Precedence		Declaring terminals with precedence and associativity.
3.7.4 Nonterminal Symbols		Declaring the choice of type for a nonterminal symbol.
3.7.5 Performing Actions before Parsing		Code run before parsing starts.
3.7.6 Freeing Discarded Symbols		Declaring how symbols are freed.
3.7.7 Printing Semantic Values		Declaring how symbol values are displayed.
3.7.8 Suppressing Conflict Warnings		Suppressing warnings about parsing conflicts.
3.7.9 The Start-Symbol		Specifying the start symbol.
3.7.10 A Pure (Reentrant) Parser		Requesting a reentrant parser.
3.7.11 A Push Parser		Requesting a push parser.
3.7.12 Bison Declaration Summary		Table of all Bison declarations.
3.7.13 %define Summary		Defining variables to adjust Bison’s behavior.
3.7.14 %code Summary		Inserting code into the parser source.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.1 Require a Version of Bison

You may require the minimum version of Bison to process the grammar. If the requirement is not met, bison exits with an error (exit status 63).

%require "version"

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.2 Token Type Names

The basic way to declare a token type name (terminal symbol) is as follows:

%token name

Bison will convert this into a #define directive in the parser, so that the function yylex (if it is in this file) can use the name name to stand for this token type’s code.

Alternatively, you can use %left, %right, %precedence, or %nonassoc instead of %token, if you wish to specify associativity and precedence. See section Operator Precedence.

You can explicitly specify the numeric code for a token type by appending a nonnegative decimal or hexadecimal integer value in the field immediately following the token name:

%token NUM 300
%token XNUM 0x12d // a GNU extension

It is generally best, however, to let Bison choose the numeric codes for all token types. Bison will automatically select codes that don’t conflict with each other or with normal characters.

In the event that the stack type is a union, you must augment the %token or other token declaration to include the data type alternative delimited by angle-brackets (see section More Than One Value Type).

For example:

%union {              /* define stack type */
  double val;
  symrec *tptr;
}
%token <val> NUM      /* define token NUM and its type */

You can associate a literal string token with a token type name by writing the literal string at the end of a %token declaration which declares the name. For example:

%token arrow "=>"

For example, a grammar for the C language might specify these names with equivalent literal string tokens:

%token  <operator>  OR      "||"
%token  <operator>  LE 134  "<="
%left  OR  "<="

Once you equate the literal string and the token name, you can use them interchangeably in further declarations or the grammar rules. The yylex function can use the token name or the literal string to obtain the token type code number (see section Calling Convention for yylex). Syntax error messages passed to yyerror from the parser will reference the literal string instead of the token name.

The token numbered as 0 corresponds to end of file; the following line allows for nicer error messages referring to “end of file” instead of “$end”:

%token END 0 "end of file"

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.3 Operator Precedence

Use the %left, %right, %nonassoc, or %precedence declaration to declare a token and specify its precedence and associativity, all at once. These are called precedence declarations. See section Operator Precedence, for general information on operator precedence.

The syntax of a precedence declaration is nearly the same as that of %token: either

%left symbols…

%left <type> symbols…

And indeed any of these declarations serves the purposes of %token. But in addition, they specify the associativity and relative precedence for all the symbols:

The associativity of an operator op determines how repeated uses of the operator nest: whether ‘x op y op z’ is parsed by grouping x with y first or by grouping y with z first. %left specifies left-associativity (grouping x with y first) and %right specifies right-associativity (grouping y with z first). %nonassoc specifies no associativity, which means that ‘x op y op z’ is considered a syntax error.
%precedence gives only precedence to the symbols, and defines no associativity at all. Use this to define precedence only, and leave any potential conflict due to associativity enabled.
The precedence of an operator determines how it nests with other operators. All the tokens declared in a single precedence declaration have equal precedence and nest together according to their associativity. When two tokens declared in different precedence declarations associate, the one declared later has the higher precedence and is grouped first.

For backward compatibility, there is a confusing difference between the argument lists of %token and precedence declarations. Only a %token can associate a literal string with a token type name. A precedence declaration always interprets a literal string as a reference to a separate token. For example:

%left  OR "<="         // Does not declare an alias.
%left  OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=".

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.4 Nonterminal Symbols

When you use %union to specify multiple value types, you must declare the value type of each nonterminal symbol for which values are used. This is done with a %type declaration, like this:

%type <type> nonterminal…

Here nonterminal is the name of a nonterminal symbol, and type is the name given in the %union to the alternative that you want (see section The Union Declaration). You can give any number of nonterminal symbols in the same %type declaration, if they have the same value type. Use spaces to separate the symbol names.

You can also declare the value type of a terminal symbol. To do this, use the same <type> construction in a declaration for the terminal symbol. All kinds of token declarations allow <type>.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.5 Performing Actions before Parsing

Sometimes your parser needs to perform some initializations before parsing. The %initial-action directive allows for such arbitrary code.

Directive: %initial-action { code }: Declare that the braced code must be invoked before parsing each time yyparse is called. The code may use $$ (or $<tag>$ ) and @$ — initial value and location of the lookahead — and the %parse-param.

For instance, if your locations use a file name, you may use

%parse-param { char const *file_name };
%initial-action
{
  @$.initialize (file_name);
};

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.6 Freeing Discarded Symbols

During error recovery (see section Error Recovery), symbols already pushed on the stack and tokens coming from the rest of the file are discarded until the parser falls on its feet. If the parser runs out of memory, or if it returns via YYABORT or YYACCEPT, all the symbols on the stack must be discarded. Even if the parser succeeds, it must discard the start symbol.

When discarded symbols convey heap based information, this memory is lost. While this behavior can be tolerable for batch parsers, such as in traditional compilers, it is unacceptable for programs like shells or protocol implementations that may parse and execute indefinitely.

The %destructor directive defines code that is called when a symbol is automatically discarded.

Directive: %destructor { code } symbols

Invoke the braced code whenever the parser discards one of the symbols. Within code, $$ (or $<tag>$ ) designates the semantic value associated with the discarded symbol, and @$ designates its location. The additional parser parameters are also available (see section The Parser Function yyparse).

When a symbol is listed among symbols, its %destructor is called a per-symbol %destructor. You may also define a per-type %destructor by listing a semantic type tag among symbols. In that case, the parser will invoke this code whenever it discards any grammar symbol that has that semantic type tag unless that symbol has its own per-symbol %destructor.

Finally, you can define two different kinds of default %destructors. (These default forms are experimental. More user feedback will help to determine whether they should become permanent features.) You can place each of <*> and <> in the symbols list of exactly one %destructor declaration in your grammar file. The parser will invoke the code associated with one of these whenever it discards any user-defined grammar symbol that has no per-symbol and no per-type %destructor. The parser uses the code for <*> in the case of such a grammar symbol for which you have formally declared a semantic type tag (%type counts as such a declaration, but $<tag>$ does not). The parser uses the code for <> in the case of such a grammar symbol that has no declared semantic type tag.

For example:

%union { char *string; }
%token <string> STRING1 STRING2
%type  <string> string1 string2
%union { char character; }
%token <character> CHR
%type  <character> chr
%token TAGLESS

%destructor { } <character>
%destructor { free ($$); } <*>
%destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1
%destructor { printf ("Discarding tagless symbol.\n"); } <>

guarantees that, when the parser discards any user-defined symbol that has a semantic type tag other than <character>, it passes its semantic value to free by default. However, when the parser discards a STRING1 or a string1, it also prints its line number to stdout. It performs only the second %destructor in this case, so it invokes free only once. Finally, the parser merely prints a message whenever it discards any symbol, such as TAGLESS, that has no semantic type tag.

A Bison-generated parser invokes the default %destructors only for user-defined as opposed to Bison-defined symbols. For example, the parser will not invoke either kind of default %destructor for the special Bison-defined symbols $accept, $undefined, or $end (see section Bison Symbols), none of which you can reference in your grammar. It also will not invoke either for the error token (see section error), which is always defined by Bison regardless of whether you reference it in your grammar. However, it may invoke one of them for the end token (token 0) if you redefine it from $end to, for example, END:

%token END 0

Finally, Bison will never invoke a %destructor for an unreferenced mid-rule semantic value (see section Actions in Mid-Rule). That is, Bison does not consider a mid-rule to have a semantic value if you do not reference $$ in the mid-rule’s action or $n (where n is the right-hand side symbol position of the mid-rule) in any later action in that rule. However, if you do reference either, the Bison-generated parser will invoke the <> %destructor whenever it discards the mid-rule symbol.

Discarded symbols are the following:

stacked symbols popped during the first phase of error recovery,
incoming terminals during the second phase of error recovery,
the current lookahead and the entire stack (except the current right-hand side symbols) when the parser returns immediately, and
the current lookahead and the entire stack (including the current right-hand side symbols) when the C++ parser (‘lalr1.cc’) catches an exception in parse,
the start symbol, when the parser succeeds.

The parser can return immediately because of an explicit call to YYABORT or YYACCEPT, or failed error recovery, or memory exhaustion.

Right-hand side symbols of a rule that explicitly triggers a syntax error via YYERROR are not discarded automatically. As a rule of thumb, destructors are invoked only when user actions cannot manage the memory.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.7 Printing Semantic Values

When run-time traces are enabled (see section Tracing Your Parser), the parser reports its actions, such as reductions. When a symbol involved in an action is reported, only its kind is displayed, as the parser cannot know how semantic values should be formatted.

The %printer directive defines code that is called when a symbol is reported. Its syntax is the same as %destructor (see section Freeing Discarded Symbols).

Directive: %printer { code } symbols

Invoke the braced code whenever the parser displays one of the symbols. Within code, yyoutput denotes the output stream (a FILE* in C, and an std::ostream& in C++), $$ (or $<tag>$ ) designates the semantic value associated with the symbol, and @$ its location. The additional parser parameters are also available (see section The Parser Function yyparse).

The symbols are defined as for %destructor (see section Freeing Discarded Symbols.): they can be per-type (e.g., ‘<ival>’), per-symbol (e.g., ‘exp’, ‘NUM’, ‘"float"’), typed per-default (i.e., ‘<*>’, or untyped per-default (i.e., ‘<>’).

For example:

%union { char *string; }
%token <string> STRING1 STRING2
%type  <string> string1 string2
%union { char character; }
%token <character> CHR
%type  <character> chr
%token TAGLESS

%printer { fprintf (yyoutput, "'%c'", $$); } <character>
%printer { fprintf (yyoutput, "&%p", $$); } <*>
%printer { fprintf (yyoutput, "\"%s\"", $$); } STRING1 string1
%printer { fprintf (yyoutput, "<>"); } <>

guarantees that, when the parser print any symbol that has a semantic type tag other than <character>, it display the address of the semantic value by default. However, when the parser displays a STRING1 or a string1, it formats it as a string in double quotes. It performs only the second %printer in this case, so it prints only once. Finally, the parser print ‘<>’ for any symbol, such as TAGLESS, that has no semantic type tag. See also

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.8 Suppressing Conflict Warnings

Bison normally warns if there are any conflicts in the grammar (see section Shift/Reduce Conflicts), but most real grammars have harmless shift/reduce conflicts which are resolved in a predictable way and would be difficult to eliminate. It is desirable to suppress the warning about these conflicts unless the number of conflicts changes. You can do this with the %expect declaration.

The declaration looks like this:

%expect n

Here n is a decimal integer. The declaration says there should be n shift/reduce conflicts and no reduce/reduce conflicts. Bison reports an error if the number of shift/reduce conflicts differs from n, or if there are any reduce/reduce conflicts.

For deterministic parsers, reduce/reduce conflicts are more serious, and should be eliminated entirely. Bison will always report reduce/reduce conflicts for these parsers. With GLR parsers, however, both kinds of conflicts are routine; otherwise, there would be no need to use GLR parsing. Therefore, it is also possible to specify an expected number of reduce/reduce conflicts in GLR parsers, using the declaration:

%expect-rr n

In general, using %expect involves these steps:

Compile your grammar without %expect. Use the ‘-v’ option to get a verbose list of where the conflicts occur. Bison will also print the number of conflicts.
Check each of the conflicts to make sure that Bison’s default resolution is what you really want. If not, rewrite the grammar and go back to the beginning.
Add an %expect declaration, copying the number n from the number which Bison printed. With GLR parsers, add an %expect-rr declaration as well.

Now Bison will report an error if you introduce an unexpected conflict, but will keep silent otherwise.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.9 The Start-Symbol

Bison assumes by default that the start symbol for the grammar is the first nonterminal specified in the grammar specification section. The programmer may override this restriction with the %start declaration as follows:

%start symbol

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.10 A Pure (Reentrant) Parser

A reentrant program is one which does not alter in the course of execution; in other words, it consists entirely of pure (read-only) code. Reentrancy is important whenever asynchronous execution is possible; for example, a nonreentrant program may not be safe to call from a signal handler. In systems with multiple threads of control, a nonreentrant program must be called only within interlocks.

Normally, Bison generates a parser which is not reentrant. This is suitable for most uses, and it permits compatibility with Yacc. (The standard Yacc interfaces are inherently nonreentrant, because they use statically allocated variables for communication with yylex, including yylval and yylloc.)

Alternatively, you can generate a pure, reentrant parser. The Bison declaration ‘%define api.pure’ says that you want the parser to be reentrant. It looks like this:

%define api.pure full

The result is that the communication variables yylval and yylloc become local variables in yyparse, and a different calling convention is used for the lexical analyzer function yylex. See section Calling Conventions for Pure Parsers, for the details of this. The variable yynerrs becomes local in yyparse in pull mode but it becomes a member of yypstate in push mode. (see section The Error Reporting Function yyerror). The convention for calling yyparse itself is unchanged.

Whether the parser is pure has nothing to do with the grammar rules. You can generate either a pure parser or a nonreentrant parser from any valid grammar.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.11 A Push Parser

(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)

A pull parser is called once and it takes control until all its input is completely parsed. A push parser, on the other hand, is called each time a new token is made available.

A push parser is typically useful when the parser is part of a main event loop in the client’s application. This is typically a requirement of a GUI, when the main event loop needs to be triggered within a certain time period.

Normally, Bison generates a pull parser. The following Bison declaration says that you want the parser to be a push parser (see section api.push-pull):

%define api.push-pull push

In almost all cases, you want to ensure that your push parser is also a pure parser (see section A Pure (Reentrant) Parser). The only time you should create an impure push parser is to have backwards compatibility with the impure Yacc pull mode interface. Unless you know what you are doing, your declarations should look like this:

%define api.pure full
%define api.push-pull push

There is a major notable functional difference between the pure push parser and the impure push parser. It is acceptable for a pure push parser to have many parser instances, of the same type of parser, in memory at the same time. An impure push parser should only use one parser at a time.

When a push parser is selected, Bison will generate some new symbols in the generated parser. yypstate is a structure that the generated parser uses to store the parser’s state. yypstate_new is the function that will create a new parser instance. yypstate_delete will free the resources associated with the corresponding parser instance. Finally, yypush_parse is the function that should be called whenever a token is available to provide the parser. A trivial example of using a pure push parser would look like this:

int status;
yypstate *ps = yypstate_new ();
do {
  status = yypush_parse (ps, yylex (), NULL);
} while (status == YYPUSH_MORE);
yypstate_delete (ps);

If the user decided to use an impure push parser, a few things about the generated parser will change. The yychar variable becomes a global variable instead of a variable in the yypush_parse function. For this reason, the signature of the yypush_parse function is changed to remove the token as a parameter. A nonreentrant push parser example would thus look like this:

extern int yychar;
int status;
yypstate *ps = yypstate_new ();
do {
  yychar = yylex ();
  status = yypush_parse (ps);
} while (status == YYPUSH_MORE);
yypstate_delete (ps);

That’s it. Notice the next token is put into the global variable yychar for use by the next invocation of the yypush_parse function.

Bison also supports both the push parser interface along with the pull parser interface in the same generated parser. In order to get this functionality, you should replace the ‘%define api.push-pull push’ declaration with the ‘%define api.push-pull both’ declaration. Doing this will create all of the symbols mentioned earlier along with the two extra symbols, yyparse and yypull_parse. yyparse can be used exactly as it normally would be used. However, the user should note that it is implemented in the generated parser by calling yypull_parse. This makes the yyparse function that is generated with the ‘%define api.push-pull both’ declaration slower than the normal yyparse function. If the user calls the yypull_parse function it will parse the rest of the input stream. It is possible to yypush_parse tokens to select a subgrammar and then yypull_parse the rest of the input stream. If you would like to switch back and forth between between parsing styles, you would have to write your own yypull_parse function that knows when to quit looking for input. An example of using the yypull_parse function would look like this:

yypstate *ps = yypstate_new ();
yypull_parse (ps); /* Will call the lexer */
yypstate_delete (ps);

Adding the ‘%define api.pure’ declaration does exactly the same thing to the generated parser with ‘%define api.push-pull both’ as it did for ‘%define api.push-pull push’.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.12 Bison Declaration Summary

Here is a summary of the declarations used to define a grammar:

Directive: %union: Declare the collection of data types that semantic values may have (see section The Union Declaration).

Directive: %token: Declare a terminal symbol (token type name) with no precedence or associativity specified (see section Token Type Names).

Directive: %right: Declare a terminal symbol (token type name) that is right-associative (see section Operator Precedence).

Directive: %left: Declare a terminal symbol (token type name) that is left-associative (see section Operator Precedence).

Directive: %nonassoc: Declare a terminal symbol (token type name) that is nonassociative (see section Operator Precedence). Using it in a way that would be associative is a syntax error.

Directive: %type: Declare the type of semantic values for a nonterminal symbol (see section Nonterminal Symbols).

Directive: %start: Specify the grammar’s start symbol (see section The Start-Symbol).

Directive: %expect: Declare the expected number of shift-reduce conflicts (see section Suppressing Conflict Warnings).

In order to change the behavior of bison, use the following directives:

Directive: %code {code}
Directive: %code qualifier {code}: Insert code verbatim into the output parser source at the default location or at the location specified by qualifier. See section %code Summary.

Directive: %debug: Instrument the parser for traces. Obsoleted by ‘%define parse.trace’. See section Tracing Your Parser.

Directive: %define variable
Directive: %define variable value
Directive: %define variable {value}
Directive: %define variable "value": Define a variable to adjust Bison’s behavior. See section %define Summary.

Directive: %defines

Write a parser header file containing macro definitions for the token type names defined in the grammar as well as a few other declarations. If the parser implementation file is named ‘name.c’ then the parser header file is named ‘name.h’.

For C parsers, the parser header file declares YYSTYPE unless YYSTYPE is already defined as a macro or you have used a <type> tag without using %union. Therefore, if you are using a %union (see section More Than One Value Type) with components that require other definitions, or if you have defined a YYSTYPE macro or type definition (see section Data Types of Semantic Values), you need to arrange for these definitions to be propagated to all modules, e.g., by putting them in a prerequisite header that is included both by your parser and by any other module that needs YYSTYPE.

Unless your parser is pure, the parser header file declares yylval as an external variable. See section A Pure (Reentrant) Parser.

If you have also used locations, the parser header file declares YYLTYPE and yylloc using a protocol similar to that of the YYSTYPE macro and yylval. See section Tracking Locations.

This parser header file is normally essential if you wish to put the definition of yylex in a separate source file, because yylex typically needs to be able to refer to the above-mentioned declarations and to the token type codes. See section Semantic Values of Tokens.

If you have declared %code requires or %code provides, the output header also contains their code. See section %code Summary.

The generated header is protected against multiple inclusions with a C preprocessor guard: ‘YY_PREFIX_FILE_INCLUDED’, where PREFIX and FILE are the prefix (see section Multiple Parsers in the Same Program) and generated file name turned uppercase, with each series of non alphanumerical characters converted to a single underscore.

For instance with ‘%define api.prefix {calc}’ and ‘%defines "lib/parse.h"’, the header will be guarded as follows.

#ifndef YY_CALC_LIB_PARSE_H_INCLUDED
# define YY_CALC_LIB_PARSE_H_INCLUDED
...
#endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */

Directive: %defines defines-file: Same as above, but save in the file ‘defines-file’.

Directive: %destructor: Specify how the parser should reclaim the memory associated to discarded symbols. See section Freeing Discarded Symbols.

Directive: %file-prefix "prefix": Specify a prefix to use for all Bison output file names. The names are chosen as if the grammar file were named ‘prefix.y’.

Directive: %language "language": Specify the programming language for the generated parser. Currently supported languages include C, C++, and Java. language is case-insensitive.

Directive: %locations: Generate the code processing the locations (see section Special Features for Use in Actions). This mode is enabled as soon as the grammar uses the special ‘@n’ tokens, but if your grammar does not use it, using ‘%locations’ allows for more accurate syntax error messages.

Directive: %name-prefix "prefix": Rename the external symbols used in the parser so that they start with prefix instead of ‘yy’. The precise list of symbols renamed in C parsers is yyparse, yylex, yyerror, yynerrs, yylval, yychar, yydebug, and (if locations are used) yylloc. If you use a push parser, yypush_parse, yypull_parse, yypstate, yypstate_new and yypstate_delete will also be renamed. For example, if you use ‘%name-prefix "c_"’, the names become c_parse, c_lex, and so on. For C++ parsers, see the ‘%define api.namespace’ documentation in this section. See section Multiple Parsers in the Same Program.

Directive: %no-lines: Don’t generate any #line preprocessor commands in the parser implementation file. Ordinarily Bison writes these commands in the parser implementation file so that the C compiler and debuggers will associate errors and object code with your source file (the grammar file). This directive causes them to associate errors with the parser implementation file, treating it as an independent source file in its own right.

Directive: %output "file": Generate the parser implementation in ‘file’.

Directive: %pure-parser: Deprecated version of ‘%define api.pure’ (see section api.pure), for which Bison is more careful to warn about unreasonable usage.

Directive: %require "version": Require version version or higher of Bison. See section Require a Version of Bison.

Directive: %skeleton "file"

Specify the skeleton to use.

If file does not contain a /, file is the name of a skeleton file in the Bison installation directory. If it does, file is an absolute file name or a file name relative to the directory of the grammar file. This is similar to how most shells resolve commands.

Directive: %token-table

Generate an array of token names in the parser implementation file. The name of the array is yytname; yytname[i] is the name of the token whose internal Bison token code number is i. The first three elements of yytname correspond to the predefined tokens "$end", "error", and "$undefined"; after these come the symbols defined in the grammar file.

The name in the table includes all the characters needed to represent the token in Bison. For single-character literals and literal strings, this includes the surrounding quoting characters and any escape sequences. For example, the Bison single-character literal '+' corresponds to a three-character name, represented in C as "'+'"; and the Bison two-character literal string "\\/" corresponds to a five-character name, represented in C as "\"\\\\/\"".

When you specify %token-table, Bison also generates macro definitions for macros YYNTOKENS, YYNNTS, and YYNRULES, and YYNSTATES:

YYNTOKENS: The highest token number, plus one.
YYNNTS: The number of nonterminal symbols.
YYNRULES: The number of grammar rules,
YYNSTATES: The number of parser states (see section Parser States).

Directive: %verbose: Write an extra output file containing verbose descriptions of the parser states and what is done for each type of lookahead token in that state. See section Understanding Your Parser, for more information.

Directive: %yacc: Pretend the option ‘--yacc’ was given, i.e., imitate Yacc, including its naming conventions. See section Bison Options, for more.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.13 %define Summary

There are many features of Bison’s behavior that can be controlled by assigning the feature a single value. For historical reasons, some such features are assigned values by dedicated directives, such as %start, which assigns the start symbol. However, newer such features are associated with variables, which are assigned by the %define directive:

Directive: %define variable

Directive: %define variable value

Directive: %define variable {value}

Directive: %define variable "value"

Define variable to value.

The type of the values depend on the syntax. Braces denote value in the target language (e.g., a namespace, a type, etc.). Keyword values (no delimiters) denote finite choice (e.g., a variation of a feature). String values denote remaining cases (e.g., a file name).

It is an error if a variable is defined by %define multiple times, but see -D name[=value].

The rest of this section summarizes variables and values that %define accepts.

Some variables take Boolean values. In this case, Bison will complain if the variable definition does not meet one of the following four conditions:

value is true
value is omitted (or "" is specified). This is equivalent to true.
value is false.
variable is never defined. In this case, Bison selects a default value.

What variables are accepted, as well as their meanings and default values, depend on the selected target language and/or the parser skeleton (see section %language, see section %skeleton). Unaccepted variables produce an error. Some of the accepted variables are described below.

Directive: %define api.namespace {namespace}

Languages(s): C++
Purpose: Specify the namespace for the parser class. For example, if you specify:
%define api.namespace {foo::bar}
Bison uses foo::bar verbatim in references such as:
foo::bar::parser::semantic_type
However, to open a namespace, Bison removes any leading :: and then splits on any remaining occurrences:
namespace foo { namespace bar { class position; class location; } }
Accepted Values: Any absolute or relative C++ namespace reference without a trailing "::". For example, "foo" or "::foo::bar".
Default Value: The value specified by %name-prefix, which defaults to yy. This usage of %name-prefix is for backward compatibility and can be confusing since %name-prefix also specifies the textual prefix for the lexical analyzer function. Thus, if you specify %name-prefix, it is best to also specify ‘%define api.namespace’ so that %name-prefix only affects the lexical analyzer function. For example, if you specify:
%define api.namespace {foo} %name-prefix "bar::"
The parser namespace is foo and yylex is referenced as bar::lex.

Directive: %define api.location.type {type}

Language(s): C++, Java
Purpose: Define the location type. See section User Defined Location Type.
Accepted Values: String
Default Value: none
History: Introduced in Bison 2.7 for C, C++ and Java. Introduced under the name location_type for C++ in Bison 2.5 and for Java in Bison 2.4.

Directive: %define api.prefix {prefix}

Language(s): All
Purpose: Rename exported symbols. See section Multiple Parsers in the Same Program.
Accepted Values: String
Default Value: yy
History: introduced in Bison 2.6

Directive: %define api.pure purity

Language(s): C
Purpose: Request a pure (reentrant) parser program. See section A Pure (Reentrant) Parser.
Accepted Values: true, false, full
The value may be omitted: this is equivalent to specifying true, as is the case for Boolean values.

When %define api.pure full is used, the parser is made reentrant. This changes the signature for yylex (see section Calling Conventions for Pure Parsers), and also that of yyerror when the tracking of locations has been activated, as shown below.

The true value is very similar to the full value, the only difference is in the signature of yyerror on Yacc parsers without %parse-param, for historical reasons.

I.e., if ‘%locations %define api.pure’ is passed then the prototypes for yyerror are:
void yyerror (char const *msg); // Yacc parsers. void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers.
But if ‘%locations %define api.pure %parse-param {int *nastiness}’ is used, then both parsers have the same signature:
void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg);
(see section The Error Reporting Function yyerror)
Default Value: false
History: the full value was introduced in Bison 2.7

Directive: %define api.push-pull kind

Language(s): C (deterministic parsers only)
Purpose: Request a pull parser, a push parser, or both. See section A Push Parser. (The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
Accepted Values: pull, push, both
Default Value: pull

Directive: %define api.token.constructor

Language(s): C++
Purpose: When variant-based semantic values are enabled (see section C++ Variants), request that symbols be handled as a whole (type, value, and possibly location) in the scanner. See section Complete Symbols, for details.
Accepted Values: Boolean.
Default Value: false
History: introduced in Bison 3.0

Directive: %define api.token.prefix {prefix}

Languages(s): all
Purpose: Add a prefix to the token names when generating their definition in the target language. For instance
%token FILE for ERROR %define api.token.prefix {TOK_} %% start: FILE for ERROR;
generates the definition of the symbols TOK_FILE, TOK_for, and TOK_ERROR in the generated source files. In particular, the scanner must use these prefixed token names, while the grammar itself may still use the short names (as in the sample rule given above). The generated informational files (‘*.output’, ‘*.xml’, ‘*.dot’) are not modified by this prefix.

Bison also prefixes the generated member names of the semantic value union. See section Generating the Semantic Value Type, for more details.

See Calc++ Parser and Calc++ Scanner, for a complete example.
Accepted Values: Any string. Should be a valid identifier prefix in the target language, in other words, it should typically be an identifier itself (sequence of letters, underscores, and —not at the beginning— digits).
Default Value: empty
History: introduced in Bison 3.0

Directive: %define api.value.type support

Directive: %define api.value.type {type}

Language(s): all
Purpose: The type for semantic values.

Accepted Values:

‘{}’

This grammar has no semantic value at all. This is not properly supported yet.

‘union-directive’ (C, C++)

The type is defined thanks to the %union directive. You don’t have to define api.value.type in that case, using %union suffices. See section The Union Declaration. For instance:

%define api.value.type union-directive
%union
{
  int ival;
  char *sval;
}
%token <ival> INT "integer"
%token <sval> STR "string"

‘union’ (C, C++)

The symbols are defined with type names, from which Bison will generate a union. For instance:

%define api.value.type union
%token <int> INT "integer"
%token <char *> STR "string"

This feature needs user feedback to stabilize. Note that most C++ objects cannot be stored in a union.

‘variant’ (C++)

This is similar to union, but special storage techniques are used to allow any kind of C++ object to be used. For instance:

%define api.value.type variant
%token <int> INT "integer"
%token <std::string> STR "string"

This feature needs user feedback to stabilize. See section C++ Variants.

‘{type}’

Use this type as semantic value.

%code requires
{
  struct my_value
  {
    enum
    {
      is_int, is_str
    } kind;
    union
    {
      int ival;
      char *sval;
    } u;
  };
}
%define api.value.type {struct my_value}
%token <u.ival> INT "integer"
%token <u.sval> STR "string"

Default Value:
- - %union if %union is used, otherwise …
- - int if type tags are used (i.e., ‘%token <type>…’ or ‘%token <type>…’ is used), otherwise …
- - ""
History: introduced in Bison 3.0. Was introduced for Java only in 2.3b as stype.

Directive: %define location_type: Obsoleted by api.location.type since Bison 2.7.

Directive: %define lr.default-reduction when

Language(s): all
Purpose: Specify the kind of states that are permitted to contain default reductions. See section Default Reductions. (The ability to specify where default reductions should be used is experimental. More user feedback will help to stabilize it.)
Accepted Values: most, consistent, accepting
Default Value:
- accepting if lr.type is canonical-lr.
- most otherwise.
History: introduced as lr.default-reductions in 2.5, renamed as lr.default-reduction in 3.0.

Directive: %define lr.keep-unreachable-state

Language(s): all
Purpose: Request that Bison allow unreachable parser states to remain in the parser tables. See section Unreachable States.
Accepted Values: Boolean
Default Value: false
History: introduced as lr.keep_unreachable_states in 2.3b, renamed as lr.keep-unreachable-states in 2.5, and as lr.keep-unreachable-state in 3.0.

Directive: %define lr.type type

Language(s): all
Purpose: Specify the type of parser tables within the LR(1) family. See section LR Table Construction. (This feature is experimental. More user feedback will help to stabilize it.)
Accepted Values: lalr, ielr, canonical-lr
Default Value: lalr

Directive: %define namespace {namespace}: Obsoleted by api.namespace

Directive: %define parse.assert

Languages(s): C++
Purpose: Issue runtime assertions to catch invalid uses. In C++, when variants are used (see section C++ Variants), symbols must be constructed and destroyed properly. This option checks these constraints.
Accepted Values: Boolean
Default Value: false

Directive: %define parse.error verbosity

Languages(s): all
Purpose: Control the kind of error messages passed to the error reporting function. See section The Error Reporting Function yyerror.
Accepted Values:
- simple Error messages passed to yyerror are simply "syntax error".
- verbose Error messages report the unexpected token, and possibly the expected ones. However, this report can often be incorrect when LAC is not enabled (see section LAC).
Default Value: simple

Directive: %define parse.lac when

Languages(s): C (deterministic parsers only)
Purpose: Enable LAC (lookahead correction) to improve syntax error handling. See section LAC.
Accepted Values: none, full
Default Value: none

Directive: %define parse.trace

Languages(s): C, C++, Java
Purpose: Require parser instrumentation for tracing. See section Tracing Your Parser.
In C/C++, define the macro YYDEBUG (or prefixDEBUG with ‘%define api.prefix {prefix}’), see Multiple Parsers in the Same Program) to 1 in the parser implementation file if it is not already defined, so that the debugging facilities are compiled.
Accepted Values: Boolean
Default Value: false

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

3.7.14 %code Summary

The %code directive inserts code verbatim into the output parser source at any of a predefined set of locations. It thus serves as a flexible and user-friendly alternative to the traditional Yacc prologue, %{code%}. This section summarizes the functionality of %code for the various target languages supported by Bison. For a detailed discussion of how to use %code in place of %{code%} for C/C++ and why it is advantageous to do so, see section Prologue Alternatives.

Directive: %code {code}

This is the unqualified form of the %code directive. It inserts code verbatim at a language-dependent default location in the parser implementation.

For C/C++, the default location is the parser implementation file after the usual contents of the parser header file. Thus, the unqualified form replaces %{code%} for most purposes.

For Java, the default location is inside the parser class.

Directive: %code qualifier {code}: This is the qualified form of the %code directive. qualifier identifies the purpose of code and thus the location(s) where Bison should insert it. That is, if you need to specify location-sensitive code that does not belong at the default location selected by the unqualified %code form, use this form instead.

For any particular qualifier or for the unqualified form, if there are multiple occurrences of the %code directive, Bison concatenates the specified code in the order in which it appears in the grammar file.

Not all qualifiers are accepted for all target languages. Unaccepted qualifiers produce an error. Some of the accepted qualifiers are:

requires

Language(s): C, C++
Purpose: This is the best place to write dependency code required for YYSTYPE and YYLTYPE. In other words, it’s the best place to define types referenced in %union directives. If you use #define to override Bison’s default YYSTYPE and YYLTYPE definitions, then it is also the best place. However you should rather %define api.value.type and api.location.type.
Location(s): The parser header file and the parser implementation file before the Bison-generated YYSTYPE and YYLTYPE definitions.

provides

Language(s): C, C++
Purpose: This is the best place to write additional definitions and declarations that should be provided to other modules.
Location(s): The parser header file and the parser implementation file after the Bison-generated YYSTYPE, YYLTYPE, and token definitions.

top

Language(s): C, C++
Purpose: The unqualified %code or %code requires should usually be more appropriate than %code top. However, occasionally it is necessary to insert code much nearer the top of the parser implementation file. For example:
%code top { #define _GNU_SOURCE #include <stdio.h> }
Location(s): Near the top of the parser implementation file.

imports

Language(s): Java
Purpose: This is the best place to write Java import directives.
Location(s): The parser Java file after any Java package directive and before any class definitions.

Though we say the insertion locations are language-dependent, they are technically skeleton-dependent. Writers of non-standard skeletons however should choose their locations consistently with the behavior of the standard Bison skeletons.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.