[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Bison takes as input a context-free grammar specification and produces a C-language function that recognizes correct instances of the grammar.
The Bison grammar file conventionally has a name ending in ‘.y’. See section Invoking Bison.
3.1 Outline of a Bison Grammar | Overall layout of the grammar file. | |
3.2 Symbols, Terminal and Nonterminal | Terminal and nonterminal symbols. | |
3.3 Grammar Rules | How to write grammar rules. | |
3.4 Defining Language Semantics | Semantic values and actions. | |
3.5 Tracking Locations | Locations and actions. | |
3.6 Named References | Using named references in actions. | |
3.7 Bison Declarations | All kinds of Bison declarations are described here. | |
3.8 Multiple Parsers in the Same Program | Putting more than one Bison parser in one program. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A Bison grammar file has four main sections, shown here with the appropriate delimiters:
%{ Prologue %} Bison declarations %% Grammar rules %% Epilogue |
Comments enclosed in ‘/* … */’ may appear in any of the sections. As a GNU extension, ‘//’ introduces a comment that continues until end of line.
3.1.1 The prologue | Syntax and usage of the prologue. | |
3.1.2 Prologue Alternatives | Syntax and usage of alternatives to the prologue. | |
3.1.3 The Bison Declarations Section | Syntax and usage of the Bison declarations section. | |
3.1.4 The Grammar Rules Section | Syntax and usage of the grammar rules section. | |
3.1.5 The epilogue | Syntax and usage of the epilogue. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Prologue section contains macro definitions and declarations
of functions and variables that are used in the actions in the grammar
rules. These are copied to the beginning of the parser implementation
file so that they precede the definition of yyparse
. You can
use ‘#include’ to get the declarations from a header file. If
you don’t need any C declarations, you may omit the ‘%{’ and
‘%}’ delimiters that bracket this section.
The Prologue section is terminated by the first occurrence of ‘%}’ that is outside a comment, a string literal, or a character constant.
You may have more than one Prologue section, intermixed with the
Bison declarations. This allows you to have C and Bison
declarations that refer to each other. For example, the %union
declaration may use types defined in a header file, and you may wish to
prototype functions that take arguments of type YYSTYPE
. This
can be done with two Prologue blocks, one before and one after the
%union
declaration.
%{ #define _GNU_SOURCE #include <stdio.h> #include "ptypes.h" %} %union {
long int n;
tree t; /* %{ static void print_token_value (FILE *, int, YYSTYPE); #define YYPRINT(F, N, L) print_token_value (F, N, L) %} … |
When in doubt, it is usually safer to put prologue code before all
Bison declarations, rather than after. For example, any definitions
of feature test macros like _GNU_SOURCE
or
_POSIX_C_SOURCE
should appear before all Bison declarations, as
feature test macros can affect the behavior of Bison-generated
#include
directives.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The functionality of Prologue sections can often be subtle and
inflexible. As an alternative, Bison provides a %code
directive with an explicit qualifier field, which identifies the
purpose of the code and thus the location(s) where Bison should
generate it. For C/C++, the qualifier can be omitted for the default
location, or it can be one of requires
, provides
,
top
. See section %code Summary.
Look again at the example of the previous section:
%{ #define _GNU_SOURCE #include <stdio.h> #include "ptypes.h" %} %union {
long int n;
tree t; /* %{ static void print_token_value (FILE *, int, YYSTYPE); #define YYPRINT(F, N, L) print_token_value (F, N, L) %} … |
Notice that there are two Prologue sections here, but there’s a
subtle distinction between their functionality. For example, if you
decide to override Bison’s default definition for YYLTYPE
, in
which Prologue section should you write your new definition?
You should write it in the first since Bison will insert that code
into the parser implementation file before the default
YYLTYPE
definition. In which Prologue section should you
prototype an internal function, trace_token
, that accepts
YYLTYPE
and yytokentype
as arguments? You should
prototype it in the second since Bison will insert that code
after the YYLTYPE
and yytokentype
definitions.
This distinction in functionality between the two Prologue sections is
established by the appearance of the %union
between them.
This behavior raises a few questions.
First, why should the position of a %union
affect definitions related to
YYLTYPE
and yytokentype
?
Second, what if there is no %union
?
In that case, the second kind of Prologue section is not available.
This behavior is not intuitive.
To avoid this subtle %union
dependency, rewrite the example using a
%code top
and an unqualified %code
.
Let’s go ahead and add the new YYLTYPE
definition and the
trace_token
prototype at the same time:
%code top { #define _GNU_SOURCE #include <stdio.h> /* WARNING: The following code really belongs * in a '%code requires'; see below. */ #include "ptypes.h" #define YYLTYPE YYLTYPE typedef struct YYLTYPE { int first_line; int first_column; int last_line; int last_column; char *filename; } YYLTYPE; } %union {
long int n;
tree t; /* %code { static void print_token_value (FILE *, int, YYSTYPE); #define YYPRINT(F, N, L) print_token_value (F, N, L) static void trace_token (enum yytokentype token, YYLTYPE loc); } … |
In this way, %code top
and the unqualified %code
achieve the same
functionality as the two kinds of Prologue sections, but it’s always
explicit which kind you intend.
Moreover, both kinds are always available even in the absence of %union
.
The %code top
block above logically contains two parts. The
first two lines before the warning need to appear near the top of the
parser implementation file. The first line after the warning is
required by YYSTYPE
and thus also needs to appear in the parser
implementation file. However, if you’ve instructed Bison to generate
a parser header file (see section %defines), you probably
want that line to appear before the YYSTYPE
definition in that
header file as well. The YYLTYPE
definition should also appear
in the parser header file to override the default YYLTYPE
definition there.
In other words, in the %code top
block above, all but the first two
lines are dependency code required by the YYSTYPE
and YYLTYPE
definitions.
Thus, they belong in one or more %code requires
:
%code top { #define _GNU_SOURCE #include <stdio.h> } %code requires { #include "ptypes.h" } %union {
long int n;
tree t; /* %code requires { #define YYLTYPE YYLTYPE typedef struct YYLTYPE { int first_line; int first_column; int last_line; int last_column; char *filename; } YYLTYPE; } %code { static void print_token_value (FILE *, int, YYSTYPE); #define YYPRINT(F, N, L) print_token_value (F, N, L) static void trace_token (enum yytokentype token, YYLTYPE loc); } … |
Now Bison will insert #include "ptypes.h"
and the new
YYLTYPE
definition before the Bison-generated YYSTYPE
and YYLTYPE
definitions in both the parser implementation file
and the parser header file. (By the same reasoning, %code
requires
would also be the appropriate place to write your own
definition for YYSTYPE
.)
When you are writing dependency code for YYSTYPE
and
YYLTYPE
, you should prefer %code requires
over
%code top
regardless of whether you instruct Bison to generate
a parser header file. When you are writing code that you need Bison
to insert only into the parser implementation file and that has no
special need to appear at the top of that file, you should prefer the
unqualified %code
over %code top
. These practices will
make the purpose of each block of your code explicit to Bison and to
other developers reading your grammar file. Following these
practices, we expect the unqualified %code
and %code
requires
to be the most important of the four Prologue
alternatives.
At some point while developing your parser, you might decide to
provide trace_token
to modules that are external to your
parser. Thus, you might wish for Bison to insert the prototype into
both the parser header file and the parser implementation file. Since
this function is not a dependency required by YYSTYPE
or
YYLTYPE
, it doesn’t make sense to move its prototype to a
%code requires
. More importantly, since it depends upon
YYLTYPE
and yytokentype
, %code requires
is not
sufficient. Instead, move its prototype from the unqualified
%code
to a %code provides
:
%code top { #define _GNU_SOURCE #include <stdio.h> } %code requires { #include "ptypes.h" } %union {
long int n;
tree t; /* %code requires { #define YYLTYPE YYLTYPE typedef struct YYLTYPE { int first_line; int first_column; int last_line; int last_column; char *filename; } YYLTYPE; } %code provides { void trace_token (enum yytokentype token, YYLTYPE loc); } %code { static void print_token_value (FILE *, int, YYSTYPE); #define YYPRINT(F, N, L) print_token_value (F, N, L) } … |
Bison will insert the trace_token
prototype into both the
parser header file and the parser implementation file after the
definitions for yytokentype
, YYLTYPE
, and
YYSTYPE
.
The above examples are careful to write directives in an order that
reflects the layout of the generated parser implementation and header
files: %code top
, %code requires
, %code provides
,
and then %code
. While your grammar files may generally be
easier to read if you also follow this order, Bison does not require
it. Instead, Bison lets you choose an organization that makes sense
to you.
You may declare any of these directives multiple times in the grammar file. In that case, Bison concatenates the contained code in declaration order. This is the only way in which the position of one of these directives within the grammar file affects its functionality.
The result of the previous two properties is greater flexibility in how you may organize your grammar file. For example, you may organize semantic-type-related directives by semantic type:
%code requires { #include "type1.h" } %union { type1 field1; } %destructor { type1_free ($$); } <field1> %printer { type1_print (yyoutput, $$); } <field1> %code requires { #include "type2.h" } %union { type2 field2; } %destructor { type2_free ($$); } <field2> %printer { type2_print (yyoutput, $$); } <field2> |
You could even place each of the above directive groups in the rules section of
the grammar file next to the set of rules that uses the associated semantic
type.
(In the rules section, you must terminate each of those directives with a
semicolon.)
And you don’t have to worry that some directive (like a %union
) in the
definitions section is going to adversely affect their functionality in some
counter-intuitive manner just because it comes first.
Such an organization is not possible using Prologue sections.
This section has been concerned with explaining the advantages of the four
Prologue alternatives over the original Yacc Prologue.
However, in most cases when using these directives, you shouldn’t need to
think about all the low-level ordering issues discussed here.
Instead, you should simply use these directives to label each block of your
code according to its purpose and let Bison handle the ordering.
%code
is the most generic label.
Move code to %code requires
, %code provides
, or %code top
as needed.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Bison declarations section contains declarations that define terminal and nonterminal symbols, specify precedence, and so on. In some simple grammars you may not need any declarations. See section Bison Declarations.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The grammar rules section contains one or more Bison grammar rules, and nothing else. See section Syntax of Grammar Rules.
There must always be at least one grammar rule, and the first ‘%%’ (which precedes the grammar rules) may never be omitted even if it is the first thing in the file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Epilogue is copied verbatim to the end of the parser
implementation file, just as the Prologue is copied to the
beginning. This is the most convenient place to put anything that you
want to have in the parser implementation file but which need not come
before the definition of yyparse
. For example, the definitions
of yylex
and yyerror
often go here. Because C requires
functions to be declared before being used, you often need to declare
functions like yylex
and yyerror
in the Prologue, even
if you define them in the Epilogue. See section Parser C-Language Interface.
If the last section is empty, you may omit the ‘%%’ that separates it from the grammar rules.
The Bison parser itself contains many macros and identifiers whose names start with ‘yy’ or ‘YY’, so it is a good idea to avoid using any such names (except those documented in this manual) in the epilogue of the grammar file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Symbols in Bison grammars represent the grammatical classifications of the language.
A terminal symbol (also known as a token type) represents a
class of syntactically equivalent tokens. You use the symbol in grammar
rules to mean that a token in that class is allowed. The symbol is
represented in the Bison parser by a numeric code, and the yylex
function returns a token type code to indicate what kind of token has
been read. You don’t need to know what the code value is; you can use
the symbol to stand for it.
A nonterminal symbol stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case.
Symbol names can contain letters, underscores, periods, and non-initial digits and dashes. Dashes in symbol names are a GNU extension, incompatible with POSIX Yacc. Periods and dashes make symbol names less convenient to use with named references, which require brackets around such names (see section Named References). Terminal symbols that contain periods or dashes make little sense: since they are not valid symbols (in most programming languages) they are not exported as token names.
There are three ways of writing terminal symbols in the grammar:
%token
. See section Token Type Names.
'+'
is a character token type. A
character token type doesn’t need to be declared unless you need to
specify its semantic value data type (see section Data Types of Semantic Values), associativity, or precedence (see section Operator Precedence).
By convention, a character token type is used only to represent a
token that consists of that particular character. Thus, the token
type '+'
is used to represent the character ‘+’ as a
token. Nothing enforces this convention, but if you depart from it,
your program will confuse other readers.
All the usual escape sequences used in character literals in C can be
used in Bison as well, but you must not use the null character as a
character literal because its numeric code, zero, signifies
end-of-input (see section Calling Convention for yylex
). Also, unlike standard C, trigraphs have no
special meaning in Bison character literals, nor is backslash-newline
allowed.
"<="
is a literal string token. A literal string token
doesn’t need to be declared unless you need to specify its semantic
value data type (see section Data Types of Semantic Values), associativity, or precedence
(see section Operator Precedence).
You can associate the literal string token with a symbolic name as an
alias, using the %token
declaration (see section Token Declarations). If you don’t do that, the lexical analyzer has to
retrieve the token number for the literal string token from the
yytname
table (see section Calling Convention for yylex
).
Warning: literal string tokens do not work in Yacc.
By convention, a literal string token is used only to represent a token
that consists of that particular string. Thus, you should use the token
type "<="
to represent the string ‘<=’ as a token. Bison
does not enforce this convention, but if you depart from it, people who
read your program will be confused.
All the escape sequences used in string literals in C can be used in Bison as well, except that you must not use a null character within a string literal. Also, unlike Standard C, trigraphs have no special meaning in Bison string literals, nor is backslash-newline allowed. A literal string token must contain two or more characters; for a token containing just one character, use a character token (see above).
How you choose to write a terminal symbol has no effect on its grammatical meaning. That depends only on where it appears in rules and on when the parser function returns that symbol.
The value returned by yylex
is always one of the terminal
symbols, except that a zero or negative value signifies end-of-input.
Whichever way you write the token type in the grammar rules, you write
it the same way in the definition of yylex
. The numeric code
for a character token type is simply the positive numeric code of the
character, so yylex
can use the identical value to generate the
requisite code, though you may need to convert it to unsigned
char
to avoid sign-extension on hosts where char
is signed.
Each named token type becomes a C macro in the parser implementation
file, so yylex
can use the name to stand for the code. (This
is why periods don’t make sense in terminal symbols.) See section Calling Convention for yylex
.
If yylex
is defined in a separate file, you need to arrange for the
token-type macro definitions to be available there. Use the ‘-d’
option when you run Bison, so that it will write these macro definitions
into a separate header file ‘name.tab.h’ which you can include
in the other source files that need it. See section Invoking Bison.
If you want to write a grammar that is portable to any Standard C host, you must use only nonnull character tokens taken from the basic execution character set of Standard C. This set consists of the ten digits, the 52 lower- and upper-case English letters, and the characters in the following C-language string:
"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_{|}~" |
The yylex
function and Bison must use a consistent character set
and encoding for character tokens. For example, if you run Bison in an
ASCII environment, but then compile and run the resulting
program in an environment that uses an incompatible character set like
EBCDIC, the resulting program may not work because the tables
generated by Bison will assume ASCII numeric values for
character tokens. It is standard practice for software distributions to
contain C source files that were generated by Bison in an
ASCII environment, so installers on platforms that are
incompatible with ASCII must rebuild those files before
compiling them.
The symbol error
is a terminal symbol reserved for error recovery
(see section Error Recovery); you shouldn’t use it for any other purpose.
In particular, yylex
should never return this value. The default
value of the error token is 256, unless you explicitly assigned 256 to
one of your tokens with a %token
declaration.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A Bison grammar is a list of rules.
3.3.1 Syntax of Grammar Rules | Syntax of the rules. | |
3.3.2 Empty Rules | Symbols that can match the empty string. | |
3.3.3 Recursive Rules | Writing recursive rules. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A Bison grammar rule has the following general form:
result: components…; |
where result is the nonterminal symbol that this rule describes, and components are various terminal and nonterminal symbols that are put together by this rule (see section Symbols, Terminal and Nonterminal).
For example,
exp: exp '+' exp; |
says that two groupings of type exp
, with a ‘+’ token in between,
can be combined into a larger grouping of type exp
.
White space in rules is significant only to separate symbols. You can add extra white space as you wish.
Scattered among the components can be actions that determine the semantics of the rule. An action looks like this:
{C statements} |
This is an example of braced code, that is, C code surrounded by braces, much like a compound statement in C. Braced code can contain any sequence of C tokens, so long as its braces are balanced. Bison does not check the braced code for correctness directly; it merely copies the code to the parser implementation file, where the C compiler can check it.
Within braced code, the balanced-brace count is not affected by braces within comments, string literals, or character constants, but it is affected by the C digraphs ‘<%’ and ‘%>’ that represent braces. At the top level braced code must be terminated by ‘}’ and not by a digraph. Bison does not look for trigraphs, so if braced code uses trigraphs you should ensure that they do not affect the nesting of braces or the boundaries of comments, string literals, or character constants.
Usually there is only one action and it follows the components. See section Actions.
Multiple rules for the same result can be written separately or can be joined with the vertical-bar character ‘|’ as follows:
result: rule1-components… | rule2-components… … ; |
They are still considered distinct rules even when joined in this way.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A rule is said to be empty if its right-hand side (components) is empty. It means that result can match the empty string. For example, here is how to define an optional semicolon:
semicolon.opt: | ";"; |
It is easy not to see an empty rule, especially when |
is used. The
%empty
directive allows to make explicit that a rule is empty on
purpose:
semicolon.opt: %empty | ";" ; |
Flagging a non-empty rule with %empty
is an error. If run with
‘-Wempty-rule’, bison
will report empty rules without
%empty
. Using %empty
enables this warning, unless
‘-Wno-empty-rule’ was specified.
The %empty
directive is a Bison extension, it does not work with
Yacc. To remain compatible with POSIX Yacc, it is customary to write a
comment ‘/* empty */’ in each rule with no components:
semicolon.opt: /* empty */ | ";" ; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A rule is called recursive when its result nonterminal appears also on its right hand side. Nearly all Bison grammars need to use recursion, because that is the only way to define a sequence of any number of a particular thing. Consider this recursive definition of a comma-separated sequence of one or more expressions:
expseq1: exp | expseq1 ',' exp ; |
Since the recursive use of expseq1
is the leftmost symbol in the
right hand side, we call this left recursion. By contrast, here
the same construct is defined using right recursion:
expseq1: exp | exp ',' expseq1 ; |
Any kind of sequence can be defined using either left recursion or right recursion, but you should always use left recursion, because it can parse a sequence of any number of elements with bounded stack space. Right recursion uses up space on the Bison stack in proportion to the number of elements in the sequence, because all the elements must be shifted onto the stack before the rule can be applied even once. See section The Bison Parser Algorithm, for further explanation of this.
Indirect or mutual recursion occurs when the result of the rule does not appear directly on its right hand side, but does appear in rules for other nonterminals which do appear on its right hand side.
For example:
expr: primary | primary '+' primary ; primary: constant | '(' expr ')' ; |
defines two mutually-recursive nonterminals, since each refers to the other.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The grammar rules for a language determine only the syntax. The semantics are determined by the semantic values associated with various tokens and groupings, and by the actions taken when various groupings are recognized.
For example, the calculator calculates properly because the value associated with each expression is the proper number; it adds properly because the action for the grouping ‘x + y’ is to add the numbers associated with x and y.
3.4.1 Data Types of Semantic Values | Specifying one data type for all semantic values. | |
3.4.2 More Than One Value Type | Specifying several alternative data types. | |
3.4.3 Generating the Semantic Value Type | Generating the semantic value type. | |
3.4.4 The Union Declaration | Declaring the set of all semantic value types. | |
3.4.5 Providing a Structured Semantic Value Type | Providing a structured semantic value type. | |
3.4.6 Actions | An action is the semantic definition of a grammar rule. | |
3.4.7 Data Types of Values in Actions | Specifying data types for actions to operate on. | |
3.4.8 Actions in Mid-Rule | Most actions go at the end of a rule. This says when, why and how to use the exceptional action in the middle of a rule. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In a simple program it may be sufficient to use the same data type for the semantic values of all language constructs. This was true in the RPN and infix calculator examples (see section Reverse Polish Notation Calculator).
Bison normally uses the type int
for semantic values if your
program uses the same data type for all language constructs. To
specify some other type, define the %define
variable
api.value.type
like this:
%define api.value.type {double} |
or
%define api.value.type {struct semantic_type} |
The value of api.value.type
should be a type name that does not
contain parentheses or square brackets.
Alternatively, instead of relying of Bison’s %define
support, you may
rely on the C/C++ preprocessor and define YYSTYPE
as a macro, like
this:
#define YYSTYPE double |
This macro definition must go in the prologue of the grammar file
(see section Outline of a Bison Grammar). If compatibility
with POSIX Yacc matters to you, use this. Note however that Bison cannot
know YYSTYPE
’s value, not even whether it is defined, so there are
services it cannot provide. Besides this works only for languages that have
a preprocessor.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In most programs, you will need different data types for different kinds
of tokens and groupings. For example, a numeric constant may need type
int
or long int
, while a string constant needs type
char *
, and an identifier might need a pointer to an entry in the
symbol table.
To use more than one data type for semantic values in one parser, Bison requires you to do two things:
%union
Bison declaration (see section The Union Declaration);
%define
variable api.value.type
to be a union type
whose members are the type tags (see section Providing a Structured Semantic Value Type);
typedef
or a #define
to define YYSTYPE
to be a
union type whose member names are the type tags.
%token
Bison declaration (see section Token Type Names)
and for groupings with the %type
Bison declaration (see section Nonterminal Symbols).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The special value union
of the %define
variable
api.value.type
instructs Bison that the tags used with the
%token
and %type
directives are genuine types, not names of
members of YYSTYPE
.
For example:
%define api.value.type union %token <int> INT "integer" %token <int> 'n' %type <int> expr %token <char const *> ID "identifier" |
generates an appropriate value of YYSTYPE
to support each symbol
type. The name of the member of YYSTYPE
for tokens than have a
declared identifier id (such as INT
and ID
above, but
not 'n'
) is id
. The other symbols have unspecified
names on which you should not depend; instead, relying on C casts to access
the semantic value with the appropriate type:
/* For an "integer". */ yylval.INT = 42; return INT; /* For an 'n', also declared as int. */ *((int*)&yylval) = 42; return 'n'; /* For an "identifier". */ yylval.ID = "42"; return ID; |
If the %define
variable api.token.prefix
is defined
(see section api.token.prefix), then it is also used to prefix
the union member names. For instance, with ‘%define api.token.prefix
{TOK_}’:
/* For an "integer". */ yylval.TOK_INT = 42; return TOK_INT; |
This Bison extension cannot work if %yacc
(or
‘-y’/‘--yacc’) is enabled, as POSIX mandates that Yacc
generate tokens as macros (e.g., ‘#define INT 258’, or ‘#define
TOK_INT 258’).
This feature is new, and user feedback would be most welcome.
A similar feature is provided for C++ that in addition overcomes C++
limitations (that forbid non-trivial objects to be part of a union
):
‘%define api.value.type variant’, see C++ Variants.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The %union
declaration specifies the entire collection of possible
data types for semantic values. The keyword %union
is followed by
braced code containing the same thing that goes inside a union
in C.
For example:
%union { double val; symrec *tptr; } |
This says that the two alternative types are double
and symrec
*
. They are given names val
and tptr
; these names are used
in the %token
and %type
declarations to pick one of the types
for a terminal or nonterminal symbol (see section Nonterminal Symbols).
As an extension to POSIX, a tag is allowed after the %union
. For
example:
%union value { double val; symrec *tptr; } |
specifies the union tag value
, so the corresponding C type is
union value
. If you do not specify a tag, it defaults to
YYSTYPE
.
As another extension to POSIX, you may specify multiple %union
declarations; their contents are concatenated. However, only the first
%union
declaration can specify a tag.
Note that, unlike making a union
declaration in C, you need not write
a semicolon after the closing brace.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Instead of %union
, you can define and use your own union type
YYSTYPE
if your grammar contains at least one ‘<type>’
tag. For example, you can put the following into a header file
‘parser.h’:
union YYSTYPE { double val; symrec *tptr; }; |
and then your grammar can use the following instead of %union
:
%{ #include "parser.h" %} %define api.value.type {union YYSTYPE} %type <val> expr %token <tptr> ID |
Actually, you may also provide a struct
rather that a union
,
which may be handy if you want to track information for every symbol (such
as preceding comments).
The type you provide may even be structured and include pointers, in which case the type tags you provide may be composite, with ‘.’ and ‘->’ operators.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
An action accompanies a syntactic rule and contains C code to be executed each time an instance of that rule is recognized. The task of most actions is to compute a semantic value for the grouping built by the rule from the semantic values associated with tokens or smaller groupings.
An action consists of braced code containing C statements, and can be placed at any position in the rule; it is executed at that position. Most rules have just one action at the end of the rule, following all the components. Actions in the middle of a rule are tricky and used only for special purposes (see section Actions in Mid-Rule).
The C code in an action can refer to the semantic values of the
components matched by the rule with the construct $n
,
which stands for the value of the nth component. The semantic
value for the grouping being constructed is $$
. In addition,
the semantic values of symbols can be accessed with the named
references construct $name
or $[name]
.
Bison translates both of these constructs into expressions of the
appropriate type when it copies the actions into the parser
implementation file. $$
(or $name
, when it stands
for the current grouping) is translated to a modifiable lvalue, so it
can be assigned to.
Here is a typical example:
exp: … | exp '+' exp { $$ = $1 + $3; } |
Or, in terms of named references:
exp[result]: … | exp[left] '+' exp[right] { $result = $left + $right; } |
This rule constructs an exp
from two smaller exp
groupings
connected by a plus-sign token. In the action, $1
and $3
($left
and $right
)
refer to the semantic values of the two component exp
groupings,
which are the first and third symbols on the right hand side of the rule.
The sum is stored into $$
($result
) so that it becomes the
semantic value of
the addition-expression just recognized by the rule. If there were a
useful semantic value associated with the ‘+’ token, it could be
referred to as $2
.
See section Named References, for more information about using the named references construct.
Note that the vertical-bar character ‘|’ is really a rule separator, and actions are attached to a single rule. This is a difference with tools like Flex, for which ‘|’ stands for either “or”, or “the same action as that of the next rule”. In the following example, the action is triggered only when ‘b’ is found:
a-or-b: 'a'|'b' { a_or_b_found = 1; }; |
If you don’t specify an action for a rule, Bison supplies a default:
$$ = $1
. Thus, the value of the first symbol in the rule
becomes the value of the whole rule. Of course, the default action is
valid only if the two data types match. There is no meaningful default
action for an empty rule; every empty rule must have an explicit action
unless the rule’s value does not matter.
$n
with n zero or negative is allowed for reference
to tokens and groupings on the stack before those that match the
current rule. This is a very risky practice, and to use it reliably
you must be certain of the context in which the rule is applied. Here
is a case in which you can use this reliably:
foo: expr bar '+' expr { … } | expr bar '-' expr { … } ; bar: %empty { previous_expr = $0; } ; |
As long as bar
is used only in the fashion shown here, $0
always refers to the expr
which precedes bar
in the
definition of foo
.
It is also possible to access the semantic value of the lookahead token, if
any, from a semantic action.
This semantic value is stored in yylval
.
See section Special Features for Use in Actions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you have chosen a single data type for semantic values, the $$
and $n
constructs always have that data type.
If you have used %union
to specify a variety of data types, then you
must declare a choice among these types for each terminal or nonterminal
symbol that can have a semantic value. Then each time you use $$
or
$n
, its data type is determined by which symbol it refers to
in the rule. In this example,
exp: … | exp '+' exp { $$ = $1 + $3; } |
$1
and $3
refer to instances of exp
, so they all
have the data type declared for the nonterminal symbol exp
. If
$2
were used, it would have the data type declared for the
terminal symbol '+'
, whatever that might be.
Alternatively, you can specify the data type when you refer to the value, by inserting ‘<type>’ after the ‘$’ at the beginning of the reference. For example, if you have defined types as shown here:
%union { int itype; double dtype; } |
then you can write $<itype>1
to refer to the first subunit of the
rule as an integer, or $<dtype>1
to refer to it as a double.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Occasionally it is useful to put an action in the middle of a rule. These actions are written just like usual end-of-rule actions, but they are executed before the parser even recognizes the following components.
3.4.8.1 Using Mid-Rule Actions | Putting an action in the middle of a rule. | |
3.4.8.2 Mid-Rule Action Translation | How mid-rule actions are actually processed. | |
3.4.8.3 Conflicts due to Mid-Rule Actions | Mid-rule actions can cause conflicts. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A mid-rule action may refer to the components preceding it using
$n
, but it may not refer to subsequent components because
it is run before they are parsed.
The mid-rule action itself counts as one of the components of the rule.
This makes a difference when there is another action later in the same rule
(and usually there is another at the end): you have to count the actions
along with the symbols when working out which number n to use in
$n
.
The mid-rule action can also have a semantic value. The action can set
its value with an assignment to $$
, and actions later in the rule
can refer to the value using $n
. Since there is no symbol
to name the action, there is no way to declare a data type for the value
in advance, so you must use the ‘$<…>n’ construct to
specify a data type each time you refer to this value.
There is no way to set the value of the entire rule with a mid-rule
action, because assignments to $$
do not have that effect. The
only way to set the value for the entire rule is with an ordinary action
at the end of the rule.
Here is an example from a hypothetical compiler, handling a let
statement that looks like ‘let (variable) statement’ and
serves to create a variable named variable temporarily for the
duration of statement. To parse this construct, we must put
variable into the symbol table while statement is parsed, then
remove it afterward. Here is how it is done:
stmt: "let" '(' var ')' { $<context>$ = push_context (); declare_variable ($3); } stmt { $$ = $6; pop_context ($<context>5); } |
As soon as ‘let (variable)’ has been recognized, the first
action is run. It saves a copy of the current semantic context (the
list of accessible variables) as its semantic value, using alternative
context
in the data-type union. Then it calls
declare_variable
to add the new variable to that list. Once the
first action is finished, the embedded statement stmt
can be
parsed.
Note that the mid-rule action is component number 5, so the ‘stmt’ is component number 6. Named references can be used to improve the readability and maintainability (see section Named References):
stmt: "let" '(' var ')' { $<context>let = push_context (); declare_variable ($3); }[let] stmt { $$ = $6; pop_context ($<context>let); } |
After the embedded statement is parsed, its semantic value becomes the
value of the entire let
-statement. Then the semantic value from the
earlier action is used to restore the prior list of variables. This
removes the temporary let
-variable from the list so that it won’t
appear to exist while the rest of the program is parsed.
In the above example, if the parser initiates error recovery (see section Error Recovery) while parsing the tokens in the embedded statement stmt
,
it might discard the previous semantic context $<context>5
without
restoring it.
Thus, $<context>5
needs a destructor (see section Freeing Discarded Symbols).
However, Bison currently provides no means to declare a destructor specific to
a particular mid-rule action’s semantic value.
One solution is to bury the mid-rule action inside a nonterminal symbol and to declare a destructor for that symbol:
%type <context> let %destructor { pop_context ($$); } let %% stmt: let stmt { $$ = $2; pop_context ($let); }; let: "let" '(' var ')' { $let = push_context (); declare_variable ($3); }; |
Note that the action is now at the end of its rule. Any mid-rule action can be converted to an end-of-rule action in this way, and this is what Bison actually does to implement mid-rule actions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As hinted earlier, mid-rule actions are actually transformed into regular rules and actions. The various reports generated by Bison (textual, graphical, etc., see Understanding Your Parser) reveal this translation, best explained by means of an example. The following rule:
exp: { a(); } "b" { c(); } { d(); } "e" { f(); }; |
is translated into:
$@1: %empty { a(); }; $@2: %empty { c(); }; $@3: %empty { d(); }; exp: $@1 "b" $@2 $@3 "e" { f(); }; |
with new nonterminal symbols $@n
, where n is a number.
A mid-rule action is expected to generate a value if it uses $$
, or
the (final) action uses $n
where n denote the mid-rule
action. In that case its nonterminal is rather named @n
:
exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; |
is translated into
@1: %empty { a(); }; @2: %empty { $$ = c(); }; $@3: %empty { d(); }; exp: @1 "b" @2 $@3 "e" { f = $1; } |
There are probably two errors in the above example: the first mid-rule
action does not generate a value (it does not use $$
although the
final action uses it), and the value of the second one is not used (the
final action does not use $3
). Bison reports these errors when the
midrule-value
warnings are enabled (see section Invoking Bison):
$ bison -fcaret -Wmidrule-value mid.y mid.y:2.6-13: warning: unset value: $$ exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; ^^^^^^^^ mid.y:2.19-31: warning: unused value: $3 exp: { a(); } "b" { $$ = c(); } { d(); } "e" { f = $1; }; ^^^^^^^^^^^^^ |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Taking action before a rule is completely recognized often leads to conflicts since the parser must commit to a parse in order to execute the action. For example, the following two rules, without mid-rule actions, can coexist in a working parser because the parser can shift the open-brace token and look at what follows before deciding whether there is a declaration or not:
compound: '{' declarations statements '}' | '{' statements '}' ; |
But when we add a mid-rule action as follows, the rules become nonfunctional:
compound: { prepare_for_local_variables (); } '{' declarations statements '}' | '{' statements '}' ; |
Now the parser is forced to decide whether to run the mid-rule action when it has read no farther than the open-brace. In other words, it must commit to using one rule or the other, without sufficient information to do it correctly. (The open-brace token is what is called the lookahead token at this time, since the parser is still deciding what to do about it. See section Lookahead Tokens.)
You might think that you could correct the problem by putting identical actions into the two rules, like this:
compound: { prepare_for_local_variables (); } '{' declarations statements '}' | { prepare_for_local_variables (); } '{' statements '}' ; |
But this does not help, because Bison does not realize that the two actions are identical. (Bison never tries to understand the C code in an action.)
If the grammar is such that a declaration can be distinguished from a statement by the first token (which is true in C), then one solution which does work is to put the action after the open-brace, like this:
compound: '{' { prepare_for_local_variables (); } declarations statements '}' | '{' statements '}' ; |
Now the first token of the following declaration or statement, which would in any case tell Bison which rule to use, can still do so.
Another solution is to bury the action inside a nonterminal symbol which serves as a subroutine:
subroutine: %empty { prepare_for_local_variables (); } ; compound: subroutine '{' declarations statements '}' | subroutine '{' statements '}' ; |
Now Bison can execute the action in the rule for subroutine
without
deciding which rule for compound
it will eventually use.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Though grammar rules and semantic actions are enough to write a fully functional parser, it can be useful to process some additional information, especially symbol locations.
The way locations are handled is defined by providing a data type, and actions to take when rules are matched.
3.5.1 Data Type of Locations | Specifying a data type for locations. | |
3.5.2 Actions and Locations | Using locations in actions. | |
3.5.3 Default Action for Locations | Defining a general way to compute locations. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Defining a data type for locations is much simpler than for semantic values, since all tokens and groupings always use the same type.
You can specify the type of locations by defining a macro called
YYLTYPE
, just as you can specify the semantic value type by
defining a YYSTYPE
macro (see section Data Types of Semantic Values).
When YYLTYPE
is not defined, Bison uses a default structure type with
four members:
typedef struct YYLTYPE { int first_line; int first_column; int last_line; int last_column; } YYLTYPE; |
When YYLTYPE
is not defined, at the beginning of the parsing, Bison
initializes all these fields to 1 for yylloc
. To initialize
yylloc
with a custom location type (or to chose a different
initialization), use the %initial-action
directive. See section Performing Actions before Parsing.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Actions are not only useful for defining language semantics, but also for describing the behavior of the output parser with locations.
The most obvious way for building locations of syntactic groupings is very
similar to the way semantic values are computed. In a given rule, several
constructs can be used to access the locations of the elements being matched.
The location of the nth component of the right hand side is
@n
, while the location of the left hand side grouping is
@$
.
In addition, the named references construct @name
and
@[name]
may also be used to address the symbol locations.
See section Named References, for more information about using the named
references construct.
Here is a basic example using the default data type for locations:
exp: … | exp '/' exp { @$.first_column = @1.first_column; @$.first_line = @1.first_line; @$.last_column = @3.last_column; @$.last_line = @3.last_line; if ($3) $$ = $1 / $3; else { $$ = 1; fprintf (stderr, "%d.%d-%d.%d: division by zero", @3.first_line, @3.first_column, @3.last_line, @3.last_column); } } |
As for semantic values, there is a default action for locations that is
run each time a rule is matched. It sets the beginning of @$
to the
beginning of the first symbol, and the end of @$
to the end of the
last symbol.
With this default action, the location tracking can be fully automatic. The example above simply rewrites this way:
exp: … | exp '/' exp { if ($3) $$ = $1 / $3; else { $$ = 1; fprintf (stderr, "%d.%d-%d.%d: division by zero", @3.first_line, @3.first_column, @3.last_line, @3.last_column); } } |
It is also possible to access the location of the lookahead token, if any,
from a semantic action.
This location is stored in yylloc
.
See section Special Features for Use in Actions.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Actually, actions are not the best place to compute locations. Since
locations are much more general than semantic values, there is room in
the output parser to redefine the default action to take for each
rule. The YYLLOC_DEFAULT
macro is invoked each time a rule is
matched, before the associated action is run. It is also invoked
while processing a syntax error, to compute the error’s location.
Before reporting an unresolvable syntactic ambiguity, a GLR
parser invokes YYLLOC_DEFAULT
recursively to compute the location
of that ambiguity.
Most of the time, this macro is general enough to suppress location dedicated code from semantic actions.
The YYLLOC_DEFAULT
macro takes three parameters. The first one is
the location of the grouping (the result of the computation). When a
rule is matched, the second parameter identifies locations of
all right hand side elements of the rule being matched, and the third
parameter is the size of the rule’s right hand side.
When a GLR parser reports an ambiguity, which of multiple candidate
right hand sides it passes to YYLLOC_DEFAULT
is undefined.
When processing a syntax error, the second parameter identifies locations
of the symbols that were discarded during error processing, and the third
parameter is the number of discarded symbols.
By default, YYLLOC_DEFAULT
is defined this way:
# define YYLLOC_DEFAULT(Cur, Rhs, N) \ do \ if (N) \ { \ (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \ (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \ (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \ (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \ } \ else \ { \ (Cur).first_line = (Cur).last_line = \ YYRHSLOC(Rhs, 0).last_line; \ (Cur).first_column = (Cur).last_column = \ YYRHSLOC(Rhs, 0).last_column; \ } \ while (0) |
where YYRHSLOC (rhs, k)
is the location of the kth symbol
in rhs when k is positive, and the location of the symbol
just before the reduction when k and n are both zero.
When defining YYLLOC_DEFAULT
, you should consider that:
YYLLOC_DEFAULT
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As described in the preceding sections, the traditional way to refer to any
semantic value or location is a positional reference, which takes the
form $n
, $$
, @n
, and @$
. However,
such a reference is not very descriptive. Moreover, if you later decide to
insert or remove symbols in the right-hand side of a grammar rule, the need
to renumber such references can be tedious and error-prone.
To avoid these issues, you can also refer to a semantic value or location using a named reference. First of all, original symbol names may be used as named references. For example:
invocation: op '(' args ')' { $invocation = new_invocation ($op, $args, @invocation); } |
Positional and named references can be mixed arbitrarily. For example:
invocation: op '(' args ')' { $$ = new_invocation ($op, $args, @$); } |
However, sometimes regular symbol names are not sufficient due to ambiguities:
exp: exp '/' exp { $exp = $exp / $exp; } // $exp is ambiguous. exp: exp '/' exp { $$ = $1 / $exp; } // One usage is ambiguous. exp: exp '/' exp { $$ = $1 / $3; } // No error. |
When ambiguity occurs, explicitly declared names may be used for values and locations. Explicit names are declared as a bracketed name after a symbol appearance in rule definitions. For example:
exp[result]: exp[left] '/' exp[right] { $result = $left / $right; } |
In order to access a semantic value generated by a mid-rule action, an explicit name may also be declared by putting a bracketed name after the closing brace of the mid-rule action code:
exp[res]: exp[x] '+' {$left = $x;}[left] exp[right] { $res = $left + $right; } |
In references, in order to specify names containing dots and dashes, an explicit
bracketed syntax $[name]
and @[name]
must be used:
if-stmt: "if" '(' expr ')' "then" then.stmt ';' { $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); } |
It often happens that named references are followed by a dot, dash or other
C punctuation marks and operators. By default, Bison will read
‘$name.suffix’ as a reference to symbol value $name
followed by
‘.suffix’, i.e., an access to the suffix
field of the semantic
value. In order to force Bison to recognize ‘name.suffix’ in its
entirety as the name of a semantic value, the bracketed syntax
‘$[name.suffix]’ must be used.
The named references feature is experimental. More user feedback will help to stabilize it.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Bison declarations section of a Bison grammar defines the symbols used in formulating the grammar and the data types of semantic values. See section Symbols, Terminal and Nonterminal.
All token type names (but not single-character literal tokens such as
'+'
and '*'
) must be declared. Nonterminal symbols must be
declared if you need to specify which data type to use for the semantic
value (see section More Than One Value Type).
The first rule in the grammar file also specifies the start symbol, by default. If you want some other symbol to be the start symbol, you must declare it explicitly (see section Languages and Context-Free Grammars).
3.7.1 Require a Version of Bison | Requiring a Bison version. | |
3.7.2 Token Type Names | Declaring terminal symbols. | |
3.7.3 Operator Precedence | Declaring terminals with precedence and associativity. | |
3.7.4 Nonterminal Symbols | Declaring the choice of type for a nonterminal symbol. | |
3.7.5 Performing Actions before Parsing | Code run before parsing starts. | |
3.7.6 Freeing Discarded Symbols | Declaring how symbols are freed. | |
3.7.7 Printing Semantic Values | Declaring how symbol values are displayed. | |
3.7.8 Suppressing Conflict Warnings | Suppressing warnings about parsing conflicts. | |
3.7.9 The Start-Symbol | Specifying the start symbol. | |
3.7.10 A Pure (Reentrant) Parser | Requesting a reentrant parser. | |
3.7.11 A Push Parser | Requesting a push parser. | |
3.7.12 Bison Declaration Summary | Table of all Bison declarations. | |
3.7.13 %define Summary | Defining variables to adjust Bison’s behavior. | |
3.7.14 %code Summary | Inserting code into the parser source. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You may require the minimum version of Bison to process the grammar. If
the requirement is not met, bison
exits with an error (exit
status 63).
%require "version" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The basic way to declare a token type name (terminal symbol) is as follows:
%token name |
Bison will convert this into a #define
directive in
the parser, so that the function yylex
(if it is in this file)
can use the name name to stand for this token type’s code.
Alternatively, you can use %left
, %right
,
%precedence
, or
%nonassoc
instead of %token
, if you wish to specify
associativity and precedence. See section Operator Precedence.
You can explicitly specify the numeric code for a token type by appending a nonnegative decimal or hexadecimal integer value in the field immediately following the token name:
%token NUM 300 %token XNUM 0x12d // a GNU extension |
It is generally best, however, to let Bison choose the numeric codes for all token types. Bison will automatically select codes that don’t conflict with each other or with normal characters.
In the event that the stack type is a union, you must augment the
%token
or other token declaration to include the data type
alternative delimited by angle-brackets (see section More Than One Value Type).
For example:
%union { /* define stack type */ double val; symrec *tptr; } %token <val> NUM /* define token NUM and its type */ |
You can associate a literal string token with a token type name by
writing the literal string at the end of a %token
declaration which declares the name. For example:
%token arrow "=>" |
For example, a grammar for the C language might specify these names with equivalent literal string tokens:
%token <operator> OR "||" %token <operator> LE 134 "<=" %left OR "<=" |
Once you equate the literal string and the token name, you can use them
interchangeably in further declarations or the grammar rules. The
yylex
function can use the token name or the literal string to
obtain the token type code number (see section Calling Convention for yylex
).
Syntax error messages passed to yyerror
from the parser will reference
the literal string instead of the token name.
The token numbered as 0 corresponds to end of file; the following line allows for nicer error messages referring to “end of file” instead of “$end”:
%token END 0 "end of file" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Use the %left
, %right
, %nonassoc
, or
%precedence
declaration to
declare a token and specify its precedence and associativity, all at
once. These are called precedence declarations.
See section Operator Precedence, for general information on
operator precedence.
The syntax of a precedence declaration is nearly the same as that of
%token
: either
%left symbols… |
or
%left <type> symbols… |
And indeed any of these declarations serves the purposes of %token
.
But in addition, they specify the associativity and relative precedence for
all the symbols:
%left
specifies
left-associativity (grouping x with y first) and
%right
specifies right-associativity (grouping y with
z first). %nonassoc
specifies no associativity, which
means that ‘x op y op z’ is
considered a syntax error.
%precedence
gives only precedence to the symbols, and
defines no associativity at all. Use this to define precedence only,
and leave any potential conflict due to associativity enabled.
For backward compatibility, there is a confusing difference between the
argument lists of %token
and precedence declarations.
Only a %token
can associate a literal string with a token type name.
A precedence declaration always interprets a literal string as a reference to a
separate token.
For example:
%left OR "<=" // Does not declare an alias. %left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=". |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you use %union
to specify multiple value types, you must
declare the value type of each nonterminal symbol for which values are
used. This is done with a %type
declaration, like this:
%type <type> nonterminal… |
Here nonterminal is the name of a nonterminal symbol, and
type is the name given in the %union
to the alternative
that you want (see section The Union Declaration). You
can give any number of nonterminal symbols in the same %type
declaration, if they have the same value type. Use spaces to separate
the symbol names.
You can also declare the value type of a terminal symbol. To do this,
use the same <type>
construction in a declaration for the
terminal symbol. All kinds of token declarations allow
<type>
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Sometimes your parser needs to perform some initializations before
parsing. The %initial-action
directive allows for such arbitrary
code.
Declare that the braced code must be invoked before parsing each time
yyparse
is called. The code may use $$
(or
$<tag>$
) and @$
— initial value and location of the
lookahead — and the %parse-param
.
For instance, if your locations use a file name, you may use
%parse-param { char const *file_name }; %initial-action { @$.initialize (file_name); }; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
During error recovery (see section Error Recovery), symbols already pushed
on the stack and tokens coming from the rest of the file are discarded
until the parser falls on its feet. If the parser runs out of memory,
or if it returns via YYABORT
or YYACCEPT
, all the
symbols on the stack must be discarded. Even if the parser succeeds, it
must discard the start symbol.
When discarded symbols convey heap based information, this memory is lost. While this behavior can be tolerable for batch parsers, such as in traditional compilers, it is unacceptable for programs like shells or protocol implementations that may parse and execute indefinitely.
The %destructor
directive defines code that is called when a
symbol is automatically discarded.
Invoke the braced code whenever the parser discards one of the
symbols. Within code, $$
(or $<tag>$
)
designates the semantic value associated with the discarded symbol, and
@$
designates its location. The additional parser parameters are
also available (see section The Parser Function yyparse
).
When a symbol is listed among symbols, its %destructor
is called a
per-symbol %destructor
.
You may also define a per-type %destructor
by listing a semantic type
tag among symbols.
In that case, the parser will invoke this code whenever it discards any
grammar symbol that has that semantic type tag unless that symbol has its own
per-symbol %destructor
.
Finally, you can define two different kinds of default %destructor
s.
(These default forms are experimental.
More user feedback will help to determine whether they should become permanent
features.)
You can place each of <*>
and <>
in the symbols list of
exactly one %destructor
declaration in your grammar file.
The parser will invoke the code associated with one of these whenever it
discards any user-defined grammar symbol that has no per-symbol and no per-type
%destructor
.
The parser uses the code for <*>
in the case of such a grammar
symbol for which you have formally declared a semantic type tag (%type
counts as such a declaration, but $<tag>$
does not).
The parser uses the code for <>
in the case of such a grammar
symbol that has no declared semantic type tag.
For example:
%union { char *string; } %token <string> STRING1 STRING2 %type <string> string1 string2 %union { char character; } %token <character> CHR %type <character> chr %token TAGLESS %destructor { } <character> %destructor { free ($$); } <*> %destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1 %destructor { printf ("Discarding tagless symbol.\n"); } <> |
guarantees that, when the parser discards any user-defined symbol that has a
semantic type tag other than <character>
, it passes its semantic value
to free
by default.
However, when the parser discards a STRING1
or a string1
, it also
prints its line number to stdout
.
It performs only the second %destructor
in this case, so it invokes
free
only once.
Finally, the parser merely prints a message whenever it discards any symbol,
such as TAGLESS
, that has no semantic type tag.
A Bison-generated parser invokes the default %destructor
s only for
user-defined as opposed to Bison-defined symbols.
For example, the parser will not invoke either kind of default
%destructor
for the special Bison-defined symbols $accept
,
$undefined
, or $end
(see section Bison Symbols),
none of which you can reference in your grammar.
It also will not invoke either for the error
token (see section error), which is always defined by Bison regardless of whether you
reference it in your grammar.
However, it may invoke one of them for the end token (token 0) if you
redefine it from $end
to, for example, END
:
%token END 0 |
Finally, Bison will never invoke a %destructor
for an unreferenced
mid-rule semantic value (see section Actions in Mid-Rule).
That is, Bison does not consider a mid-rule to have a semantic value if you
do not reference $$
in the mid-rule’s action or $n
(where n is the right-hand side symbol position of the mid-rule) in
any later action in that rule. However, if you do reference either, the
Bison-generated parser will invoke the <>
%destructor
whenever
it discards the mid-rule symbol.
Discarded symbols are the following:
parse
,
The parser can return immediately because of an explicit call to
YYABORT
or YYACCEPT
, or failed error recovery, or memory
exhaustion.
Right-hand side symbols of a rule that explicitly triggers a syntax
error via YYERROR
are not discarded automatically. As a rule
of thumb, destructors are invoked only when user actions cannot manage
the memory.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When run-time traces are enabled (see section Tracing Your Parser), the parser reports its actions, such as reductions. When a symbol involved in an action is reported, only its kind is displayed, as the parser cannot know how semantic values should be formatted.
The %printer
directive defines code that is called when a symbol is
reported. Its syntax is the same as %destructor
(see section Freeing Discarded Symbols).
Invoke the braced code whenever the parser displays one of the
symbols. Within code, yyoutput
denotes the output stream
(a FILE*
in C, and an std::ostream&
in C++), $$
(or
$<tag>$
) designates the semantic value associated with the
symbol, and @$
its location. The additional parser parameters are
also available (see section The Parser Function yyparse
).
The symbols are defined as for %destructor
(see section Freeing Discarded Symbols.): they can be per-type (e.g.,
‘<ival>’), per-symbol (e.g., ‘exp’, ‘NUM’, ‘"float"’),
typed per-default (i.e., ‘<*>’, or untyped per-default (i.e.,
‘<>’).
For example:
%union { char *string; } %token <string> STRING1 STRING2 %type <string> string1 string2 %union { char character; } %token <character> CHR %type <character> chr %token TAGLESS %printer { fprintf (yyoutput, "'%c'", $$); } <character> %printer { fprintf (yyoutput, "&%p", $$); } <*> %printer { fprintf (yyoutput, "\"%s\"", $$); } STRING1 string1 %printer { fprintf (yyoutput, "<>"); } <> |
guarantees that, when the parser print any symbol that has a semantic type
tag other than <character>
, it display the address of the semantic
value by default. However, when the parser displays a STRING1
or a
string1
, it formats it as a string in double quotes. It performs
only the second %printer
in this case, so it prints only once.
Finally, the parser print ‘<>’ for any symbol, such as TAGLESS
,
that has no semantic type tag. See also
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Bison normally warns if there are any conflicts in the grammar
(see section Shift/Reduce Conflicts), but most real grammars
have harmless shift/reduce conflicts which are resolved in a predictable
way and would be difficult to eliminate. It is desirable to suppress
the warning about these conflicts unless the number of conflicts
changes. You can do this with the %expect
declaration.
The declaration looks like this:
%expect n |
Here n is a decimal integer. The declaration says there should be n shift/reduce conflicts and no reduce/reduce conflicts. Bison reports an error if the number of shift/reduce conflicts differs from n, or if there are any reduce/reduce conflicts.
For deterministic parsers, reduce/reduce conflicts are more serious, and should be eliminated entirely. Bison will always report reduce/reduce conflicts for these parsers. With GLR parsers, however, both kinds of conflicts are routine; otherwise, there would be no need to use GLR parsing. Therefore, it is also possible to specify an expected number of reduce/reduce conflicts in GLR parsers, using the declaration:
%expect-rr n |
In general, using %expect
involves these steps:
%expect
. Use the ‘-v’ option
to get a verbose list of where the conflicts occur. Bison will also
print the number of conflicts.
%expect
declaration, copying the number n from the
number which Bison printed. With GLR parsers, add an
%expect-rr
declaration as well.
Now Bison will report an error if you introduce an unexpected conflict, but will keep silent otherwise.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Bison assumes by default that the start symbol for the grammar is the first
nonterminal specified in the grammar specification section. The programmer
may override this restriction with the %start
declaration as follows:
%start symbol |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A reentrant program is one which does not alter in the course of execution; in other words, it consists entirely of pure (read-only) code. Reentrancy is important whenever asynchronous execution is possible; for example, a nonreentrant program may not be safe to call from a signal handler. In systems with multiple threads of control, a nonreentrant program must be called only within interlocks.
Normally, Bison generates a parser which is not reentrant. This is
suitable for most uses, and it permits compatibility with Yacc. (The
standard Yacc interfaces are inherently nonreentrant, because they use
statically allocated variables for communication with yylex
,
including yylval
and yylloc
.)
Alternatively, you can generate a pure, reentrant parser. The Bison declaration ‘%define api.pure’ says that you want the parser to be reentrant. It looks like this:
%define api.pure full |
The result is that the communication variables yylval
and
yylloc
become local variables in yyparse
, and a different
calling convention is used for the lexical analyzer function
yylex
. See section Calling Conventions for Pure Parsers, for the details of this. The variable yynerrs
becomes local in yyparse
in pull mode but it becomes a member
of yypstate
in push mode. (see section The Error Reporting Function yyerror
). The convention for calling
yyparse
itself is unchanged.
Whether the parser is pure has nothing to do with the grammar rules. You can generate either a pure parser or a nonreentrant parser from any valid grammar.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)
A pull parser is called once and it takes control until all its input is completely parsed. A push parser, on the other hand, is called each time a new token is made available.
A push parser is typically useful when the parser is part of a main event loop in the client’s application. This is typically a requirement of a GUI, when the main event loop needs to be triggered within a certain time period.
Normally, Bison generates a pull parser. The following Bison declaration says that you want the parser to be a push parser (see section api.push-pull):
%define api.push-pull push |
In almost all cases, you want to ensure that your push parser is also a pure parser (see section A Pure (Reentrant) Parser). The only time you should create an impure push parser is to have backwards compatibility with the impure Yacc pull mode interface. Unless you know what you are doing, your declarations should look like this:
%define api.pure full %define api.push-pull push |
There is a major notable functional difference between the pure push parser and the impure push parser. It is acceptable for a pure push parser to have many parser instances, of the same type of parser, in memory at the same time. An impure push parser should only use one parser at a time.
When a push parser is selected, Bison will generate some new symbols in
the generated parser. yypstate
is a structure that the generated
parser uses to store the parser’s state. yypstate_new
is the
function that will create a new parser instance. yypstate_delete
will free the resources associated with the corresponding parser instance.
Finally, yypush_parse
is the function that should be called whenever a
token is available to provide the parser. A trivial example
of using a pure push parser would look like this:
int status; yypstate *ps = yypstate_new (); do { status = yypush_parse (ps, yylex (), NULL); } while (status == YYPUSH_MORE); yypstate_delete (ps); |
If the user decided to use an impure push parser, a few things about
the generated parser will change. The yychar
variable becomes
a global variable instead of a variable in the yypush_parse
function.
For this reason, the signature of the yypush_parse
function is
changed to remove the token as a parameter. A nonreentrant push parser
example would thus look like this:
extern int yychar; int status; yypstate *ps = yypstate_new (); do { yychar = yylex (); status = yypush_parse (ps); } while (status == YYPUSH_MORE); yypstate_delete (ps); |
That’s it. Notice the next token is put into the global variable yychar
for use by the next invocation of the yypush_parse
function.
Bison also supports both the push parser interface along with the pull parser
interface in the same generated parser. In order to get this functionality,
you should replace the ‘%define api.push-pull push’ declaration with the
‘%define api.push-pull both’ declaration. Doing this will create all of
the symbols mentioned earlier along with the two extra symbols, yyparse
and yypull_parse
. yyparse
can be used exactly as it normally
would be used. However, the user should note that it is implemented in the
generated parser by calling yypull_parse
.
This makes the yyparse
function that is generated with the
‘%define api.push-pull both’ declaration slower than the normal
yyparse
function. If the user
calls the yypull_parse
function it will parse the rest of the input
stream. It is possible to yypush_parse
tokens to select a subgrammar
and then yypull_parse
the rest of the input stream. If you would like
to switch back and forth between between parsing styles, you would have to
write your own yypull_parse
function that knows when to quit looking
for input. An example of using the yypull_parse
function would look
like this:
yypstate *ps = yypstate_new (); yypull_parse (ps); /* Will call the lexer */ yypstate_delete (ps); |
Adding the ‘%define api.pure’ declaration does exactly the same thing to the generated parser with ‘%define api.push-pull both’ as it did for ‘%define api.push-pull push’.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here is a summary of the declarations used to define a grammar:
Declare the collection of data types that semantic values may have (see section The Union Declaration).
Declare a terminal symbol (token type name) with no precedence or associativity specified (see section Token Type Names).
Declare a terminal symbol (token type name) that is right-associative (see section Operator Precedence).
Declare a terminal symbol (token type name) that is left-associative (see section Operator Precedence).
Declare a terminal symbol (token type name) that is nonassociative (see section Operator Precedence). Using it in a way that would be associative is a syntax error.
Declare the type of semantic values for a nonterminal symbol (see section Nonterminal Symbols).
Specify the grammar’s start symbol (see section The Start-Symbol).
Declare the expected number of shift-reduce conflicts (see section Suppressing Conflict Warnings).
In order to change the behavior of bison
, use the following
directives:
Insert code verbatim into the output parser source at the default location or at the location specified by qualifier. See section %code Summary.
Instrument the parser for traces. Obsoleted by ‘%define parse.trace’. See section Tracing Your Parser.
Define a variable to adjust Bison’s behavior. See section %define Summary.
Write a parser header file containing macro definitions for the token type names defined in the grammar as well as a few other declarations. If the parser implementation file is named ‘name.c’ then the parser header file is named ‘name.h’.
For C parsers, the parser header file declares YYSTYPE
unless
YYSTYPE
is already defined as a macro or you have used a
<type>
tag without using %union
. Therefore, if
you are using a %union
(see section More Than One Value Type) with components that require other definitions, or if you
have defined a YYSTYPE
macro or type definition (see section Data Types of Semantic Values), you need to arrange for these
definitions to be propagated to all modules, e.g., by putting them in
a prerequisite header that is included both by your parser and by any
other module that needs YYSTYPE
.
Unless your parser is pure, the parser header file declares
yylval
as an external variable. See section A Pure (Reentrant) Parser.
If you have also used locations, the parser header file declares
YYLTYPE
and yylloc
using a protocol similar to that of the
YYSTYPE
macro and yylval
. See section Tracking Locations.
This parser header file is normally essential if you wish to put the
definition of yylex
in a separate source file, because
yylex
typically needs to be able to refer to the
above-mentioned declarations and to the token type codes. See section Semantic Values of Tokens.
If you have declared %code requires
or %code provides
, the output
header also contains their code.
See section %code Summary.
The generated header is protected against multiple inclusions with a C preprocessor guard: ‘YY_PREFIX_FILE_INCLUDED’, where PREFIX and FILE are the prefix (see section Multiple Parsers in the Same Program) and generated file name turned uppercase, with each series of non alphanumerical characters converted to a single underscore.
For instance with ‘%define api.prefix {calc}’ and ‘%defines "lib/parse.h"’, the header will be guarded as follows.
#ifndef YY_CALC_LIB_PARSE_H_INCLUDED # define YY_CALC_LIB_PARSE_H_INCLUDED ... #endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */ |
Same as above, but save in the file ‘defines-file’.
Specify how the parser should reclaim the memory associated to discarded symbols. See section Freeing Discarded Symbols.
Specify a prefix to use for all Bison output file names. The names are chosen as if the grammar file were named ‘prefix.y’.
Specify the programming language for the generated parser. Currently supported languages include C, C++, and Java. language is case-insensitive.
Generate the code processing the locations (see section Special Features for Use in Actions). This mode is enabled as soon as the grammar uses the special ‘@n’ tokens, but if your grammar does not use it, using ‘%locations’ allows for more accurate syntax error messages.
Rename the external symbols used in the parser so that they start with
prefix instead of ‘yy’. The precise list of symbols renamed
in C parsers
is yyparse
, yylex
, yyerror
, yynerrs
,
yylval
, yychar
, yydebug
, and
(if locations are used) yylloc
. If you use a push parser,
yypush_parse
, yypull_parse
, yypstate
,
yypstate_new
and yypstate_delete
will
also be renamed. For example, if you use ‘%name-prefix "c_"’, the
names become c_parse
, c_lex
, and so on.
For C++ parsers, see the ‘%define api.namespace’ documentation in this
section.
See section Multiple Parsers in the Same Program.
Don’t generate any #line
preprocessor commands in the parser
implementation file. Ordinarily Bison writes these commands in the
parser implementation file so that the C compiler and debuggers will
associate errors and object code with your source file (the grammar
file). This directive causes them to associate errors with the parser
implementation file, treating it as an independent source file in its
own right.
Generate the parser implementation in ‘file’.
Deprecated version of ‘%define api.pure’ (see section api.pure), for which Bison is more careful to warn about unreasonable usage.
Require version version or higher of Bison. See section Require a Version of Bison.
Specify the skeleton to use.
If file does not contain a /
, file is the name of a skeleton
file in the Bison installation directory.
If it does, file is an absolute file name or a file name relative to the
directory of the grammar file.
This is similar to how most shells resolve commands.
Generate an array of token names in the parser implementation file.
The name of the array is yytname
; yytname[i]
is
the name of the token whose internal Bison token code number is
i. The first three elements of yytname
correspond to the
predefined tokens "$end"
, "error"
, and
"$undefined"
; after these come the symbols defined in the
grammar file.
The name in the table includes all the characters needed to represent
the token in Bison. For single-character literals and literal
strings, this includes the surrounding quoting characters and any
escape sequences. For example, the Bison single-character literal
'+'
corresponds to a three-character name, represented in C as
"'+'"
; and the Bison two-character literal string "\\/"
corresponds to a five-character name, represented in C as
"\"\\\\/\""
.
When you specify %token-table
, Bison also generates macro
definitions for macros YYNTOKENS
, YYNNTS
, and
YYNRULES
, and YYNSTATES
:
YYNTOKENS
The highest token number, plus one.
YYNNTS
The number of nonterminal symbols.
YYNRULES
The number of grammar rules,
YYNSTATES
The number of parser states (see section Parser States).
Write an extra output file containing verbose descriptions of the parser states and what is done for each type of lookahead token in that state. See section Understanding Your Parser, for more information.
Pretend the option ‘--yacc’ was given, i.e., imitate Yacc, including its naming conventions. See section Bison Options, for more.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are many features of Bison’s behavior that can be controlled by
assigning the feature a single value. For historical reasons, some
such features are assigned values by dedicated directives, such as
%start
, which assigns the start symbol. However, newer such
features are associated with variables, which are assigned by the
%define
directive:
Define variable to value.
The type of the values depend on the syntax. Braces denote value in the target language (e.g., a namespace, a type, etc.). Keyword values (no delimiters) denote finite choice (e.g., a variation of a feature). String values denote remaining cases (e.g., a file name).
It is an error if a variable is defined by %define
multiple
times, but see -D name[=value].
The rest of this section summarizes variables and values that
%define
accepts.
Some variables take Boolean values. In this case, Bison will complain if the variable definition does not meet one of the following four conditions:
value
is true
value
is omitted (or ""
is specified).
This is equivalent to true
.
value
is false
.
What variables are accepted, as well as their meanings and default values, depend on the selected target language and/or the parser skeleton (see section %language, see section %skeleton). Unaccepted variables produce an error. Some of the accepted variables are described below.
%define api.namespace {foo::bar} |
Bison uses foo::bar
verbatim in references such as:
foo::bar::parser::semantic_type |
However, to open a namespace, Bison removes any leading ::
and then
splits on any remaining occurrences:
namespace foo { namespace bar { class position; class location; } } |
"::"
. For example, "foo"
or "::foo::bar"
.
%name-prefix
, which defaults to yy
.
This usage of %name-prefix
is for backward compatibility and can
be confusing since %name-prefix
also specifies the textual prefix
for the lexical analyzer function. Thus, if you specify
%name-prefix
, it is best to also specify ‘%define
api.namespace’ so that %name-prefix
only affects the
lexical analyzer function. For example, if you specify:
%define api.namespace {foo} %name-prefix "bar::" |
The parser namespace is foo
and yylex
is referenced as
bar::lex
.
location_type
for C++ in Bison 2.5 and for Java in Bison 2.4.
yy
true
, false
, full
The value may be omitted: this is equivalent to specifying true
, as is
the case for Boolean values.
When %define api.pure full
is used, the parser is made reentrant. This
changes the signature for yylex
(see section Calling Conventions for Pure Parsers), and also that of
yyerror
when the tracking of locations has been activated, as shown
below.
The true
value is very similar to the full
value, the only
difference is in the signature of yyerror
on Yacc parsers without
%parse-param
, for historical reasons.
I.e., if ‘%locations %define api.pure’ is passed then the prototypes for
yyerror
are:
void yyerror (char const *msg); // Yacc parsers. void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers. |
But if ‘%locations %define api.pure %parse-param {int *nastiness}’ is used, then both parsers have the same signature:
void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg); |
(see section The Error Reporting Function yyerror
)
false
full
value was introduced in Bison 2.7
pull
, push
, both
pull
false
%token FILE for ERROR %define api.token.prefix {TOK_} %% start: FILE for ERROR; |
generates the definition of the symbols TOK_FILE
, TOK_for
,
and TOK_ERROR
in the generated source files. In particular, the
scanner must use these prefixed token names, while the grammar itself
may still use the short names (as in the sample rule given above). The
generated informational files (‘*.output’, ‘*.xml’,
‘*.dot’) are not modified by this prefix.
Bison also prefixes the generated member names of the semantic value union. See section Generating the Semantic Value Type, for more details.
See Calc++ Parser and Calc++ Scanner, for a complete example.
This grammar has no semantic value at all. This is not properly supported yet.
The type is defined thanks to the %union
directive. You don’t have
to define api.value.type
in that case, using %union
suffices.
See section The Union Declaration.
For instance:
%define api.value.type union-directive %union { int ival; char *sval; } %token <ival> INT "integer" %token <sval> STR "string" |
The symbols are defined with type names, from which Bison will generate a
union
. For instance:
%define api.value.type union %token <int> INT "integer" %token <char *> STR "string" |
This feature needs user feedback to stabilize. Note that most C++ objects
cannot be stored in a union
.
This is similar to union
, but special storage techniques are used to
allow any kind of C++ object to be used. For instance:
%define api.value.type variant %token <int> INT "integer" %token <std::string> STR "string" |
This feature needs user feedback to stabilize. See section C++ Variants.
Use this type as semantic value.
%code requires { struct my_value { enum { is_int, is_str } kind; union { int ival; char *sval; } u; }; } %define api.value.type {struct my_value} %token <u.ival> INT "integer" %token <u.sval> STR "string" |
%union
if %union
is used, otherwise …
int
if type tags are used (i.e., ‘%token <type>…’ or
‘%token <type>…’ is used), otherwise …
""
stype
.
Obsoleted by api.location.type
since Bison 2.7.
most
, consistent
, accepting
accepting
if lr.type
is canonical-lr
.
most
otherwise.
lr.default-reductions
in 2.5, renamed as
lr.default-reduction
in 3.0.
false
lr.keep_unreachable_states
in 2.3b, renamed as
lr.keep-unreachable-states
in 2.5, and as
lr.keep-unreachable-state
in 3.0.
lalr
, ielr
, canonical-lr
lalr
Obsoleted by api.namespace
false
yyerror
.
simple
Error messages passed to yyerror
are simply "syntax
error"
.
verbose
Error messages report the unexpected token, and possibly the expected ones.
However, this report can often be incorrect when LAC is not enabled
(see section LAC).
simple
none
, full
none
In C/C++, define the macro YYDEBUG
(or prefixDEBUG
with
‘%define api.prefix {prefix}’), see Multiple Parsers in the Same Program) to 1 in the parser implementation
file if it is not already defined, so that the debugging facilities are
compiled.
false
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The %code
directive inserts code verbatim into the output
parser source at any of a predefined set of locations. It thus serves
as a flexible and user-friendly alternative to the traditional Yacc
prologue, %{code%}
. This section summarizes the
functionality of %code
for the various target languages
supported by Bison. For a detailed discussion of how to use
%code
in place of %{code%}
for C/C++ and why it
is advantageous to do so, see section Prologue Alternatives.
This is the unqualified form of the %code
directive. It
inserts code verbatim at a language-dependent default location
in the parser implementation.
For C/C++, the default location is the parser implementation file
after the usual contents of the parser header file. Thus, the
unqualified form replaces %{code%}
for most purposes.
For Java, the default location is inside the parser class.
This is the qualified form of the %code
directive.
qualifier identifies the purpose of code and thus the
location(s) where Bison should insert it. That is, if you need to
specify location-sensitive code that does not belong at the
default location selected by the unqualified %code
form, use
this form instead.
For any particular qualifier or for the unqualified form, if there are
multiple occurrences of the %code
directive, Bison concatenates
the specified code in the order in which it appears in the grammar
file.
Not all qualifiers are accepted for all target languages. Unaccepted qualifiers produce an error. Some of the accepted qualifiers are:
requires
YYSTYPE
and YYLTYPE
. In other words, it’s the best place to
define types referenced in %union
directives. If you use
#define
to override Bison’s default YYSTYPE
and YYLTYPE
definitions, then it is also the best place. However you should rather
%define
api.value.type
and api.location.type
.
YYSTYPE
and YYLTYPE
definitions.
provides
YYSTYPE
, YYLTYPE
, and
token definitions.
top
%code
or %code requires
should usually be more appropriate than %code top
. However,
occasionally it is necessary to insert code much nearer the top of the
parser implementation file. For example:
%code top { #define _GNU_SOURCE #include <stdio.h> } |
imports
Though we say the insertion locations are language-dependent, they are technically skeleton-dependent. Writers of non-standard skeletons however should choose their locations consistently with the behavior of the standard Bison skeletons.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Most programs that use Bison parse only one language and therefore contain
only one Bison parser. But what if you want to parse more than one language
with the same program? Then you need to avoid name conflicts between
different definitions of functions and variables such as yyparse
,
yylval
. To use different parsers from the same compilation unit, you
also need to avoid conflicts on types and macros (e.g., YYSTYPE
)
exported in the generated header.
The easy way to do this is to define the %define
variable
api.prefix
. With different api.prefix
s it is guaranteed that
headers do not conflict when included together, and that compiled objects
can be linked together too. Specifying ‘%define api.prefix
{prefix}’ (or passing the option ‘-Dapi.prefix={prefix}’, see
Invoking Bison) renames the interface functions and
variables of the Bison parser to start with prefix instead of
‘yy’, and all the macros to start by PREFIX (i.e., prefix
upper-cased) instead of ‘YY’.
The renamed symbols include yyparse
, yylex
, yyerror
,
yynerrs
, yylval
, yylloc
, yychar
and
yydebug
. If you use a push parser, yypush_parse
,
yypull_parse
, yypstate
, yypstate_new
and
yypstate_delete
will also be renamed. The renamed macros include
YYSTYPE
, YYLTYPE
, and YYDEBUG
, which is treated
specifically — more about this below.
For example, if you use ‘%define api.prefix {c}’, the names become
cparse
, clex
, …, CSTYPE
, CLTYPE
, and so
on.
The %define
variable api.prefix
works in two different ways.
In the implementation file, it works by adding macro definitions to the
beginning of the parser implementation file, defining yyparse
as
prefixparse
, and so on:
#define YYSTYPE CTYPE #define yyparse cparse #define yylval clval ... YYSTYPE yylval; int yyparse (void); |
This effectively substitutes one name for the other in the entire parser
implementation file, thus the “original” names (yylex
,
YYSTYPE
, …) are also usable in the parser implementation file.
However, in the parser header file, the symbols are defined renamed, for instance:
extern CSTYPE clval; int cparse (void); |
The macro YYDEBUG
is commonly used to enable the tracing support in
parsers. To comply with this tradition, when api.prefix
is used,
YYDEBUG
(not renamed) is used as a default value:
/* Debug traces. */ #ifndef CDEBUG # if defined YYDEBUG # if YYDEBUG # define CDEBUG 1 # else # define CDEBUG 0 # endif # else # define CDEBUG 0 # endif #endif #if CDEBUG extern int cdebug; #endif |
Prior to Bison 2.6, a feature similar to api.prefix
was provided by
the obsolete directive %name-prefix
(see section Bison Symbols) and the option --name-prefix
(see section Bison Options).
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.