[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The first example is that of a simple double-precision reverse polish notation calculator (a calculator using postfix operators). This example provides a good starting point, since operator precedence is not an issue. The second example will illustrate how operator precedence is handled.
The source code for this calculator is named ‘rpcalc.y’. The ‘.y’ extension is a convention used for Bison grammar files.
2.1.1 Declarations for rpcalc | Prologue (declarations) for rpcalc. | |
2.1.2 Grammar Rules for rpcalc | Grammar Rules for rpcalc, with explanation. | |
2.1.3 The rpcalc Lexical Analyzer | The lexical analyzer. | |
2.1.4 The Controlling Function | The controlling function. | |
2.1.5 The Error Reporting Routine | The error reporting function. | |
2.1.6 Running Bison to Make the Parser | Running Bison on the grammar file. | |
2.1.7 Compiling the Parser Implementation File | Run the C compiler on the output code. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
rpcalc
Here are the C and Bison declarations for the reverse polish notation calculator. As in C, comments are placed between ‘/*…*/’.
/* Reverse polish notation calculator. */ %{ #include <stdio.h> #include <math.h> int yylex (void); void yyerror (char const *); %} %define api.value.type {double} %token NUM %% /* Grammar rules and actions follow. */ |
The declarations section (see section The prologue) contains two preprocessor directives and two forward declarations.
The #include
directive is used to declare the exponentiation
function pow
.
The forward declarations for yylex
and yyerror
are
needed because the C language requires that functions be declared
before they are used. These functions will be defined in the
epilogue, but the parser calls them so they must be declared in the
prologue.
The second section, Bison declarations, provides information to Bison about the tokens and their types (see section The Bison Declarations Section).
The %define
directive defines the variable api.value.type
,
thus specifying the C data type for semantic values of both tokens and
groupings (see section Data Types of Semantic Values). The Bison
parser will use whatever type api.value.type
is defined as; if you
don’t define it, int
is the default. Because we specify
‘{double}’, each token and each expression has an associated value,
which is a floating point number. C code can use YYSTYPE
to refer to
the value api.value.type
.
Each terminal symbol that is not a single-character literal must be
declared. (Single-character literals normally don’t need to be declared.)
In this example, all the arithmetic operators are designated by
single-character literals, so the only terminal symbol that needs to be
declared is NUM
, the token type for numeric constants.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
rpcalc
Here are the grammar rules for the reverse polish notation calculator.
input: %empty | input line ; line: '\n' | exp '\n' { printf ("%.10g\n", $1); } ; exp: NUM { $$ = $1; } | exp exp '+' { $$ = $1 + $2; } | exp exp '-' { $$ = $1 - $2; } | exp exp '*' { $$ = $1 * $2; } | exp exp '/' { $$ = $1 / $2; } | exp exp '^' { $$ = pow ($1, $2); } /* Exponentiation */ | exp 'n' { $$ = -$1; } /* Unary minus */ ; %% |
The groupings of the rpcalc “language” defined here are the expression
(given the name exp
), the line of input (line
), and the
complete input transcript (input
). Each of these nonterminal
symbols has several alternate rules, joined by the vertical bar ‘|’
which is read as “or”. The following sections explain what these rules
mean.
The semantics of the language is determined by the actions taken when a grouping is recognized. The actions are the C code that appears inside braces. See section Actions.
You must specify these actions in C, but Bison provides the means for
passing semantic values between the rules. In each action, the
pseudo-variable $$
stands for the semantic value for the grouping
that the rule is going to construct. Assigning a value to $$
is the
main job of most actions. The semantic values of the components of the
rule are referred to as $1
, $2
, and so on.
2.1.2.1 Explanation of input | Explanation of the input nonterminal
| |
2.1.2.2 Explanation of line | Explanation of the line nonterminal
| |
2.1.2.3 Explanation of expr | Explanation of the expr nonterminal
|
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
input
Consider the definition of input
:
input: %empty | input line ; |
This definition reads as follows: “A complete input is either an empty
string, or a complete input followed by an input line”. Notice that
“complete input” is defined in terms of itself. This definition is said
to be left recursive since input
appears always as the
leftmost symbol in the sequence. See section Recursive Rules.
The first alternative is empty because there are no symbols between the
colon and the first ‘|’; this means that input
can match an
empty string of input (no tokens). We write the rules this way because it
is legitimate to type Ctrl-d right after you start the calculator.
It’s conventional to put an empty alternative first and to use the
(optional) %empty
directive, or to write the comment ‘/* empty
*/’ in it (see section Empty Rules).
The second alternate rule (input line
) handles all nontrivial input.
It means, “After reading any number of lines, read one more line if
possible.” The left recursion makes this rule into a loop. Since the
first alternative matches empty input, the loop can be executed zero or
more times.
The parser function yyparse
continues to process input until a
grammatical error is seen or the lexical analyzer says there are no more
input tokens; we will arrange for the latter to happen at end-of-input.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
line
Now consider the definition of line
:
line: '\n' | exp '\n' { printf ("%.10g\n", $1); } ; |
The first alternative is a token which is a newline character; this means
that rpcalc accepts a blank line (and ignores it, since there is no
action). The second alternative is an expression followed by a newline.
This is the alternative that makes rpcalc useful. The semantic value of
the exp
grouping is the value of $1
because the exp
in
question is the first symbol in the alternative. The action prints this
value, which is the result of the computation the user asked for.
This action is unusual because it does not assign a value to $$
. As
a consequence, the semantic value associated with the line
is
uninitialized (its value will be unpredictable). This would be a bug if
that value were ever used, but we don’t use it: once rpcalc has printed the
value of the user’s input line, that value is no longer needed.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
expr
The exp
grouping has several rules, one for each kind of expression.
The first rule handles the simplest expressions: those that are just numbers.
The second handles an addition-expression, which looks like two expressions
followed by a plus-sign. The third handles subtraction, and so on.
exp: NUM | exp exp '+' { $$ = $1 + $2; } | exp exp '-' { $$ = $1 - $2; } … ; |
We have used ‘|’ to join all the rules for exp
, but we could
equally well have written them separately:
exp: NUM ; exp: exp exp '+' { $$ = $1 + $2; }; exp: exp exp '-' { $$ = $1 - $2; }; … |
Most of the rules have actions that compute the value of the expression in
terms of the value of its parts. For example, in the rule for addition,
$1
refers to the first component exp
and $2
refers to
the second one. The third component, '+'
, has no meaningful
associated semantic value, but if it had one you could refer to it as
$3
. When yyparse
recognizes a sum expression using this
rule, the sum of the two subexpressions’ values is produced as the value of
the entire expression. See section Actions.
You don’t have to give an action for every rule. When a rule has no
action, Bison by default copies the value of $1
into $$
.
This is what happens in the first rule (the one that uses NUM
).
The formatting shown here is the recommended convention, but Bison does not require it. You can add or change white space as much as you wish. For example, this:
exp: NUM | exp exp '+' {$$ = $1 + $2; } | … ; |
means the same thing as this:
exp: NUM | exp exp '+' { $$ = $1 + $2; } | … ; |
The latter, however, is much more readable.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
rpcalc
Lexical AnalyzerThe lexical analyzer’s job is low-level parsing: converting characters
or sequences of characters into tokens. The Bison parser gets its
tokens by calling the lexical analyzer. See section The Lexical Analyzer Function yylex
.
Only a simple lexical analyzer is needed for the RPN
calculator. This
lexical analyzer skips blanks and tabs, then reads in numbers as
double
and returns them as NUM
tokens. Any other character
that isn’t part of a number is a separate token. Note that the token-code
for such a single-character token is the character itself.
The return value of the lexical analyzer function is a numeric code which
represents a token type. The same text used in Bison rules to stand for
this token type is also a C expression for the numeric code for the type.
This works in two ways. If the token type is a character literal, then its
numeric code is that of the character; you can use the same
character literal in the lexical analyzer to express the number. If the
token type is an identifier, that identifier is defined by Bison as a C
macro whose definition is the appropriate number. In this example,
therefore, NUM
becomes a macro for yylex
to use.
The semantic value of the token (if it has one) is stored into the
global variable yylval
, which is where the Bison parser will look
for it. (The C data type of yylval
is YYSTYPE
, whose value
was defined at the beginning of the grammar via ‘%define api.value.type
{double}’; see section Declarations for rpcalc
.)
A token type code of zero is returned if the end-of-input is encountered. (Bison recognizes any nonpositive value as indicating end-of-input.)
Here is the code for the lexical analyzer:
/* The lexical analyzer returns a double floating point number on the stack and the token NUM, or the numeric code of the character read if not a number. It skips all blanks and tabs, and returns 0 for end-of-input. */ #include <ctype.h> int yylex (void) { int c; /* Skip white space. */ while ((c = getchar ()) == ' ' || c == '\t') continue; /* Process numbers. */ if (c == '.' || isdigit (c)) { ungetc (c, stdin); scanf ("%lf", &yylval); return NUM; } /* Return end-of-input. */ if (c == EOF) return 0; /* Return a single char. */ return c; } |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In keeping with the spirit of this example, the controlling function is
kept to the bare minimum. The only requirement is that it call
yyparse
to start the process of parsing.
int main (void) { return yyparse (); } |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When yyparse
detects a syntax error, it calls the error reporting
function yyerror
to print an error message (usually but not
always "syntax error"
). It is up to the programmer to supply
yyerror
(see section Parser C-Language Interface), so
here is the definition we will use:
#include <stdio.h> /* Called by yyparse on error. */ void yyerror (char const *s) { fprintf (stderr, "%s\n", s); } |
After yyerror
returns, the Bison parser may recover from the error
and continue parsing if the grammar contains a suitable error rule
(see section Error Recovery). Otherwise, yyparse
returns nonzero. We
have not written any error rules in this example, so any invalid input will
cause the calculator program to exit. This is not clean behavior for a
real calculator, but it is adequate for the first example.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Before running Bison to produce a parser, we need to decide how to
arrange all the source code in one or more source files. For such a
simple example, the easiest thing is to put everything in one file,
the grammar file. The definitions of yylex
, yyerror
and
main
go at the end, in the epilogue of the grammar file
(see section The Overall Layout of a Bison Grammar).
For a large project, you would probably have several source files, and use
make
to arrange to recompile them.
With all the source in the grammar file, you use the following command to convert it into a parser implementation file:
bison file.y |
In this example, the grammar file is called ‘rpcalc.y’ (for
“Reverse Polish CALCulator”). Bison produces a parser
implementation file named ‘file.tab.c’, removing the
‘.y’ from the grammar file name. The parser implementation file
contains the source code for yyparse
. The additional functions
in the grammar file (yylex
, yyerror
and main
) are
copied verbatim to the parser implementation file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here is how to compile and run the parser implementation file:
# List files in current directory.
$ ls
rpcalc.tab.c rpcalc.y
# Compile the Bison parser.
# ‘-lm’ tells compiler to search math library for # List files again.
$ ls
rpcalc rpcalc.tab.c rpcalc.y
|
The file ‘rpcalc’ now contains the executable code. Here is an
example session using rpcalc
.
$ rpcalc 4 9 + ⇒ 13 3 7 + 3 4 5 *+- ⇒ -13 3 7 + 3 4 5 * + - n Note the unary minus, ‘n’ ⇒ 13 5 6 / 4 n + ⇒ -3.166666667 3 4 ^ Exponentiation ⇒ 81 ^D End-of-file indicator $ |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.