| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The first example is that of a simple double-precision reverse polish notation calculator (a calculator using postfix operators). This example provides a good starting point, since operator precedence is not an issue. The second example will illustrate how operator precedence is handled.
The source code for this calculator is named ‘rpcalc.y’. The ‘.y’ extension is a convention used for Bison grammar files.
2.1.1 Declarations for rpcalc | Prologue (declarations) for rpcalc. | |
2.1.2 Grammar Rules for rpcalc | Grammar Rules for rpcalc, with explanation. | |
2.1.3 The rpcalc Lexical Analyzer | The lexical analyzer. | |
| 2.1.4 The Controlling Function | The controlling function. | |
| 2.1.5 The Error Reporting Routine | The error reporting function. | |
| 2.1.6 Running Bison to Make the Parser | Running Bison on the grammar file. | |
| 2.1.7 Compiling the Parser Implementation File | Run the C compiler on the output code. |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
rpcalcHere are the C and Bison declarations for the reverse polish notation calculator. As in C, comments are placed between ‘/*…*/’.
/* Reverse polish notation calculator. */ %{
#include <stdio.h>
#include <math.h>
int yylex (void);
void yyerror (char const *);
%}
%define api.value.type {double}
%token NUM
%% /* Grammar rules and actions follow. */
|
The declarations section (see section The prologue) contains two preprocessor directives and two forward declarations.
The #include directive is used to declare the exponentiation
function pow.
The forward declarations for yylex and yyerror are
needed because the C language requires that functions be declared
before they are used. These functions will be defined in the
epilogue, but the parser calls them so they must be declared in the
prologue.
The second section, Bison declarations, provides information to Bison about the tokens and their types (see section The Bison Declarations Section).
The %define directive defines the variable api.value.type,
thus specifying the C data type for semantic values of both tokens and
groupings (see section Data Types of Semantic Values). The Bison
parser will use whatever type api.value.type is defined as; if you
don’t define it, int is the default. Because we specify
‘{double}’, each token and each expression has an associated value,
which is a floating point number. C code can use YYSTYPE to refer to
the value api.value.type.
Each terminal symbol that is not a single-character literal must be
declared. (Single-character literals normally don’t need to be declared.)
In this example, all the arithmetic operators are designated by
single-character literals, so the only terminal symbol that needs to be
declared is NUM, the token type for numeric constants.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
rpcalcHere are the grammar rules for the reverse polish notation calculator.
input: %empty | input line ; line:
'\n'
| exp '\n' { printf ("%.10g\n", $1); }
;
exp:
NUM { $$ = $1; }
| exp exp '+' { $$ = $1 + $2; }
| exp exp '-' { $$ = $1 - $2; }
| exp exp '*' { $$ = $1 * $2; }
| exp exp '/' { $$ = $1 / $2; }
| exp exp '^' { $$ = pow ($1, $2); } /* Exponentiation */
| exp 'n' { $$ = -$1; } /* Unary minus */
;
%% |
The groupings of the rpcalc “language” defined here are the expression
(given the name exp), the line of input (line), and the
complete input transcript (input). Each of these nonterminal
symbols has several alternate rules, joined by the vertical bar ‘|’
which is read as “or”. The following sections explain what these rules
mean.
The semantics of the language is determined by the actions taken when a grouping is recognized. The actions are the C code that appears inside braces. See section Actions.
You must specify these actions in C, but Bison provides the means for
passing semantic values between the rules. In each action, the
pseudo-variable $$ stands for the semantic value for the grouping
that the rule is going to construct. Assigning a value to $$ is the
main job of most actions. The semantic values of the components of the
rule are referred to as $1, $2, and so on.
2.1.2.1 Explanation of input | Explanation of the input nonterminal
| |
2.1.2.2 Explanation of line | Explanation of the line nonterminal
| |
2.1.2.3 Explanation of expr | Explanation of the expr nonterminal
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
inputConsider the definition of input:
input: %empty | input line ; |
This definition reads as follows: “A complete input is either an empty
string, or a complete input followed by an input line”. Notice that
“complete input” is defined in terms of itself. This definition is said
to be left recursive since input appears always as the
leftmost symbol in the sequence. See section Recursive Rules.
The first alternative is empty because there are no symbols between the
colon and the first ‘|’; this means that input can match an
empty string of input (no tokens). We write the rules this way because it
is legitimate to type Ctrl-d right after you start the calculator.
It’s conventional to put an empty alternative first and to use the
(optional) %empty directive, or to write the comment ‘/* empty
*/’ in it (see section Empty Rules).
The second alternate rule (input line) handles all nontrivial input.
It means, “After reading any number of lines, read one more line if
possible.” The left recursion makes this rule into a loop. Since the
first alternative matches empty input, the loop can be executed zero or
more times.
The parser function yyparse continues to process input until a
grammatical error is seen or the lexical analyzer says there are no more
input tokens; we will arrange for the latter to happen at end-of-input.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lineNow consider the definition of line:
line:
'\n'
| exp '\n' { printf ("%.10g\n", $1); }
;
|
The first alternative is a token which is a newline character; this means
that rpcalc accepts a blank line (and ignores it, since there is no
action). The second alternative is an expression followed by a newline.
This is the alternative that makes rpcalc useful. The semantic value of
the exp grouping is the value of $1 because the exp in
question is the first symbol in the alternative. The action prints this
value, which is the result of the computation the user asked for.
This action is unusual because it does not assign a value to $$. As
a consequence, the semantic value associated with the line is
uninitialized (its value will be unpredictable). This would be a bug if
that value were ever used, but we don’t use it: once rpcalc has printed the
value of the user’s input line, that value is no longer needed.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
exprThe exp grouping has several rules, one for each kind of expression.
The first rule handles the simplest expressions: those that are just numbers.
The second handles an addition-expression, which looks like two expressions
followed by a plus-sign. The third handles subtraction, and so on.
exp:
NUM
| exp exp '+' { $$ = $1 + $2; }
| exp exp '-' { $$ = $1 - $2; }
…
;
|
We have used ‘|’ to join all the rules for exp, but we could
equally well have written them separately:
exp: NUM ;
exp: exp exp '+' { $$ = $1 + $2; };
exp: exp exp '-' { $$ = $1 - $2; };
…
|
Most of the rules have actions that compute the value of the expression in
terms of the value of its parts. For example, in the rule for addition,
$1 refers to the first component exp and $2 refers to
the second one. The third component, '+', has no meaningful
associated semantic value, but if it had one you could refer to it as
$3. When yyparse recognizes a sum expression using this
rule, the sum of the two subexpressions’ values is produced as the value of
the entire expression. See section Actions.
You don’t have to give an action for every rule. When a rule has no
action, Bison by default copies the value of $1 into $$.
This is what happens in the first rule (the one that uses NUM).
The formatting shown here is the recommended convention, but Bison does not require it. You can add or change white space as much as you wish. For example, this:
exp: NUM | exp exp '+' {$$ = $1 + $2; } | … ;
|
means the same thing as this:
exp:
NUM
| exp exp '+' { $$ = $1 + $2; }
| …
;
|
The latter, however, is much more readable.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
rpcalc Lexical AnalyzerThe lexical analyzer’s job is low-level parsing: converting characters
or sequences of characters into tokens. The Bison parser gets its
tokens by calling the lexical analyzer. See section The Lexical Analyzer Function yylex.
Only a simple lexical analyzer is needed for the RPN
calculator. This
lexical analyzer skips blanks and tabs, then reads in numbers as
double and returns them as NUM tokens. Any other character
that isn’t part of a number is a separate token. Note that the token-code
for such a single-character token is the character itself.
The return value of the lexical analyzer function is a numeric code which
represents a token type. The same text used in Bison rules to stand for
this token type is also a C expression for the numeric code for the type.
This works in two ways. If the token type is a character literal, then its
numeric code is that of the character; you can use the same
character literal in the lexical analyzer to express the number. If the
token type is an identifier, that identifier is defined by Bison as a C
macro whose definition is the appropriate number. In this example,
therefore, NUM becomes a macro for yylex to use.
The semantic value of the token (if it has one) is stored into the
global variable yylval, which is where the Bison parser will look
for it. (The C data type of yylval is YYSTYPE, whose value
was defined at the beginning of the grammar via ‘%define api.value.type
{double}’; see section Declarations for rpcalc.)
A token type code of zero is returned if the end-of-input is encountered. (Bison recognizes any nonpositive value as indicating end-of-input.)
Here is the code for the lexical analyzer:
/* The lexical analyzer returns a double floating point number on the stack and the token NUM, or the numeric code of the character read if not a number. It skips all blanks and tabs, and returns 0 for end-of-input. */ #include <ctype.h> int
yylex (void)
{
int c;
/* Skip white space. */
while ((c = getchar ()) == ' ' || c == '\t')
continue;
/* Process numbers. */
if (c == '.' || isdigit (c))
{
ungetc (c, stdin);
scanf ("%lf", &yylval);
return NUM;
}
/* Return end-of-input. */
if (c == EOF)
return 0;
/* Return a single char. */
return c;
}
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In keeping with the spirit of this example, the controlling function is
kept to the bare minimum. The only requirement is that it call
yyparse to start the process of parsing.
int
main (void)
{
return yyparse ();
}
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When yyparse detects a syntax error, it calls the error reporting
function yyerror to print an error message (usually but not
always "syntax error"). It is up to the programmer to supply
yyerror (see section Parser C-Language Interface), so
here is the definition we will use:
#include <stdio.h> /* Called by yyparse on error. */
void
yyerror (char const *s)
{
fprintf (stderr, "%s\n", s);
}
|
After yyerror returns, the Bison parser may recover from the error
and continue parsing if the grammar contains a suitable error rule
(see section Error Recovery). Otherwise, yyparse returns nonzero. We
have not written any error rules in this example, so any invalid input will
cause the calculator program to exit. This is not clean behavior for a
real calculator, but it is adequate for the first example.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Before running Bison to produce a parser, we need to decide how to
arrange all the source code in one or more source files. For such a
simple example, the easiest thing is to put everything in one file,
the grammar file. The definitions of yylex, yyerror and
main go at the end, in the epilogue of the grammar file
(see section The Overall Layout of a Bison Grammar).
For a large project, you would probably have several source files, and use
make to arrange to recompile them.
With all the source in the grammar file, you use the following command to convert it into a parser implementation file:
bison file.y |
In this example, the grammar file is called ‘rpcalc.y’ (for
“Reverse Polish CALCulator”). Bison produces a parser
implementation file named ‘file.tab.c’, removing the
‘.y’ from the grammar file name. The parser implementation file
contains the source code for yyparse. The additional functions
in the grammar file (yylex, yyerror and main) are
copied verbatim to the parser implementation file.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here is how to compile and run the parser implementation file:
# List files in current directory.
$ ls
rpcalc.tab.c rpcalc.y
# Compile the Bison parser.
# ‘-lm’ tells compiler to search math library for # List files again.
$ ls
rpcalc rpcalc.tab.c rpcalc.y
|
The file ‘rpcalc’ now contains the executable code. Here is an
example session using rpcalc.
$ rpcalc 4 9 + ⇒ 13 3 7 + 3 4 5 *+- ⇒ -13 3 7 + 3 4 5 * + - n Note the unary minus, ‘n’ ⇒ 13 5 6 / 4 n + ⇒ -3.166666667 3 4 ^ Exponentiation ⇒ 81 ^D End-of-file indicator $ |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.