[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1 Reverse Polish Notation Calculator

The first example is that of a simple double-precision reverse polish notation calculator (a calculator using postfix operators). This example provides a good starting point, since operator precedence is not an issue. The second example will illustrate how operator precedence is handled.

The source code for this calculator is named ‘rpcalc.y’. The ‘.y’ extension is a convention used for Bison grammar files.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.1 Declarations for rpcalc

Here are the C and Bison declarations for the reverse polish notation calculator. As in C, comments are placed between ‘/*…*/’.

 
/* Reverse polish notation calculator.  */

%{
  #include <stdio.h>
  #include <math.h>
  int yylex (void);
  void yyerror (char const *);
%}
%define api.value.type {double}
%token NUM

%% /* Grammar rules and actions follow.  */

The declarations section (see section The prologue) contains two preprocessor directives and two forward declarations.

The #include directive is used to declare the exponentiation function pow.

The forward declarations for yylex and yyerror are needed because the C language requires that functions be declared before they are used. These functions will be defined in the epilogue, but the parser calls them so they must be declared in the prologue.

The second section, Bison declarations, provides information to Bison about the tokens and their types (see section The Bison Declarations Section).

The %define directive defines the variable api.value.type, thus specifying the C data type for semantic values of both tokens and groupings (see section Data Types of Semantic Values). The Bison parser will use whatever type api.value.type is defined as; if you don’t define it, int is the default. Because we specify ‘{double}’, each token and each expression has an associated value, which is a floating point number. C code can use YYSTYPE to refer to the value api.value.type.

Each terminal symbol that is not a single-character literal must be declared. (Single-character literals normally don’t need to be declared.) In this example, all the arithmetic operators are designated by single-character literals, so the only terminal symbol that needs to be declared is NUM, the token type for numeric constants.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.2 Grammar Rules for rpcalc

Here are the grammar rules for the reverse polish notation calculator.

 
input:
  %empty
| input line
;
line:
  '\n'
| exp '\n'      { printf ("%.10g\n", $1); }
;
exp:
  NUM           { $$ = $1;           }
| exp exp '+'   { $$ = $1 + $2;      }
| exp exp '-'   { $$ = $1 - $2;      }
| exp exp '*'   { $$ = $1 * $2;      }
| exp exp '/'   { $$ = $1 / $2;      }
| exp exp '^'   { $$ = pow ($1, $2); }  /* Exponentiation */
| exp 'n'       { $$ = -$1;          }  /* Unary minus    */
;
%%

The groupings of the rpcalc “language” defined here are the expression (given the name exp), the line of input (line), and the complete input transcript (input). Each of these nonterminal symbols has several alternate rules, joined by the vertical bar ‘|’ which is read as “or”. The following sections explain what these rules mean.

The semantics of the language is determined by the actions taken when a grouping is recognized. The actions are the C code that appears inside braces. See section Actions.

You must specify these actions in C, but Bison provides the means for passing semantic values between the rules. In each action, the pseudo-variable $$ stands for the semantic value for the grouping that the rule is going to construct. Assigning a value to $$ is the main job of most actions. The semantic values of the components of the rule are referred to as $1, $2, and so on.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.2.1 Explanation of input

Consider the definition of input:

 
input:
  %empty
| input line
;

This definition reads as follows: “A complete input is either an empty string, or a complete input followed by an input line”. Notice that “complete input” is defined in terms of itself. This definition is said to be left recursive since input appears always as the leftmost symbol in the sequence. See section Recursive Rules.

The first alternative is empty because there are no symbols between the colon and the first ‘|’; this means that input can match an empty string of input (no tokens). We write the rules this way because it is legitimate to type Ctrl-d right after you start the calculator. It’s conventional to put an empty alternative first and to use the (optional) %empty directive, or to write the comment ‘/* empty */’ in it (see section Empty Rules).

The second alternate rule (input line) handles all nontrivial input. It means, “After reading any number of lines, read one more line if possible.” The left recursion makes this rule into a loop. Since the first alternative matches empty input, the loop can be executed zero or more times.

The parser function yyparse continues to process input until a grammatical error is seen or the lexical analyzer says there are no more input tokens; we will arrange for the latter to happen at end-of-input.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.2.2 Explanation of line

Now consider the definition of line:

 
line:
  '\n'
| exp '\n'  { printf ("%.10g\n", $1); }
;

The first alternative is a token which is a newline character; this means that rpcalc accepts a blank line (and ignores it, since there is no action). The second alternative is an expression followed by a newline. This is the alternative that makes rpcalc useful. The semantic value of the exp grouping is the value of $1 because the exp in question is the first symbol in the alternative. The action prints this value, which is the result of the computation the user asked for.

This action is unusual because it does not assign a value to $$. As a consequence, the semantic value associated with the line is uninitialized (its value will be unpredictable). This would be a bug if that value were ever used, but we don’t use it: once rpcalc has printed the value of the user’s input line, that value is no longer needed.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.2.3 Explanation of expr

The exp grouping has several rules, one for each kind of expression. The first rule handles the simplest expressions: those that are just numbers. The second handles an addition-expression, which looks like two expressions followed by a plus-sign. The third handles subtraction, and so on.

 
exp:
  NUM
| exp exp '+'     { $$ = $1 + $2;    }
| exp exp '-'     { $$ = $1 - $2;    }
…
;

We have used ‘|’ to join all the rules for exp, but we could equally well have written them separately:

 
exp: NUM ;
exp: exp exp '+'     { $$ = $1 + $2; };
exp: exp exp '-'     { $$ = $1 - $2; };
…

Most of the rules have actions that compute the value of the expression in terms of the value of its parts. For example, in the rule for addition, $1 refers to the first component exp and $2 refers to the second one. The third component, '+', has no meaningful associated semantic value, but if it had one you could refer to it as $3. When yyparse recognizes a sum expression using this rule, the sum of the two subexpressions’ values is produced as the value of the entire expression. See section Actions.

You don’t have to give an action for every rule. When a rule has no action, Bison by default copies the value of $1 into $$. This is what happens in the first rule (the one that uses NUM).

The formatting shown here is the recommended convention, but Bison does not require it. You can add or change white space as much as you wish. For example, this:

 
exp: NUM | exp exp '+' {$$ = $1 + $2; } | … ;

means the same thing as this:

 
exp:
  NUM
| exp exp '+'    { $$ = $1 + $2; }
| …
;

The latter, however, is much more readable.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.3 The rpcalc Lexical Analyzer

The lexical analyzer’s job is low-level parsing: converting characters or sequences of characters into tokens. The Bison parser gets its tokens by calling the lexical analyzer. See section The Lexical Analyzer Function yylex.

Only a simple lexical analyzer is needed for the RPN calculator. This lexical analyzer skips blanks and tabs, then reads in numbers as double and returns them as NUM tokens. Any other character that isn’t part of a number is a separate token. Note that the token-code for such a single-character token is the character itself.

The return value of the lexical analyzer function is a numeric code which represents a token type. The same text used in Bison rules to stand for this token type is also a C expression for the numeric code for the type. This works in two ways. If the token type is a character literal, then its numeric code is that of the character; you can use the same character literal in the lexical analyzer to express the number. If the token type is an identifier, that identifier is defined by Bison as a C macro whose definition is the appropriate number. In this example, therefore, NUM becomes a macro for yylex to use.

The semantic value of the token (if it has one) is stored into the global variable yylval, which is where the Bison parser will look for it. (The C data type of yylval is YYSTYPE, whose value was defined at the beginning of the grammar via ‘%define api.value.type {double}’; see section Declarations for rpcalc.)

A token type code of zero is returned if the end-of-input is encountered. (Bison recognizes any nonpositive value as indicating end-of-input.)

Here is the code for the lexical analyzer:

 
/* The lexical analyzer returns a double floating point
   number on the stack and the token NUM, or the numeric code
   of the character read if not a number.  It skips all blanks
   and tabs, and returns 0 for end-of-input.  */

#include <ctype.h>
int
yylex (void)
{
  int c;

  /* Skip white space.  */
  while ((c = getchar ()) == ' ' || c == '\t')
    continue;
  /* Process numbers.  */
  if (c == '.' || isdigit (c))
    {
      ungetc (c, stdin);
      scanf ("%lf", &yylval);
      return NUM;
    }
  /* Return end-of-input.  */
  if (c == EOF)
    return 0;
  /* Return a single char.  */
  return c;
}

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.4 The Controlling Function

In keeping with the spirit of this example, the controlling function is kept to the bare minimum. The only requirement is that it call yyparse to start the process of parsing.

 
int
main (void)
{
  return yyparse ();
}

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.5 The Error Reporting Routine

When yyparse detects a syntax error, it calls the error reporting function yyerror to print an error message (usually but not always "syntax error"). It is up to the programmer to supply yyerror (see section Parser C-Language Interface), so here is the definition we will use:

 
#include <stdio.h>

/* Called by yyparse on error.  */
void
yyerror (char const *s)
{
  fprintf (stderr, "%s\n", s);
}

After yyerror returns, the Bison parser may recover from the error and continue parsing if the grammar contains a suitable error rule (see section Error Recovery). Otherwise, yyparse returns nonzero. We have not written any error rules in this example, so any invalid input will cause the calculator program to exit. This is not clean behavior for a real calculator, but it is adequate for the first example.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.6 Running Bison to Make the Parser

Before running Bison to produce a parser, we need to decide how to arrange all the source code in one or more source files. For such a simple example, the easiest thing is to put everything in one file, the grammar file. The definitions of yylex, yyerror and main go at the end, in the epilogue of the grammar file (see section The Overall Layout of a Bison Grammar).

For a large project, you would probably have several source files, and use make to arrange to recompile them.

With all the source in the grammar file, you use the following command to convert it into a parser implementation file:

 
bison file.y

In this example, the grammar file is called ‘rpcalc.y’ (for “Reverse Polish CALCulator”). Bison produces a parser implementation file named ‘file.tab.c’, removing the ‘.y’ from the grammar file name. The parser implementation file contains the source code for yyparse. The additional functions in the grammar file (yylex, yyerror and main) are copied verbatim to the parser implementation file.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1.7 Compiling the Parser Implementation File

Here is how to compile and run the parser implementation file:

 
# List files in current directory.
$ ls
rpcalc.tab.c  rpcalc.y
# Compile the Bison parser.
# -lm’ tells compiler to search math library for pow.
$ cc -lm -o rpcalc rpcalc.tab.c
# List files again.
$ ls
rpcalc  rpcalc.tab.c  rpcalc.y

The file ‘rpcalc’ now contains the executable code. Here is an example session using rpcalc.

 
$ rpcalc
4 9 +
⇒ 13
3 7 + 3 4 5 *+-
⇒ -13
3 7 + 3 4 5 * + - n              Note the unary minus, ‘n
⇒ 13
5 6 / 4 n +
⇒ -3.166666667
3 4 ^                            Exponentiation
⇒ 81
^D                               End-of-file indicator
$

[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.