[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ltcalc
This example extends the infix notation calculator with location tracking. This feature will be used to improve the error messages. For the sake of clarity, this example is a simple integer calculator, since most of the work needed to use locations will be done in the lexical analyzer.
2.4.1 Declarations for ltcalc | Bison and C declarations for ltcalc. | |
2.4.2 Grammar Rules for ltcalc | Grammar rules for ltcalc, with explanations. | |
2.4.3 The ltcalc Lexical Analyzer. | The lexical analyzer. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ltcalc
The C and Bison declarations for the location tracking calculator are the same as the declarations for the infix notation calculator.
/* Location tracking calculator. */ %{ #include <math.h> int yylex (void); void yyerror (char const *); %} /* Bison declarations. */ %define api.value.type {int} %token NUM %left '-' '+' %left '*' '/' %precedence NEG %right '^' %% /* The grammar follows. */ |
Note there are no declarations specific to locations. Defining a data
type for storing locations is not needed: we will use the type provided
by default (see section Data Types of Locations), which is a
four member structure with the following integer fields:
first_line
, first_column
, last_line
and
last_column
. By conventions, and in accordance with the GNU
Coding Standards and common practice, the line and column count both
start at 1.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ltcalc
Whether handling locations or not has no effect on the syntax of your language. Therefore, grammar rules for this example will be very close to those of the previous example: we will only modify them to benefit from the new information.
Here, we will use locations to report divisions by zero, and locate the wrong expressions or subexpressions.
input: %empty | input line ; line: '\n' | exp '\n' { printf ("%d\n", $1); } ; exp: NUM { $$ = $1; } | exp '+' exp { $$ = $1 + $3; } | exp '-' exp { $$ = $1 - $3; } | exp '*' exp { $$ = $1 * $3; } | exp '/' exp { if ($3) $$ = $1 / $3; else { $$ = 1; fprintf (stderr, "%d.%d-%d.%d: division by zero", @3.first_line, @3.first_column, @3.last_line, @3.last_column); } } | '-' exp %prec NEG { $$ = -$2; } | exp '^' exp { $$ = pow ($1, $3); } | '(' exp ')' { $$ = $2; } |
This code shows how to reach locations inside of semantic actions, by
using the pseudo-variables @n
for rule components, and the
pseudo-variable @$
for groupings.
We don’t need to assign a value to @$
: the output parser does it
automatically. By default, before executing the C code of each action,
@$
is set to range from the beginning of @1
to the end
of @n
, for a rule with n components. This behavior
can be redefined (see section Default Action for Locations), and for very specific rules, @$
can be computed by
hand.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ltcalc
Lexical Analyzer.Until now, we relied on Bison’s defaults to enable location tracking. The next step is to rewrite the lexical analyzer, and make it able to feed the parser with the token locations, as it already does for semantic values.
To this end, we must take into account every single character of the input text, to avoid the computed locations of being fuzzy or wrong:
int yylex (void) { int c; /* Skip white space. */ while ((c = getchar ()) == ' ' || c == '\t') ++yylloc.last_column; /* Step. */ yylloc.first_line = yylloc.last_line; yylloc.first_column = yylloc.last_column; /* Process numbers. */ if (isdigit (c)) { yylval = c - '0'; ++yylloc.last_column; while (isdigit (c = getchar ())) { ++yylloc.last_column; yylval = yylval * 10 + c - '0'; } ungetc (c, stdin); return NUM; } /* Return end-of-input. */ if (c == EOF) return 0; /* Return a single char, and update location. */ if (c == '\n') { ++yylloc.last_line; yylloc.last_column = 0; } else ++yylloc.last_column; return c; } |
Basically, the lexical analyzer performs the same processing as before:
it skips blanks and tabs, and reads numbers or single-character tokens.
In addition, it updates yylloc
, the global variable (of type
YYLTYPE
) containing the token’s location.
Now, each time this function returns a token, the parser has its number
as well as its semantic value, and its location in the text. The last
needed change is to initialize yylloc
, for example in the
controlling function:
int main (void) { yylloc.first_line = yylloc.last_line = 1; yylloc.first_column = yylloc.last_column = 0; return yyparse (); } |
Remember that computing locations is not a matter of syntax. Every character must be associated to a location update, whether it is in valid input, in comments, in literal strings, and so on.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.