10. Parsers Written In Other Languages

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1 C++ Parsers

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.1 C++ Bison Interface

The C++ deterministic parser is selected using the skeleton directive, ‘%skeleton "lalr1.cc"’, or the synonymous command-line option ‘--skeleton=lalr1.cc’. See section Bison Declaration Summary.

When run, bison will create several entities in the ‘yy’ namespace. Use the ‘%define api.namespace’ directive to change the namespace name, see api.namespace. The various classes are generated in the following files:

‘position.hh’

‘location.hh’

The definition of the classes position and location, used for location tracking when enabled. These files are not generated if the %define variable api.location.type is defined. See section C++ Location Values.

‘stack.hh’

An auxiliary class stack used by the parser.

‘file.hh’

‘file.cc’

(Assuming the extension of the grammar file was ‘.yy’.) The declaration and implementation of the C++ parser class. The basename and extension of these two files follow the same rules as with regular C parsers (see section Invoking Bison).

The header is mandatory; you must either pass ‘-d’/‘--defines’ to bison, or use the ‘%defines’ directive.

All these files are documented using Doxygen; run doxygen for a complete and accurate documentation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.2 C++ Semantic Values

Bison supports two different means to handle semantic values in C++. One is alike the C interface, and relies on unions (see section C++ Unions). As C++ practitioners know, unions are inconvenient in C++, therefore another approach is provided, based on variants (see section C++ Variants).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.2.1 C++ Unions

The %union directive works as for C, see The Union Declaration. In particular it produces a genuine union, which have a few specific features in C++.

- The type YYSTYPE is defined but its use is discouraged: rather you should refer to the parser’s encapsulated type yy::parser::semantic_type.
- Non POD (Plain Old Data) types cannot be used. C++ forbids any instance of classes with constructors in unions: only pointers to such objects are allowed.

Because objects have to be stored via pointers, memory is not reclaimed automatically: using the %destructor directive is the only means to avoid leaks. See section Freeing Discarded Symbols.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.2.2 C++ Variants

Bison provides a variant based implementation of semantic values for C++. This alleviates all the limitations reported in the previous section, and in particular, object types can be used without pointers.

To enable variant-based semantic values, set %define variable variant (see section variant). Once this defined, %union is ignored, and instead of using the name of the fields of the %union to “type” the symbols, use genuine types.

For instance, instead of

%union
{
  int ival;
  std::string* sval;
}
%token <ival> NUMBER;
%token <sval> STRING;

write

%token <int> NUMBER;
%token <std::string> STRING;

STRING is no longer a pointer, which should fairly simplify the user actions in the grammar and in the scanner (in particular the memory management).

Since C++ features destructors, and since it is customary to specialize operator<< to support uniform printing of values, variants also typically simplify Bison printers and destructors.

Variants are stricter than unions. When based on unions, you may play any dirty game with yylval, say storing an int, reading a char*, and then storing a double in it. This is no longer possible with variants: they must be initialized, then assigned to, and eventually, destroyed.

Method on semantic_type: T& build<T> (): Initialize, but leave empty. Returns the address where the actual value may be stored. Requires that the variant was not initialized yet.

Method on semantic_type: T& build<T> (const T& t): Initialize, and copy-construct from t.

Warning: We do not use Boost.Variant, for two reasons. First, it appeared unacceptable to require Boost on the user’s machine (i.e., the machine on which the generated parser will be compiled, not the machine on which bison was run). Second, for each possible semantic value, Boost.Variant not only stores the value, but also a tag specifying its type. But the parser already “knows” the type of the semantic value, so that would be duplicating the information.

Therefore we developed light-weight variants whose type tag is external (so they are really like unions for C++ actually). But our code is much less mature that Boost.Variant. So there is a number of limitations in (the current implementation of) variants:

Alignment must be enforced: values should be aligned in memory according to the most demanding type. Computing the smallest alignment possible requires meta-programming techniques that are not currently implemented in Bison, and therefore, since, as far as we know, double is the most demanding type on all platforms, alignments are enforced for double whatever types are actually used. This may waste space in some cases.
There might be portability issues we are not aware of.

As far as we know, these limitations can be alleviated. All it takes is some time and/or some talented C++ hacker willing to contribute to Bison.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.3 C++ Location Values

When the directive %locations is used, the C++ parser supports location tracking, see Tracking Locations.

By default, two auxiliary classes define a position, a single point in a file, and a location, a range composed of a pair of positions (possibly spanning several files). But if the %define variable api.location.type is defined, then these classes will not be generated, and the user defined type will be used.

In this section uint is an abbreviation for unsigned int: in genuine code only the latter is used.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.3.1 C++ `position`

Constructor on position: position (std::string* file = 0, uint line = 1, uint col = 1): Create a position denoting a given point. Note that file is not reclaimed when the position is destroyed: memory managed must be handled elsewhere.

Method on position: void initialize (std::string* file = 0, uint line = 1, uint col = 1): Reset the position to the given values.

Instance Variable of position: std::string* file: The name of the file. It will always be handled as a pointer, the parser will never duplicate nor deallocate it. As an experimental feature you may change it to ‘type*’ using ‘%define filename_type "type"’.

Instance Variable of position: uint line: The line, starting at 1.

Method on position: void lines (int height = 1): If height is not null, advance by height lines, resetting the column number. The resulting line number cannot be less than 1.

Instance Variable of position: uint column: The column, starting at 1.

Method on position: void columns (int width = 1): Advance by width columns, without changing the line number. The resulting column number cannot be less than 1.

Method on position: position& operator+= (int width)
Method on position: position operator+ (int width)
Method on position: position& operator-= (int width)
Method on position: position operator- (int width): Various forms of syntactic sugar for columns.

Method on position: bool operator== (const position& that)
Method on position: bool operator!= (const position& that): Whether *this and that denote equal/different positions.

Function: std::ostream& operator<< (std::ostream& o, const position& p): Report p on o like this: ‘file:line.column’, or ‘line.column’ if file is null.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.3.2 C++ `location`

Constructor on location: location (const position& begin, const position& end): Create a Location from the endpoints of the range.

Constructor on location: location (const position& pos = position())
Constructor on location: location (std::string* file, uint line, uint col): Create a Location denoting an empty range located at a given point.

Method on location: void initialize (std::string* file = 0, uint line = 1, uint col = 1): Reset the location to an empty range at the given values.

Instance Variable of location: position begin
Instance Variable of location: position end: The first, inclusive, position of the range, and the first beyond.

Method on location: void columns (int width = 1)
Method on location: void lines (int height = 1): Forwarded to the end position.

Method on location: location operator+ (const location& end)
Method on location: location operator+ (int width)
Method on location: location operator+= (int width)
Method on location: location operator- (int width)
Method on location: location operator-= (int width): Various forms of syntactic sugar.

Method on location: void step (): Move begin onto end.

Method on location: bool operator== (const location& that)
Method on location: bool operator!= (const location& that): Whether *this and that denote equal/different ranges of positions.

Function: std::ostream& operator<< (std::ostream& o, const location& p): Report p on o, taking care of special cases such as: no filename defined, or equal filename/line or column.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.3.3 User Defined Location Type

Instead of using the built-in types you may use the %define variable api.location.type to specify your own type:

%define api.location.type {LocationType}

The requirements over your LocationType are:

it must be copyable;
in order to compute the (default) value of @$ in a reduction, the parser basically runs
@$.begin = @$1.begin; @$.end = @$N.end; // The location of last right-hand side symbol.
so there must be copyable begin and end members;
alternatively you may redefine the computation of the default location, in which case these members are not required (see section Default Action for Locations);
if traces are enabled, then there must exist an ‘std::ostream& operator<< (std::ostream& o, const LocationType& s)’ function.

In programs with several C++ parsers, you may also use the %define variable api.location.type to share a common set of built-in definitions for position and location. For instance, one parser ‘master/parser.yy’ might use:

%defines
%locations
%define api.namespace {master::}

to generate the ‘master/position.hh’ and ‘master/location.hh’ files, reused by other parsers as follows:

%define api.location.type {master::location}
%code requires { #include <master/location.hh> }

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.4 C++ Parser Interface

The output files ‘output.hh’ and ‘output.cc’ declare and define the parser class in the namespace yy. The class name defaults to parser, but may be changed using ‘%define parser_class_name {name}’. The interface of this class is detailed below. It can be extended using the %parse-param feature: its semantics is slightly changed since it describes an additional member of the parser class, and an additional argument for its constructor.

Type of parser: semantic_type
Type of parser: location_type: The types for semantic values and locations (if enabled).

Type of parser: token: A structure that contains (only) the yytokentype enumeration, which defines the tokens. To refer to the token FOO, use yy::parser::token::FOO. The scanner can use ‘typedef yy::parser::token token;’ to “import” the token enumeration (see section Calc++ Scanner).

Type of parser: syntax_error: This class derives from std::runtime_error. Throw instances of it from the scanner or from the user actions to raise parse errors. This is equivalent with first invoking error to report the location and message of the syntax error, and then to invoke YYERROR to enter the error-recovery mode. But contrary to YYERROR which can only be invoked from user actions (i.e., written in the action itself), the exception can be thrown from function invoked from the user action.

Method on parser: parser (type1 arg1, ...): Build a new parser object. There are no arguments by default, unless ‘%parse-param {type1 arg1}’ was used.

Method on syntax_error: syntax_error (const location_type& l, const std::string& m)
Method on syntax_error: syntax_error (const std::string& m): Instantiate a syntax-error exception.

Method on parser: int parse ()

Run the syntactic analysis, and return 0 on success, 1 otherwise.

The whole function is wrapped in a try/catch block, so that when an exception is thrown, the %destructors are called to release the lookahead symbol, and the symbols pushed on the stack.

Method on parser: std::ostream& debug_stream ()
Method on parser: void set_debug_stream (std::ostream& o): Get or set the stream used for tracing the parsing. It defaults to std::cerr.

Method on parser: debug_level_type debug_level ()
Method on parser: void set_debug_level (debug_level l): Get or set the tracing level. Currently its value is either 0, no trace, or nonzero, full tracing.

Method on parser: void error (const location_type& l, const std::string& m)
Method on parser: void error (const std::string& m): The definition for this member function must be supplied by the user: the parser uses it to report a parser error occurring at l, described by m. If location tracking is not enabled, the second signature is used.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.5 C++ Scanner Interface

The parser invokes the scanner by calling yylex. Contrary to C parsers, C++ parsers are always pure: there is no point in using the ‘%define api.pure’ directive. The actual interface with yylex depends whether you use unions, or variants.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.5.1 Split Symbols

The interface is as follows.

Method on parser: int yylex (semantic_type* yylval, location_type* yylloc, type1 arg1, ...)
Method on parser: int yylex (semantic_type* yylval, type1 arg1, ...): Return the next token. Its type is the return value, its semantic value and location (if enabled) being yylval and yylloc. Invocations of ‘%lex-param {type1 arg1}’ yield additional arguments.

Note that when using variants, the interface for yylex is the same, but yylval is handled differently.

Regular union-based code in Lex scanner typically look like:

[0-9]+   {
           yylval.ival = text_to_int (yytext);
           return yy::parser::INTEGER;
         }
[a-z]+   {
           yylval.sval = new std::string (yytext);
           return yy::parser::IDENTIFIER;
         }

Using variants, yylval is already constructed, but it is not initialized. So the code would look like:

[0-9]+   {
           yylval.build<int>() = text_to_int (yytext);
           return yy::parser::INTEGER;
         }
[a-z]+   {
           yylval.build<std::string> = yytext;
           return yy::parser::IDENTIFIER;
         }

[0-9]+   {
           yylval.build(text_to_int (yytext));
           return yy::parser::INTEGER;
         }
[a-z]+   {
           yylval.build(yytext);
           return yy::parser::IDENTIFIER;
         }

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.5.2 Complete Symbols

If you specified both %define api.value.type variant and %define api.token.constructor, the parser class also defines the class parser::symbol_type which defines a complete symbol, aggregating its type (i.e., the traditional value returned by yylex), its semantic value (i.e., the value passed in yylval, and possibly its location (yylloc).

Method on symbol_type: symbol_type (token_type type, const semantic_type& value, const location_type& location): Build a complete terminal symbol which token type is type, and which semantic value is value. If location tracking is enabled, also pass the location.

This interface is low-level and should not be used for two reasons. First, it is inconvenient, as you still have to build the semantic value, which is a variant, and second, because consistency is not enforced: as with unions, it is still possible to give an integer as semantic value for a string.

So for each token type, Bison generates named constructors as follows.

Method on symbol_type: make_token (const value_type& value, const location_type& location)
Method on symbol_type: make_token (const location_type& location): Build a complete terminal symbol for the token type token (not including the api.token.prefix) whose possible semantic value is value of adequate value_type. If location tracking is enabled, also pass the location.

For instance, given the following declarations:

%define api.token.prefix {TOK_}
%token <std::string> IDENTIFIER;
%token <int> INTEGER;
%token COLON;

Bison generates the following functions:

symbol_type make_IDENTIFIER(const std::string& v,
                            const location_type& l);
symbol_type make_INTEGER(const int& v,
                         const location_type& loc);
symbol_type make_COLON(const location_type& loc);

which should be used in a Lex-scanner as follows.

[0-9]+   return yy::parser::make_INTEGER(text_to_int (yytext), loc);
[a-z]+   return yy::parser::make_IDENTIFIER(yytext, loc);
":"      return yy::parser::make_COLON(loc);

Tokens that do not have an identifier are not accessible: you cannot simply use characters such as ':', they must be declared with %token.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.6 A Complete C++ Example

This section demonstrates the use of a C++ parser with a simple but complete example. This example should be available on your system, ready to compile, in the directory .../bison/examples/calc++. It focuses on the use of Bison, therefore the design of the various C++ classes is very naive: no accessors, no encapsulation of members etc. We will use a Lex scanner, and more precisely, a Flex scanner, to demonstrate the various interactions. A hand-written scanner is actually easier to interface with.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.6.1 Calc++ — C++ Calculator

Of course the grammar is dedicated to arithmetics, a single expression, possibly preceded by variable assignments. An environment containing possibly predefined variables such as one and two, is exchanged with the parser. An example of valid input follows.

three := 3
seven := one + two * three
seven * seven

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.6.2 Calc++ Parsing Driver

To support a pure interface with the parser (and the scanner) the technique of the “parsing context” is convenient: a structure containing all the data to exchange. Since, in addition to simply launch the parsing, there are several auxiliary tasks to execute (open the file for parsing, instantiate the parser etc.), we recommend transforming the simple parsing context structure into a fully blown parsing driver class.

The declaration of this driver class, ‘calc++-driver.hh’, is as follows. The first part includes the CPP guard and imports the required standard library components, and the declaration of the parser class.

#ifndef CALCXX_DRIVER_HH
# define CALCXX_DRIVER_HH
# include <string>
# include <map>
# include "calc++-parser.hh"

Then comes the declaration of the scanning function. Flex expects the signature of yylex to be defined in the macro YY_DECL, and the C++ parser expects it to be declared. We can factor both as follows.

// Tell Flex the lexer's prototype ...
# define YY_DECL \
  yy::calcxx_parser::symbol_type yylex (calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;

The calcxx_driver class is then declared with its most obvious members.

// Conducting the whole scanning and parsing of Calc++.
class calcxx_driver
{
public:
  calcxx_driver ();
  virtual ~calcxx_driver ();

  std::map<std::string, int> variables;

  int result;

To encapsulate the coordination with the Flex scanner, it is useful to have member functions to open and close the scanning phase.

  // Handling the scanner.
  void scan_begin ();
  void scan_end ();
  bool trace_scanning;

Similarly for the parser itself.

  // Run the parser on file F.
  // Return 0 on success.
  int parse (const std::string& f);
  // The name of the file being parsed.
  // Used later to pass the file name to the location tracker.
  std::string file;
  // Whether parser traces should be generated.
  bool trace_parsing;

To demonstrate pure handling of parse errors, instead of simply dumping them on the standard error output, we will pass them to the compiler driver using the following two member functions. Finally, we close the class declaration and CPP guard.

  // Error handling.
  void error (const yy::location& l, const std::string& m);
  void error (const std::string& m);
};
#endif // ! CALCXX_DRIVER_HH

The implementation of the driver is straightforward. The parse member function deserves some attention. The error functions are simple stubs, they should actually register the located error messages and set error state.

#include "calc++-driver.hh"
#include "calc++-parser.hh"

calcxx_driver::calcxx_driver ()
  : trace_scanning (false), trace_parsing (false)
{
  variables["one"] = 1;
  variables["two"] = 2;
}

calcxx_driver::~calcxx_driver ()
{
}

int
calcxx_driver::parse (const std::string &f)
{
  file = f;
  scan_begin ();
  yy::calcxx_parser parser (*this);
  parser.set_debug_level (trace_parsing);
  int res = parser.parse ();
  scan_end ();
  return res;
}

void
calcxx_driver::error (const yy::location& l, const std::string& m)
{
  std::cerr << l << ": " << m << std::endl;
}

void
calcxx_driver::error (const std::string& m)
{
  std::cerr << m << std::endl;
}

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.6.3 Calc++ Parser

The grammar file ‘calc++-parser.yy’ starts by asking for the C++ deterministic parser skeleton, the creation of the parser header file, and specifies the name of the parser class. Because the C++ skeleton changed several times, it is safer to require the version you designed the grammar for.

%skeleton "lalr1.cc" /* -*- C++ -*- */
%require "3.0.2"
%defines
%define parser_class_name {calcxx_parser}

This example will use genuine C++ objects as semantic values, therefore, we require the variant-based interface. To make sure we properly use it, we enable assertions. To fully benefit from type-safety and more natural definition of “symbol”, we enable api.token.constructor.

%define api.token.constructor
%define api.value.type variant
%define parse.assert

Then come the declarations/inclusions needed by the semantic values. Because the parser uses the parsing driver and reciprocally, both would like to include the header of the other, which is, of course, insane. This mutual dependency will be broken using forward declarations. Because the driver’s header needs detailed knowledge about the parser class (in particular its inner types), it is the parser’s header which will use a forward declaration of the driver. See section %code Summary.

%code requires
{
# include <string>
class calcxx_driver;
}

The driver is passed by reference to the parser and to the scanner. This provides a simple but effective pure interface, not relying on global variables.

// The parsing context.
%param { calcxx_driver& driver }

Then we request location tracking, and initialize the first location’s file name. Afterward new locations are computed relatively to the previous locations: the file name will be propagated.

%locations
%initial-action
{
  // Initialize the initial location.
  @$.begin.filename = @$.end.filename = &driver.file;
};

Use the following two directives to enable parser tracing and verbose error messages. However, verbose error messages can contain incorrect information (see section LAC).

%define parse.trace
%define parse.error verbose

The code between ‘%code {’ and ‘}’ is output in the ‘*.cc’ file; it needs detailed knowledge about the driver.

%code
{
# include "calc++-driver.hh"
}

The token numbered as 0 corresponds to end of file; the following line allows for nicer error messages referring to “end of file” instead of “$end”. Similarly user friendly names are provided for each symbol. To avoid name clashes in the generated files (see section Calc++ Scanner), prefix tokens with TOK_ (see section api.token.prefix).

%define api.token.prefix {TOK_}
%token
  END  0  "end of file"
  ASSIGN  ":="
  MINUS   "-"
  PLUS    "+"
  STAR    "*"
  SLASH   "/"
  LPAREN  "("
  RPAREN  ")"
;

Since we use variant-based semantic values, %union is not used, and both %type and %token expect genuine types, as opposed to type tags.

%token <std::string> IDENTIFIER "identifier"
%token <int> NUMBER "number"
%type  <int> exp

No %destructor is needed to enable memory deallocation during error recovery; the memory, for strings for instance, will be reclaimed by the regular destructors. All the values are printed using their operator<< (see section Printing Semantic Values).

%printer { yyoutput << $$; } <*>;

The grammar itself is straightforward (see section Location Tracking Calculator: ltcalc).

%%
%start unit;
unit: assignments exp  { driver.result = $2; };

assignments:
  %empty                 {}
| assignments assignment {};

assignment:
  "identifier" ":=" exp { driver.variables[$1] = $3; };

%left "+" "-";
%left "*" "/";
exp:
  exp "+" exp   { $$ = $1 + $3; }
| exp "-" exp   { $$ = $1 - $3; }
| exp "*" exp   { $$ = $1 * $3; }
| exp "/" exp   { $$ = $1 / $3; }
| "(" exp ")"   { std::swap ($$, $2); }
| "identifier"  { $$ = driver.variables[$1]; }
| "number"      { std::swap ($$, $1); };
%%

Finally the error member function registers the errors to the driver.

void
yy::calcxx_parser::error (const location_type& l,
                          const std::string& m)
{
  driver.error (l, m);
}

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.6.4 Calc++ Scanner

The Flex scanner first includes the driver declaration, then the parser’s to get the set of defined tokens.

%{ /* -*- C++ -*- */
# include <cerrno>
# include <climits>
# include <cstdlib>
# include <string>
# include "calc++-driver.hh"
# include "calc++-parser.hh"

// Work around an incompatibility in flex (at least versions
// 2.5.31 through 2.5.33): it generates code that does
// not conform to C89.  See Debian bug 333231
// <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>.
# undef yywrap
# define yywrap() 1

// The location of the current token.
static yy::location loc;
%}

Because there is no #include-like feature we don’t need yywrap, we don’t need unput either, and we parse an actual file, this is not an interactive session with the user. Finally, we enable scanner tracing.

%option noyywrap nounput batch debug noinput

Abbreviations allow for more readable rules.

id    [a-zA-Z][a-zA-Z_0-9]*
int   [0-9]+
blank [ \t]

The following paragraph suffices to track locations accurately. Each time yylex is invoked, the begin position is moved onto the end position. Then when a pattern is matched, its width is added to the end column. When matching ends of lines, the end cursor is adjusted, and each time blanks are matched, the begin cursor is moved onto the end cursor to effectively ignore the blanks preceding tokens. Comments would be treated equally.

%{
  // Code run each time a pattern is matched.
  # define YY_USER_ACTION  loc.columns (yyleng);
%}

%%

%{
  // Code run each time yylex is called.
  loc.step ();
%}

{blank}+   loc.step ();
[\n]+      loc.lines (yyleng); loc.step ();

The rules are simple. The driver is used to report errors.

"-"      return yy::calcxx_parser::make_MINUS(loc);
"+"      return yy::calcxx_parser::make_PLUS(loc);
"*"      return yy::calcxx_parser::make_STAR(loc);
"/"      return yy::calcxx_parser::make_SLASH(loc);
"("      return yy::calcxx_parser::make_LPAREN(loc);
")"      return yy::calcxx_parser::make_RPAREN(loc);
":="     return yy::calcxx_parser::make_ASSIGN(loc);

{int}      {
  errno = 0;
  long n = strtol (yytext, NULL, 10);
  if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
    driver.error (loc, "integer is out of range");
  return yy::calcxx_parser::make_NUMBER(n, loc);
}

{id}       return yy::calcxx_parser::make_IDENTIFIER(yytext, loc);
.          driver.error (loc, "invalid character");
<<EOF>>    return yy::calcxx_parser::make_END(loc);
%%

Finally, because the scanner-related driver’s member-functions depend on the scanner’s data, it is simpler to implement them in this file.

void
calcxx_driver::scan_begin ()
{
  yy_flex_debug = trace_scanning;
  if (file.empty () || file == "-")
    yyin = stdin;
  else if (!(yyin = fopen (file.c_str (), "r")))
    {
      error ("cannot open " + file + ": " + strerror(errno));
      exit (EXIT_FAILURE);
    }
}

void
calcxx_driver::scan_end ()
{
  fclose (yyin);
}

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.1.6.5 Calc++ Top Level

The top level file, ‘calc++.cc’, poses no problem.

#include <iostream>
#include "calc++-driver.hh"

int
main (int argc, char *argv[])
{
  int res = 0;
  calcxx_driver driver;
  for (int i = 1; i < argc; ++i)
    if (argv[i] == std::string ("-p"))
      driver.trace_parsing = true;
    else if (argv[i] == std::string ("-s"))
      driver.trace_scanning = true;
    else if (!driver.parse (argv[i]))
      std::cout << driver.result << std::endl;
    else
      res = 1;
  return res;
}

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2 Java Parsers

10.2.1 Java Bison Interface		Asking for Java parser generation
10.2.2 Java Semantic Values		%type and %token vs. Java
10.2.3 Java Location Values		The position and location classes
10.2.4 Java Parser Interface		Instantiating and running the parser
10.2.5 Java Scanner Interface		Specifying the scanner for the parser
10.2.6 Special Features for Use in Java Actions		Special features for use in actions
10.2.7 Java Push Parser Interface		Instantiating and running the a push parser
10.2.8 Differences between C/C++ and Java Grammars
10.2.9 Java Declarations Summary		List of Bison declarations used with Java

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.1 Java Bison Interface

(The current Java interface is experimental and may evolve. More user feedback will help to stabilize it.)

The Java parser skeletons are selected using the %language "Java" directive or the ‘-L java’/‘--language=java’ option.

When generating a Java parser, bison basename.y will create a single Java source file named ‘basename.java’ containing the parser implementation. Using a grammar file without a ‘.y’ suffix is currently broken. The basename of the parser implementation file can be changed by the %file-prefix directive or the ‘-p’/‘--name-prefix’ option. The entire parser implementation file name can be changed by the %output directive or the ‘-o’/‘--output’ option. The parser implementation file contains a single class for the parser.

You can create documentation for generated parsers using Javadoc.

Contrary to C parsers, Java parsers do not use global variables; the state of the parser is always local to an instance of the parser class. Therefore, all Java parsers are “pure”, and the %pure-parser and %define api.pure directives do nothing when used in Java.

Push parsers are currently unsupported in Java and %define api.push-pull have no effect.

GLR parsers are currently unsupported in Java. Do not use the glr-parser directive.

No header file can be generated for Java parsers. Do not use the %defines directive or the ‘-d’/‘--defines’ options.

Currently, support for tracing is always compiled in. Thus the ‘%define parse.trace’ and ‘%token-table’ directives and the ‘-t’/‘--debug’ and ‘-k’/‘--token-table’ options have no effect. This may change in the future to eliminate unused code in the generated parser, so use ‘%define parse.trace’ explicitly if needed. Also, in the future the %token-table directive might enable a public interface to access the token names and codes.

Getting a “code too large” error from the Java compiler means the code hit the 64KB bytecode per method limitation of the Java class file. Try reducing the amount of code in actions and static initializers; otherwise, report a bug so that the parser skeleton will be improved.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.2 Java Semantic Values

There is no %union directive in Java parsers. Instead, the semantic values’ types (class names) should be specified in the %type or %token directive:

%type <Expression> expr assignment_expr term factor
%type <Integer> number

By default, the semantic stack is declared to have Object members, which means that the class types you specify can be of any class. To improve the type safety of the parser, you can declare the common superclass of all the semantic values using the ‘%define api.value.type’ directive. For example, after the following declaration:

%define api.value.type {ASTNode}

any %type or %token specifying a semantic type which is not a subclass of ASTNode, will cause a compile-time error.

Types used in the directives may be qualified with a package name. Primitive data types are accepted for Java version 1.5 or later. Note that in this case the autoboxing feature of Java 1.5 will be used. Generic types may not be used; this is due to a limitation in the implementation of Bison, and may change in future releases.

Java parsers do not support %destructor, since the language adopts garbage collection. The parser will try to hold references to semantic values for as little time as needed.

Java parsers do not support %printer, as toString() can be used to print the semantic values. This however may change (in a backwards-compatible way) in future versions of Bison.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.3 Java Location Values

When the directive %locations is used, the Java parser supports location tracking, see Tracking Locations. An auxiliary user-defined class defines a position, a single point in a file; Bison itself defines a class representing a location, a range composed of a pair of positions (possibly spanning several files). The location class is an inner class of the parser; the name is Location by default, and may also be renamed using %define api.location.type {class-name}.

The location class treats the position as a completely opaque value. By default, the class name is Position, but this can be changed with %define api.position.type {class-name}. This class must be supplied by the user.

Instance Variable of Location: Position begin
Instance Variable of Location: Position end: The first, inclusive, position of the range, and the first beyond.

Constructor on Location: Location (Position loc): Create a Location denoting an empty range located at a given point.

Constructor on Location: Location (Position begin, Position end): Create a Location from the endpoints of the range.

Method on Location: String toString (): Prints the range represented by the location. For this to work properly, the position class should override the equals and toString methods appropriately.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.4 Java Parser Interface

The name of the generated parser class defaults to YYParser. The YY prefix may be changed using the %name-prefix directive or the ‘-p’/‘--name-prefix’ option. Alternatively, use ‘%define parser_class_name {name}’ to give a custom name to the class. The interface of this class is detailed below.

By default, the parser class has package visibility. A declaration ‘%define public’ will change to public visibility. Remember that, according to the Java language specification, the name of the ‘.java’ file should match the name of the class in this case. Similarly, you can use abstract, final and strictfp with the %define declaration to add other modifiers to the parser class. A single ‘%define annotations {annotations}’ directive can be used to add any number of annotations to the parser class.

The Java package name of the parser class can be specified using the ‘%define package’ directive. The superclass and the implemented interfaces of the parser class can be specified with the %define extends and ‘%define implements’ directives.

The parser class defines an inner class, Location, that is used for location tracking (see Java Location Values), and a inner interface, Lexer (see Java Scanner Interface). Other than these inner class/interface, and the members described in the interface below, all the other members and fields are preceded with a yy or YY prefix to avoid clashes with user code.

The parser class can be extended using the %parse-param directive. Each occurrence of the directive will add a protected final field to the parser class, and an argument to its constructor, which initialize them automatically.

Constructor on YYParser: YYParser (lex_param, …, parse_param, …)

Build a new parser object with embedded %code lexer. There are no parameters, unless %params and/or %parse-params and/or %lex-params are used.

Use %code init for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use ‘%define init_throws’ to specify any uncaught exceptions.

Constructor on YYParser: YYParser (Lexer lexer, parse_param, …)

Build a new parser object using the specified scanner. There are no additional parameters unless %params and/or %parse-params are used.

If the scanner is defined by %code lexer, this constructor is declared protected and is called automatically with a scanner created with the correct %params and/or %lex-params.

Use %code init for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use ‘%define init_throws’ to specify any uncaught exceptions.

Method on YYParser: boolean parse (): Run the syntactic analysis, and return true on success, false otherwise.

Method on YYParser: boolean getErrorVerbose ()
Method on YYParser: void setErrorVerbose (boolean verbose): Get or set the option to produce verbose error messages. These are only available with ‘%define parse.error verbose’, which also turns on verbose error messages.

Method on YYParser: void yyerror (String msg)
Method on YYParser: void yyerror (Position pos, String msg)
Method on YYParser: void yyerror (Location loc, String msg): Print an error message using the yyerror method of the scanner instance in use. The Location and Position parameters are available only if location tracking is active.

Method on YYParser: boolean recovering (): During the syntactic analysis, return true if recovering from a syntax error. See section Error Recovery.

Method on YYParser: java.io.PrintStream getDebugStream ()
Method on YYParser: void setDebugStream (java.io.printStream o): Get or set the stream used for tracing the parsing. It defaults to System.err.

Method on YYParser: int getDebugLevel ()
Method on YYParser: void setDebugLevel (int l): Get or set the tracing level. Currently its value is either 0, no trace, or nonzero, full tracing.

Constant of YYParser: String bisonVersion
Constant of YYParser: String bisonSkeleton: Identify the Bison version and skeleton used to generate this parser.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.5 Java Scanner Interface

There are two possible ways to interface a Bison-generated Java parser with a scanner: the scanner may be defined by %code lexer, or defined elsewhere. In either case, the scanner has to implement the Lexer inner interface of the parser class. This interface also contain constants for all user-defined token names and the predefined EOF token.

In the first case, the body of the scanner class is placed in %code lexer blocks. If you want to pass parameters from the parser constructor to the scanner constructor, specify them with %lex-param; they are passed before %parse-params to the constructor.

In the second case, the scanner has to implement the Lexer interface, which is defined within the parser class (e.g., YYParser.Lexer). The constructor of the parser object will then accept an object implementing the interface; %lex-param is not used in this case.

In both cases, the scanner has to implement the following methods.

Method on Lexer: void yyerror (Location loc, String msg): This method is defined by the user to emit an error message. The first parameter is omitted if location tracking is not active. Its type can be changed using %define api.location.type {class-name}.

Method on Lexer: int yylex ()

Return the next token. Its type is the return value, its semantic value and location are saved and returned by the their methods in the interface.

Use ‘%define lex_throws’ to specify any uncaught exceptions. Default is java.io.IOException.

Method on Lexer: Position getStartPos ()

Method on Lexer: Position getEndPos ()

Return respectively the first position of the last token that yylex returned, and the first position beyond it. These methods are not needed unless location tracking is active.

The return type can be changed using %define api.position.type {class-name}.

Method on Lexer: Object getLVal ()

Return the semantic value of the last token that yylex returned.

The return type can be changed using ‘%define api.value.type {class-name}’.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.6 Special Features for Use in Java Actions

The following special constructs can be uses in Java actions. Other analogous C action features are currently unavailable for Java.

Use ‘%define throws’ to specify any uncaught exceptions from parser actions, and initial actions specified by %initial-action.

Variable: $n: The semantic value for the nth component of the current rule. This may not be assigned to. See section Java Semantic Values.

Variable: $<typealt>n: Like $n but specifies a alternative type typealt. See section Java Semantic Values.

Variable: $$: The semantic value for the grouping made by the current rule. As a value, this is in the base type (Object or as specified by ‘%define api.value.type’) as in not cast to the declared subtype because casts are not allowed on the left-hand side of Java assignments. Use an explicit Java cast if the correct subtype is needed. See section Java Semantic Values.

Variable: $<typealt>$: Same as $$ since Java always allow assigning to the base type. Perhaps we should use this and $<>$ for the value and $$ for setting the value but there is currently no easy way to distinguish these constructs. See section Java Semantic Values.

Variable: @n: The location information of the nth component of the current rule. This may not be assigned to. See section Java Location Values.

Variable: @$: The location information of the grouping made by the current rule. See section Java Location Values.

Statement: return YYABORT ;: Return immediately from the parser, indicating failure. See section Java Parser Interface.

Statement: return YYACCEPT ;: Return immediately from the parser, indicating success. See section Java Parser Interface.

Statement: return YYERROR ;: Start error recovery (without printing an error message). See section Error Recovery.

Function: boolean recovering (): Return whether error recovery is being done. In this state, the parser reads token until it reaches a known state, and then restarts normal operation. See section Error Recovery.

Function: void yyerror (String msg)
Function: void yyerror (Position loc, String msg)
Function: void yyerror (Location loc, String msg): Print an error message using the yyerror method of the scanner instance in use. The Location and Position parameters are available only if location tracking is active.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.7 Java Push Parser Interface

(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)

Normally, Bison generates a pull parser for Java. The following Bison declaration says that you want the parser to be a push parser (see section api.push-pull):

%define api.push-pull push

Most of the discussion about the Java pull Parser Interface, (see section Java Parser Interface) applies to the push parser interface as well.

When generating a push parser, the method push_parse is created with the following signature (depending on if locations are enabled).

Method on YYParser: void push_parse (int token, Object yylval)
Method on YYParser: void push_parse (int token, Object yylval, Location yyloc)
Method on YYParser: void push_parse (int token, Object yylval, Position yypos)

The primary difference with respect to a pull parser is that the parser method push_parse is invoked repeatedly to parse each token. This function is available if either the "%define api.push-pull push" or "%define api.push-pull both" declaration is used (see section api.push-pull). The Location and Position parameters are available only if location tracking is active.

The value returned by the push_parse method is one of the following four constants: YYABORT, YYACCEPT, YYERROR, or YYPUSH_MORE. This new value, YYPUSH_MORE, may be returned if more input is required to finish parsing the grammar.

If api.push-pull is declared as both, then the generated parser class will also implement the parse method. This method’s body is a loop that repeatedly invokes the scanner and then passes the values obtained from the scanner to the push_parse method.

There is one additional complication. Technically, the push parser does not need to know about the scanner (i.e. an object implementing the YYParser.Lexer interface), but it does need access to the yyerror method. Currently, the yyerror method is defined in the YYParser.Lexer interface. Hence, an implementation of that interface is still required in order to provide an implementation of yyerror. The current approach (and subject to change) is to require the YYParser constructor to be given an object implementing the YYParser.Lexer interface. This object need only implement the yyerror method; the other methods can be stubbed since they will never be invoked. The simplest way to do this is to add a trivial scanner implementation to your grammar file using whatever implementation of yyerror is desired. The following code sample shows a simple way to accomplish this.

%code lexer
{
  public Object getLVal () {return null;}
  public int yylex () {return 0;}
  public void yyerror (String s) {System.err.println(s);}
}

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.8 Differences between C/C++ and Java Grammars

The different structure of the Java language forces several differences between C/C++ grammars, and grammars designed for Java parsers. This section summarizes these differences.

Java lacks a preprocessor, so the YYERROR, YYACCEPT, YYABORT symbols (see section Bison Symbols) cannot obviously be macros. Instead, they should be preceded by return when they appear in an action. The actual definition of these symbols is opaque to the Bison grammar, and it might change in the future. The only meaningful operation that you can do, is to return them. See section Special Features for Use in Java Actions.
Note that of these three symbols, only YYACCEPT and YYABORT will cause a return from the yyparse method(1).
Java lacks unions, so %union has no effect. Instead, semantic values have a common base type: Object or as specified by ‘%define api.value.type’. Angle brackets on %token, type, $n and $$ specify subtypes rather than fields of an union. The type of $$, even with angle brackets, is the base type since Java casts are not allow on the left-hand side of assignments. Also, $n and @n are not allowed on the left-hand side of assignments. See section Java Semantic Values, and Special Features for Use in Java Actions.
The prologue declarations have a different meaning than in C/C++ code.

%code imports

blocks are placed at the beginning of the Java source code. They may include copyright notices. For a package declarations, it is suggested to use ‘%define package’ instead.

unqualified %code

blocks are placed inside the parser class.

%code lexer

blocks, if specified, should include the implementation of the scanner. If there is no such block, the scanner can be any class that implements the appropriate interface (see section Java Scanner Interface).

Other %code blocks are not supported in Java parsers. In particular, %{ … %} blocks should not be used and may give an error in future versions of Bison.

The epilogue has the same meaning as in C/C++ code and it can be used to define other classes used by the parser outside the parser class.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

10.2.9 Java Declarations Summary

This summary only include declarations specific to Java or have special meaning when used in a Java parser.

Directive: %language "Java": Generate a Java class for the parser.

Directive: %lex-param {type name}: A parameter for the lexer class defined by %code lexer only, added as parameters to the lexer constructor and the parser constructor that creates a lexer. Default is none. See section Java Scanner Interface.

Directive: %name-prefix "prefix": The prefix of the parser class name prefixParser if ‘%define parser_class_name’ is not used. Default is YY. See section Java Bison Interface.

Directive: %parse-param {type name}: A parameter for the parser class added as parameters to constructor(s) and as fields initialized by the constructor(s). Default is none. See section Java Parser Interface.

Directive: %token <type> token …: Declare tokens. Note that the angle brackets enclose a Java type. See section Java Semantic Values.

Directive: %type <type> nonterminal …: Declare the type of nonterminals. Note that the angle brackets enclose a Java type. See section Java Semantic Values.

Directive: %code { code … }: Code appended to the inside of the parser class. See section Differences between C/C++ and Java Grammars.

Directive: %code imports { code … }: Code inserted just after the package declaration. See section Differences between C/C++ and Java Grammars.

Directive: %code init { code … }: Code inserted at the beginning of the parser constructor body. See section Java Parser Interface.

Directive: %code lexer { code … }: Code added to the body of a inner lexer class within the parser class. See section Java Scanner Interface.

Directive: %% code …: Code (after the second %%) appended to the end of the file, outside the parser class. See section Differences between C/C++ and Java Grammars.

Directive: %{ code … %}: Not supported. Use %code imports instead. See section Differences between C/C++ and Java Grammars.

Directive: %define abstract: Whether the parser class is declared abstract. Default is false. See section Java Bison Interface.

Directive: %define annotations {annotations}: The Java annotations for the parser class. Default is none. See section Java Bison Interface.

Directive: %define extends {superclass}: The superclass of the parser class. Default is none. See section Java Bison Interface.

Directive: %define final: Whether the parser class is declared final. Default is false. See section Java Bison Interface.

Directive: %define implements {interfaces}: The implemented interfaces of the parser class, a comma-separated list. Default is none. See section Java Bison Interface.

Directive: %define init_throws {exceptions}: The exceptions thrown by %code init from the parser class constructor. Default is none. See section Java Parser Interface.

Directive: %define lex_throws {exceptions}: The exceptions thrown by the yylex method of the lexer, a comma-separated list. Default is java.io.IOException. See section Java Scanner Interface.

Directive: %define api.location.type {class}: The name of the class used for locations (a range between two positions). This class is generated as an inner class of the parser class by bison. Default is Location. Formerly named location_type. See section Java Location Values.

Directive: %define package {package}: The package to put the parser class in. Default is none. See section Java Bison Interface.

Directive: %define parser_class_name {name}: The name of the parser class. Default is YYParser or name-prefixParser. See section Java Bison Interface.

Directive: %define api.position.type {class}: The name of the class used for positions. This class must be supplied by the user. Default is Position. Formerly named position_type. See section Java Location Values.

Directive: %define public: Whether the parser class is declared public. Default is false. See section Java Bison Interface.

Directive: %define api.value.type {class}: The base type of semantic values. Default is Object. See section Java Semantic Values.

Directive: %define strictfp: Whether the parser class is declared strictfp. Default is false. See section Java Bison Interface.

Directive: %define throws {exceptions}: The exceptions thrown by user-supplied parser actions and %initial-action, a comma-separated list. Default is none. See section Java Parser Interface.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.

10.1.3.1 C++ `position`		One point in the source file
10.1.3.2 C++ `location`		Two points in the source file
10.1.3.3 User Defined Location Type		Required interface for locations

10.1 C++ Parsers		The interface to generate C++ parser classes
10.2 Java Parsers		The interface to generate Java parser classes

10.1.1 C++ Bison Interface		Asking for C++ parser generation
10.1.2 C++ Semantic Values		%union vs. C++
10.1.3 C++ Location Values		The position and location classes
10.1.4 C++ Parser Interface		Instantiating and running the parser
10.1.5 C++ Scanner Interface		Exchanges between yylex and parse
10.1.6 A Complete C++ Example		Demonstrating their use

10.1.2.1 C++ Unions		Semantic values cannot be objects
10.1.2.2 C++ Variants		Using objects as semantic values

10.1.5.1 Split Symbols		Passing symbols as two/three components
10.1.5.2 Complete Symbols		Making symbols a whole

10.1.6.1 Calc++ — C++ Calculator		The specifications
10.1.6.2 Calc++ Parsing Driver		An active parsing context
10.1.6.3 Calc++ Parser		A parser class
10.1.6.4 Calc++ Scanner		A pure C++ Flex scanner
10.1.6.5 Calc++ Top Level		Conducting the band

10. Parsers Written In Other Languages

10.1 C++ Parsers

10.1.1 C++ Bison Interface

10.1.2 C++ Semantic Values

10.1.2.1 C++ Unions

10.1.2.2 C++ Variants

10.1.3 C++ Location Values

10.1.3.1 C++ position

10.1.3.2 C++ location

10.1.3.3 User Defined Location Type

10.1.4 C++ Parser Interface

10.1.5 C++ Scanner Interface

10.1.5.1 Split Symbols

10.1.5.2 Complete Symbols

10.1.6 A Complete C++ Example

10.1.6.1 Calc++ — C++ Calculator

10.1.6.2 Calc++ Parsing Driver

10.1.6.3 Calc++ Parser

10.1.6.4 Calc++ Scanner

10.1.6.5 Calc++ Top Level

10.2 Java Parsers

10.2.1 Java Bison Interface

10.2.2 Java Semantic Values

10.2.3 Java Location Values

10.2.4 Java Parser Interface

10.2.5 Java Scanner Interface

10.2.6 Special Features for Use in Java Actions

10.2.7 Java Push Parser Interface

10.2.8 Differences between C/C++ and Java Grammars

10.2.9 Java Declarations Summary

10.1.3.1 C++ `position`

10.1.3.2 C++ `location`