[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2 Java Parsers


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.1 Java Bison Interface

(The current Java interface is experimental and may evolve. More user feedback will help to stabilize it.)

The Java parser skeletons are selected using the %language "Java" directive or the ‘-L java’/‘--language=java’ option.

When generating a Java parser, bison basename.y will create a single Java source file named ‘basename.java’ containing the parser implementation. Using a grammar file without a ‘.y’ suffix is currently broken. The basename of the parser implementation file can be changed by the %file-prefix directive or the ‘-p’/‘--name-prefix’ option. The entire parser implementation file name can be changed by the %output directive or the ‘-o’/‘--output’ option. The parser implementation file contains a single class for the parser.

You can create documentation for generated parsers using Javadoc.

Contrary to C parsers, Java parsers do not use global variables; the state of the parser is always local to an instance of the parser class. Therefore, all Java parsers are “pure”, and the %pure-parser and %define api.pure directives do nothing when used in Java.

Push parsers are currently unsupported in Java and %define api.push-pull have no effect.

GLR parsers are currently unsupported in Java. Do not use the glr-parser directive.

No header file can be generated for Java parsers. Do not use the %defines directive or the ‘-d’/‘--defines’ options.

Currently, support for tracing is always compiled in. Thus the ‘%define parse.trace’ and ‘%token-table’ directives and the ‘-t’/‘--debug’ and ‘-k’/‘--token-table’ options have no effect. This may change in the future to eliminate unused code in the generated parser, so use ‘%define parse.trace’ explicitly if needed. Also, in the future the %token-table directive might enable a public interface to access the token names and codes.

Getting a “code too large” error from the Java compiler means the code hit the 64KB bytecode per method limitation of the Java class file. Try reducing the amount of code in actions and static initializers; otherwise, report a bug so that the parser skeleton will be improved.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.2 Java Semantic Values

There is no %union directive in Java parsers. Instead, the semantic values’ types (class names) should be specified in the %type or %token directive:

 
%type <Expression> expr assignment_expr term factor
%type <Integer> number

By default, the semantic stack is declared to have Object members, which means that the class types you specify can be of any class. To improve the type safety of the parser, you can declare the common superclass of all the semantic values using the ‘%define api.value.type’ directive. For example, after the following declaration:

 
%define api.value.type {ASTNode}

any %type or %token specifying a semantic type which is not a subclass of ASTNode, will cause a compile-time error.

Types used in the directives may be qualified with a package name. Primitive data types are accepted for Java version 1.5 or later. Note that in this case the autoboxing feature of Java 1.5 will be used. Generic types may not be used; this is due to a limitation in the implementation of Bison, and may change in future releases.

Java parsers do not support %destructor, since the language adopts garbage collection. The parser will try to hold references to semantic values for as little time as needed.

Java parsers do not support %printer, as toString() can be used to print the semantic values. This however may change (in a backwards-compatible way) in future versions of Bison.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.3 Java Location Values

When the directive %locations is used, the Java parser supports location tracking, see Tracking Locations. An auxiliary user-defined class defines a position, a single point in a file; Bison itself defines a class representing a location, a range composed of a pair of positions (possibly spanning several files). The location class is an inner class of the parser; the name is Location by default, and may also be renamed using %define api.location.type {class-name}.

The location class treats the position as a completely opaque value. By default, the class name is Position, but this can be changed with %define api.position.type {class-name}. This class must be supplied by the user.

Instance Variable of Location: Position begin
Instance Variable of Location: Position end

The first, inclusive, position of the range, and the first beyond.

Constructor on Location: Location (Position loc)

Create a Location denoting an empty range located at a given point.

Constructor on Location: Location (Position begin, Position end)

Create a Location from the endpoints of the range.

Method on Location: String toString ()

Prints the range represented by the location. For this to work properly, the position class should override the equals and toString methods appropriately.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.4 Java Parser Interface

The name of the generated parser class defaults to YYParser. The YY prefix may be changed using the %name-prefix directive or the ‘-p’/‘--name-prefix’ option. Alternatively, use ‘%define parser_class_name {name}’ to give a custom name to the class. The interface of this class is detailed below.

By default, the parser class has package visibility. A declaration ‘%define public’ will change to public visibility. Remember that, according to the Java language specification, the name of the ‘.java’ file should match the name of the class in this case. Similarly, you can use abstract, final and strictfp with the %define declaration to add other modifiers to the parser class. A single ‘%define annotations {annotations}’ directive can be used to add any number of annotations to the parser class.

The Java package name of the parser class can be specified using the ‘%define package’ directive. The superclass and the implemented interfaces of the parser class can be specified with the %define extends and ‘%define implements’ directives.

The parser class defines an inner class, Location, that is used for location tracking (see Java Location Values), and a inner interface, Lexer (see Java Scanner Interface). Other than these inner class/interface, and the members described in the interface below, all the other members and fields are preceded with a yy or YY prefix to avoid clashes with user code.

The parser class can be extended using the %parse-param directive. Each occurrence of the directive will add a protected final field to the parser class, and an argument to its constructor, which initialize them automatically.

Constructor on YYParser: YYParser (lex_param, …, parse_param, …)

Build a new parser object with embedded %code lexer. There are no parameters, unless %params and/or %parse-params and/or %lex-params are used.

Use %code init for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use ‘%define init_throws’ to specify any uncaught exceptions.

Constructor on YYParser: YYParser (Lexer lexer, parse_param, …)

Build a new parser object using the specified scanner. There are no additional parameters unless %params and/or %parse-params are used.

If the scanner is defined by %code lexer, this constructor is declared protected and is called automatically with a scanner created with the correct %params and/or %lex-params.

Use %code init for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use ‘%define init_throws’ to specify any uncaught exceptions.

Method on YYParser: boolean parse ()

Run the syntactic analysis, and return true on success, false otherwise.

Method on YYParser: boolean getErrorVerbose ()
Method on YYParser: void setErrorVerbose (boolean verbose)

Get or set the option to produce verbose error messages. These are only available with ‘%define parse.error verbose’, which also turns on verbose error messages.

Method on YYParser: void yyerror (String msg)
Method on YYParser: void yyerror (Position pos, String msg)
Method on YYParser: void yyerror (Location loc, String msg)

Print an error message using the yyerror method of the scanner instance in use. The Location and Position parameters are available only if location tracking is active.

Method on YYParser: boolean recovering ()

During the syntactic analysis, return true if recovering from a syntax error. See section Error Recovery.

Method on YYParser: java.io.PrintStream getDebugStream ()
Method on YYParser: void setDebugStream (java.io.printStream o)

Get or set the stream used for tracing the parsing. It defaults to System.err.

Method on YYParser: int getDebugLevel ()
Method on YYParser: void setDebugLevel (int l)

Get or set the tracing level. Currently its value is either 0, no trace, or nonzero, full tracing.

Constant of YYParser: String bisonVersion
Constant of YYParser: String bisonSkeleton

Identify the Bison version and skeleton used to generate this parser.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.5 Java Scanner Interface

There are two possible ways to interface a Bison-generated Java parser with a scanner: the scanner may be defined by %code lexer, or defined elsewhere. In either case, the scanner has to implement the Lexer inner interface of the parser class. This interface also contain constants for all user-defined token names and the predefined EOF token.

In the first case, the body of the scanner class is placed in %code lexer blocks. If you want to pass parameters from the parser constructor to the scanner constructor, specify them with %lex-param; they are passed before %parse-params to the constructor.

In the second case, the scanner has to implement the Lexer interface, which is defined within the parser class (e.g., YYParser.Lexer). The constructor of the parser object will then accept an object implementing the interface; %lex-param is not used in this case.

In both cases, the scanner has to implement the following methods.

Method on Lexer: void yyerror (Location loc, String msg)

This method is defined by the user to emit an error message. The first parameter is omitted if location tracking is not active. Its type can be changed using %define api.location.type {class-name}.

Method on Lexer: int yylex ()

Return the next token. Its type is the return value, its semantic value and location are saved and returned by the their methods in the interface.

Use ‘%define lex_throws’ to specify any uncaught exceptions. Default is java.io.IOException.

Method on Lexer: Position getStartPos ()
Method on Lexer: Position getEndPos ()

Return respectively the first position of the last token that yylex returned, and the first position beyond it. These methods are not needed unless location tracking is active.

The return type can be changed using %define api.position.type {class-name}.

Method on Lexer: Object getLVal ()

Return the semantic value of the last token that yylex returned.

The return type can be changed using ‘%define api.value.type {class-name}’.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.6 Special Features for Use in Java Actions

The following special constructs can be uses in Java actions. Other analogous C action features are currently unavailable for Java.

Use ‘%define throws’ to specify any uncaught exceptions from parser actions, and initial actions specified by %initial-action.

Variable: $n

The semantic value for the nth component of the current rule. This may not be assigned to. See section Java Semantic Values.

Variable: $<typealt>n

Like $n but specifies a alternative type typealt. See section Java Semantic Values.

Variable: $$

The semantic value for the grouping made by the current rule. As a value, this is in the base type (Object or as specified by ‘%define api.value.type’) as in not cast to the declared subtype because casts are not allowed on the left-hand side of Java assignments. Use an explicit Java cast if the correct subtype is needed. See section Java Semantic Values.

Variable: $<typealt>$

Same as $$ since Java always allow assigning to the base type. Perhaps we should use this and $<>$ for the value and $$ for setting the value but there is currently no easy way to distinguish these constructs. See section Java Semantic Values.

Variable: @n

The location information of the nth component of the current rule. This may not be assigned to. See section Java Location Values.

Variable: @$

The location information of the grouping made by the current rule. See section Java Location Values.

Statement: return YYABORT ;

Return immediately from the parser, indicating failure. See section Java Parser Interface.

Statement: return YYACCEPT ;

Return immediately from the parser, indicating success. See section Java Parser Interface.

Statement: return YYERROR ;

Start error recovery (without printing an error message). See section Error Recovery.

Function: boolean recovering ()

Return whether error recovery is being done. In this state, the parser reads token until it reaches a known state, and then restarts normal operation. See section Error Recovery.

Function: void yyerror (String msg)
Function: void yyerror (Position loc, String msg)
Function: void yyerror (Location loc, String msg)

Print an error message using the yyerror method of the scanner instance in use. The Location and Position parameters are available only if location tracking is active.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.7 Java Push Parser Interface

(The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.)

Normally, Bison generates a pull parser for Java. The following Bison declaration says that you want the parser to be a push parser (see section api.push-pull):

 
%define api.push-pull push

Most of the discussion about the Java pull Parser Interface, (see section Java Parser Interface) applies to the push parser interface as well.

When generating a push parser, the method push_parse is created with the following signature (depending on if locations are enabled).

Method on YYParser: void push_parse (int token, Object yylval)
Method on YYParser: void push_parse (int token, Object yylval, Location yyloc)
Method on YYParser: void push_parse (int token, Object yylval, Position yypos)

The primary difference with respect to a pull parser is that the parser method push_parse is invoked repeatedly to parse each token. This function is available if either the "%define api.push-pull push" or "%define api.push-pull both" declaration is used (see section api.push-pull). The Location and Position parameters are available only if location tracking is active.

The value returned by the push_parse method is one of the following four constants: YYABORT, YYACCEPT, YYERROR, or YYPUSH_MORE. This new value, YYPUSH_MORE, may be returned if more input is required to finish parsing the grammar.

If api.push-pull is declared as both, then the generated parser class will also implement the parse method. This method’s body is a loop that repeatedly invokes the scanner and then passes the values obtained from the scanner to the push_parse method.

There is one additional complication. Technically, the push parser does not need to know about the scanner (i.e. an object implementing the YYParser.Lexer interface), but it does need access to the yyerror method. Currently, the yyerror method is defined in the YYParser.Lexer interface. Hence, an implementation of that interface is still required in order to provide an implementation of yyerror. The current approach (and subject to change) is to require the YYParser constructor to be given an object implementing the YYParser.Lexer interface. This object need only implement the yyerror method; the other methods can be stubbed since they will never be invoked. The simplest way to do this is to add a trivial scanner implementation to your grammar file using whatever implementation of yyerror is desired. The following code sample shows a simple way to accomplish this.

 
%code lexer
{
  public Object getLVal () {return null;}
  public int yylex () {return 0;}
  public void yyerror (String s) {System.err.println(s);}
}

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.8 Differences between C/C++ and Java Grammars

The different structure of the Java language forces several differences between C/C++ grammars, and grammars designed for Java parsers. This section summarizes these differences.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.9 Java Declarations Summary

This summary only include declarations specific to Java or have special meaning when used in a Java parser.

Directive: %language "Java"

Generate a Java class for the parser.

Directive: %lex-param {type name}

A parameter for the lexer class defined by %code lexer only, added as parameters to the lexer constructor and the parser constructor that creates a lexer. Default is none. See section Java Scanner Interface.

Directive: %name-prefix "prefix"

The prefix of the parser class name prefixParser if ‘%define parser_class_name’ is not used. Default is YY. See section Java Bison Interface.

Directive: %parse-param {type name}

A parameter for the parser class added as parameters to constructor(s) and as fields initialized by the constructor(s). Default is none. See section Java Parser Interface.

Directive: %token <type> token

Declare tokens. Note that the angle brackets enclose a Java type. See section Java Semantic Values.

Directive: %type <type> nonterminal

Declare the type of nonterminals. Note that the angle brackets enclose a Java type. See section Java Semantic Values.

Directive: %code { code … }

Code appended to the inside of the parser class. See section Differences between C/C++ and Java Grammars.

Directive: %code imports { code … }

Code inserted just after the package declaration. See section Differences between C/C++ and Java Grammars.

Directive: %code init { code … }

Code inserted at the beginning of the parser constructor body. See section Java Parser Interface.

Directive: %code lexer { code … }

Code added to the body of a inner lexer class within the parser class. See section Java Scanner Interface.

Directive: %% code

Code (after the second %%) appended to the end of the file, outside the parser class. See section Differences between C/C++ and Java Grammars.

Directive: %{ code … %}

Not supported. Use %code imports instead. See section Differences between C/C++ and Java Grammars.

Directive: %define abstract

Whether the parser class is declared abstract. Default is false. See section Java Bison Interface.

Directive: %define annotations {annotations}

The Java annotations for the parser class. Default is none. See section Java Bison Interface.

Directive: %define extends {superclass}

The superclass of the parser class. Default is none. See section Java Bison Interface.

Directive: %define final

Whether the parser class is declared final. Default is false. See section Java Bison Interface.

Directive: %define implements {interfaces}

The implemented interfaces of the parser class, a comma-separated list. Default is none. See section Java Bison Interface.

Directive: %define init_throws {exceptions}

The exceptions thrown by %code init from the parser class constructor. Default is none. See section Java Parser Interface.

Directive: %define lex_throws {exceptions}

The exceptions thrown by the yylex method of the lexer, a comma-separated list. Default is java.io.IOException. See section Java Scanner Interface.

Directive: %define api.location.type {class}

The name of the class used for locations (a range between two positions). This class is generated as an inner class of the parser class by bison. Default is Location. Formerly named location_type. See section Java Location Values.

Directive: %define package {package}

The package to put the parser class in. Default is none. See section Java Bison Interface.

Directive: %define parser_class_name {name}

The name of the parser class. Default is YYParser or name-prefixParser. See section Java Bison Interface.

Directive: %define api.position.type {class}

The name of the class used for positions. This class must be supplied by the user. Default is Position. Formerly named position_type. See section Java Location Values.

Directive: %define public

Whether the parser class is declared public. Default is false. See section Java Bison Interface.

Directive: %define api.value.type {class}

The base type of semantic values. Default is Object. See section Java Semantic Values.

Directive: %define strictfp

Whether the parser class is declared strictfp. Default is false. See section Java Bison Interface.

Directive: %define throws {exceptions}

The exceptions thrown by user-supplied parser actions and %initial-action, a comma-separated list. Default is none. See section Java Parser Interface.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.