[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A Bison grammar is a list of rules.
3.3.1 Syntax of Grammar Rules | Syntax of the rules. | |
3.3.2 Empty Rules | Symbols that can match the empty string. | |
3.3.3 Recursive Rules | Writing recursive rules. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A Bison grammar rule has the following general form:
result: components…; |
where result is the nonterminal symbol that this rule describes, and components are various terminal and nonterminal symbols that are put together by this rule (see section Symbols, Terminal and Nonterminal).
For example,
exp: exp '+' exp; |
says that two groupings of type exp
, with a ‘+’ token in between,
can be combined into a larger grouping of type exp
.
White space in rules is significant only to separate symbols. You can add extra white space as you wish.
Scattered among the components can be actions that determine the semantics of the rule. An action looks like this:
{C statements} |
This is an example of braced code, that is, C code surrounded by braces, much like a compound statement in C. Braced code can contain any sequence of C tokens, so long as its braces are balanced. Bison does not check the braced code for correctness directly; it merely copies the code to the parser implementation file, where the C compiler can check it.
Within braced code, the balanced-brace count is not affected by braces within comments, string literals, or character constants, but it is affected by the C digraphs ‘<%’ and ‘%>’ that represent braces. At the top level braced code must be terminated by ‘}’ and not by a digraph. Bison does not look for trigraphs, so if braced code uses trigraphs you should ensure that they do not affect the nesting of braces or the boundaries of comments, string literals, or character constants.
Usually there is only one action and it follows the components. See section Actions.
Multiple rules for the same result can be written separately or can be joined with the vertical-bar character ‘|’ as follows:
result: rule1-components… | rule2-components… … ; |
They are still considered distinct rules even when joined in this way.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A rule is said to be empty if its right-hand side (components) is empty. It means that result can match the empty string. For example, here is how to define an optional semicolon:
semicolon.opt: | ";"; |
It is easy not to see an empty rule, especially when |
is used. The
%empty
directive allows to make explicit that a rule is empty on
purpose:
semicolon.opt: %empty | ";" ; |
Flagging a non-empty rule with %empty
is an error. If run with
‘-Wempty-rule’, bison
will report empty rules without
%empty
. Using %empty
enables this warning, unless
‘-Wno-empty-rule’ was specified.
The %empty
directive is a Bison extension, it does not work with
Yacc. To remain compatible with POSIX Yacc, it is customary to write a
comment ‘/* empty */’ in each rule with no components:
semicolon.opt: /* empty */ | ";" ; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A rule is called recursive when its result nonterminal appears also on its right hand side. Nearly all Bison grammars need to use recursion, because that is the only way to define a sequence of any number of a particular thing. Consider this recursive definition of a comma-separated sequence of one or more expressions:
expseq1: exp | expseq1 ',' exp ; |
Since the recursive use of expseq1
is the leftmost symbol in the
right hand side, we call this left recursion. By contrast, here
the same construct is defined using right recursion:
expseq1: exp | exp ',' expseq1 ; |
Any kind of sequence can be defined using either left recursion or right recursion, but you should always use left recursion, because it can parse a sequence of any number of elements with bounded stack space. Right recursion uses up space on the Bison stack in proportion to the number of elements in the sequence, because all the elements must be shifted onto the stack before the rule can be applied even once. See section The Bison Parser Algorithm, for further explanation of this.
Indirect or mutual recursion occurs when the result of the rule does not appear directly on its right hand side, but does appear in rules for other nonterminals which do appear on its right hand side.
For example:
expr: primary | primary '+' primary ; primary: constant | '(' expr ')' ; |
defines two mutually-recursive nonterminals, since each refers to the other.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.