[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Another situation where shift/reduce conflicts appear is in arithmetic expressions. Here shifting is not always the preferred resolution; the Bison declarations for operator precedence allow you to specify when to shift and when to reduce.
5.3.1 When Precedence is Needed | An example showing why precedence is needed. | |
5.3.2 Specifying Operator Precedence | How to specify precedence and associativity. | |
5.3.3 Specifying Precedence Only | How to specify precedence only. | |
5.3.4 Precedence Examples | How these features are used in the previous example. | |
5.3.5 How Precedence Works | How they work. | |
5.3.6 Using Precedence For Non Operators | Using precedence for general conflicts. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Consider the following ambiguous grammar fragment (ambiguous because the input ‘1 - 2 * 3’ can be parsed in two different ways):
expr: expr '-' expr | expr '*' expr | expr '<' expr | '(' expr ')' … ; |
Suppose the parser has seen the tokens ‘1’, ‘-’ and ‘2’; should it reduce them via the rule for the subtraction operator? It depends on the next token. Of course, if the next token is ‘)’, we must reduce; shifting is invalid because no single rule can reduce the token sequence ‘- 2 )’ or anything starting with that. But if the next token is ‘*’ or ‘<’, we have a choice: either shifting or reduction would allow the parse to complete, but with different results.
To decide which one Bison should do, we must consider the results. If the next operator token op is shifted, then it must be reduced first in order to permit another opportunity to reduce the difference. The result is (in effect) ‘1 - (2 op 3)’. On the other hand, if the subtraction is reduced before shifting op, the result is ‘(1 - 2) op 3’. Clearly, then, the choice of shift or reduce should depend on the relative precedence of the operators ‘-’ and op: ‘*’ should be shifted first, but not ‘<’.
What about input such as ‘1 - 2 - 5’; should this be ‘(1 - 2) - 5’ or should it be ‘1 - (2 - 5)’? For most operators we prefer the former, which is called left association. The latter alternative, right association, is desirable for assignment operators. The choice of left or right association is a matter of whether the parser chooses to shift or reduce when the stack contains ‘1 - 2’ and the lookahead token is ‘-’: shifting makes right-associativity.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Bison allows you to specify these choices with the operator precedence
declarations %left
and %right
. Each such declaration
contains a list of tokens, which are operators whose precedence and
associativity is being declared. The %left
declaration makes all
those operators left-associative and the %right
declaration makes
them right-associative. A third alternative is %nonassoc
, which
declares that it is a syntax error to find the same operator twice “in a
row”.
The last alternative, %precedence
, allows to define only
precedence and no associativity at all. As a result, any
associativity-related conflict that remains will be reported as an
compile-time error. The directive %nonassoc
creates run-time
error: using the operator in a associative way is a syntax error. The
directive %precedence
creates compile-time errors: an operator
can be involved in an associativity-related conflict, contrary to
what expected the grammar author.
The relative precedence of different operators is controlled by the order in which they are declared. The first precedence/associativity declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Since POSIX Yacc defines only %left
, %right
, and
%nonassoc
, which all defines precedence and associativity, little
attention is paid to the fact that precedence cannot be defined without
defining associativity. Yet, sometimes, when trying to solve a
conflict, precedence suffices. In such a case, using %left
,
%right
, or %nonassoc
might hide future (associativity
related) conflicts that would remain hidden.
The dangling else
ambiguity (see section Shift/Reduce Conflicts) can be solved explicitly. This shift/reduce conflicts occurs
in the following situation, where the period denotes the current parsing
state:
if e1 then if e2 then s1 . else s2 |
The conflict involves the reduction of the rule ‘IF expr THEN
stmt’, which precedence is by default that of its last token
(THEN
), and the shifting of the token ELSE
. The usual
disambiguation (attach the else
to the closest if
),
shifting must be preferred, i.e., the precedence of ELSE
must be
higher than that of THEN
. But neither is expected to be involved
in an associativity related conflict, which can be specified as follows.
%precedence THEN %precedence ELSE |
The unary-minus is another typical example where associativity is
usually over-specified, see Infix Notation Calculator: calc
. The %left
directive is traditionally
used to declare the precedence of NEG
, which is more than needed
since it also defines its associativity. While this is harmless in the
traditional example, who knows how NEG
might be used in future
evolutions of the grammar…
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In our example, we would want the following declarations:
%left '<' %left '-' %left '*' |
In a more complete example, which supports other operators as well, we
would declare them in groups of equal precedence. For example, '+'
is
declared with '-'
:
%left '<' '>' '=' "!=" "<=" ">=" %left '+' '-' %left '*' '/' |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The first effect of the precedence declarations is to assign precedence levels to the terminal symbols declared. The second effect is to assign precedence levels to certain rules: each rule gets its precedence from the last terminal symbol mentioned in the components. (You can also specify explicitly the precedence of a rule. See section Context-Dependent Precedence.)
Finally, the resolution of conflicts works by comparing the precedence of the rule being considered with that of the lookahead token. If the token’s precedence is higher, the choice is to shift. If the rule’s precedence is higher, the choice is to reduce. If they have equal precedence, the choice is made based on the associativity of that precedence level. The verbose output file made by ‘-v’ (see section Invoking Bison) says how each conflict was resolved.
Not all rules and not all tokens have precedence. If either the rule or the lookahead token has no precedence, then the default is to shift.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Using properly precedence and associativity directives can help fixing
shift/reduce conflicts that do not involve arithmetics-like operators. For
instance, the “dangling else
” problem (see section Shift/Reduce Conflicts) can be solved elegantly in two different ways.
In the present case, the conflict is between the token "else"
willing
to be shifted, and the rule ‘if_stmt: "if" expr "then" stmt’, asking
for reduction. By default, the precedence of a rule is that of its last
token, here "then"
, so the conflict will be solved appropriately
by giving "else"
a precedence higher than that of "then"
, for
instance as follows:
%precedence "then" %precedence "else" |
Alternatively, you may give both tokens the same precedence, in which case associativity is used to solve the conflict. To preserve the shift action, use right associativity:
%right "then" "else" |
Neither solution is perfect however. Since Bison does not provide, so far, “scoped” precedence, both force you to declare the precedence of these keywords with respect to the other operators your grammar. Therefore, instead of being warned about new conflicts you would be unaware of (e.g., a shift/reduce conflict due to ‘if test then 1 else 2 + 3’ being ambiguous: ‘if test then 1 else (2 + 3)’ or ‘(if test then 1 else 2) + 3’?), the conflict will be already “fixed”.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.