Recursive-descent parsing: Handling left-associativity?

Started by
7 comments, last by daerid 17 years, 6 months ago
I'm writing a small language parser. Addition and subtraction are left-associative. In other words, 3+3+3 means (3+3)+3. In my grammar, it's basically this:
Expression ::= Expression + Term  |  Term
I'm having trouble getting my mind around this problem. If it was right-associative, my parsing function for this rule would be something like this:

Expression ParseExpression() {
  Term tLeft = ParseTerm();
  if(NextToken().Type == Plus)
    return new Addition(tLeft, ParseExpression());
  else
    return tLeft;
}
In fact, I originally wrote the function that way, and it worked great--except that my additions were all right-associative instead of left-associative. The problem, of course, is that my function would get into an infinite loop if I tried to do this instead:

Expression ParseExpression() {
  Expression eLeft = ParseExpression();
  if(NextToken().Type == Plus)
    return new Addition(eLeft, ParseTerm());
  else
    return eLeft;
}
So, what do I have to change to implement left-associativity? [edit: Fixed grammar] Thanks, [Edited by - BeanDog on September 27, 2006 6:28:13 PM]
Advertisement
Change the grammar.

Try 'Expression ::= Term | Term '+' Expression'.
Note: Fixed original grammar in OP.

Unfortunately, changing the grammar as you suggested makes answers incorrect. You would make the operator right-associative. Although additions would work correctly, consider divisions. 1/2/3 would be 1/(2/3) = 3/2 instead of (1/2)/3 = 1/6. So I've got to figure out a way to do it with my original grammar (or come up with an equivalent grammar that's easier to implement).
Recursive descent parsers CANNOT handle left associativity (or right associativity, if they start parsing from the right :) ).

Fortunately, every left-associative BNF grammar can be rewritten into a non-left-associative, equivalent grammar. I believe Zahlman was giving an example, not a solution. So experiment with new ways of writing BNF for arithmetic (and remember, even if no particular RULE is left-recursive, the BNF might still be (cycles in the BNF rules).
Usually a better way to parse expressions of this type is to use only 1 recursive call per operator level/precedence. Normally you handle all operators (not just 1) of the same level in a single parsing function, while accumulating the result of this level in the return variable. In practice this means that your function ParseExpression should look something like this:

Expression ParseExpression() {  Term tLeft = ParseTerm();  while (NextToken().Type == Plus) {    tLeft = new Addition(tLeft, ParseTerm());  }  return tLeft;}
Notice that three things have changed:
1. The if-statement was replaced with a while-statement,
2. The call to Addition now uses ParseTerm as the second parameter, instead of ParseExpression (the function ParseExpression is now only called once per operator level, and the next level [ParseTerm] is invoked directly rather than through a recursive ParseExpression call),
3. The result is now accumulated in tLeft, which is returned from the function when there are no more operators of this level in the input string.

If you do all your arithmetics like this you get left-associativity for free, while also reducing recursion depth and making the code more understandable. Btw, the grammar this is parsing is: Expression = Term {"+" Term}.

P.S. I am not a C programmer so the code snippet I posted above might contain errors!

[Edited by - Devilogic on September 28, 2006 6:36:57 AM]
Sorry, my rewrite was equivalent to the right-associative parser you illustrated. For some reason I was thinking of the interpretation process as somehow decoupled from the construct ;)

You might find this useful though. Or search for yourself :)

Edit: NM, it's still all the same: the techniques for removing left-recursion also flip the natural associativity. You additionally need to do something like what Devilogic said. Alternatively, keep the recursion but transform as you go:

Expression ParseExpression() {  Term tLeft = ParseTerm();  if(NextToken().Type == Plus) {    Addition right(ParseExpression());    // Hmm... not sure about this at all, actually :P    return Addition(Addition(tLeft, right.lhs), right.rhs);  } else return tLeft;}


[Edited by - Zahlman on September 27, 2006 11:55:37 PM]
For example, if you have a grammar like this:

expr = expr * expr     | expr / expr     | expr + expr     | expr - expr     | <...>


It can be refactored to something like the following to eliminate left recursion:

expr = factor * factor     | factor / factorfactor = term + term       | term - termterm = number     | ( expr )      


I did a small(sorta) scripting language in Boost.Spirit, which is a recursive descent parser framework.
daerid@gmail.com
Daerid: Unfortunately your example refactored grammar eliminates *all* recursion, making something like 1/2/3 impossible.

Devilogic: Rating++ for helping me understand what I'd found on Google. They said that left-recursion should be done in a loop, exiting the loop if the next operator is a lower precedence and continuing the loop if the operator is equal or higher precedence. Didn't make much sense at first, but I see your quick pseudocode would do what I need.

Thanks all!
Quote:Original post by BeanDog
Daerid: Unfortunately your example refactored grammar eliminates *all* recursion, making something like 1/2/3 impossible.


oh, right.. sorry. the correct grammar would be:

expr = factor * expr     | factor / expr     | factorfactor = term + factor       | term - factor       | termterm = number     | ( expr )      


I was thinking with a spirit grammar, which has a nice little kleene star operator, like so:

add_expr  =   mul_expr >> *(add_op >> mul_expr)  ;
daerid@gmail.com

This topic is closed to new replies.

Advertisement