Compiler Design

Advanced Parsing & Ambiguity


Ambiguity Handling in Parser Generators

In real-world compiler construction, restructurings to make grammars completely unambiguous can lead to many extra production rules and complex parse trees.

To avoid this, parser generators (like Yacc, Bison, or ANTLR) allow developers to write compact, ambiguous grammars and resolve ambiguity by declaring operator precedence and associativity rules:

%left '+' '-'     /* Left-associative, lowest precedence */
%left '*' '/'     /* Left-associative, higher precedence */
%right '^'        /* Right-associative, highest precedence */

The parser generator uses these rules to decide whether to shift or reduce when a table conflict occurs, keeping the parser deterministic.

Solved Exam Problems

Consider the conditional grammar (dangling-else):

S → iCtS | iCtSeS | a C → b

Let us trace the input string: i b t i b t a e a (representing: if b then if b then a else a).

Interpretation 1:

The else associates with the INNER/second if.

       S
     / | \
    i  C  t   S
       |    / | \
       b   i  C  t  S  e  S
              |     |     |
              b     a     a
Interpretation 2:

The else associates with the OUTER/first if.

             S
        / / | \ \
       i C  t  S e S
         |     |   |
         b     S   a
             / | \
            i  C  t  S
               |     |
               b     a

Conclusion: Because the string matches two separate parse tree structures, the Dangling-Else grammar is ambiguous. (In practice, parsers default to matching the else with the nearest open if).

By definition, an LR parser operates deterministically, scanning the input left-to-right and constructing exactly one rightmost derivation in reverse.

If a grammar is ambiguous, it means there exists at least one string that has two or more distinct rightmost derivations. When compiling the parsing table for such a grammar, the parser generator will encounter cells where multiple actions are valid (e.g. both shift and reduce, or two different reduces).

Since these conflicts are inevitable for ambiguous grammars, no ambiguous grammar can ever be LR.

Because LALR merges states that have the same core, it combines lookahead sets. This merging can sometimes delay the detection of a syntax error.

Specifically, an LALR parser may perform one or more reduce actions using the combined lookahead sets before checking the input buffer and realizing that no valid transition exists.

A CLR(1) parser, keeping states separate with exact lookaheads, will immediately recognize the invalid lookahead symbol and report a syntax error before performing any incorrect reductions.

Syntax Error Recovery Strategies

If a parser encounters an error, it shouldn't just stop compiling. It needs to recover from the error to check the rest of the file for other syntax issues.

  • Panic-Mode Recovery: The parser discards input tokens one by one until it finds a synchronizing token (such as a semicolon ; or closing brace }) and then pops states off the stack until it can resume parsing.
  • Phrase-Level Recovery: The parser performs local correction on the input (e.g., inserting a missing semicolon or replacing a common typo) to allow the parsing engine to proceed.