Compiler Design

CLR(1) & LALR Parsing

LR(1) Items & Lookaheads

To build more powerful bottom-up parsers, we include a lookahead symbol inside each parsing item. An LR(1) item is written as:

[A → α • β, a]

Where A → αβ is a production, and a is a terminal symbol representing the lookahead. The lookahead acts as a gatekeeper: the parser is allowed to reduce A → α if and only if the next input terminal is exactly a.

CLR(1) vs LALR(1) Parsing

Both CLR(1) and LALR(1) use LR(1) items, but they trade off memory for power in different ways:

Canonical LR (CLR/LR(1)): Keeps every set of LR(1) items as a distinct state. This is the most powerful parsing method but creates a massive number of states, making tables very large.
Look-Ahead LR (LALR(1)): Merges CLR(1) states that have the same core (the same LR(0) items, ignoring the lookaheads). The lookaheads of the merged states are combined (unioned). This retains the small state count of SLR while providing almost the same power as CLR.

How to compute the lookahead in CLOSURE(I):

If [A → α • B β, a] is in the set of items, then for each production B → γ, we add the item [B → • γ, b] to the set for every terminal b in:

FIRST(βa)

Where β is the sequence of symbols following B, and a is the lookahead of the parent item.

Property	LR(0)	SLR	LALR(1)	CLR(1)
Lookahead	None	FOLLOW(A)	Merged LR(1)	Exact lookahead
States Count	Fewest	Same as LR(0)	Same as LR(0)	Most (10x more)
Parsing Power	Weakest	Medium-Low	Medium-High	Strongest
Conflicts	Many	Some	Few	Zero false conflicts
Real-world Use	None	Rare	Bison/Yacc standard	Research/LLVM

Containment Hierarchy

The relationships between grammar families parsed by bottom-up methods follow a strict containment hierarchy:

LR(0) ⊂ SLR ⊂ LALR ⊂ CLR(1)

Every LR(0) grammar is also SLR, every SLR grammar is also LALR, and every LALR grammar is also CLR. However, adding more lookahead symbols beyond 1 (e.g. LR(2)) does not increase the class of languages that can be parsed; all deterministic context-free languages can be parsed using just LR(1).

Solved Problem: SLR Solved Problem: LALR