Compiler Design

Finite Automata & DFA Construction


Finite Automata are the mathematical engines driving lexical scanners. The lexer uses these state transitions to recognize token patterns.

NFA vs DFA

An automaton is deterministic if each input symbol uniquely determines the next state.

NFA (Non-deterministic)

  • Can have multiple transitions from a single state on the same input symbol.
  • Supports epsilon (ε) transitions (moves that consume no character).
  • Easy to construct from regular expressions but harder to implement.

DFA (Deterministic)

  • Exactly one transition for each state-character pair.
  • No epsilon (ε) transitions.
  • Directly implementable in code using efficient 2D lookup tables.

Thompson's Construction (Regex to NFA)

Thompson's Construction is an algorithm that converts any regular expression into an equivalent NFA. It builds states recursively:

1. Base Character (a):
q0 --a--> q1
2. Concatenation (r s):

Directly chain the accept state of r to the start state of s via an epsilon transition.

[Start_r] ... [Accept_r] --ε--> [Start_s] ... [Accept_s]
3. Union / Alternation (r | s):

Branch out to two separate NFA machines and merge them with ε-transitions.

          --ε--> [Start_r] ... [Accept_r] --ε-->
        /                                        \
  q_start                                          q_accept
        \                                        /
          --ε--> [Start_s] ... [Accept_s] --ε-->
4. Kleene Star (r*):

Allow looping back to the start state and bypassing the machine completely.

          +-----------------ε-----------------+
          |                                   v
  q_start --ε--> [Start_r] ... [Accept_r] --ε--> q_accept
                   ^                   |
                   +--------ε----------+

Algorithm & Solved Problems

Dive into detailed pages to see how NFAs are converted into DFAs, how DFAs are minimized, and view a complete step-by-step solved problem: