Compiler Design

Symbol Table


Symbol Table

A Symbol Table is a critical data structure created and maintained by the compiler to store detailed information about every identifier (such as variable names, function names, classes, and labels) encountered in the source program.

It is used and updated by virtually every phase of the compiler. The scanner creates initial entries, the parser adds structural context, the semantic analyzer computes and validates types, and the code generator reads addresses to emit machine instructions.

Information Stored in a Symbol Table Entry

Field NameData TypePurpose / ContentsExample Value
NameStringThe identifier as written in the source code.'totalAmount'
KindEnumVariable, Function, Parameter, Type, Label, etc.Variable
TypeType DescriptorData type: int, float, array, pointer, struct, signature.int[]
Scope LevelIntegerNesting depth: 0 = global, 1 = function, 2 = block.2
Storage ClassEnumauto, static, extern, register (C); local, global.auto
Memory AddressOffset / AddressAbsolute memory address or stack frame offset.offset 16
Array DimensionsInteger ArrayFor array types: dimensions and bounds.[3][4]
Function InfoRecordReturn type, parameters count, parameter types list.int, 2 params
Line Number(s)Integer ListLine where declared and lines where referenced.declared: 12

Usage Across Compiler Phases

  • Lexical Analyzer: Looks up character sequences. If it finds a new identifier, it inserts it and returns a generic identifier token.
  • Syntax Analyzer: Recognizes declarations and triggers table insertions.
  • Semantic Analyzer: Adds type descriptors, checks variable initialization, detects duplicate declarations, and resolves scoping.
  • Intermediate Code Gen: Looks up symbols to find offsets/temporaries, creating correct references in Three-Address Code.
  • Code Generator: Uses offset values to emit load/store instructions referencing relative stack frame positions or registers.

Symbol Table Operations

Adds a new identifier to the symbol table with its associated attributes.

  • Invoked when a declaration statement is recognized.
  • Must verify that no duplicate identifier exists in the current scope level (throws duplicate declaration error if found).

Searches for an identifier and returns its entry (or null if not found).

  • Invoked when an identifier is referenced in an expression or statement.
  • Must search from the innermost active scope outward to the global scope.
  • This is the most critical operation to optimize for speed.

Removes identifiers when they go out of scope.

  • Invoked when exiting a block or procedure.
  • In block-structured compilers, this is implemented efficiently by popping tables off a scope stack.

Modifies an existing symbol's attribute.

  • Invoked when additional information (like an inferred type or an allocated offset) becomes available during later compilation phases.

Data Structures for Implementation

Compilers choose data structures based on program scale and language scoping rules:

Data StructureInsertLookupDeleteBest Use Case
Linear ListO(1)O(n)O(n)Tiny programs (< 20 identifiers). No lookup overhead.
Hash Table (Chaining)O(1)*O(1)*O(1)*Standard production compilers (GCC, LLVM). Amortized constant time.
BST (Balanced)O(log n)O(log n)O(log n)When sorted/alphabetical output is needed (e.g. cross-referencers).
Scope Stack + HashO(1)*O(k)*O(1)*Block-structured languages supporting nested scoping.

* Amortized/average case. k = scope nesting depth.

1. Simple ASCII Sum Hash (mod TABLE_SIZE):

int hash(char *name, int size) {
    int h = 0;
    while (*name) {
        h += *name++;
    }
    return h % size;
}

2. Polynomial Rolling Hash (Better distribution, fewer collisions):

int rollingHash(char *name, int size) {
    int h = 0;
    while (*name) {
        h = (h * 31 + *name++) % size;
    }
    return h;
}