Compiler Design

Symbol Table

A Symbol Table is a critical data structure created and maintained by the compiler to store detailed information about every identifier (such as variable names, function names, classes, and labels) encountered in the source program.

It is used and updated by virtually every phase of the compiler. The scanner creates initial entries, the parser adds structural context, the semantic analyzer computes and validates types, and the code generator reads addresses to emit machine instructions.

Information Stored in a Symbol Table Entry

Field Name	Data Type	Purpose / Contents	Example Value
Name	String	The identifier as written in the source code.	'totalAmount'
Kind	Enum	Variable, Function, Parameter, Type, Label, etc.	Variable
Type	Type Descriptor	Data type: int, float, array, pointer, struct, signature.	int[]
Scope Level	Integer	Nesting depth: 0 = global, 1 = function, 2 = block.	2
Storage Class	Enum	auto, static, extern, register (C); local, global.	auto
Memory Address	Offset / Address	Absolute memory address or stack frame offset.	offset 16
Array Dimensions	Integer Array	For array types: dimensions and bounds.	[3][4]
Function Info	Record	Return type, parameters count, parameter types list.	int, 2 params
Line Number(s)	Integer List	Line where declared and lines where referenced.	declared: 12

Usage Across Compiler Phases

Lexical Analyzer: Looks up character sequences. If it finds a new identifier, it inserts it and returns a generic identifier token.
Syntax Analyzer: Recognizes declarations and triggers table insertions.
Semantic Analyzer: Adds type descriptors, checks variable initialization, detects duplicate declarations, and resolves scoping.
Intermediate Code Gen: Looks up symbols to find offsets/temporaries, creating correct references in Three-Address Code.
Code Generator: Uses offset values to emit load/store instructions referencing relative stack frame positions or registers.

Symbol Table Operations

Adds a new identifier to the symbol table with its associated attributes.

Invoked when a declaration statement is recognized.
Must verify that no duplicate identifier exists in the current scope level (throws duplicate declaration error if found).

Searches for an identifier and returns its entry (or null if not found).

Invoked when an identifier is referenced in an expression or statement.
Must search from the innermost active scope outward to the global scope.
This is the most critical operation to optimize for speed.

Removes identifiers when they go out of scope.

Invoked when exiting a block or procedure.
In block-structured compilers, this is implemented efficiently by popping tables off a scope stack.

Modifies an existing symbol's attribute.

Invoked when additional information (like an inferred type or an allocated offset) becomes available during later compilation phases.

Data Structures for Implementation

Compilers choose data structures based on program scale and language scoping rules:

Data Structure	Insert	Lookup	Delete	Best Use Case
Linear List	O(1)	O(n)	O(n)	Tiny programs (< 20 identifiers). No lookup overhead.
Hash Table (Chaining)	O(1)*	O(1)*	O(1)*	Standard production compilers (GCC, LLVM). Amortized constant time.
BST (Balanced)	O(log n)	O(log n)	O(log n)	When sorted/alphabetical output is needed (e.g. cross-referencers).
Scope Stack + Hash	O(1)*	O(k)*	O(1)*	Block-structured languages supporting nested scoping.

* Amortized/average case. k = scope nesting depth.

1. Simple ASCII Sum Hash (mod TABLE_SIZE):

int hash(char *name, int size) {
    int h = 0;
    while (*name) {
        h += *name++;
    }
    return h % size;
}

2. Polynomial Rolling Hash (Better distribution, fewer collisions):

int rollingHash(char *name, int size) {
    int h = 0;
    while (*name) {
        h = (h * 31 + *name++) % size;
    }
    return h;
}

Evaluation Order & Dependency Graphs Scope Management in Detail