Compiler Design
Introduction to Semantic Analysis
What is Semantic Analysis?
Semantic Analysis is the third major phase of a compiler, coming after Lexical Analysis and Syntax Analysis. While Lexical Analysis deals with tokens (is each word valid?) and Syntax Analysis deals with grammar (are words arranged in a grammatical structure?), Semantic Analysis deals with meaning — does the program actually make sense logically and type-theoretically?
Consider this analogy: in English, the sentence "Colorless green ideas sleep furiously" is syntactically correct but semantically nonsensical. Similarly, int x = "hello"; is syntactically correct C code but semantically wrong because a string literal cannot be assigned to an integer variable. Semantic analysis catches exactly such errors.
Key Insight
Semantic analysis is the phase where the compiler builds most of the symbol table and uses it. Unlike lexical and syntax errors which prevent further compilation, many semantic errors can be reported and compilation can continue to find more errors.Position in the Compiler Pipeline
Below is the context of Semantic Analysis within the standard 6-phase compiler structure:
| Phase # | Phase Name | Input | Output | Key Tool |
|---|---|---|---|---|
| 1 | Lexical Analysis | Source code (characters) | Token stream | Finite Automaton / DFA |
| 2 | Syntax Analysis | Token stream | Parse tree / AST | Push-down automaton / Parser |
| 3 | Semantic Analysis | Parse tree + Symbol Table | Annotated AST | Attribute Grammar / Type Checker |
| 4 | Intermediate Code Gen | Annotated AST | Three-address code (TAC) | SDT Schemes |
| 5 | Code Optimization | TAC | Optimized TAC | Data Flow Analysis |
| 6 | Code Generation | Optimized code | Target machine code | Register allocator |
What Does Semantic Analysis Check?
Semantic analysis performs checking at multiple levels to ensure the source code conforms to the language definition:
Ensures that every operator receives operands of compatible types. For example:
- Arithmetic operators (
+,-,*,/) must have numeric operands. - Array indexing (
arr[i]) requires the indexito be an integer. - The type checker infers or checks types of every expression recursively.
Guarantees that every identifier is declared before it is used. It checks:
- Is the variable or function visible in the current lexical scope?
- Does an identifier conflict with a duplicate declaration in the same scope?
Verifies that function calls supply the correct number of arguments matching the function's parameter list, and that their types are compatible with the parameters defined in the function signature.
Ensures that control flow statements are placed in valid contexts:
breakandcontinuestatements must appear only inside loops.returnstatements must return values matching the function's declared return type.
Initialization Checking: Compilers check or issue warnings if a variable is read before it has been assigned a value.
Overloading Resolution: In languages that support overloading (like C++ or Java), the compiler uses argument types at the call site to determine exactly which overloaded function version to invoke.