Compiler Design

Introduction to Semantic Analysis


What is Semantic Analysis?

Semantic Analysis is the third major phase of a compiler, coming after Lexical Analysis and Syntax Analysis. While Lexical Analysis deals with tokens (is each word valid?) and Syntax Analysis deals with grammar (are words arranged in a grammatical structure?), Semantic Analysis deals with meaning — does the program actually make sense logically and type-theoretically?

Consider this analogy: in English, the sentence "Colorless green ideas sleep furiously" is syntactically correct but semantically nonsensical. Similarly, int x = "hello"; is syntactically correct C code but semantically wrong because a string literal cannot be assigned to an integer variable. Semantic analysis catches exactly such errors.

Key Insight

Semantic analysis is the phase where the compiler builds most of the symbol table and uses it. Unlike lexical and syntax errors which prevent further compilation, many semantic errors can be reported and compilation can continue to find more errors.

Position in the Compiler Pipeline

Below is the context of Semantic Analysis within the standard 6-phase compiler structure:

Phase #Phase NameInputOutputKey Tool
1Lexical AnalysisSource code (characters)Token streamFinite Automaton / DFA
2Syntax AnalysisToken streamParse tree / ASTPush-down automaton / Parser
3Semantic AnalysisParse tree + Symbol TableAnnotated ASTAttribute Grammar / Type Checker
4Intermediate Code GenAnnotated ASTThree-address code (TAC)SDT Schemes
5Code OptimizationTACOptimized TACData Flow Analysis
6Code GenerationOptimized codeTarget machine codeRegister allocator

What Does Semantic Analysis Check?

Semantic analysis performs checking at multiple levels to ensure the source code conforms to the language definition:

Ensures that every operator receives operands of compatible types. For example:

  • Arithmetic operators (+, -, *, /) must have numeric operands.
  • Array indexing (arr[i]) requires the index i to be an integer.
  • The type checker infers or checks types of every expression recursively.

Guarantees that every identifier is declared before it is used. It checks:

  • Is the variable or function visible in the current lexical scope?
  • Does an identifier conflict with a duplicate declaration in the same scope?

Verifies that function calls supply the correct number of arguments matching the function's parameter list, and that their types are compatible with the parameters defined in the function signature.

Ensures that control flow statements are placed in valid contexts:

  • break and continue statements must appear only inside loops.
  • return statements must return values matching the function's declared return type.

Initialization Checking: Compilers check or issue warnings if a variable is read before it has been assigned a value.

Overloading Resolution: In languages that support overloading (like C++ or Java), the compiler uses argument types at the call site to determine exactly which overloaded function version to invoke.