Compiler Design
Target Code Generation
Target Code Generation
The code generator takes the intermediate representation (IR) and produces the final target machine language. The output code must preserve the semantics of the original program while making optimal use of the target machine's resources.
The code generator requires two main inputs:
- Intermediate Representation (IR): Usually Three-Address Code or a DAG.
- Symbol Table: Provides information about the runtime addresses, data types, and sizes of variables.
The generator can output code in several formats, depending on the compiler's design:
- Absolute Machine Code: Can be loaded directly into memory and executed immediately. Fast startup, but all addresses are fixed (no linking).
- Relocatable Machine Code: Object files that contain unresolved symbols. This allows for separate compilation (linking object files together later).
- Assembly Language: Easier to debug, but requires an additional assembly pass to convert to binary.
Instruction selection involves choosing the appropriate target machine instructions for each IR operation. The quality of this mapping significantly impacts performance.
Example for `x = y + z`:
- Naive approach:
LOAD y, R0→ADD z, R0→STORE R0, x - Better approach: If the CPU supports memory-to-memory addition, one instruction could suffice (e.g.,
ADD y, z, x).
Modern CPUs execute instructions in a pipeline. Instruction scheduling reorders independent instructions to avoid pipeline hazards (stalls) and maximize throughput, without changing the program's semantics.
Example: Moving a slow memory load before an unrelated operation so the CPU doesn't stall waiting for the data to arrive.