Compiler Design
Bootstrapping & Cross Compilers
In this chapter, we explore how compilers compile themselves, and how to construct compilers that generate machine code for architectures different from the ones they are running on.
The Bootstrapping Paradox
Bootstrapping is the process of writing a compiler for a language in that same language. It presents a classic chicken-and-egg paradox: how do you compile a compiler written in language X when you don't yet have a compiler that runs X?
To compile the compiler source (written in X), you need a binary compiler for X. But you are currently writing that very binary.
- Write a compiler for a small subset of the language
X0in assembly/machine code. - Use that subset compiler to compile a compiler for the full language
Xwritten inX0. - You now have a running compiler for
X! From here, you can rewrite your compiler in fullXand compile it with itself. This state is called self-hosting.
T-Diagrams (Tombstone Diagrams)
T-Diagrams provide a formal visual notation for describing compilers. A compiler is defined by three languages:
- Source Language (S): The input language (left).
- Target Language (T): The output language (right).
- Implementation Language (I): The language the compiler is written in (bottom).
S ----> T
\ /
\ /
I
Combination Rule (Compiler cascade):
If you have a compiler translating S -> T written in M:
Compiler 1: [S -> T] written in M
And a compiler/processor translating M -> U written in H:
Compiler 2: [M -> U] written in H
You can feed Compiler 1's source code (written in M) into Compiler 2, resulting in:
[S -> T] written in U!Cross Compilers
A Cross Compiler is a compiler that runs on one host machine H, but generates machine instructions for a different target machine T (where H ≠ T).
Why are they necessary?
Essential for embedded systems and IoT devices (like microcontrollers or mobile phones) that are too slow or have too little memory to run a full compiler suite themselves.
Example
Compiling C code on your x86-64 Intel laptop to generate ARM binary instructions to deploy on a Raspberry Pi or an iOS/Android smartphone.
A complex but standard build configuration in systems engineering involving three machines:
- Build Machine (A): Where the compiler is actually built.
- Host Machine (B): Where the compiler executable will run.
- Target Machine (C): Where the binary outputs generated by the compiler will run.
This is commonly used in compiling GCC toolchains for new target architectures when a native compiler does not exist on the target architecture.