Compiler Design

Directed Acyclic Graphs (DAGs)

A Directed Acyclic Graph (DAG) is a data structure used to represent basic blocks, optimize code, and detect common subexpressions. It is a more condensed version of an Abstract Syntax Tree (AST).

To build a DAG for a block of code, follow these steps:

Initialize Leaves: Create a leaf node for each unique variable or constant that appears as an initial value (operand).
Process Statements: For each statement of the form x = y op z:
- Search to see if a node labelled op with children y and z already exists.
- If YES: Attach x as an alias (or identifier) to that existing node.
- If NO: Create a new interior node for op with children y and z, and attach x as its label.
Cleanup: At the end of the block, interior nodes without live variable labels (those not used later) can be safely deleted.

Example Trace:

Code: 
1. a = b + c
2. b = a - d
3. c = b + c
4. d = a - d

DAG Behavior:
- 'a - d' is computed in step 2 and stored in b.
- In step 4, 'a - d' is required again.
- Instead of creating a new node, 'd' points to the existing node for 'a - d' (which is b).

Uses of DAG in Code Optimization

DAGs are instrumental in performing local optimizations within a basic block.

Because a DAG shares nodes for identical expressions, it automatically identifies and eliminates redundant calculations.

Before:
a = b + c
d = b + c

After (via DAG):
'b + c' is one shared node. Computed only once!

Nodes in the DAG that are not reachable from any live output variable (variables used outside the block) are considered dead and are stripped out during generation.

Performing a topological sort on the DAG provides a valid and often optimal order for evaluating operations, which minimizes register pressure.

By analyzing the DAG structure, the compiler can make better decisions about which computed values should be kept in fast CPU registers versus being spilled to memory.

Intermediate Code Generation Introduction to Code Optimization