Compiler Design

Loop Optimizations


Loop Optimizations

Loops account for the majority of execution time in a typical program. Therefore, compilers spend significant effort optimizing the instructions inside and around loops.

Loop Unrolling reduces the overhead of loop control (branch instructions, condition checks, and counter updates) by replicating the loop body multiple times and adjusting the iteration step accordingly.

Before Unrolling

for (i = 0; i < 100; i++) {
    a[i] = b[i] + c[i];
}

After Unrolling (Factor 4)

for (i = 0; i < 100; i += 4) {
    a[i]   = b[i]   + c[i];
    a[i+1] = b[i+1] + c[i+1];
    a[i+2] = b[i+2] + c[i+2];
    a[i+3] = b[i+3] + c[i+3];
}

Advantages: Reduces branch overhead (25 iterations instead of 100), increases Instruction Level Parallelism (ILP), and improves cache locality.

Disadvantages: Increases code size (could cause cache misses) and requires remainder handling if the count isn't divisible by the factor.

Loop Jamming merges two or more loops that have the SAME loop bounds and no conflicting data dependencies into a SINGLE loop.

Before Jamming

for (i = 0; i < N; i++) {
    a[i] = b[i] * 2;
}
for (i = 0; i < N; i++) {
    c[i] = a[i] + 5;
}

After Jamming

for (i = 0; i < N; i++) {
    a[i] = b[i] * 2;
    c[i] = a[i] + 5;
}

Advantages: Reduces loop overheads, improves cache reuse, and enables further optimization across the fused body.

Moves computations that produce the exact same result every iteration OUTSIDE the loop body so they are only calculated once.

Induction Variables: Variables that change by a constant amount each iteration. The compiler identifies and eliminates redundant ones.

Reduction in Strength: Replacing an expensive operation (like multiplication) with a cheaper one (like addition) inside the loop.

// Instead of:  j = i * 4  (where i increments by 1)
// We use:      j = j + 4  (cheaper addition)