Compiler Design
Loop Optimizations
Loop Optimizations
Loops account for the majority of execution time in a typical program. Therefore, compilers spend significant effort optimizing the instructions inside and around loops.
Loop Unrolling reduces the overhead of loop control (branch instructions, condition checks, and counter updates) by replicating the loop body multiple times and adjusting the iteration step accordingly.
Before Unrolling
for (i = 0; i < 100; i++) {
a[i] = b[i] + c[i];
}After Unrolling (Factor 4)
for (i = 0; i < 100; i += 4) {
a[i] = b[i] + c[i];
a[i+1] = b[i+1] + c[i+1];
a[i+2] = b[i+2] + c[i+2];
a[i+3] = b[i+3] + c[i+3];
}Advantages: Reduces branch overhead (25 iterations instead of 100), increases Instruction Level Parallelism (ILP), and improves cache locality.
Disadvantages: Increases code size (could cause cache misses) and requires remainder handling if the count isn't divisible by the factor.
Loop Jamming merges two or more loops that have the SAME loop bounds and no conflicting data dependencies into a SINGLE loop.
Before Jamming
for (i = 0; i < N; i++) {
a[i] = b[i] * 2;
}
for (i = 0; i < N; i++) {
c[i] = a[i] + 5;
}After Jamming
for (i = 0; i < N; i++) {
a[i] = b[i] * 2;
c[i] = a[i] + 5;
}Advantages: Reduces loop overheads, improves cache reuse, and enables further optimization across the fused body.
Moves computations that produce the exact same result every iteration OUTSIDE the loop body so they are only calculated once.
Induction Variables: Variables that change by a constant amount each iteration. The compiler identifies and eliminates redundant ones.
Reduction in Strength: Replacing an expensive operation (like multiplication) with a cheaper one (like addition) inside the loop.
// Instead of: j = i * 4 (where i increments by 1) // We use: j = j + 4 (cheaper addition)