Three-Address Code: Compiler Optimization Techniques Explained
Understanding Intermediate Code in Modern Compilers
When programmers write source code, they prioritize readability and functionality over raw efficiency. This is where compilers step in, transforming human-friendly code into optimized machine instructions. Three-address code serves as a crucial intermediate representation (IR) that bridges high-level logic and low-level execution. Unlike abstract syntax trees, three-address code provides a linear, assembly-like structure where each instruction contains at most one operator and three operands. For example, x = y + z * w decomposes into:
t1 = z * w
t2 = y + t1
x = t2
This decomposition enables powerful machine-independent optimizations before final code generation. Industry compilers like GCC and LLVM leverage such IRs to perform transformations that would be impractical directly on source code or syntax trees.
How Three-Address Code Enables Optimization
Three-address code’s simplicity creates optimization opportunities through its explicit sequencing and reduced operational complexity. Key characteristics include:
- Explicit temporary variables for complex expression evaluation
- Symbolic labels for control flow (e.g.,
if x < 10 goto L3) - Backpatching to resolve forward jumps during single-pass generation
- Machine agnosticism allowing front-end specific optimizations
During IR generation, compilers replace source variables with symbol table pointers. Conditional jumps use labels like L1, which are later resolved to actual addresses. The video reveals a critical insight: "Backpatching allows compilers to generate jump targets in a single pass by maintaining a 'to-fix' list." This technique is foundational to efficient compilation.
Basic Blocks: The Units of Local Optimization
Compilers partition three-address code into basic blocks—sequential instructions with single entry and exit points. Control flow only enters at the start and exits via jumps or fall-through at the end. Consider this loop example:
L1: if x > y goto L2
z = 0
goto L3
L2: z = 1
L3: i = i + 1
if i < 10 goto L1
Basic blocks identified:
L1togoto L3(conditional block)L2: z = 1(assignment block)L3to end (loop increment block)
Why basic blocks matter:
- Enable local optimizations without considering entire program flow
- Allow instruction reordering within blocks
- Form nodes in control flow graphs (CFGs) for global analysis
- Simplify dead code elimination and register allocation
Optimization Techniques in Action
Within basic blocks, compilers apply these key optimizations:
Dead Code Elimination
Instructions assigning values never used later are removed. For example:
a = 5 // Never referenced elsewhere
b = x * 2
→ The a=5 assignment is deleted.
Common Subexpression Elimination
Repeated calculations are identified and cached:
t1 = x + y
z = t1 * 3
t2 = x + y // Duplicate
w = t2 / 4
→ Transforms to:
t1 = x + y
z = t1 * 3
w = t1 / 4 // Reuses t1
Constant Folding and Propagation
Compilers precompute constant expressions:
x = 3
y = x * 5 // 15
z = y + 2 // 17
→ Becomes:
x = 3
y = 15
z = 17
Algebraic Simplification
Mathematical identities simplify operations:
a = b + 0 → a = b
c = d * 1 → c = d
e = f * 2 → e = f + f // Faster on some architectures
Advanced Techniques: DAGs and Peephole Optimization
Beyond basic blocks, compilers deploy:
Directed Acyclic Graphs (DAGs):
- Represent expressions while identifying common subexpressions
- Enable optimal instruction sequencing during code regeneration
Peephole Optimization:
A "sliding window" examines short instruction sequences to:
- Remove redundant jumps (e.g.,
goto L1immediately followed bygoto L2→ directgoto L2) - Simplify sequences (e.g.,
x = x + 0→ eliminated) - Replace slow instructions with faster equivalents (e.g.,
mul x, 8→left_shift x, 3)
Why Optimization Isn't Perfect
Despite sophisticated techniques:
- Full optimization is computationally infeasible (NP-hard problems)
- Compilers prioritize safe transformations preserving program behavior
- Trade-offs exist between compilation time and optimization depth
- Target-specific optimizations occur later in the pipeline
As the video emphasizes: "Optimization is a misnomer—compilers improve code, but 'optimal' is unattainable in practice." The key is achieving significant efficiency gains without altering functionality.
Practical Implications for Developers
- Write clear code first: Modern compilers handle low-level optimizations effectively
- Avoid premature optimization: Focus on algorithm efficiency rather than micro-optimizations
- Understand compiler flags: Use settings like
-O2(GCC) or/O2(MSVC) to enable IR optimizations - Profile before optimizing: Target bottlenecks revealed by profiling tools
Key Takeaways
- Three-address code simplifies complex expressions into single-operation instructions
- Basic blocks enable localized, machine-independent optimizations
- Dead code elimination and constant propagation reduce runtime overhead
- DAGs and peephole techniques optimize across blocks
- Compilers balance optimization depth against compilation speed
Crucially, these transformations occur transparently—developers get faster code without modifying source. When writing loops or conditionals, which optimization technique do you suspect provides the most significant performance lift in your projects? Share your observations below.