Computer Organization and Architecture
Memory Hierarchy
Module VII: Memory Hierarchy. This module examines how memory subsystems are structured to mitigate the speed gap between high-performance processors and slow storage components. Learners will explore the principle of locality, analyze the architectural traits of SRAM, DRAM, and physical disks, and study cache mapping topologies. Additionally, this module details cache miss penalties and the address translation mechanics of Virtual Memory.
1. Memory Hierarchy Structure: SRAM, DRAM, and Disk
An ideal memory subsystem would be infinitely fast, infinitely large, and cost next to nothing. In practice, smaller memories are fast but expensive, while larger memories are cheap but slow. To resolve this bottleneck, systems organize storage layers into a strict Memory Hierarchy.
The Principle of Locality
Hierarchies work efficiently because software programs consistently manifest locality, reusing code and data segments via two distinct behaviors:
Temporal Locality
(Locality in Time) If a specific location is referenced, it is highly likely to be accessed again in the near future (e.g., loop counters, induction variables).
Spatial Locality
(Locality in Space) If a specific location is referenced, adjacent memory addresses are highly likely to be accessed soon afterward (e.g., sequential arrays).
2. Cache Memory and Placement Strategies
A Cache is a small, high-speed memory block placed directly between the CPU and Main Memory. A Hit means the block was found in cache, while a Miss means the CPU must fetch the chunk from main memory.
Block-Placement Strategies
A. Direct-Mapped Cache
Each main memory block maps to exactly one predetermined slot inside the cache array.
Pros: Exceptionally fast access, low multiplexer overhead.
Cons: Prone to thrashing (constant cache evictions due to index collisions).
B. Fully Associative Cache
A main memory block can sit inside any available slot within the cache matrix.
Pros: Eliminates structural index collisions entirely.
Cons: Requires parallel hardware comparators for every tag; high transistor count & latency.
C. Set-Associative Cache
A middle-ground framework that groups cache blocks into distinct Sets. A memory block maps to a single set, but can be placed in any block slot within that designated set.
3. Handling Cache Misses & Miss Penalty Reduction
To evaluate the complete performance drain of cache misses, engineers combine Hit Time, Miss Rate, and Miss Penalty into Average Memory Access Time (AMAT):
Cache Write Policies
- Write-Through: Processor writes modified byte to both cache block and main memory simultaneously. Often uses a Write Buffer to avoid stalling the pipeline.
- Write-Back: Processor writes strictly to the cache block. Main memory is only updated when that block is evicted to make room. Utilizes a Dirty Bit to track modifications.
4. Virtual Memory and Address Translation
Virtual Memory is an architectural abstraction that uses main memory as a cache for the disk drive, dividing address space into fixed-size Virtual Pages mapping to RAM Physical Page Frames. When a requested page isn't in RAM, a Page Fault occurs.
Translation Mechanics
Virtual Address Layout
Physical Address Layout
- The Page TableAn in-memory indexing array maintained by the OS. The VPN acts as a lookup index, yielding the associated PPN alongside bits like `Valid`, `Dirty`, and Permissions.
- Translation Lookaside Buffer (TLB)Since Page Tables live in memory, address translation normally requires an extra memory cycle. The TLB is a tiny, highly-associative, hardware cache directly inside the CPU that remembers recent VPN-to-PPN links, resolving >95% of translations instantly.