Computer Organization and Architecture

Memory Hierarchy

Module VII: Memory Hierarchy. This module examines how memory subsystems are structured to mitigate the speed gap between high-performance processors and slow storage components. Learners will explore the principle of locality, analyze the architectural traits of SRAM, DRAM, and physical disks, and study cache mapping topologies. Additionally, this module details cache miss penalties and the address translation mechanics of Virtual Memory.

1. Memory Hierarchy Structure: SRAM, DRAM, and Disk

An ideal memory subsystem would be infinitely fast, infinitely large, and cost next to nothing. In practice, smaller memories are fast but expensive, while larger memories are cheap but slow. To resolve this bottleneck, systems organize storage layers into a strict Memory Hierarchy.

CPU Registers

Caches (L1, L2, L3)

SRAM - Fast, Small, High Cost

Main Memory

DRAM - Moderate Speed/Size/Cost

Secondary Storage

Magnetic / SSD - Slow, Huge, Cheap

The Principle of Locality

Hierarchies work efficiently because software programs consistently manifest locality, reusing code and data segments via two distinct behaviors:

Temporal Locality

(Locality in Time) If a specific location is referenced, it is highly likely to be accessed again in the near future (e.g., loop counters, induction variables).

Spatial Locality

(Locality in Space) If a specific location is referenced, adjacent memory addresses are highly likely to be accessed soon afterward (e.g., sequential arrays).

2. Cache Memory and Placement Strategies

A Cache is a small, high-speed memory block placed directly between the CPU and Main Memory. A Hit means the block was found in cache, while a Miss means the CPU must fetch the chunk from main memory.

Block-Placement Strategies

A. Direct-Mapped Cache

Each main memory block maps to exactly one predetermined slot inside the cache array.

Cache Slot Index = (Block Address) % (Number of Cache Blocks)

Pros: Exceptionally fast access, low multiplexer overhead.

Cons: Prone to thrashing (constant cache evictions due to index collisions).

B. Fully Associative Cache

A main memory block can sit inside any available slot within the cache matrix.

Pros: Eliminates structural index collisions entirely.

Cons: Requires parallel hardware comparators for every tag; high transistor count & latency.

C. Set-Associative Cache

A middle-ground framework that groups cache blocks into distinct Sets. A memory block maps to a single set, but can be placed in any block slot within that designated set.

Cache Set Index = (Block Address) % (Number of Sets in Cache)

3. Handling Cache Misses & Miss Penalty Reduction

To evaluate the complete performance drain of cache misses, engineers combine Hit Time, Miss Rate, and Miss Penalty into Average Memory Access Time (AMAT):

AMAT = Hit Time + (Miss Rate × Miss Penalty)

Cache Write Policies

Write-Through: Processor writes modified byte to both cache block and main memory simultaneously. Often uses a Write Buffer to avoid stalling the pipeline.
Write-Back: Processor writes strictly to the cache block. Main memory is only updated when that block is evicted to make room. Utilizes a Dirty Bit to track modifications.

4. Virtual Memory and Address Translation

Virtual Memory is an architectural abstraction that uses main memory as a cache for the disk drive, dividing address space into fixed-size Virtual Pages mapping to RAM Physical Page Frames. When a requested page isn't in RAM, a Page Fault occurs.

Translation Mechanics

Virtual Address Layout

Virtual Page Number (VPN)

Page Offset

Memory Management Unit (MMU)(TLB / Page Table Lookup)↓

↓

Physical Page Number (PPN)

Page Offset

Physical Address Layout

The Page TableAn in-memory indexing array maintained by the OS. The VPN acts as a lookup index, yielding the associated PPN alongside bits like `Valid`, `Dirty`, and Permissions.
Translation Lookaside Buffer (TLB)Since Page Tables live in memory, address translation normally requires an extra memory cycle. The TLB is a tiny, highly-associative, hardware cache directly inside the CPU that remembers recent VPN-to-PPN links, resolving >95% of translations instantly.

Pipelining Storage and I/O Systems