Computer Organization and Architecture
Building a Datapath
Module V: Building a Datapath. This module uncovers the structural blueprint of the processor core. It explores combinational vs. sequential logic elements, control unit design, instruction execution pathways for R-type, load, and branch instructions, and the motivations behind transitioning to multicycle datapath designs.
1. Single-Cycle Datapath & Logic Building Blocks
To construct a functional processor core, hardware designers categorize circuit components into two fundamental logic classes:
Combinational Elements
Logic units that process information instantly without keeping internal memory. Outputs depend strictly on current inputs (e.g., ALU, multiplexers, sign-extend hardware).
Sequential Elements
Logic units with built-in storage that preserve state across clock edges. Synchronized by an internal system clock (e.g., Register File, Instruction Memory, Data Memory).
Essential Memory Components
- Instruction Memory: Holds executable binary instructions. Given a 32-bit address (PC), outputs the corresponding instruction word.
- Register File: High-speed multi-ported storage hosting 32 general-purpose registers. Allows simultaneous reads from two source registers and one write port per clock edge.
- Data Memory: Secondary state block for reading or writing application data using calculated address pointers.
2. Main Control Unit & ALU Signal Design
The Control Unit acts as the central orchestrator. It takes the 6-bit Opcode field (bits 31-26) and decodes it to set binary control lines that dictate how multiplexers and memory modules route data.
Central Control Signal Matrix
| Signal | Asserted (1) | De-asserted (0) |
|---|---|---|
| RegDst | Write to rd (bits 15-11) | Write to rt (bits 20-16) |
| ALUSrc | ALU operand is sign-extended immediate | ALU operand from Register File (rt) |
| MemtoReg | Write Data Memory output to register | Write ALU output to register |
| RegWrite | Enable register write | Inhibit register write |
| MemRead | Initiate Data Memory read | Inhibit memory read |
| MemWrite | Initiate Data Memory write | Inhibit memory write |
| Branch | Branch if ALU Zero flag set | PC advances to PC + 4 |
Two-Level ALU Decoding Strategy
To keep the main control block simple, MIPS uses a hierarchical decoding scheme:
- Level 1: Main Control Unit evaluates the opcode and outputs a 2-bit
ALUOpfield - Level 2: ALU Control unit processes
ALUOp(2 bits) plus the 6-bitFunction Field(bits 5-0) to generate the 3-bit ALU control code
3. Instruction Execution Pathways
A single-cycle datapath completes an entire instruction cycle within one extended clock period. Different instruction types follow distinct data paths through the core logic.
R-Type Instruction Flow
Example: add $rd, $rs, $rt
- Fetch: Instruction Memory pulls the arithmetic command using the PC address
- Decode: Register File reads contents of source registers rs and rt
- Execute: ALU receives both register outputs and computes the arithmetic result
- Write-Back: ALU output is routed back to Register File and saved in destination register rd
Load Word Instruction Flow
Example: lw $rt, offset($rs)
- Fetch & Decode: Core fetches instruction and reads base address register rs
- Address Calculation: Sign-extend hardware converts 16-bit immediate into 32-bit value. ALU adds this extended offset to the base address from rs
- Memory Read: Computed address is passed to Data Memory, which outputs the data word at that location
- Write-Back: Data word is routed to Register File and saved into register rt
Branch If Equal Instruction Flow
Example: beq $rs, $rt, offset
- Fetch & Decode: Core fetches instruction and reads contents of registers rs and rt
- Comparison: ALU subtracts rt from rs. If values match, result is zero and the Zero Flag is asserted
- Target Calculation: Separate adder logic calculates the branch target address:Target Address = (PC + 4) + (Sign-Extended Offset × 4)
- PC Update: If
Branch AND Zeroevaluates true, PC updates to branch target; otherwise PC advances to PC + 4
4. Transitioning to Multicycle Approach
The Inefficiency of Single-Cycle Datapaths
While straightforward to design, single-cycle implementations suffer from severe performance limitations. The clock period must stretch to accommodate the longest possible paththrough the entire circuit-always the Load Word (lw) instruction. This forces faster instructions (like jumps or simple additions) to wait for the same extended clock window, wasting precious processing potential.
Critical Bottleneck:
The longest critical path through the datapath (often the memory access time) determines the minimum clock period for the entire system, forcing all instructions to wait.
The Multicycle Design Philosophy
To resolve this bottleneck, engineers transition to a Multicycle Datapath. This design breaks a single instruction into smaller, discrete functional execution steps. Each step completes within a much shorter, highly optimized clock period.
Key Architectural Changes:
Resource Sharing
Instead of duplicating expensive hardware blocks, multicycle designs share a single memory module for both instructions and data, and a single ALU for PC updates, branch evaluation, and data operations.
Internal Buffers
Temporary, non-user-visible registers placed between major logic blocks (Instruction Register, Memory Data Register, A, B, ALUOut). These buffers save intermediate results on clock boundaries.
Multicycle Advantage:
By breaking instructions into steps and allowing each step to take only the time it truly needs, multicycle designs dramatically improve overall throughput and reduce wasted clock cycles.