Computer Organization and Architecture

Building a Datapath


Module V: Building a Datapath. This module uncovers the structural blueprint of the processor core. It explores combinational vs. sequential logic elements, control unit design, instruction execution pathways for R-type, load, and branch instructions, and the motivations behind transitioning to multicycle datapath designs.


1. Single-Cycle Datapath & Logic Building Blocks

To construct a functional processor core, hardware designers categorize circuit components into two fundamental logic classes:

Combinational Elements

Logic units that process information instantly without keeping internal memory. Outputs depend strictly on current inputs (e.g., ALU, multiplexers, sign-extend hardware).

Sequential Elements

Logic units with built-in storage that preserve state across clock edges. Synchronized by an internal system clock (e.g., Register File, Instruction Memory, Data Memory).

Essential Memory Components

  • Instruction Memory: Holds executable binary instructions. Given a 32-bit address (PC), outputs the corresponding instruction word.
  • Register File: High-speed multi-ported storage hosting 32 general-purpose registers. Allows simultaneous reads from two source registers and one write port per clock edge.
  • Data Memory: Secondary state block for reading or writing application data using calculated address pointers.

2. Main Control Unit & ALU Signal Design

The Control Unit acts as the central orchestrator. It takes the 6-bit Opcode field (bits 31-26) and decodes it to set binary control lines that dictate how multiplexers and memory modules route data.

Central Control Signal Matrix

SignalAsserted (1)De-asserted (0)
RegDstWrite to rd (bits 15-11)Write to rt (bits 20-16)
ALUSrcALU operand is sign-extended immediateALU operand from Register File (rt)
MemtoRegWrite Data Memory output to registerWrite ALU output to register
RegWriteEnable register writeInhibit register write
MemReadInitiate Data Memory readInhibit memory read
MemWriteInitiate Data Memory writeInhibit memory write
BranchBranch if ALU Zero flag setPC advances to PC + 4

Two-Level ALU Decoding Strategy

To keep the main control block simple, MIPS uses a hierarchical decoding scheme:

  1. Level 1: Main Control Unit evaluates the opcode and outputs a 2-bit ALUOp field
  2. Level 2: ALU Control unit processes ALUOp (2 bits) plus the 6-bit Function Field (bits 5-0) to generate the 3-bit ALU control code

3. Instruction Execution Pathways

A single-cycle datapath completes an entire instruction cycle within one extended clock period. Different instruction types follow distinct data paths through the core logic.

R-Type Instruction Flow

Example: add $rd, $rs, $rt

  1. Fetch: Instruction Memory pulls the arithmetic command using the PC address
  2. Decode: Register File reads contents of source registers rs and rt
  3. Execute: ALU receives both register outputs and computes the arithmetic result
  4. Write-Back: ALU output is routed back to Register File and saved in destination register rd

Load Word Instruction Flow

Example: lw $rt, offset($rs)

  1. Fetch & Decode: Core fetches instruction and reads base address register rs
  2. Address Calculation: Sign-extend hardware converts 16-bit immediate into 32-bit value. ALU adds this extended offset to the base address from rs
  3. Memory Read: Computed address is passed to Data Memory, which outputs the data word at that location
  4. Write-Back: Data word is routed to Register File and saved into register rt

Branch If Equal Instruction Flow

Example: beq $rs, $rt, offset

  1. Fetch & Decode: Core fetches instruction and reads contents of registers rs and rt
  2. Comparison: ALU subtracts rt from rs. If values match, result is zero and the Zero Flag is asserted
  3. Target Calculation: Separate adder logic calculates the branch target address:
    Target Address = (PC + 4) + (Sign-Extended Offset × 4)
  4. PC Update: If Branch AND Zero evaluates true, PC updates to branch target; otherwise PC advances to PC + 4

4. Transitioning to Multicycle Approach

The Inefficiency of Single-Cycle Datapaths

While straightforward to design, single-cycle implementations suffer from severe performance limitations. The clock period must stretch to accommodate the longest possible paththrough the entire circuit-always the Load Word (lw) instruction. This forces faster instructions (like jumps or simple additions) to wait for the same extended clock window, wasting precious processing potential.

Critical Bottleneck:

The longest critical path through the datapath (often the memory access time) determines the minimum clock period for the entire system, forcing all instructions to wait.

The Multicycle Design Philosophy

To resolve this bottleneck, engineers transition to a Multicycle Datapath. This design breaks a single instruction into smaller, discrete functional execution steps. Each step completes within a much shorter, highly optimized clock period.

Key Architectural Changes:

Resource Sharing

Instead of duplicating expensive hardware blocks, multicycle designs share a single memory module for both instructions and data, and a single ALU for PC updates, branch evaluation, and data operations.

Internal Buffers

Temporary, non-user-visible registers placed between major logic blocks (Instruction Register, Memory Data Register, A, B, ALUOut). These buffers save intermediate results on clock boundaries.

Multicycle Advantage:

By breaking instructions into steps and allowing each step to take only the time it truly needs, multicycle designs dramatically improve overall throughput and reduce wasted clock cycles.