Computer Organization and Architecture

Building a Datapath

Module V: Building a Datapath. This module uncovers the structural blueprint of the processor core. It explores combinational vs. sequential logic elements, control unit design, instruction execution pathways for R-type, load, and branch instructions, and the motivations behind transitioning to multicycle datapath designs.

1. Single-Cycle Datapath & Logic Building Blocks

To construct a functional processor core, hardware designers categorize circuit components into two fundamental logic classes:

Combinational Elements

Logic units that process information instantly without keeping internal memory. Outputs depend strictly on current inputs (e.g., ALU, multiplexers, sign-extend hardware).

Sequential Elements

Logic units with built-in storage that preserve state across clock edges. Synchronized by an internal system clock (e.g., Register File, Instruction Memory, Data Memory).

Essential Memory Components

Instruction Memory: Holds executable binary instructions. Given a 32-bit address (PC), outputs the corresponding instruction word.
Register File: High-speed multi-ported storage hosting 32 general-purpose registers. Allows simultaneous reads from two source registers and one write port per clock edge.
Data Memory: Secondary state block for reading or writing application data using calculated address pointers.

2. Main Control Unit & ALU Signal Design

The Control Unit acts as the central orchestrator. It takes the 6-bit Opcode field (bits 31-26) and decodes it to set binary control lines that dictate how multiplexers and memory modules route data.

Central Control Signal Matrix

Signal	Asserted (1)	De-asserted (0)
RegDst	Write to rd (bits 15-11)	Write to rt (bits 20-16)
ALUSrc	ALU operand is sign-extended immediate	ALU operand from Register File (rt)
MemtoReg	Write Data Memory output to register	Write ALU output to register
RegWrite	Enable register write	Inhibit register write
MemRead	Initiate Data Memory read	Inhibit memory read
MemWrite	Initiate Data Memory write	Inhibit memory write
Branch	Branch if ALU Zero flag set	PC advances to PC + 4

Two-Level ALU Decoding Strategy

To keep the main control block simple, MIPS uses a hierarchical decoding scheme:

Level 1: Main Control Unit evaluates the opcode and outputs a 2-bit ALUOp field
Level 2: ALU Control unit processes ALUOp (2 bits) plus the 6-bit Function Field (bits 5-0) to generate the 3-bit ALU control code

3. Instruction Execution Pathways

A single-cycle datapath completes an entire instruction cycle within one extended clock period. Different instruction types follow distinct data paths through the core logic.

R-Type Instruction Flow

Example: add $rd, $rs, $rt

Fetch: Instruction Memory pulls the arithmetic command using the PC address
Decode: Register File reads contents of source registers rs and rt
Execute: ALU receives both register outputs and computes the arithmetic result
Write-Back: ALU output is routed back to Register File and saved in destination register rd

Load Word Instruction Flow

Example: lw $rt, offset($rs)

Fetch & Decode: Core fetches instruction and reads base address register rs
Address Calculation: Sign-extend hardware converts 16-bit immediate into 32-bit value. ALU adds this extended offset to the base address from rs
Memory Read: Computed address is passed to Data Memory, which outputs the data word at that location
Write-Back: Data word is routed to Register File and saved into register rt

Branch If Equal Instruction Flow

Example: beq $rs, $rt, offset

Fetch & Decode: Core fetches instruction and reads contents of registers rs and rt
Comparison: ALU subtracts rt from rs. If values match, result is zero and the Zero Flag is asserted
Target Calculation: Separate adder logic calculates the branch target address:
Target Address = (PC + 4) + (Sign-Extended Offset × 4)
PC Update: If Branch AND Zero evaluates true, PC updates to branch target; otherwise PC advances to PC + 4

4. Transitioning to Multicycle Approach

The Inefficiency of Single-Cycle Datapaths

While straightforward to design, single-cycle implementations suffer from severe performance limitations. The clock period must stretch to accommodate the longest possible paththrough the entire circuit-always the Load Word (lw) instruction. This forces faster instructions (like jumps or simple additions) to wait for the same extended clock window, wasting precious processing potential.

Critical Bottleneck:

The longest critical path through the datapath (often the memory access time) determines the minimum clock period for the entire system, forcing all instructions to wait.

The Multicycle Design Philosophy

To resolve this bottleneck, engineers transition to a Multicycle Datapath. This design breaks a single instruction into smaller, discrete functional execution steps. Each step completes within a much shorter, highly optimized clock period.

Key Architectural Changes:

Resource Sharing

Instead of duplicating expensive hardware blocks, multicycle designs share a single memory module for both instructions and data, and a single ALU for PC updates, branch evaluation, and data operations.

Internal Buffers

Temporary, non-user-visible registers placed between major logic blocks (Instruction Register, Memory Data Register, A, B, ALUOut). These buffers save intermediate results on clock boundaries.

Multicycle Advantage:

By breaking instructions into steps and allowing each step to take only the time it truly needs, multicycle designs dramatically improve overall throughput and reduce wasted clock cycles.

Computer Arithmetic Pipelining