Computer Organization and Architecture

Performance Analysis

Module II: Performance Analysis. This module focuses on the principles and metrics used to assess, measure, and analyze computer system performance. Learners will examine the distinctions between system latency and execution throughput, demystify the multi-variable CPU performance equation, study instruction-level evaluation standards (CPI and IPC), and analyze historical hardware-oriented metrics like MIPS, MOPS, and MFLOPS alongside their engineering limitations.

1. Defining Performance: Latency vs. Throughput

To accurately evaluate a computer system's processing capabilities, engineers differentiate between two primary definitions of performance:

Response Time (Latency)

The total time required to execute a single, individual computational task from its initial submission to its definitive completion. This is the primary metric of concern for individual desktop or mobile users.

Throughput

The total volume or number of discrete processing tasks completed over a defined unit of execution time. This is the primary metric of concern for data centers, servers, and cloud infrastructure administrators.

The Reciprocal Relationship

Computer performance is mathematically defined as the reciprocal of execution time. For a given machine A:

Performance_A = 1 / Execution Time_A

When comparing two distinct processing configurations, Machine A is described as n times faster than Machine B if:

( Performance_A / Performance_B ) = ( Execution Time_B / Execution Time_A ) = n

2. Components of Execution Time: User vs. System

When a program runs on a processor, its total elapsed clock time (Wall-Clock Time) includes peripheral operations, I/O delays, and operating system resource cycles. To isolate processor performance, engineers track CPU Execution Time, which is partitioned into two core components:

User CPU Time: The specific amount of processor clock time expended strictly executing the lines of code within the user's application.
System CPU Time: The processor clock time consumed by the underlying Operating System kernel executing system resource tasks on behalf of the application (e.g., memory allocation, context switching, basic device driver calls).

3. Cycles Per Instruction (CPI) and Instructions Per Cycle (IPC)

Processors operate via a continuous internal hardware clock running at a specific Clock Rate (measured in Hertz, GHz) or Clock Period (measured in nanoseconds/picoseconds). The speed at which code runs is fundamentally dependent on the number of clock cycles required by the processor to execute each instruction.

Cycles Per Instruction (CPI)

Different classes of instruction tokens require different amounts of internal hardware manipulation:

Memory-bound instructions (e.g., lw, sw) typically consume more cycles due to memory interface overhead.
Basic arithmetic instructions (e.g., add, or) execute faster using compact ALU combinational logic.

The Average CPI for a compiled program with n instruction classes:

CPI_avg = ∑ (CPI_i × Instruction Frequency_i)

Instructions Per Cycle (IPC)

IPC is the strict mathematical inverse of CPI:

IPC = 1 / CPI

A higher IPC signifies that the architecture can execute more parallel instruction streams per single clock tick, which is a key design benchmark for modern superscalar or out-of-order execution cores.

4. The Fundamental CPU Performance Equation

The total CPU execution time required to complete a given software application is determined by three independent, interacting variables:

CPU Time = Instruction Count × CPI × Clock Cycle Time

Alternatively, using Clock Rate (f = 1 / Clock Cycle Time):
CPU Time = (Instruction Count × CPI) / Clock Rate

Architectural Dependencies Matrix

This equation demonstrates that performance is not governed by clock speed alone. Each layer of the hardware-software stack influences a different component:

System Layer	Affects Instruction Count?	Affects CPI?	Affects Clock Rate?
Algorithm Design	Yes	Yes	No
Programming Language	Yes	No	No
Compiler Optimization	Yes	Yes	No
Instruction Set Architecture	Yes	Yes	No
Microarchitecture Design	No	Yes	Yes
Silicon VLSI Technology	No	No	Yes

5. Hardware-Oriented Metrics & Their Limitations

Historically, computer vendors used absolute hardware throughput metrics instead of time-based calculations to market processing units.

MIPS (Millions of Instructions Per Second)

MIPS quantifies the absolute speed of instruction execution for a designated computer:

MIPS = Instruction Count / (Execution Time × 10⁶) = Clock Rate / (CPI × 10⁶)

Significant Limitations of MIPS:

ISA Independence Failure: Cannot compare processors with different instruction sets. A CISC machine doing more work per instruction will have a lower MIPS score than RISC despite faster execution.
Program Variance: Varies drastically across different workloads on the same machine because the instruction mix alters the average CPI.
Inverse Scaling Risk: A less efficient compiler might generate many fast, simple instructions, increasing absolute MIPS while drastically slowing down total execution time.

MOPS

Millions of Operations Per Second: Tracks raw low-level hardware control actions rather than complete machine instructions.

MFLOPS

Millions of Floating-Point Operations Per Second: Benchmarks scientific computing units by counting only floating-point math executions.

MFLOPS = FP Count / (Time × 10⁶)

MFLOPS isolates floating-point capabilities, heavily depending on explicit math instructions while completely ignoring integer index updates or control loop structures.

Introduction to Computer Architecture MIPS - Language of the Computer