Machine Learning

Unit 5: Trends and Applications / Image Recognition


2. Image Recognition

Image Recognition is a field of computer vision that uses machine learning to identify and detect objects, people, or actions in images.

Traditional Pipeline

  • Manual Feature Extraction: Using algorithms like SIFT, HOG, SURF to find edges/corners.
  • Classification: Passing those features to an SVM or Random Forest.
  • Limitation: Manual feature engineering is domain-specific and fragile.

Deep Learning (CNNs)

  • Convolutional Layers: Automatically detect low-level features (edges).
  • Pooling Layers: Reduce spatial dimensions.
  • Fully Connected Layers: Combine features for final Softmax classification.
  • Advantage: Learns features automatically from raw pixels.

Image Preprocessing Basics

  • Resizing: Standardize image dimensions (e.g., 224x224).
  • Normalization: Scale pixel values (0-255) down to [0,1] or [-1,1].
  • Data Augmentation: Flip, rotate, or crop training images to artificially increase dataset diversity and prevent overfitting.
Deep Dive: Famous CNN Architectures
ArchitectureYearKey Innovation
AlexNet2012ReLU activation, Dropout, GPU training (sparked DL boom).
VGGNet2014Very deep networks using only small 3x3 filters.
GoogLeNet2014Inception module (parallel convolutions of different sizes), 1x1 convs.
ResNet2015Skip/Residual connections to solve the vanishing gradient problem in ultra-deep networks (150+ layers).