Machine Learning

Unit 5: Trends and Applications / Image Recognition

2. Image Recognition

Image Recognition is a field of computer vision that uses machine learning to identify and detect objects, people, or actions in images.

Traditional Pipeline

Manual Feature Extraction: Using algorithms like SIFT, HOG, SURF to find edges/corners.
Classification: Passing those features to an SVM or Random Forest.
Limitation: Manual feature engineering is domain-specific and fragile.

Deep Learning (CNNs)

Convolutional Layers: Automatically detect low-level features (edges).
Pooling Layers: Reduce spatial dimensions.
Fully Connected Layers: Combine features for final Softmax classification.
Advantage: Learns features automatically from raw pixels.

Image Preprocessing Basics

Resizing: Standardize image dimensions (e.g., 224x224).
Normalization: Scale pixel values (0-255) down to [0,1] or [-1,1].
Data Augmentation: Flip, rotate, or crop training images to artificially increase dataset diversity and prevent overfitting.

Deep Dive: Famous CNN Architectures

Architecture	Year	Key Innovation
AlexNet	2012	ReLU activation, Dropout, GPU training (sparked DL boom).
VGGNet	2014	Very deep networks using only small 3x3 filters.
GoogLeNet	2014	Inception module (parallel convolutions of different sizes), 1x1 convs.
ResNet	2015	Skip/Residual connections to solve the vanishing gradient problem in ultra-deep networks (150+ layers).

Ensemble Learning Speech Recognition