Machine Learning
Unit 5: Trends and Applications / Image Recognition
2. Image Recognition
Image Recognition is a field of computer vision that uses machine learning to identify and detect objects, people, or actions in images.
Traditional Pipeline
- Manual Feature Extraction: Using algorithms like SIFT, HOG, SURF to find edges/corners.
- Classification: Passing those features to an SVM or Random Forest.
- Limitation: Manual feature engineering is domain-specific and fragile.
Deep Learning (CNNs)
- Convolutional Layers: Automatically detect low-level features (edges).
- Pooling Layers: Reduce spatial dimensions.
- Fully Connected Layers: Combine features for final Softmax classification.
- Advantage: Learns features automatically from raw pixels.
Image Preprocessing Basics
- Resizing: Standardize image dimensions (e.g., 224x224).
- Normalization: Scale pixel values (0-255) down to [0,1] or [-1,1].
- Data Augmentation: Flip, rotate, or crop training images to artificially increase dataset diversity and prevent overfitting.
Deep Dive: Famous CNN Architectures
| Architecture | Year | Key Innovation |
|---|---|---|
| AlexNet | 2012 | ReLU activation, Dropout, GPU training (sparked DL boom). |
| VGGNet | 2014 | Very deep networks using only small 3x3 filters. |
| GoogLeNet | 2014 | Inception module (parallel convolutions of different sizes), 1x1 convs. |
| ResNet | 2015 | Skip/Residual connections to solve the vanishing gradient problem in ultra-deep networks (150+ layers). |