Machine Learning
Supervised Learning: Regression / Dimensionality Reduction
Dimensionality Reduction
Dimensionality reduction transforms high-dimensional data into a lower-dimensional representation, preserving the essential structure and information needed for learning. In machine learning, datasets can have hundreds or thousands of features. Not all features contribute equally to the prediction task.
The Problem: Curse of Dimensionality
As the number of features (dimensions) grows, the feature space becomes increasingly sparse. Machine learning models require exponentially more data to generalize well, leading to:
- Overfitting: The model learns noise rather than actual patterns.
- High computational cost: Slow training and inference times.
- Visualization difficulty: Impossible to visualize data above 3 dimensions.
Main Approaches
1. Feature Extraction (Projection)
Creates NEW features that are combinations (transformations) of the original features. The original features are projected into a lower-dimensional subspace.
- Principal Component Analysis (PCA): The most widely used technique. It creates orthogonal features that capture the maximum variance in the data. (Unsupervised).→ View PCA Deep Dive Example
- Other Advanced Techniques (LDA, t-SNE, Autoencoders): Methods used for supervised classification, non-linear visualization, or neural network-based compression.→ View Advanced Techniques Detail
2. Feature Selection
Selects a SUBSET of the original features without transforming them. Keeps the most relevant features and discards the rest.
Ready to test your Dimensionality Reduction knowledge?
Dimensionality Reduction
Quiz on the Curse of Dimensionality, feature extraction vs selection, and an overview of Principal Component Analysis.