Machine Learning

Supervised Learning: Regression / Dimensionality Reduction

Dimensionality Reduction

Dimensionality reduction transforms high-dimensional data into a lower-dimensional representation, preserving the essential structure and information needed for learning. In machine learning, datasets can have hundreds or thousands of features. Not all features contribute equally to the prediction task.

The Problem: Curse of Dimensionality

As the number of features (dimensions) grows, the feature space becomes increasingly sparse. Machine learning models require exponentially more data to generalize well, leading to:

Overfitting: The model learns noise rather than actual patterns.
High computational cost: Slow training and inference times.
Visualization difficulty: Impossible to visualize data above 3 dimensions.

Main Approaches

1. Feature Extraction (Projection)

Creates NEW features that are combinations (transformations) of the original features. The original features are projected into a lower-dimensional subspace.

Principal Component Analysis (PCA): The most widely used technique. It creates orthogonal features that capture the maximum variance in the data. (Unsupervised).
→ View PCA Deep Dive Example
Other Advanced Techniques (LDA, t-SNE, Autoencoders): Methods used for supervised classification, non-linear visualization, or neural network-based compression.
→ View Advanced Techniques Detail

2. Feature Selection

Selects a SUBSET of the original features without transforming them. Keeps the most relevant features and discards the rest.

→ See Feature Selection Methods

Ready to test your Dimensionality Reduction knowledge?

Dimensionality Reduction

Quiz on the Curse of Dimensionality, feature extraction vs selection, and an overview of Principal Component Analysis.

5 questions·No time limit·Instant feedback

Data Pre-processing PCA: Deep Dive