Machine Learning

Supervised Learning: Regression / Feature Selection

Feature Subset Selection

Feature subset selection (FSS) is the process of identifying and selecting the most relevant subset of features from the original feature set for building a machine learning model. Unlike feature extraction (like PCA), feature selection retains the original features, making the results highly interpretable.

1. Filter Methods▼

Evaluate the relevance of features by measuring their statistical relationship with the target variable, independent of any machine learning algorithm. They are fast, scalable, and act as a pre-processing step.

Pearson Correlation: Measures linear relationship. Features with high correlation (e.g., |r| > 0.7) to the target are highly relevant. Features with near zero correlation are dropped.
Information Gain: Measures the reduction in entropy (uncertainty) after splitting data on a feature.
Chi-Square Test: Tests statistical independence between a categorical feature and a categorical target.

Disadvantage: Ignores feature interactions (how features work together).

2. Wrapper Methods▼

Use a specific machine learning algorithm as a "black box" to evaluate feature subsets. They search through combinations of features and use model performance (e.g., accuracy) to pick the best subset.

Forward Selection:

Start with an empty set: S = {}
For each remaining feature, train a model adding only that feature to S.
Add the feature that gives the highest accuracy improvement to S.
Repeat until performance stops improving.

Backward Elimination:

Start with all features: S = {all features}
For each feature in S, train a model without that feature.
Remove the feature whose removal causes the least drop in accuracy.
Repeat until performance drops significantly.

Disadvantage: Computationally expensive, especially for datasets with many features.

3. Embedded Methods▼

Perform feature selection automatically as part of the model training process. They strike a balance between the speed of filter methods and the accuracy of wrapper methods.

LASSO (L1 Regularization): Adds a penalty to the loss function that shrinks the coefficients of irrelevant or redundant features exactly to zero, effectively eliminating them from the model.
Decision Trees / Random Forests: Naturally evaluate feature importance during the construction of the tree splits (using Information Gain or Gini Impurity).

Ready to test your Feature Selection knowledge?

Feature Selection

Quiz on selecting the optimal subset of features using Filter, Wrapper, and Embedded methods.

5 questions·No time limit·Instant feedback

Advanced Dim Reduction (LDA, t-SNE)Regression Models