Machine Learning

Supervised Learning: Regression / Multicollinearity & VIF


Multicollinearity & VIF

In Multiple Linear Regression, Multicollinearity occurs when two or more independent variables are highly correlated with each other. This makes it difficult to determine the individual effect of each feature.

Example: High Correlation

Suppose we are predicting health outcomes using Height and Leg Length. These two features are naturally very highly correlated (r ≈ 0.95).

FeatureHeightLeg Length
Height1.000.98
Leg Length0.981.00

In this scenario, the model cannot distinguish between the impact of Height vs. Leg Length.

Detection: Variance Inflation Factor (VIF)

VIF measures how much the variance of an estimated regression coefficient increases when predictors are correlated. To calculate VIF for feature Xⱼ:

  1. Run a regression of Xⱼ against all other predictors (X₁, X₂, ..., Xⱼ₋₁, Xⱼ₊₁, ...).
  2. Find the value for that specific regression (R²ⱼ).
  3. Calculate: VIFⱼ = 1 / (1 - R²ⱼ)

Rule of thumb: VIF > 5 or 10 indicates severe multicollinearity.

Solutions:

  • Remove one of the highly correlated variables.
  • Use PCA to combine correlated features into uncorrelated principal components.
  • Use Regularization techniques like Ridge Regression (L2) or LASSO (L1).

Ready to test your Multicollinearity & VIF knowledge?

Multicollinearity & VIF

Understand the concept of Multicollinearity in regression models, how to detect it using VIF, and how to resolve it.

5 questions·No time limit·Instant feedback