Machine Learning

Unit 5: Trends and Applications / Ensemble Learning

1. Ensemble Learning

Ensemble learning is a paradigm where multiple models (weak learners) are trained and their predictions are combined to produce a final, more accurate and robust prediction. It is based on the wisdom of crowds.

A. Bagging (Bootstrap Aggregating)

Models are trained in parallel on random subsets (with replacement) of the training data. The final prediction is made by averaging (Regression) or majority voting (Classification).

Objective: Reduce Variance (Overfitting).
Example: Random Forest.
OOB Error: The ~36.8% of data left out of a bootstrap sample can be used to estimate validation error natively.

B. Boosting

Models are trained sequentially. Each new model focuses on correcting the errors made by the previous models by assigning higher weights to misclassified instances.

Objective: Reduce Bias (Underfitting).
Example: AdaBoost, Gradient Boosting (XGBoost).

2. Randomization in ML

Randomization introduces diversity among base learners. If all models make the same errors, the ensemble fails.Random Forest uses Data Randomization (Bootstrap) AND Algorithmic Randomization (random feature subset at each node split).ExtraTrees takes this further by also picking random split thresholds.

Deep Dive: AdaBoost (Adaptive Boosting) Algorithm

AdaBoost trains a sequence of weak learners. Misclassified instances receive exponentially higher weights in the next round.

Initialize weights: wᵢ = 1/N for all N instances.
For t = 1 to T:
- Train weak learner hₜ on weighted data.
- Calculate weighted error (eₜ).
- Calculate learner weight (αₜ):
  αₜ = 0.5 × ln((1 - eₜ) / eₜ)
- Update sample weights: Increase if misclassified, decrease if correct.
Final Prediction: Weighted vote of all T learners.

Deep Dive: Gradient Boosting

Instead of sample weights, Gradient Boosting fits each new model to the RESIDUAL ERRORS (actual - predicted) of the current ensemble.

Step 1: Initialize with a constant (e.g., mean).
Step 2: Compute residuals (negative gradient of loss function).
Step 3: Fit a new shallow tree to predict these residuals.
Step 4: Update ensemble: new_pred = old_pred + (learning_rate × new_tree)

Popular Implementations: XGBoost (Regularization, fast), LightGBM (Leaf-wise growth), CatBoost (Categorical features).

Back to Unit 5: Trends and Applications Image Recognition