Machine Learning

Supervised Learning: Classification / Decision Trees

5. Decision Tree Algorithm

A Decision Tree is a tree-structured classifier where internal nodes represent feature tests, branches represent the outcome of tests, and leaf nodes represent class labels. It creates a model that predicts the value of a target variable by learning decision rules from the features.

Tree Building Algorithm (Top-Down Recursive)

Start with the entire training dataset at the root node.
Calculate the impurity (e.g., Entropy or Gini Index) of the current dataset.
Iterate through all available features and calculate the Information Gain (or reduction in impurity) if the data were split on each feature.
Select the feature that yields the highest Information Gain (or lowest Gini Index) as the decision node.
Split the dataset into subsets based on the selected feature's values.
Recursively repeat steps 2-5 for each subset until a stopping criterion is met (e.g., all instances in a node belong to the same class, or max depth is reached).

Attribute Selection Measures

The key challenge is deciding which feature to split on at each node.

A) Information Gain (ID3 Algorithm)

Based on entropy (measure of impurity/disorder):

Entropy(S) = −Σ pᵢ log₂(pᵢ)

IG(S, A) = Entropy(S) − Σ (|Sᵥ|/|S|) × Entropy(Sᵥ)

B) Gini Index (CART Algorithm)

Measures the probability of a variable being wrongly classified. Lower is better (more pure).

Gini(S) = 1 − Σ pᵢ²

C) Gain Ratio (C4.5 Algorithm)

Normalizes Information Gain to prevent bias toward attributes with many distinct values.

Gain Ratio(S, A) = IG(S, A) / Split Info(S, A)

Tree Pruning

Overgrown trees overfit the training data. Pruning reduces tree complexity:

Pre-pruning (Early Stopping): Stop growing before it perfectly classifies training data (e.g., set max depth).
Post-pruning: Grow full tree, then remove subtrees that do not improve validation accuracy.

Naive Bayes Support Vector Machine (SVM)