Machine Learning
Supervised Learning: Classification / Naive Bayes
4. Naive Bayes Classifier
Naive Bayes is a probabilistic classifier based on Bayes' Theorem with a strong ("naive") independence assumption: all features are assumed to be conditionally independent given the class label.
Bayes' Theorem
- P(C|X): Posterior probability of class C given features X.
- P(X|C): Likelihood — probability of features X given class C.
- P(C): Prior probability of class C.
- P(X): Evidence (normalizing constant, same for all classes).
Types of Naive Bayes
Gaussian
Features are continuous. Assumes a normal distribution.
Multinomial
Features are discrete counts. Common for text classification (word frequencies).
Bernoulli
Features are binary (0 or 1). Models presence/absence of features.
Solved Example: Weather Dataset
Problem: Predict if the player should play when the weather outlook is Sunny.
(Dataset: 7 total instances. 6 Yes, 1 No. Outlook Sunny appears 2 times in Yes, 1 time in No.)
- Priors: P(Play=Yes) = 6/7 ≈ 0.857, P(Play=No) = 1/7 ≈ 0.143
- Likelihoods: P(Sunny|Yes) = 2/6 ≈ 0.333, P(Sunny|No) = 1/1 = 1.0
- Apply Theorem (ignore evidence):
- P(Yes|Sunny) ∝ 0.333 × 0.857 ≈ 0.2857
- P(No|Sunny) ∝ 1.0 × 0.143 ≈ 0.143
- Normalize: Total = 0.2857 + 0.143 = 0.4287
- P(Yes|Sunny) = 0.2857 / 0.4287 ≈ 0.667 (66.7%)
- P(No|Sunny) = 0.143 / 0.4287 ≈ 0.333 (33.3%)
Conclusion: Player SHOULD PLAY.
Laplace Smoothing
If a feature value never appears in a class during training, its probability is 0, which zeros out the entire product. Solution: Add a small constant α (usually 1) to all counts.