Machine Learning

Supervised Learning: Classification / Naive Bayes


4. Naive Bayes Classifier

Naive Bayes is a probabilistic classifier based on Bayes' Theorem with a strong ("naive") independence assumption: all features are assumed to be conditionally independent given the class label.

Bayes' Theorem

P(C|X) = [ P(X|C) × P(C) ] / P(X)
  • P(C|X): Posterior probability of class C given features X.
  • P(X|C): Likelihood — probability of features X given class C.
  • P(C): Prior probability of class C.
  • P(X): Evidence (normalizing constant, same for all classes).

Types of Naive Bayes

Gaussian

Features are continuous. Assumes a normal distribution.

P(xᵢ|C) = (1 / √(2πσ²)) * exp( -(xᵢ-μ)² / 2σ² )
Multinomial

Features are discrete counts. Common for text classification (word frequencies).

Bernoulli

Features are binary (0 or 1). Models presence/absence of features.

Solved Example: Weather Dataset

Problem: Predict if the player should play when the weather outlook is Sunny.
(Dataset: 7 total instances. 6 Yes, 1 No. Outlook Sunny appears 2 times in Yes, 1 time in No.)

  1. Priors: P(Play=Yes) = 6/7 ≈ 0.857, P(Play=No) = 1/7 ≈ 0.143
  2. Likelihoods: P(Sunny|Yes) = 2/6 ≈ 0.333, P(Sunny|No) = 1/1 = 1.0
  3. Apply Theorem (ignore evidence):
    • P(Yes|Sunny) ∝ 0.333 × 0.857 ≈ 0.2857
    • P(No|Sunny) ∝ 1.0 × 0.143 ≈ 0.143
  4. Normalize: Total = 0.2857 + 0.143 = 0.4287
    • P(Yes|Sunny) = 0.2857 / 0.4287 ≈ 0.667 (66.7%)
    • P(No|Sunny) = 0.143 / 0.4287 ≈ 0.333 (33.3%)

Conclusion: Player SHOULD PLAY.

Laplace Smoothing

If a feature value never appears in a class during training, its probability is 0, which zeros out the entire product. Solution: Add a small constant α (usually 1) to all counts.