Machine Learning

Supervised Learning: Classification / Logistic Regression


2. Logistic Regression

Logistic Regression is a statistical method used for binary classification despite its name containing "regression". It models the probability that an input belongs to a particular class using the logistic (sigmoid) function.

The Sigmoid Function

The sigmoid function maps any real-valued number into a value between 0 and 1:

σ(z) = 1 / (1 + e⁻ᶻ)

Where z = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ (the linear combination of features).

Decision Boundary

Logistic regression outputs a probability P(y=1|x). A threshold (usually 0.5) is applied:

  • If P(y=1|x) ≥ 0.5 → Predict class 1
  • If P(y=1|x) < 0.5 → Predict class 0

Cost Function & Training

The cost function used is Log Loss (Binary Cross-Entropy):

J(w) = −(1/m) Σ [y·log(ŷ) + (1−y)·log(1−ŷ)]

Weights are updated using gradient descent to minimize the cost function:

wⱼ = wⱼ − α · (∂J/∂wⱼ)
Gradient Descent Algorithm Steps:
  1. Initialize weights: Start with random weights or zeros for w and bias b.
  2. Forward Pass: Compute the linear combination z = w·x + b and pass it through the sigmoid function to get predictions ŷ.
  3. Compute Cost: Calculate the Log Loss J(w) to see how far off predictions are from actual labels.
  4. Backward Pass (Gradients): Compute the partial derivatives of the cost function with respect to each weight: ∂J/∂wⱼ = (1/m) Σ (ŷ - y)·xⱼ.
  5. Update Weights: Adjust the weights in the opposite direction of the gradient: wⱼ = wⱼ - α(∂J/∂wⱼ), where α is the learning rate.
  6. Repeat: Iterate steps 2-5 until convergence (cost stops decreasing).

Multinomial Logistic Regression

For multi-class classification (more than 2 classes), the softmax function is used instead of sigmoid:

P(y=k|x) = e^(wₖᵀx) / Σⱼ e^(wⱼᵀx)