Machine Learning

Unit 6: Advanced Topics & MLOps (Bonus) / Reinforcement Learning

2. Reinforcement Learning (RL)

Unlike supervised learning (which learns from labeled data) or unsupervised learning (which finds hidden structures), Reinforcement Learning learns by trial and error in an interactive environment. It is the core technology behind self-driving cars and game-playing AIs (like AlphaGo).

The Agent-Environment Loop

Agent: The learner or decision-maker.
Environment: The world the agent interacts with.
State (S): A specific situation or configuration of the environment.
Action (A): What the agent chooses to do.
Reward (R): The feedback from the environment (positive or negative) based on the action taken.

Goal: Maximize cumulative total reward over time.

Exploration

Trying new actions to discover better strategies. (Risking current reward for future knowledge).

Exploitation

Using known actions that currently yield the highest reward. (Playing it safe).

Deep Dive: Q-Learning & The Bellman Equation

Q-Learning is a model-free RL algorithm. The "Q" stands for Quality—how good is a specific action in a specific state.

It iteratively updates a Q-Table using the Bellman Equation:

Q(S, A) ← Q(S, A) + α [ R + γ × max(Q(S', A')) - Q(S, A) ]

α (Alpha): Learning Rate. How much new information overrides old.
γ (Gamma): Discount Factor. How much we care about future rewards vs immediate rewards.
max(Q(S', A')): The highest expected Q-value for the next state S'.

Deep Learning Fundamentals Natural Language Processing