Machine Learning

Unit 5: Trends and Applications / Online Fraud Detection


7. Online Fraud Detection

Fraud Detection uses machine learning to automatically flag fraudulent transactions (credit card fraud, account takeover, phishing) in real-time. It is essential because fraudsters constantly evolve tactics, rendering static rules obsolete.

The Biggest Challenge: Class Imbalance

Fraudulent transactions typically represent less than 0.1% to 1% of all transactions. A naive model predicting "Not Fraud" for everything would achieve 99.9% accuracy but fail completely at its job. Accuracy is a misleading metric here—we must use Precision-Recall curves or F1-Scores.

Deep Dive: Solutions for Class Imbalance
TechniqueDescription
Oversampling (SMOTE)Synthetically generate new minority (fraud) samples to balance the dataset.
UndersamplingRandomly reduce the majority class (normal transactions) to match the minority class.
Cost-Sensitive LearningAssign high "Class Weights" to fraud cases, severely penalizing the algorithm when it misclassifies them.

Machine Learning Approaches

Supervised Learning

Used when rich historical labels are available.

  • Logistic Regression: Fast, highly interpretable baseline.
  • XGBoost / LightGBM: State-of-the-art for tabular financial data. Highly accurate.
Anomaly Detection

Used to catch completely new, unseen fraud patterns.

  • Isolation Forest: Isolates anomalies as they require fewer feature splits to separate.
  • Autoencoders: Learns normal behavior; high reconstruction error flags fraud.
Graph Neural Networks (GNNs)

Financial fraud often involves networks of colluding parties (money mule networks). GNNs are uniquely suited to detect these topological anomalies by analyzing the relationship connections between accounts rather than just isolated transaction features.