Machine Learning
Unit 5: Trends and Applications / Online Fraud Detection
7. Online Fraud Detection
Fraud Detection uses machine learning to automatically flag fraudulent transactions (credit card fraud, account takeover, phishing) in real-time. It is essential because fraudsters constantly evolve tactics, rendering static rules obsolete.
The Biggest Challenge: Class Imbalance
Fraudulent transactions typically represent less than 0.1% to 1% of all transactions. A naive model predicting "Not Fraud" for everything would achieve 99.9% accuracy but fail completely at its job. Accuracy is a misleading metric here—we must use Precision-Recall curves or F1-Scores.
Deep Dive: Solutions for Class Imbalance
| Technique | Description |
|---|---|
| Oversampling (SMOTE) | Synthetically generate new minority (fraud) samples to balance the dataset. |
| Undersampling | Randomly reduce the majority class (normal transactions) to match the minority class. |
| Cost-Sensitive Learning | Assign high "Class Weights" to fraud cases, severely penalizing the algorithm when it misclassifies them. |
Machine Learning Approaches
Supervised Learning
Used when rich historical labels are available.
- Logistic Regression: Fast, highly interpretable baseline.
- XGBoost / LightGBM: State-of-the-art for tabular financial data. Highly accurate.
Anomaly Detection
Used to catch completely new, unseen fraud patterns.
- Isolation Forest: Isolates anomalies as they require fewer feature splits to separate.
- Autoencoders: Learns normal behavior; high reconstruction error flags fraud.
Graph Neural Networks (GNNs)
Financial fraud often involves networks of colluding parties (money mule networks). GNNs are uniquely suited to detect these topological anomalies by analyzing the relationship connections between accounts rather than just isolated transaction features.