Fraud Detection: Model Architecture

Explore the design of a fraud detection system combining gradient boosted trees, GraphSAGE embeddings, and deterministic rules to achieve fast, scalable, and accurate detection under latency constraints. Understand the role of each component, how they integrate, and how latency budgets are allocated for production-ready fraud scoring.

We'll cover the following...

Gradient boosted trees for tabular scoring
- Why GBTs dominate fraud scoring
- GraphSAGE for fraud ring detection
  - Serving strategy for graph embeddings
Rule engine and ML hybrid architecture
- Execution order and threshold logic
Latency budget allocation
Summary

The previous lesson established five feature families and an entity graph that capture the behavioral, velocity, device, geographic, and network signals surrounding every transaction. Now the design challenge shifts from what the model sees to how it scores. Given these features, what model architecture can evaluate a transaction in under 50 ms at p95 while catching both individual fraud and coordinated rings? This is exactly the kind of question that surfaces in ML system design interviews, and the answer is not a single model.

Production fraud systems at companies like Stripe and PayPal converge on a three-component architecture. A deterministic rule engine handles known patterns instantly. A gradient boosted tree model performs real-time tabular scoring. And a graph neural network detects network-level fraud rings that no single-transaction model can see. This lesson walks through why each component exists, how they compose into a unified scoring pipeline, and how to allocate the latency budget across them.

Gradient boosted trees for tabular scoring

Why GBTs dominate fraud scoring

Deep learning has transformed vision and language, but for tabular fraud data, gradient boosted trees (XGBoost, LightGBM) remain the production workhorse. Three properties explain this dominance.

Single-digit millisecond CPU inference: A trained GBT ensemble evaluates a feature vector in 1–5 ms on commodity CPUs, requiring no GPU infrastructure for serving.
Native handling of heterogeneous features: Categorical merchant category codes, continuous dollar amounts, and binary device-trust flags all feed directly into tree splits without one-hot encoding or normalization.
Built-in feature importance for regulatory compliance: Tree-based models produce feature importance rankings natively, satisfying interpretability requirements from regulators who demand explanations for declined transactions.

Gradient boosted trees build decision trees ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Fraud Detection: Model Architecture

Gradient boosted trees for tabular scoring

Why GBTs dominate fraud scoring