Fraud Detection: Model Architecture
Explore the design of a fraud detection system combining gradient boosted trees, GraphSAGE embeddings, and deterministic rules to achieve fast, scalable, and accurate detection under latency constraints. Understand the role of each component, how they integrate, and how latency budgets are allocated for production-ready fraud scoring.
The previous lesson established five feature families and an entity graph that capture the behavioral, velocity, device, geographic, and network signals surrounding every transaction. Now the design challenge shifts from what the model sees to how it scores. Given these features, what model architecture can evaluate a transaction in under 50 ms at p95 while catching both individual fraud and coordinated rings? This is exactly the kind of question that surfaces in ML system design interviews, and the answer is not a single model.
Production fraud systems at companies like Stripe and PayPal converge on a three-component architecture. A deterministic rule engine handles known patterns instantly. A gradient boosted tree model performs real-time tabular scoring. And a graph neural network detects network-level fraud rings that no single-transaction model can see. This lesson walks through why each component exists, how they compose into a unified scoring pipeline, and how to allocate the latency budget across them.
Gradient boosted trees for tabular scoring
Why GBTs dominate fraud scoring
Deep learning has transformed vision and language, but for tabular fraud data, gradient boosted trees (XGBoost, LightGBM) remain the production workhorse. Three properties explain this dominance.
Single-digit millisecond CPU inference: A trained GBT ensemble evaluates a feature vector in 1–5 ms on commodity CPUs, requiring no GPU infrastructure for serving.
Native handling of heterogeneous features: Categorical merchant category codes, continuous dollar amounts, and binary device-trust flags all feed directly into tree splits without one-hot encoding or normalization.
Built-in feature importance for regulatory compliance: Tree-based models produce feature importance rankings natively, satisfying interpretability requirements from regulators who demand explanations for declined transactions.
Gradient boosted trees build decision trees ...