Fraud Detection: Data Strategy & Feature Engineering
Explore data strategies and feature engineering essential for real-time fraud detection systems. Understand key feature families such as transaction and behavioral features, device fingerprinting, velocity counters, and graph-based features that reveal coordinated fraud rings. Learn how to address challenges like chargeback lag through tiered labeling and prevent data leakage using temporal holdout methods to ensure accurate offline evaluations and robust production performance.
With the sub-50ms latency constraint, asymmetric cost structure, and rare event framing established in the previous lesson, the central design question becomes concrete: what data and features does the real-time scoring path actually need? In an interview setting, this is where you demonstrate depth. A strong candidate enumerates feature families, justifies each one’s fraud signal, and explains how features are computed within the latency budget. Production fraud detection systems at companies like Stripe and PayPal rely on four core feature families working in concert: transaction features, behavioral features, device fingerprinting, and velocity features. Beyond these tabular signals, graph-based features unlock detection of coordinated fraud rings that no individual-transaction feature can catch. Feature engineering is where most fraud detection interviews are won or lost, because it reveals whether you understand adversarial dynamics. Fraudsters actively adapt to evade the very features you design.
Transaction and behavioral features
Transaction features are the raw attributes available the instant a payment request arrives. These include the transaction amount, currency,
Behavioral features transform raw transaction data into discriminative signals by adding user-level context. They capture what is “normal” for a specific user, so any deviation becomes informative. Examples include average transaction amount over the last 30 days, typical merchant categories, usual transaction times, and the geographic centroid of recent purchases.
Consider a $3,000 purchase at 3 AM from a new merchant category far from the user’s typical location. Each deviation alone might be explainable, but stacking multiple behavioral anomalies produces a strong composite signal. These behavioral features are pre-computed in the feature store and retrieved at scoring time as user profile vectors, keeping the real-time path within the latency budget. Cold-start users who lack transaction history require fallback strategies such as population-level baselines or cohort-based profiles.
Attention: Cold-start users are disproportionately targeted by fraudsters because newly created accounts lack the behavioral history needed to detect anomalies. Default to stricter thresholds for accounts with fewer than 10 transactions.
The following table summarizes ...