Search⌘ K
AI Features

Fraud Detection: Serving & Monitoring

Explore how to design a fraud detection system that serves real-time predictions within a 50ms latency budget using a three-stage pipeline. Learn strategies for continuous model monitoring to detect adversarial drift despite delayed ground truth labels. Understand the integration of feature retrieval, model scoring, and decision logic with dynamic routing and feedback loops that maintain accuracy and reliability under adversarial conditions.

In the previous lesson, you built the evaluation methodology for fraud detection, including asymmetric cost functions, temporal validation splits, and champion-challenger testing. But a model that scores well offline is worthless if it cannot return a decision before the payment network times out. The operational challenge now shifts to two questions that define Staff+ fraud detection interviews. First, how do you serve fraud predictions in real time, returning a block, pass, or review decision within 50ms of a transaction arriving? Second, how do you detect when adversarial drift has silently degraded the model before chargebacks spike weeks later?

These two questions test end-to-end system thinking across serving infrastructure, decision logic, and operational feedback loops. They form the two design pillars of this lesson: the serving pipeline, which is latency-critical, and the monitoring system, which is reliability-critical.

Real-time scoring pipeline design

The end-to-end pipeline breaks into three sequential stages, each allocated a strict latency budget that sums to under 50ms. The remaining approximately 20ms is reserved for network overhead, serialization, and safety margin. Think of it like a relay race where each runner has a fixed time to complete their leg, and the baton must cross the finish line before the clock runs out.

Feature retrieval (~15ms)

Feature retrieval uses a dual-path architecture that balances precomputation with real-time freshness.

  • Precomputed features from a low-latency feature store pull user historical aggregates, device trust scores, and merchant risk profiles from Redis or DynamoDB. These values are materialized by batch or micro-batch jobs and avoid expensive joins at inference time.

  • Real-time features from a streaming engine compute transaction velocity over the last five minutes, session behavioral signals, and other recency-dependent values using Flink or Kafka Streams. These signals are critical for detecting velocity-based attacks where a stolen card is used dozens of times in rapid succession.

This split exists because precomputed features provide stable context cheaply, while real-time features capture the fast-moving signals that distinguish a burst of ...