Fraud Detection: Serving & Monitoring

Explore how to design a fraud detection system that serves real-time predictions within a 50ms latency budget using a three-stage pipeline. Learn strategies for continuous model monitoring to detect adversarial drift despite delayed ground truth labels. Understand the integration of feature retrieval, model scoring, and decision logic with dynamic routing and feedback loops that maintain accuracy and reliability under adversarial conditions.

We'll cover the following...

Real-time scoring pipeline design
Decision engine routing logic
Model staleness monitoring
- Three complementary monitoring signals
  - How the pyramid connects to retraining
L4, L5, and Staff+ answer depth
Summary

In the previous lesson, you built the evaluation methodology for fraud detection, including asymmetric cost functions, temporal validation splits, and champion-challenger testing. But a model that scores well offline is worthless if it cannot return a decision before the payment network times out. The operational challenge now shifts to two questions that define Staff+ fraud detection interviews. First, how do you serve fraud predictions in real time, returning a block, pass, or review decision within 50ms of a transaction arriving? Second, how do you detect when adversarial drift has silently degraded the model before chargebacks spike weeks later?

These two questions test end-to-end system thinking across serving infrastructure, decision logic, and operational feedback loops. They form the two design pillars of this lesson: the serving pipeline, which is latency-critical, and the monitoring system, which is reliability-critical.

Real-time scoring pipeline design

The end-to-end pipeline breaks into three sequential stages, each allocated a strict latency budget that sums to under 50ms. The remaining approximately 20ms is reserved for network overhead, serialization, and safety margin. Think of it like a relay race where each runner has a fixed time to complete their leg, and the baton must cross the finish line before the clock runs out.

Feature retrieval (~15ms)

Feature retrieval uses a dual-path architecture that balances precomputation with real-time freshness.

Precomputed features from a low-latency feature store pull user historical aggregates, device trust scores, and merchant risk profiles from Redis or DynamoDB. These values are materialized by batch or micro-batch jobs and avoid expensive joins at inference time.
Real-time features from a streaming engine compute transaction velocity over the last five minutes, session behavioral signals, and other recency-dependent values using Flink or Kafka Streams. These signals are critical for detecting velocity-based attacks where a stolen card is used dozens of times in rapid succession.

This split exists because precomputed features provide stable context cheaply, while real-time features capture the fast-moving signals that distinguish a burst of ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Fraud Detection: Serving & Monitoring

Real-time scoring pipeline design

Feature retrieval (~15ms)