Feature Stores
Explore the role of feature stores in machine learning systems. Understand how they ensure point-in-time correctness to prevent data leakage, minimize training-serving skew through unified feature definitions, and promote feature reuse via centralized registries. Gain insight into common tools and learn how to effectively discuss feature store architecture in ML system design interviews.
In a fraud detection system at a company like Stripe or PayPal, dozens of models consume overlapping features such as transaction velocity, merchant risk scores, and user behavioral aggregates. Each team that needs these features typically re-implements the computation logic independently. One team writes a Spark job to compute a user’s 30-day transaction count for a credit risk model. Another team implements the same feature in a separate code path for a fraud classifier. The feature values diverge silently because the implementations handle nulls, time zones, or window boundaries differently. Model performance degrades in production, and the root cause is hard to trace because the feature lineage is split across implementations.
This is the exact problem that feature stores exist to solve. The previous lesson established that production ML systems combine batch, streaming, and request-time pipelines. A feature store is the centralized infrastructure layer where these heterogeneous pipelines converge, providing a unified interface for ingesting, storing, and serving features to both training and inference. It solves three core problems that interviewers expect you to articulate when discussing production ML infrastructure: point-in-time correctness, training-serving skew, and feature reuse.
Let’s examine the first and most critical correctness guarantee a feature store provides.
Point-in-time correctness
Point-in-time correctness is the guarantee that every training example uses only feature values that were actually available at the moment the label event occurred. No future data leaks into the past.
Why does leakage happen without temporal joins?
Consider training a credit default model. Each training example represents a loan application, and the label indicates whether the borrower defaulted. One of the features is the user’s 30-day transaction count. If the feature store performs a naive join on user_id without respecting timestamps, it retrieves the transaction count as of today, not as of the loan application date. That count includes post-application transactions, some of which may even reflect the default event itself. The model trains on information it could never access at serving time, inflating offline AUC while collapsing in production. ...