Data Privacy, Compliance & ML

Explore essential data privacy and compliance principles in machine learning system design. Learn to navigate GDPR and other regulations, implement machine unlearning methods, apply differential privacy and federated learning, and handle PII securely. This lesson equips you to design ML systems that respect privacy constraints and meet regulatory demands effectively.

We'll cover the following...

The regulatory landscape
Machine unlearning
- Exact unlearning with SISA training
- Approximate unlearning
Differential privacy and DP-SGD
- The privacy budget and DP-SGD mechanics
Federated learning
- FedAvg and real-world deployments
PII handling in features and embeddings
Summary

When a user in Berlin submits a deletion request to your recommendation system, that request doesn’t just wipe a row from a database. It ripples through feature stores, embedding tables, and trained model parameters. If your system wasn’t designed for this from the start, you’re facing a full model retrain that could take days of GPU time and still leave you non-compliant. In MAANG ML system design interviews, privacy has shifted from a footnote to an architectural constraint that shapes every component, from data ingestion to model serving.

This lesson covers five pillars that interviewers expect you to reason about fluently. First, the regulatory landscape that defines what your system must support. Second, machine unlearning techniques that let models “forget” specific users. Third, differential privacy mechanisms that bound individual influence during training. Fourth, federated learning architectures that keep raw data on-device. Fifth, PII handling patterns that protect sensitive information in features and embeddings. Each pillar introduces specific trade-offs between privacy and utility, compute cost and compliance speed, and architectural complexity and model quality.

The regulatory landscape

Three regulations appear repeatedly in ML system design discussions, and each imposes distinct architectural constraints on how you collect, store, and train on user data.

GDPR (General Data Protection Regulation, EU) is the most impactful for ML systems. Article 17 establishes the right to be forgotten, which means a user can demand deletion of their data and its influence on any trained model. Article 5 enforces data minimization, limiting which features you can collect to only what is strictly necessary. Consent requirements constrain how training data is gathered, requiring explicit opt-in for many use cases.

CCPA (California Consumer Privacy Act) grants users opt-out rights, meaning they can request their data not be sold or used for model training. It includes deletion rights similar to GDPR, though enforcement mechanisms differ.

DMA (Digital Markets Act, EU) targets gatekeeper platforms such as Apple, Google, and Meta. It restricts cross-service data combination, directly affecting how recommendation and ads models aggregate features across products. A company cannot freely merge a user’s search history with their messaging behavior to build richer embeddings without explicit consent.

The critical insight for interviews is that a deletion request doesn’t end at the database layer. If a user’s behavioral data influenced gradient updates during training, the model itself retains a trace of that user. This motivates the need for machine unlearning, which we cover next.

The following table summarizes the key provisions and their ML system impact.

Comparison of Data Privacy Regulations and Their Impact on ML Systems

Regulation	Jurisdiction	Right to Deletion	Data Minimization Requirement	Consent Model	Key ML System Impact
GDPR	European Union (EU)	Yes, via Article 17	Yes, via Article 5	Explicit opt-in	Must support model unlearning and audit trails
CCPA	California, USA	Yes, with opt-out mechanism	Limited	Opt-out model	Must honor deletion and opt-out in training pipelines
DMA	EU gatekeeper platforms	Inherited from GDPR	Implicit via cross-service restrictions	Per-service consent required	Cannot combine cross-service user embeddings without consent

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Data Privacy, Compliance & ML

The regulatory landscape

Comparison of Data Privacy Regulations and Their Impact on ML Systems

Machine unlearning