The Standard ML Pipeline
Explore the standard machine learning pipeline by examining its six critical steps from data preparation to model validation and industry standards. Understand how errors arise and learn to identify potential sources of failure such as data quality issues, bias amplification, and security vulnerabilities. This lesson develops foundational knowledge for managing and mitigating risks in ML workflows.
We'll cover the following...
We all know a bit about ML and data science by now, but how exactly do industry professionals turn a dataset into a production-ready application?
We call this the ML pipeline, and, while there’s no set standard of steps, we usually break the procedure down into six steps (detailed in the graphic below).
We’ll dive into each of these steps in this lesson and cover the operations that are typically performed during it. In the next lesson, we’ll discuss how these operations can sometimes become sources of disasters that create irreversible damage to the pipeline and therefore to the team and the company.
Data preparation
Once a dataset is acquired, steps are taken to convert the raw data into something that a model can understand. This typically involves feature engineering (i.e., deciding how to break apart or combine columns into more meaningful variables), data cleaning, dimensionality reduction (e.g., principal ...