Search⌘ K
AI Features

The Complete Pipeline

Explore how to combine core components into a complete machine learning training pipeline. Understand the use of factory design patterns for dataset and model management to simplify code extension and dependency handling. Learn to orchestrate tasks, parse arguments, log events, and track experiments in a scalable ML pipeline.

We now have the following components of our pipeline:

  • The pipeline core

    • Argument parsing

    • Artifacts and their versioning

    • Logging

  • The ML library

    • The dataset module

    • The model module

    • Report generation

We need two more pieces of code before we can run the complete pipeline. Both of these conform to the factory design pattern.

The factory design pattern

In software engineering, a design pattern is a particular way to solve a frequently encountered problem. This will become clear when we discuss the factory design pattern and how it applies to datasets and models in our pipeline.

We’ve seen the abstract base class Dataset, from which we derived IrisDataset. We can use this class directly in our code, as shown below.

Python 3.8
from ml_pipeline.datasets import iris
dataset = iris.IrisDataset("data/iris.csv")

But what if we have different datasets in our pipeline? Remember that our goal was to build a pipeline that can extend to other projects, so ...