Building a Machine Learning Pipeline from Scratch/

...

Datasets

Learn how to build the dataset module.

We'll cover the following...

CSVMixin
IrisDataset
The complete code
- Try it yourself

Press + to interact

From these diagrams, we see that the first step in the development of the pipeline is to create an abstract base class called Dataset from which all other dataset classes derive. This base class should have four abstract methods—load, preprocess, feature_engineer, and save—corresponding to the blocks shown above. Note that each of these methods, with the exception of save, also corresponds to a task in the pipeline. The save method is a utility method that we’ll examine later.

The ML project we’re working on is iris classification, so we need a dataset for ...

Introduction

Getting Started

Structuring the ML Pipeline

Directed Acyclic Graphs (DAGs)

The ML Library

Create Your First Data Pipeline with a Dashboard

The Pipeline Core

Extending the Pipeline

Build a News ETL Data Pipeline Using Python and SQLite

Testing

Deployment

Other Considerations

Wrapping Up

Appendix

Final Assessment

Datasets