Pipeline

In this lesson, learn how to use a pipeline to combine different operations.

A full Machine Learning project involves many steps; data cleaning, data processing, feature transformation, dimension reduction, feature extraction, model build, model training, model evaluation, and so on. If you look at it from a data flow perspective, the output of the last step is often the input of the next step. If we can connect these steps, it can not only make our steps clearer but also reduce our re-work and improve efficiency.

sklearn provides a very useful module, pipeline, that allows you to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data like feature selection, normalization, and classification. The module is very simple and only contains a few functions. Let’s see how to use it.

Get hands-on with 1200+ tech skills courses.