Search⌘ K
AI Features

Pipeline

Explore how to integrate multiple machine learning steps such as feature selection, dimension reduction, and model training into one efficient workflow using Scikit-Learn's Pipeline and FeatureUnion. Understand how chaining transformers and estimators improves clarity and reduces rework in your projects.

A full Machine Learning project involves many steps; data cleaning, data processing, feature transformation, dimension reduction, feature extraction, model build, model training, model evaluation, and so on. If you look at it from a data flow perspective, the output of the last step is often the input of the next step. If we can connect these steps, it can not only make our steps clearer but also reduce our re-work and improve efficiency.

sklearn provides a very useful module, pipeline, that allows you to chain multiple estimators into one. This is useful as there is often a fixed sequence of steps in processing the data like feature selection, normalization, and classification. The module is very simple and only contains a few functions. Let’s see how to use it.

Combine features from different spaces

The purpose of FeatureUnion is to combine features from ...