Data Preprocessing

In this lesson, we present some useful methods for data preprocessing.

In the real world, data is not perfect. You need to spend a lot of time on data preprocessing, such as cleaning, scaling, normalizing, etc. Data preprocessing may be the most important step in the entire Machine Learning process. You may have heard the phrase "Garbage in, garbage out". If the data quality is not high, no matter how fancy the model is, an ideal result will not be achieved. Typically, for most engineers, 70 percent of the time is spent processing data.

The preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.

Notice: There are many preprocessing types. In this lesson, we will cover some of the most commonly used methods. If you want to learn more, just launch the Jupyter file at the end of this lesson.

Get hands-on with 1200+ tech skills courses.