Handling Outliers

Learn how to handle outliers using Python.

How to handle outliers

There are many methods of handling outliers in a dataset. Here are a few of them:

  • Ignoring outliers: We can investigate outlier records to determine whether they are genuine. If they are, we can leave them for further data analysis.

  • Removing outliers: We can remove the records that are considered outliers from a dataset. We perform this operation when removing them doesn't significantly impact further data analysis.

  • Imputing outliers: We can replace outlier values with a particular value, such as the mean or median of the dataset. Generally, we perform this operation when the goal is to maintain the size and representativeness of the dataset.

  • Applying log transformation: We can use this method when the outliers exist in a skewed dataset. Instead of removing outliers, we use a log transformation on the variable that contains the outliers to make the data more normally distributed.

Get hands-on with 1200+ tech skills courses.