Handling Outliers
Learn how to handle outliers using Python.
We'll cover the following
How to handle outliers
There are many methods of handling outliers in a dataset. Here are a few of them:
Ignoring outliers: We can investigate outlier records to determine whether they are genuine. If they are, we can leave them for further data analysis.
Removing outliers: We can remove the records that are considered outliers from a dataset. We perform this operation when removing them doesn't significantly impact further data analysis.
Imputing outliers: We can replace outlier values with a particular value, such as the mean or median of the dataset. Generally, we perform this operation when the goal is to maintain the size and representativeness of the dataset.
Applying log transformation: We can use this method when the outliers exist in a skewed dataset. Instead of removing outliers, we use a log transformation on the variable that contains the outliers to make the data more normally distributed.
Get hands-on with 1400+ tech skills courses.