Search⌘ K
AI Features

Dealing with Outliers

Explore methods to detect and handle different types of outliers in data, including statistical and machine learning techniques. Understand how to test for random responses and apply practical approaches like IQR and KNN with Python code examples. This lesson helps you prepare data effectively to improve model reliability.

Outliers can distort models, bias results, and hide meaningful patterns. In this lesson, we’ll build a toolkit for spotting, analyzing, and handling outliers using both statistical and machine learning-based approaches. Let’s get started.

Outlier detection

You’re asked by an interviewer: What categories of outliers are you familiar with, and what techniques do you use to detect them?

Sample answer

This question assess your ability to demonstrate that you understand the various ways outliers manifest in data. Here’s an example answer you could provide which covers the key categories of outliers that interviewers would expect you to know, along with a range of techniques used to detect them:

Categories of outliers

  • Univariate outliers: These are outliers detected in a single variable. For example, in a dataset of ages, an age of 150 would be considered an outlier.

Univariate outlier
Univariate outlier
  • Multivariate outliers: These are outliers found in combinations of two or more variables. For example, it can be a combination of height and weight that does not follow the general pattern in the data.

Multivariate outlier
Multivariate outlier
  • Contextual outliers: These outliers are abnormal in a specific context. For example, a high temperature reading may be normal in summer but an outlier in winter.

Contextual outlier
Contextual outlier
  • Collective outliers: These are groups of data points that together form an outlier, even though individual points may not be. For instance, it can be a ...

Collective outliers
Collective outliers
...