Dealing with Outliers
Tackle various questions about outlier detection that interviewees can expect in technical interviews.
Outliers can distort models, bias results, and hide meaningful patterns. In this lesson, we’ll build a toolkit for spotting, analyzing, and handling outliers using both statistical and machine learning-based approaches. Let’s get started.
Outlier detection
You’re asked by an interviewer: What categories of outliers are you familiar with, and what techniques do you use to detect them?
Sample answer
This question assess your ability to demonstrate that you understand the various ways outliers manifest in data. Here’s an example answer you could provide which covers the key categories of outliers that interviewers would expect you to know, along with a range of techniques used to detect them:
Categories of outliers
Univariate outliers: These are outliers detected in a single variable. For example, in a dataset of ages, an age of 150 would be considered an outlier.
Multivariate outliers: These are outliers found in combinations of two or more variables. For example, it can be a combination of height and weight that does not follow the general pattern in the data.
Contextual outliers: These outliers are abnormal in a specific context. For example, a high temperature reading may be normal in summer but an outlier in winter.
Collective outliers: These are groups of data points that together form an outlier, even though individual points may not be. For instance, it can be a sudden spike in sales during an otherwise steady period.
Outlier detection techniques
Statistical methods:
Z-score analysis: This technique standardizes data and identifies outliers based on how many standard deviations a data point is from the mean.
Interquartile Range (IQR): Outliers are detected by examining the ...