Advanced outlier detection
The purpose of this lesson is to cover some advanced methods and techniques to help you detect outliers in your data. We will explore how each technique works in detail, and of course, some easy code snippets.
Advanced outlier detection
Previously in the course, we talked about various methods we can use to detect and handle outliers, but we used only statistical measurements to denote the outliers. However, this section will cover some advanced algorithms to detect these anomalies in datasets better.
DBSCAN
DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. It is a popular clustering method used in machine learning to separate high-density clusters from clusters of low density. It is similar to K-means, except the number of clusters is not specified in advance. DBSCAN clustering is known for being robust to outliers.
We need to choose two hyperparameters:
- A positive number epsilon: is for the maximum distance between two samples for one to be considered in the neighborhood of the other.
- A natural number min_samples: is the number of samples in a neighborhood for a point to be considered a core
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy