Anomaly Detection with PyCaret

In anomaly detection tasks, learn how to import necessary libraries and load datasets in PyCaret.

Anomaly detection is one of the main tasks in unsupervised machine learning. Its goal is to identify dataset instances that differ significantly from the majority. Those instances are known as outliers. There are various incentives to detect them depending on the context and domain of each application. There are also semi-supervised and fully supervised methods for anomaly detection, but we’ll focus on the unsupervised approach. Local outlier factor is one of the main anomaly detection models defined in the following equation.

LOFk(A)=BNk(A)lrdk(B)lrdk(A)Nk(A)\text{LOF}_{k}(A)=\frac{\sum_{B\in N_k(A) \frac{\text{lrd}_{k}(B)}{\text{lrd}_{k}(A)}}}{|N_{k}(A)|}

  • LOFk(A)\text{LOF}_{k}(A) is the local outlier factor of the dataset instance AA.
  • lrdk(A)\text{lrd}_{k}(A) is the local reachability density of AA.
  • kk is the number of nearest neighbors to AA.
  • Nk(A)N_{k}(A) is the set of kk nearest neighbors.

The local outlier factor of an instance is the average local reachability density of its neighbors divided by the local reachability of the instance itself. Values that are significantly larger than 1 indicate that the instance is an outlier, while smaller values suggest an inlier.

Anomaly detection using PyCaret

As we mentioned earlier, the local outlier factor is a popular anomaly detection model, but numerous others are available as well: Isolation forest, k-nearest neighbors detector, subspace outlier detection, and clustering-based local outlier. In the rest of this chapter, we’ll see how we can train and plot an anomaly detection model using the PyCaret library.

Importing the necessary libraries

We’ll import the libraries necessary for this project: pandas, Matplotlib, Seaborn, and the PyCaret Anomaly Detection module. We’ll also import the get_data()function to load the dataset of our preference. Finally, we’ll set the Matplotlib figure DPI to 300 to get high-quality images for this course.

Get hands-on with 1200+ tech skills courses.