Building the Model
Explore how to build anomaly detection models using PyCaret by creating and assigning models like the local outlier factor to datasets. Understand how to separate inliers from outliers, evaluate model effectiveness through skewness analysis, visualize results with scatter plots and UMAP, and save models for future use. This lesson equips you to handle anomaly detection practically and effectively in Python.
Creating and assigning the model
We use the create_model() function to train the local outlier factor model on the Wholesale Customers dataset. After that, we assign anomaly labels and scores to the dataset using the assign_model() function.
Channel | Region | Fresh | Milk | Grocery | Frozen | Detergents_Paper | Delicassen | Anomaly | Anomaly_Score | |
0 | Retail | Other | 12669 | 9656 | 7561 | 214 | 2674 | 1338 | 0 | 1.107687 |
1 | Retail | Other | 7057 | 9810 | 9568 | 1762 | 3293 | 1776 | 0 | 1.027102 |
2 | Retail | Other | 6353 | 8808 | 7684 | 2405 | 3516 | 7844 | 0 | 1.398439 |
3 | Horeca | Other | 13265 | 1196 | 4221 | 6404 | 507 | 1788 | 0 | 1.200384 |
4 | Retail | Other | 22615 | 5410 | 7198 | 3915 | 1777 | 5185 | 0 | 1.164052 |
5 | Retail | Other | 9413 | 8259 | 5126 | 666 | 1795 | 1451 | 0 | 1.184313 |
6 | Retail | Other | 12126 | 3199 | 6975 | 480 | 3140 | 545 | 0 | 1.130491 |
7 | Retail | Other | 7579 | 4956 | 9426 | 1669 | 3321 | 2566 | 0 | 1.013751 |
8 | Horeca | Other | 5963 | 3648 | 6192 | 425 | 1716 | 750 | 0 | 1.201904 |
9 | Retail | Other | 6006 | 11093 | 1881 | 1159 | 7425 | 2098 | 0 | 1.053333 |
Two columns that contain the anomaly label and score for each instance are added to the dataset. Instances that are flagged as inliers (anomaly = ) have an anomaly score close to ...