An Introduction to Entity Resolution in Python/

...

Fighting Label Errors

Learn how to detect and treat label errors with confident learning techniques.

We'll cover the following...

Detect label errors
Robust machine learning models
Integrate cleanlab into an iterative workflow
Key takeaway

The real world is full of imperfect data. If we ignore issues, we might draw wrong conclusions and make suboptimal decisions. We understand this because this course focuses on resolving duplicate records, one of several data quality issues. However, the resolution outcome itself depends on the data and its quality.

This lesson introduces learners to confident learning. Consider it a robust alternative to standard (or naive) machine learning. In confident learning, potential data errors are part of the modeling so that algorithms can automatically adapt to imperfect data—for example, can we trust that the example labels we use for the initial training of our machine learning model are 100% accurate?

Detect label errors

Machine learning algorithms require some labeled examples for initial training. In entity resolution, we select a subset of pairs and assign them to the match or no-match class. Large-scale applications, such as master data management in the enterprise, involve several users reviewing pairs of records. Every such manual intervention is a potential error source. ...

Introduction to Entity Resolution and Applications

A Quickstart Guide Using the RecordLinkage Package

Preprocessing

Indexing

Feature Engineering

Pairwise Matching

Clustering

Integration

Entity Resolution Fundamentals

Matching Products Across Two Online Shops

Conclusion

Appendix

Auto-Tagging System for Content Categorization

Fighting Label Errors

Detect label errors