Similarity of Numeric Attributes
Explore how to engineer similarity features for numeric data such as amounts, dates, and geocodes in entity resolution. Learn to use RecordLinkage's built-in functions and create custom similarity measures, improving your ability to identify duplicate records efficiently in Python.
We'll cover the following...
Duplicate payments likely have similar amounts and transaction dates. Duplicate locations have similar geocodes. All three are numeric (vector) attributes. Let’s use a tiny dataset to show how to configure similarity features for numeric attributes using the RecordLinkage API.
Note that dates can be interpreted as strings or numeric values in units of, for example, days. Our focus here is on the numeric interpretation.
Custom comparer
A similarity feature is limited to