Match vs. No-Match
Explore how to distinguish matches from no-matches in customer records by combining similarity rules based on names, addresses, and phone numbers. Understand the use of threshold-based similarity joins and how to handle contradictions using transitive clustering for effective entity resolution.
We'll cover the following...
We are humans with intuition, have prior experience with similar tasks, or did a great job preparing by reviewing the data. Now, we (believe to) know how to distinguish between a match or no-match for any pair of customer records. Let’s implement this knowledge and translate it into a policy combining a few plausible rules.
Below, we define four matching rules and predict a match if any of those applies.
In other words, we predict a match if any of the following rules applies:
Rule 1: The similarity of customer names and streets are high.
Rule 2: The similarity of customer names is very high and the address is moderate.
Rule 3: The phonetic similarity of customer names and addresses are both very high.
Rule 4: Phone numbers match exactly.
The literature calls such AND/OR combinations of threshold-based rules a ...