All-in indexing

We keep it simple here and add every possible pair to the index—a “full” index in the RecordLinkage terminology.

Python 3.8

comparer = rl.Compare(n_jobs=-1)
print('Configuring one similarity function per attribute...')
for attribute in ['customer_name_c', 'customer_name_p', 'city_c', 'city_p']:
    comparer.string(left_on=attribute, right_on=attribute, method='jarowinkler', label=attribute + '_score')
for attribute in ['street_c', 'street_p']:
    comparer.string(left_on=attribute, right_on=attribute, method='damerau_levenshtein', label=attribute + '_score')
comparer.exact(left_on='phone_c', right_on='phone_c', label='phone_c_score')

1.Introduction to Entity Resolution and Applications

2.A Quickstart Guide Using the RecordLinkage Package

3.Preprocessing

4.Indexing

5.Feature Engineering

6.Pairwise Matching

7.Clustering

8.Integration

Assessment

Mini Project

9.Conclusion

10.Appendix

Project

Similarity Features

All-in indexing

Measuring similarity