Search⌘ K
AI Features

Motivating Clustering

Explore how clustering transforms entity resolution from pairwise matching to collective classification. Understand graph-based methods to resolve conflicts, improve prediction accuracy, and handle dependencies between record matches for practical applications.

A typical entity resolution pipeline starts with preprocessing records r~1=C(r1),,r~n=C(rn)\tilde{r}_1=C(r_1),\ldots,\tilde r_n=C(r_n) individually. Next comes pairwise feature engineering sij=F(r~i,r~j)s_{ij}=F(\tilde{r}_i,\tilde{r}_j), followed by pairwise matching cij=M(sij)c_{ij}=M(s_{ij}), where c=1c=1 represents a match and c=0c=0 otherwise—a binary classification problem.

Collective entity resolution goes beyond pairs to improve outcomes from the collective evidence of any number of records. It is about improving the classification accuracy and resolving potential conflicts that would otherwise make the output impractical.

Clusters

Let’s reformulate our resolution task as a clustering problem on graphs. Starting from our pairwise predictions, we create a graph where nodes represent records r1,,rnr_1,\ldots,r_n ...