Sciweavers

PVLDB
2010

Evaluating Entity Resolution Results

13 years 2 months ago
Evaluating Entity Resolution Results
Entity Resolution (ER) is the process of identifying groups of records that refer to the same real-world entity. Various measures (e.g., pairwise F1, cluster F1) have been used for evaluating ER results. However, ER measures tend to be chosen in an ad-hoc fashion without careful thought as to what defines a good result for the specific application at hand. In this paper, our contributions are twofold. First, we conduct an analysis on existing ER measures, showing that they can often conflict with each other by ranking the results of ER algorithms differently. Second, we explore a new distance measure for ER (called “generalized merge distance” or GMD) inspired by the edit distance of strings, using cluster splits and merges as its basic operations. A significant advantage of GMD is that the cost functions for splits and merges can be configured, enabling us to clearly understand the characteristics of a defined GMD measure. Surprisingly, a state-of-the-art clustering measur...
David Menestrina, Steven Whang, Hector Garcia-Moli
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where PVLDB
Authors David Menestrina, Steven Whang, Hector Garcia-Molina
Comments (0)