Sciweavers

IJCAI
2003

A Comparison of String Distance Metrics for Name-Matching Tasks

13 years 5 months ago
A Comparison of String Distance Metrics for Name-Matching Tasks
Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme, which was developed in the probabilistic record linkage community.
William W. Cohen, Pradeep D. Ravikumar, Stephen E.
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where IJCAI
Authors William W. Cohen, Pradeep D. Ravikumar, Stephen E. Fienberg
Comments (0)