Sciweavers

DEBU
2010

Weighted Set-Based String Similarity

13 years 3 months ago
Weighted Set-Based String Similarity
Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens. Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database whose similarity to the query is larger than a user specified threshold. Weighted string similarity queries are useful in applications like data cleaning and integration for finding approximate matches in the presence of typographical mistakes, multiple formatting conventions, data transformation errors, etc. We show that this problem has semantic properties that can be exploited to design index structures that support very efficient algorithms for query answering.
Marios Hadjieleftheriou, Divesh Srivastava
Added 10 Dec 2010
Updated 10 Dec 2010
Type Journal
Year 2010
Where DEBU
Authors Marios Hadjieleftheriou, Divesh Srivastava
Comments (0)