Sciweavers

SSDBM
2003
IEEE

Approximate String Joins

13 years 9 months ago
Approximate String Joins
String data is ubiquitous, and its management has taken on particular importance in the past few years. Approximate queries are very important on string data especially for more complex queries involving joins. This is due, for example, to the prevalence of typographical errors in data, and multiple conventions for recording attributes such as name and address. Commercial databases do not support approximate string joins directly, and it is a challenge to implement this functionalityefficiently with user-defined functions (UDFs). In this paper, we develop a technique for building approximate string join capabilities on top of commercial databases by exploiting facilities already available in them. At the core, our technique relies on matching short substringsof length   , called   -grams, and taking into account both positions of individual matches and the total number of such matches. Our approach applies to both approximate full string matching and approximate substring matching...
Divesh Srivastava
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Where SSDBM
Authors Divesh Srivastava
Comments (0)