Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific compu...
In this paper we present an efficient, scalable and general algorithm for performing set joins on predicates involving various similarity measures like intersect size, Jaccard-coe...
The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The s...
Identification of all objects in a dataset whose similarity is not less than a specified threshold is of major importance for management, search, and analysis of data. Set similari...
There has been considerable interest in similarity join in the research community recently. Similarity join is a fundamental operation in many application areas, such as data inte...