Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

240

ICDE
2009
IEEE

194views Database» more ICDE 2009»

Top-k Set Similarity Joins

16 years 5 months ago

Top-k Set Similarity Joins

Download www.cse.unsw.edu.au

Abstract-- Similarity join is a useful primitive operation underlying many applications, such as near duplicate Web page detection, data integration, and pattern recognition. Traditional similarity joins require a user to specify a similarity threshold. In this paper, we study a variant of the similarity join, termed top-k set similarity join. It returns the top-k pairs of records ranked by their similarities, thus eliminating the guess work users have to perform when the similarity threshold is unknown before hand. An algorithm, topk-join, is proposed to answer top-k similarity join efficiently. It is based on the prefix filtering principle and employs tight upper bounding of similarity values of unseen pairs. Experimental results demonstrate the efficiency of the proposed algorithm on large-scale real datasets.

Chuan Xiao, Wei Wang 0011, Xuemin Lin, Haichuan Sh

Real-time Traffic

Database | ICDE 2009 | Set Similarity Join | Similarity Join | Similarity Threshold | Similarity Values | Top-k Similarity Join |

claim paper

Related Content

» Semantic link based topK join queries in P2P networks

» TopK aggregation queries over large networks

» TopK Correlation Subgraph Search in Graph Databases

» Approximate distributed top k queries

» Topk Similarity Join over Multivalued Objects

» TrieJoin Efficient Triebased String Similarity Joins with EditDistance Constraints

» High Performance Data Mining Using the Nearest Neighbor Join

» Efficient set joins on similarity predicates

» Efficient Set Similarity Joins Using Minprefixes

Post Info
More Details (n/a)

Added	20 Oct 2009
Updated	20 Oct 2009
Type	Conference
Year	2009
Where	ICDE
Authors	Chuan Xiao, Wei Wang 0011, Xuemin Lin, Haichuan Shang

Comments (0)