Sciweavers

1246 search results - page 4 / 250
» High Performance Clustering Based on the Similarity Join
Sort
View
CORR
2011
Springer
186views Education» more  CORR 2011»
14 years 4 months ago
Similarity Join Size Estimation using Locality Sensitive Hashing
Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generali...
Hongrae Lee, Raymond T. Ng, Kyuseok Shim
84
Voted
WWW
2008
ACM
15 years 10 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
CLUSTER
2002
IEEE
15 years 2 months ago
Cluster Based Hybrid Hash Join: Analysis and Evaluation
The join is the most important, but also the most time consuming operation in relational database systems. We implemented the parallel Hybrid Hash Join algorithm on a PC-cluster a...
Erich Schikuta, Peter Kirkovits
SIGMOD
2011
ACM
248views Database» more  SIGMOD 2011»
14 years 13 days ago
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this paper, we propose the design of a new clu...
Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chi...
WIRN
2005
Springer
15 years 3 months ago
Ensembles Based on Random Projections to Improve the Accuracy of Clustering Algorithms
We present an algorithmic scheme for unsupervised cluster ensembles, based on randomized projections between metric spaces, by which a substantial dimensionality reduction is obtai...
Alberto Bertoni, Giorgio Valentini