The join is the most important, but also the most time consuming operation in relational database systems. We implemented the parallel Hybrid Hash Join algorithm on a PC-cluster a...
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different u...
Edward R. Dougherty, Junior Barrera, Marcel Brun, ...
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambi...
We consider the problem of clustering data lying on multiple subspaces of unknown and possibly different dimensions. We show that one can represent the subspaces with a set of pol...
Extracting natural groups of the unlabeled data is known as clustering. To improve the stability and robustness of the clustering outputs, clustering ensembles have emerged recent...