Exploiting parallelism to support scalable hierarchical clustering

13 years 4 months ago

Download ir.iit.edu

A distributed memory parallel version of the group average Hierarchical Agglomerative Clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efﬁcient load balancing. In a series of experiments using a subset of a standard TREC test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efﬁciently used and the collection size. Results show that our algorithm performs close to the expected O(n2 /p) time on p processors, rather than the worst-case O(n3 /p) time . Furthermore, the O(n2 /p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results conﬁrm those of other studies showing that hierarchical algorithms produce signiﬁcantly tighter clusters in the d...

Rebecca Cathey, Eric C. Jensen, Steven M. Beitzel,

Real-time Traffic

Agglomerative Clustering Algorithm | Clustering Algorithm | Hierarchical Agglomerative Clustering | JASIS 2007 |

claim paper

» HyPursuit A Hierarchical Network Search Engine that Exploits ContentLink Hypertext Cluster...

» Hierarchical Bloom filter arrays HBA a novel scalable metadata management system for large...

» Scalable computing with parallel tasks

» A new intrusion detection system using support vector machines and hierarchical clustering

» Scalable Approaches for Supporting MPIIO Atomicity

» An Assessment of a Metric Space Database Index to Support Sequence Homology

» Simulative performance analysis of gossip failure detection for scalable distributed syste...

» Equalizer A Scalable Parallel Rendering Framework

Post Info
More Details (n/a)

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2007
Where	JASIS
Authors	Rebecca Cathey, Eric C. Jensen, Steven M. Beitzel, Ophir Frieder, David A. Grossman

Comments (0)

Sciweavers

Exploiting parallelism to support scalable hierarchical clustering

Agglomerative Clustering Algorithm | Clustering Algorithm | Hierarchical Agglomerative Clustering | JASIS 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers