Distributed hierarchical document clustering

12 years 11 months ago
Distributed hierarchical document clustering
This paper investigates the applicability of distributed clustering technique, called RACHET [1], to organize large sets of distributed text data. Although the authors of RACHET claim that the algorithm generates quality clusters for massive and high dimensional data set, the algorithm was not yet evaluated on a well known academic data set. This paper presents performance analysis of the algorithm and tests its suitability for distributed document clustering. This work uses three widely known hierarchical algorithms to generate local clusters at each of distributed repositories and then the RACHET is applied to merge distributed hierarchies of clusters. We perform our own tests of the algorithm on standard document corpora [2], using popular cluster evaluation measures [3, 4] and discuss important implementation details. KEY WORDS Distributed hierarchical clustering, document clustering.
Debzani Deb, M. Muztaba Fuad, Rafal A. Angryk
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACST
Authors Debzani Deb, M. Muztaba Fuad, Rafal A. Angryk
Comments (0)