Sciweavers

CIDM
2007
IEEE

Distributed Document Clustering Using Word-clusters

13 years 11 months ago
Distributed Document Clustering Using Word-clusters
−Document clustering has become an increasingly important task in analyzing huge numbers of documents distributed among various sites. The challenging aspect is to analyze this enormous number of extremely high dimensional distributed documents and to organize them in such a way that results in better search and knowledge extraction without introducing much extra cost and complexity. This paper presents a distributed document clustering approach called Distributed Information Bottleneck (DIB). DIB adopts a two stage agglomerative Information Bottleneck (aIB) algorithm to generate local clusters. At the first stage, the high-dimensional document vector is significantly reduced by finding wordclusters. These word-clusters are then used to obtain documentclusters in the second stage. DIB then extracts compact but informative local models from these document-clusters and transfers them to a central site. At the global site, the local models, that are likely to describe the same document ...
Debzani Deb, Rafal A. Angryk
Added 02 Jun 2010
Updated 02 Jun 2010
Type Conference
Year 2007
Where CIDM
Authors Debzani Deb, Rafal A. Angryk
Comments (0)