Sciweavers

CIKM
2006
Springer

Incremental hierarchical clustering of text documents

13 years 8 months ago
Incremental hierarchical clustering of text documents
Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clustering algorithms, namely Cobweb and Classit, have not been widely used with text document data. We discuss why, in the current form, these algorithms are not suitable for text clustering and propose an alternative formulation that includes changes to the underlying distributional assumption of the algorithm in order to conform with the data. Both the original Classit algorithm and our proposed algorithm are evaluated using Reuters newswire articles and Ohsumed dataset. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--Clustering; I.5.3 [Pattern Recognition]: Clustering; I.2.6 [Artificial Intelligence]: Learning--Concept Learning Ge...
Nachiketa Sahoo, Jamie Callan, Ramayya Krishnan, G
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CIKM
Authors Nachiketa Sahoo, Jamie Callan, Ramayya Krishnan, George T. Duncan, Rema Padman
Comments (0)