Incremental hierarchical clustering of text documents

15 years 4 months ago

Download www.cs.cmu.edu

Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clustering algorithms, namely Cobweb and Classit, have not been widely used with text document data. We discuss why, in the current form, these algorithms are not suitable for text clustering and propose an alternative formulation that includes changes to the underlying distributional assumption of the algorithm in order to conform with the data. Both the original Classit algorithm and our proposed algorithm are evaluated using Reuters newswire articles and Ohsumed dataset. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--Clustering; I.5.3 [Pattern Recognition]: Clustering; I.2.6 [Artificial Intelligence]: Learning--Concept Learning Ge...

Nachiketa Sahoo, Jamie Callan, Ramayya Krishnan, G

Real-time Traffic

CIKM 2006 | Clustering Algorithms | Document Clustering | Information Management | Text Document |

claim paper

» Concept Chain Based Text Clustering

» A Hierarchical Consensus Architecture for Robust Document Clustering

» Incremental Text Structuring with Online Hierarchical Ranking

» Frequent termbased text clustering

» Biomedical ontology improves biomedical literature clustering performance a comparison stu...

» Text mining without document context

» Distributed hierarchical document clustering

» An investigation of linguistic features and clustering algorithms for topical document clu...

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	CIKM
Authors	Nachiketa Sahoo, Jamie Callan, Ramayya Krishnan, George T. Duncan, Rema Padman

Comments (0)

Sciweavers

Incremental hierarchical clustering of text documents

CIKM 2006 | Clustering Algorithms | Document Clustering | Information Management | Text Document |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers