Concept Chain Based Text Clustering

10 years 7 months ago
Concept Chain Based Text Clustering
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic relations between words are ignored. In this paper, we propose a novel document representation approach to strengthen the discriminative feature of document objects. We replace terms of documents with concepts in WordNet and construct a model named Concept CHain Model(CCHM) for document representation. CCHM is applied in both partitioning and agglomerative clustering analysis. Hierarchical clustering processes in different levels of concept chains. The experimental evaluation on textual data sets demonstrates the validity and efficiency of CCHM. The results of experiments with concept show the superiority of our approach in hierarchical clustering. Keywords. Data Mining, Text Clustering, Information Retrieval
Shaoxu Song, Jian Zhang, Chunping Li
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where CIS
Authors Shaoxu Song, Jian Zhang, Chunping Li
Comments (0)