Comprehensible and Accurate Cluster Labels in Text Clustering

9 years 11 months ago
Comprehensible and Accurate Cluster Labels in Text Clustering
The purpose of text clustering in information retrieval is to discover groups of semantically related documents. Accurate and comprehensible cluster descriptions (labels) let the user comprehend the collection’s content faster and are essential for various document browsing interfaces. The task of creating descriptive, sensible cluster labels is difficult—typical text clustering algorithms focus on optimizing proximity between documents inside a cluster and rely on keyword representation for describing discovered clusters. In the approach called Description Comes First (DCF) cluster labels are as important as document groups—DCF promotes machine discovery of comprehensible candidate cluster labels later used to discover related document groups. In this paper we describe an application of DCF to the k-Means algorithm, including results of experiments performed on the 20-newsgroups document collection. Experimental evaluation showed that DCF does not decrease the metrics used to a...
Jerzy Stefanowski, Dawid Weiss
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2007
Where RIAO
Authors Jerzy Stefanowski, Dawid Weiss
Comments (0)