Sciweavers

KES
2010
Springer

W-kmeans: Clustering News Articles Using WordNet

13 years 3 months ago
W-kmeans: Clustering News Articles Using WordNet
 Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the “bag of words” used prior to the clustering process and assisting the label generation procedure following it. Our experimentation revealed a significant improvement over standard kmeans for a corpus of news articles derived from major news portals. Moreover, the cluster labeling process generates useful and of high quality cluster tags.
Christos Bouras, Vassilis Tsogkas
Added 29 Jan 2011
Updated 29 Jan 2011
Type Journal
Year 2010
Where KES
Authors Christos Bouras, Vassilis Tsogkas
Comments (0)