Sciweavers

DEXAW
2008
IEEE

Topic Detection by Clustering Keywords

13 years 11 months ago
Topic Detection by Clustering Keywords
We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results.
Christian Wartena, Rogier Brussee
Added 29 May 2010
Updated 29 May 2010
Type Conference
Year 2008
Where DEXAW
Authors Christian Wartena, Rogier Brussee
Comments (0)