Sciweavers

SIGIR
2010
ACM

Analysis of structural relationships for hierarchical cluster labeling

13 years 3 months ago
Analysis of structural relationships for hierarchical cluster labeling
Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parentchild relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon Divergence, 2 Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC Ohsumed and the CLEF IP European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes. Categories and Subject Descriptors H.3.1 [Content Analysis and Indexing]: Linguistic processing; H.3.3...
Markus Muhr, Roman Kern, Michael Granitzer
Added 06 Dec 2010
Updated 06 Dec 2010
Type Conference
Year 2010
Where SIGIR
Authors Markus Muhr, Roman Kern, Michael Granitzer
Comments (0)