Clustering documents in a web directory

13 years 11 months ago
Clustering documents in a web directory
Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hierarchical supervised classifiers is their high demand in terms of labeled examples, whose amount is related to the number of topics in the taxonomy. Hence, bootstrapping a huge hierarchy with a proper set of labeled examples is a critical issue. In this paper, we propose some solutions for the bootstrapping problem, implicitly or explicitly using a taxonomy definition: a baseline approach where documents are classified according to class labels, and two clustering approaches, where training is constrained by the a-priori knowledge of the taxonomy structure, both at terminological and topological level. In particular, we propose the TaxSOM model, that clusters a set of documents in a predefined hierarchy of classes, directly exploiting the knowledge of both their topological organization and their lexical ...
Giordano Adami, Paolo Avesani, Diego Sona
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Where WIDM
Authors Giordano Adami, Paolo Avesani, Diego Sona
Comments (0)