Towards Structure-sensitive Hypertext Categorization

13 years 9 months ago
Towards Structure-sensitive Hypertext Categorization
Abstract. Hypertext categorization is the task of automatically assigning category labels to hypertext units. Comparable to text categorization it stays in the area of function learning based on the bag-of-features approach. This scenario faces the problem of a many-to-many relation between websites and their hidden logical document structure. The paper argues that this relation is a prevalent characteristic which interferes any effort of applying the classical apparatus of categorization to web genres. This is confirmed by a threefold experiment in hypertext categorization. In order to outline a solution to this problem, the paper sketches an alternative method of unsupervised learning which aims at bridging the gap between statistical and structural pattern recognition (Bunke et al. 2001) in the area of web mining.
Alexander Mehler, Rüdiger Gleim, Matthias Deh
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where GFKL
Authors Alexander Mehler, Rüdiger Gleim, Matthias Dehmer
Comments (0)