Sciweavers

WWW
2009
ACM

Towards language-independent web genre detection

14 years 5 months ago
Towards language-independent web genre detection
The term web genre denotes the type of a given web resource, in contrast to the topic of its content. In this research, we focus on recognizing the web genres blog, wiki and forum. We present a set of features that exploit the hierarchical structure of the web page's HTML mark-up and thus, in contrast to related approaches, do not depend on a linguistic analysis of the page's content. Our results show that it is possible to achieve a very good accuracy for a fully language independent detection of structured web genres. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content and Indexing--Abstracting methods General Terms Algorithms
Philipp Scholl, Renato Domínguez Garc&iacut
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2009
Where WWW
Authors Philipp Scholl, Renato Domínguez García, Doreen Böhnstedt, Christoph Rensing, Ralf Steinmetz
Comments (0)