Sciweavers

HICSS
2008
IEEE

Using Visual Features for Fine-Grained Genre Classification of Web Pages

13 years 11 months ago
Using Visual Features for Fine-Grained Genre Classification of Web Pages
The field of automatic genre classification has primarily focused on extracting textual features from documents. The goal of this research is to investigate whether visual features of HTML web pages can improve the classification of fine-grained genres. Intuitively it seems that this should be helpful and the challenge is to extract those visual features that capture the layout characteristics of the genres. A corpus of Web pages from different e-commerce sites was generated and manually classified into several genres. Three different sets of features were compared - one with just textual features, one with HTML level features added, and a third with visual features added. Our experiments confirm that using HTML features and particularly URL address features can improve classification beyond using textual features alone. We also show that adding visual features can be useful for further improving fine-grained genre classification.
Ryan Levering, Michal Cutler, Lei Yu
Added 29 May 2010
Updated 29 May 2010
Type Conference
Year 2008
Where HICSS
Authors Ryan Levering, Michal Cutler, Lei Yu
Comments (0)