Sciweavers

IFIP12
2004

Impact on Performance of Hypertext Classification of Selective Rich HTML Capture

13 years 5 months ago
Impact on Performance of Hypertext Classification of Selective Rich HTML Capture
: Hypertext categorization is the automatic classification of web documents into predefined classes. It poses new challenges for automatic categorization because of the rich information in a hypertext document. Hyperlinks, HTML tags, and metadata all provide rich information for hypertext categorization that is not available in traditional text classification. This paper looks at (i) what representation to use for documents and which extra information hidden in HTML pages to take into consideration to improve the classification task, and (ii) how to deal with the very high number of features of texts. A hypertext dataset and three well-known learning algorithms (Na
Houda Benbrahim, Max Bramer
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where IFIP12
Authors Houda Benbrahim, Max Bramer
Comments (0)