Sciweavers

20 search results - page 1 / 4
» Measuring Contribution of HTML Features in Web Document Clus...
Sort
View
CLEIEJ
2008
72views more  CLEIEJ 2008»
13 years 5 months ago
Measuring Contribution of HTML Features in Web Document Clustering
Documents in HTML format have many features to analyze, from the terms in special sections to the phrases that appear in the whole document. However, it is important to decide whi...
Esteban Meneses, Oldemar Rodríguez-Rojas
ECIR
2003
Springer
13 years 6 months ago
Hierarchical Classification of HTML Documents with WebClassII
This paper describes a new method for the classification of a HTML document into a hierarchy of categories. The hierarchy of categories is involved in all phases of automated docum...
Michelangelo Ceci, Donato Malerba
SIGIR
2005
ACM
13 years 11 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
AIRWEB
2006
Springer
13 years 9 months ago
Tracking Web Spam with Hidden Style Similarity
Automatically generated content is ubiquitous in the web: dynamic sites built using the three-tier paradigm are good examples (e.g. commercial sites, blogs and other sites powered...
Tanguy Urvoy, Thomas Lavergne, Pascal Filoche