In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the co...
Abstract. The term Deep Web (sometimes also called Hidden Web) refers to the data content that is created dynamically as the result of a specific search on the Web. In this respec...
In this paper we present CUTER, a system that processes HTML pages in order to extract the useful text from them. The mechanism is focalized on HTML pages that include news articl...
George Adam, Christos Bouras, Vassilis Poulopoulos
Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propos...
First generation Web-content encodes information in handwritten (HTML) Web pages. Second generation Web content generates HTML pages on demand, e.g. by filling in templates with c...
Jacco van Ossenbruggen, Joost Geurts, Frank Cornel...