This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
We introduce a stricter Web community definition to overcome boundary ambiguity of a Web community defined by Flake, Lawrence and Giles [2], and consider the problem of finding co...
Many Web information services utilize techniques of information extraction (IE) to collect important facts from the Web. To create more advanced services, one possible method is t...
In this paper, we present a long-term study of user-centric Web traffic data collected in 2000-2002 and 2005-2006 from two large representative panels of French Internet users. Ou...