This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Web Directories are repositories of Web pages organized in a hierarchy of topics and sub-topics. In this paper, we present DirectoryRank, a ranking framework that orders the pages...
Vlassis Krikos, Sofia Stamou, Pavlos Kokosis, Alex...
In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend ...
This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic...
Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevan...
Web pages are created, modified and removed at unspecified times by their owners. The frequency and extent of changes to Web pages vary across sites and across pages within site...
This article presents the most distinguishing features of the Argentinian web as found in a private sample of almost 10 million web pages from 150.000 sites collected in the early...
Gabriel Tolosa, Fernando Bordignon, Ricardo A. Bae...
In this paper, we propose a novel method for generating personalized page links. The page links which are generated by our proposed method are useful if users look for web pages r...
We consider online algorithms for pull-based broadcast scheduling. In this setting there are n pages of information at a server and requests for pages arrive online. When the serv...
Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However,...