Sciweavers

LAWEB
2003
IEEE
13 years 9 months ago
On the Evolution of Clusters of Near-Duplicate Web Pages
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Dennis Fetterly, Mark Manasse, Marc Najork
WIDM
2005
ACM
13 years 10 months ago
DirectoryRank: ordering pages in web directories
Web Directories are repositories of Web pages organized in a hierarchy of topics and sub-topics. In this paper, we present DirectoryRank, a ranking framework that orders the pages...
Vlassis Krikos, Sofia Stamou, Pavlos Kokosis, Alex...
SIGMOD
2005
ACM
126views Database» more  SIGMOD 2005»
13 years 10 months ago
Page Quality: In Search of an Unbiased Web Ranking
In a number of recent studies [4, 8] researchers have found that because search engines repeatedly return currently popular pages at the top of search results, popular pages tend ...
Junghoo Cho, Sourashis Roy, Robert Adams
WISE
2005
Springer
13 years 10 months ago
Extracting Web Data Using Instance-Based Learning
This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic...
Yanhong Zhai, Bing Liu
WISE
2005
Springer
13 years 10 months ago
Temporal Ranking of Search Engine Results
Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevan...
Adam Jatowt, Yukiko Kawai, Katsumi Tanaka
FIRBPERF
2005
IEEE
260views Algorithms» more  FIRBPERF 2005»
13 years 10 months ago
Models of Dynamic Web Content
Web pages are created, modified and removed at unspecified times by their owners. The frequency and extent of changes to Web pages vary across sites and across pages within site...
Mariacarla Calzarossa, Daniele Tessera
LAWEB
2007
IEEE
13 years 10 months ago
Distinctive Features of the Argentinian Web
This article presents the most distinguishing features of the Argentinian web as found in a private sample of almost 10 million web pages from 150.000 sites collected in the early...
Gabriel Tolosa, Fernando Bordignon, Ricardo A. Bae...
ICDE
2007
IEEE
142views Database» more  ICDE 2007»
13 years 10 months ago
An Automatic Page Link Generation Method based on Users' Behavior
In this paper, we propose a novel method for generating personalized page links. The page links which are generated by our proposed method are useful if users look for web pages r...
Yu Suzuki, Keigo Nakatani, Kyoji Kawagoe
ESA
2009
Springer
99views Algorithms» more  ESA 2009»
13 years 11 months ago
Minimizing Maximum Response Time and Delay Factor in Broadcast Scheduling
We consider online algorithms for pull-based broadcast scheduling. In this setting there are n pages of information at a server and requests for pages arrive online. When the serv...
Chandra Chekuri, Sungjin Im, Benjamin Moseley
WWW
2010
ACM
13 years 11 months ago
Diversifying web search results
Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However,...
Davood Rafiei, Krishna Bharat, Anand Shukla