Sciweavers

WWW
2007
ACM

Designing efficient sampling techniques to detect webpage updates

14 years 5 months ago
Designing efficient sampling techniques to detect webpage updates
Due to resource constraints, Web archiving systems and search engines usually have difficulties keeping the entire local repository synchronized with the Web. We advance the state-of-art of the samplingbased synchronization techniques by answering a challenging question: Given a sampled webpage and its change status, which other webpages are also likely to change? We present a study of various downloading granularities and policies, and propose an adaptive model based on the update history and the popularity of the webpages. We run extensive experiments on a large dataset of approximately 300,000 webpages to demonstrate that it is most likely to find more updated webpages in the current or upper directories of the changed samples. Moreover, the adaptive strategies outperform the non-adaptive one in terms of detecting important changes. Terms:Management, Design, Algorithms, Experimentation
Qingzhao Tan, Ziming Zhuang, Prasenjit Mitra, C. L
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Qingzhao Tan, Ziming Zhuang, Prasenjit Mitra, C. Lee Giles
Comments (0)