Sciweavers

AIRS
2008
Springer

News Page Discovery Policy for Instant Crawlers

13 years 11 months ago
News Page Discovery Policy for Instant Crawlers
Many news pages which are of high freshness requirement are published on the internet every day. They should be downloaded immediately by instant crawlers. Otherwise, they will become outdated soon. In the past, instant crawlers only download pages from a manually generated news website list. Bandwidth is wasted in downloading non-news pages because news websites do not publish news pages exclusively. In this paper, a novel approach is proposed to discover news pages. This approach includes seed selection and news URL prediction based on user behavior analysis. Empirical studies on a user access log for two months show that our approach outperforms the traditional approach in both precision and recall.
Yong Wang, Yiqun Liu, Min Zhang, Shaoping Ma
Added 01 Jun 2010
Updated 01 Jun 2010
Type Conference
Year 2008
Where AIRS
Authors Yong Wang, Yiqun Liu, Min Zhang, Shaoping Ma
Comments (0)