Sciweavers

WWW
2008
ACM

Incremental web page template detection

14 years 5 months ago
Incremental web page template detection
Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage consumption and a huge delay of data refreshing. In this paper, we present an incremental framework to detect templates in which a page is processed as soon as it has been crawled. In this framework, we don't need to cache any web page. Experiments show that our framework consumes less than 7% storage than traditional methods. And also the speed of data refreshing is accelerated because of the incremental manner. Categories and Subject Descriptors: H.3.3 [Information Systems]: Information Search and Retrieval General Terms: Experimentation, Algorithms
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo Xu
Comments (0)