Sciweavers

19 search results - page 1 / 4
» Incremental web page template detection
Sort
View
WWW
2008
ACM
14 years 6 months ago
Incremental web page template detection
Most template detection methods process web pages in batches that a newly crawled page can not be processed until enough pages have been collected. This results in large storage c...
Yu Wang, Binxing Fang, Xueqi Cheng, Li Guo, Hongbo...
CIKM
2006
Springer
13 years 9 months ago
A fast and robust method for web page template detection and removal
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods suc...
Karane Vieira, Altigran Soares da Silva, Nick Pint...
WWW
2008
ACM
14 years 6 months ago
Web page sectioning using regex-based template
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The...
Rupesh R. Mehta, Amit Madaan
PKDD
2007
Springer
120views Data Mining» more  PKDD 2007»
13 years 11 months ago
Site-Independent Template-Block Detection
Detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction. Of the many approaches propos...
Aleksander Kolcz, Wen-tau Yih
SAC
2006
ACM
13 years 11 months ago
Template detection for large scale search engines
Templates in web sites hurt search engine retrieval performance, especially in content relevance and link analysis. Current template removal methods suffer from processing speed ...
Liang Chen, Shaozhi Ye, Xing Li