Sciweavers

80 search results - page 6 / 16
» Web Page Segmentation Based on Gestalt Theory
Sort
View
WEBI
2005
Springer
15 years 2 months ago
A Fuzzy Web Surfer Model
A novel web surfer model, where the transitions between web pages are fuzzy quantities, is proposed in this article. Such a model is appropriate when the links between pages are i...
B. Lakshmi Narayan, Sankar K. Pal
DOCENG
2009
ACM
15 years 3 months ago
Web article extraction for web printing: a DOM+visual based approach
: © Web Article Extraction for Web Printing: a DOM+Visual based Approach Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong, Jerry; Liu HP Laboratories HPL-2009-185 Article extrac...
Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong...
WWW
2009
ACM
15 years 10 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth
WWW
2005
ACM
15 years 10 months ago
Web data extraction based on partial tree alignment
This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...
Yanhong Zhai, Bing Liu
WWW
2008
ACM
15 years 10 months ago
Recrawl scheduling based on information longevity
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Christopher Olston, Sandeep Pandey