Search Sciweavers | Sciweavers

80 search results - page 6 / 16

» Web Page Segmentation Based on Gestalt Theory

click to vote

WEBI
2005
Springer

87views Internet Technology» more WEBI 2005»

A Fuzzy Web Surfer Model

15 years 5 months ago

Download www.isical.ac.in

A novel web surfer model, where the transitions between web pages are fuzzy quantities, is proposed in this article. Such a model is appropriate when the links between pages are i...

B. Lakshmi Narayan, Sankar K. Pal

claim paper

Read More »

131

click to vote

DOCENG
2009
ACM

223views Document Analysis» more DOCENG 2009»

Web article extraction for web printing: a DOM+visual based approach

15 years 6 months ago

Download www.hpl.hp.com

: © Web Article Extraction for Web Printing: a DOM+Visual based Approach Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong, Jerry; Liu HP Laboratories HPL-2009-185 Article extrac...

Ping Luo, Jian Fan, Sam Liu, Fen Lin, Yuhong Xiong...

claim paper

Read More »

click to vote

WWW
2009
ACM

213views Internet Technology» more WWW 2009»

Extracting article text from the web with maximum subsequence segmentation

16 years 8 days ago

Download www2009.org

Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...

Jeff Pasternack, Dan Roth

claim paper

Read More »

102

click to vote

WWW
2005
ACM

135views Internet Technology» more WWW 2005»

Web data extraction based on partial tree alignment

16 years 8 days ago

Download www.cs.uic.edu

This paper studies the problem of extracting data from a Web page that contains several structured data records. The objective is to segment these data records, extract data items...

Yanhong Zhai, Bing Liu

claim paper

Read More »

Voted

WWW
2008
ACM

109views Internet Technology» more WWW 2008»

Recrawl scheduling based on information longevity

16 years 8 days ago

Download www2008.org

It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...

Christopher Olston, Sandeep Pandey

claim paper

Read More »

« Prev « First page 6 / 16 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers