Search Sciweavers | Sciweavers

563 search results - page 28 / 113

» Crawling the web for structured documents

168

click to vote

WSDM
2010
ACM

204views Data Mining» more WSDM 2010»

Learning URL patterns for webpage de-duplication

16 years 27 days ago

Download www.wsdm-conference.org

Presence of duplicate documents in the World Wide Web adversely aﬀects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...

Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...

claim paper

Read More »

153

click to vote

WWW
2010
ACM

220views Internet Technology» more WWW 2010»

New-web search with microblog annotations

16 years 27 days ago

Download es.csiro.au

Web search engines discover indexable documents by recursively ‘crawling’ from a seed URL. Their rankings take into account link popularity. While this works well, it introduc...

Tom Rowlands, David Hawking, Ramesh Sankaranarayan...

claim paper

Read More »

152

click to vote

WWW
2005
ACM

144views Internet Technology» more WWW 2005»

Finding the boundaries of information resources on the web

15 years 11 months ago

Download www2005.org

In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...

Pavel Dmitriev, Carl Lagoze, Boris Suchkov

claim paper

Read More »

152

click to vote

ICDAR
2003
IEEE

158views Document Analysis» more ICDAR 2003»

Web Page Summarization for Handheld Devices: A Natural Language Approach

15 years 11 months ago

Download www.cse.salford.ac.uk

Summarization of web pages is a very interesting topic from both academic and commercial point of view. Academically, it is challenging to create a summary of a document (e.g. a w...

Hassan Alam, Rachmat Hartono, Aman Kumar, Ahmad Fu...

claim paper

Read More »

150

click to vote

ACMICEC
2006
ACM

141views ECommerce» more ACMICEC 2006»

From HTML documents to web tables and rules

16 years 19 hour ago

Download www.informatik.uni-freiburg.de

We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...

Kai Simon, Georg Lausen, Harold Boley

claim paper

Read More »

« Prev « First page 28 / 113 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers