Search Sciweavers | Sciweavers

563 search results - page 3 / 113

» Crawling the web for structured documents

click to vote

SIGMOD
2006
ACM

232views Database» more SIGMOD 2006»

To search or to crawl?: towards a query optimizer for text-centric tasks

14 years 5 months ago

Download pages.stern.nyu.edu

Text is ubiquitous and, not surprisingly, many important applications rely on textual data for a variety of tasks. As a notable example, information extraction applications derive...

Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay ...

claim paper

Read More »

click to vote

ADMA
2009
Springer

142views Data Mining» more ADMA 2009»

Crawling Deep Web Using a New Set Covering Algorithm

13 years 12 months ago

Download cs.uwindsor.ca

Abstract. Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This ca...

Yan Wang, Jianguo Lu, Jessica Chen

claim paper

Read More »

click to vote

JUCS
2008

124views more JUCS 2008»

Structure-Based Crawling in the Hidden Web

13 years 5 months ago

Download www.jucs.org

: The number of applications that need to crawl the Web to gather data is growing at an ever increasing pace. In some cases, the criterion to determine what pages must be included ...

Márcio L. A. Vidal, Altigran Soares da Silv...

claim paper

Read More »

click to vote

ICDE
2006
IEEE

146views Database» more ICDE 2006»

Query Selection Techniques for Efficient Crawling of Structured Web Sources

14 years 6 months ago

Download research.microsoft.com

The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are on...

Ping Wu, Ji-Rong Wen, Huan Liu, Wei-Ying Ma

claim paper

Read More »

click to vote

ICDE
2007
IEEE

167views Database» more ICDE 2007»

DSphere: A Source-Centric Approach to Crawling, Indexing and Searching the World Wide Web

14 years 6 months ago

Download www.cc.gatech.edu

We describe DSPHERE1 - a decentralized system for crawling, indexing, searching and ranking of documents in the World Wide Web. Unlike most of the existing search technologies tha...

Bhuvan Bamba, Ling Liu, James Caverlee, Vaibhav Pa...

claim paper

Read More »

« Prev « First page 3 / 113 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers