Search Sciweavers | Sciweavers

102 search results - page 16 / 21

» Agent-Based Approach for Web Crawling

218

click to vote

WWW
2009
ACM

125views Internet Technology» more WWW 2009»

Triplify: light-weight linked data publication from relational databases

16 years 8 months ago

Download www.informatik.uni-leipzig.de

In this paper we present Triplify ? a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relat...

Sören Auer, Sebastian Dietzold, Jens Lehmann,...

claim paper

Read More »

213

click to vote

SIGIR
2005
ACM

150views Information Technology» more SIGIR 2005»

Server selection methods in hybrid portal search

16 years 1 months ago

Download es.csiro.au

The TREC .GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with ...

David Hawking, Paul Thomas

claim paper

Read More »

202

click to vote

ECIR
2006
Springer

134views Information Technology» more ECIR 2006»

Automatic Document Organization in a P2P Environment

15 years 9 months ago

Download ir.shef.ac.uk

Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We co...

Stefan Siersdorfer, Sergej Sizov

claim paper

Read More »

155

click to vote

CIKM
2009
Springer

121views Information Technology» more CIKM 2009»

Graph-based seed selection for web-scale crawlers

16 years 2 months ago

Download clgiles.ist.psu.edu

One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identiﬁes and explores the problem of seed selection in webscal...

Shuyi Zheng, Pavel Dmitriev, C. Lee Giles

claim paper

Read More »

219

click to vote

KDD
2008
ACM

183views Data Mining» more KDD 2008»

De-duping URLs via rewrite rules

16 years 8 months ago

Download research.yahoo.com

A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...

Anirban Dasgupta, Ravi Kumar, Amit Sasturkar

claim paper

Read More »

« Prev « First page 16 / 21 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers