Sciweavers

38 search results - page 2 / 8
» The indexable web is more than 11.5 billion pages
Sort
View
VLDB
2000
ACM
104views Database» more  VLDB 2000»
14 years 28 days ago
The Evolution of the Web and Implications for an Incremental Crawler
In this paper we study how to build an effective incremental crawler. The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of ...
Junghoo Cho, Hector Garcia-Molina
ICTIR
2009
Springer
14 years 3 months ago
PageRank: Splitting Homogeneous Singular Linear Systems of Index One
Abstract. The PageRank algorithm is used today within web information retrieval to provide a content-neutral ranking metric over web pages. It employs power method iterations to so...
Douglas V. de Jager, Jeremy T. Bradley
SIGIR
2008
ACM
13 years 9 months ago
Classifiers without borders: incorporating fielded text from neighboring web pages
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
Xiaoguang Qi, Brian D. Davison
LREC
2010
149views Education» more  LREC 2010»
13 years 10 months ago
DutchParl. The Parliamentary Documents in Dutch
A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains d...
Maarten Marx, Anne Schuth
WEBDB
2005
Springer
129views Database» more  WEBDB 2005»
14 years 2 months ago
Searching for Hidden-Web Databases
Recently, there has been increased interest in the retrieval and integration of hidden Web data with a view to leverage high-quality information available in online databases. Alt...
Luciano Barbosa, Juliana Freire