Sciweavers

563 search results - page 12 / 113
» Crawling the web for structured documents
Sort
View
NSDI
2010
15 years 3 months ago
The Architecture and Implementation of an Extensible Web Crawler
Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...
Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...
WWW
2009
ACM
16 years 2 months ago
User-centric content freshness metrics for search engines
In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...
Ali Dasdan, Xinh Huynh
ADAPTIVE
2007
Springer
15 years 8 months ago
Web Document Modeling
A very common issue of adaptive Web-Based systems is the modeling of documents. Such documents represent domain-specific information for a number of purposes. Application areas su...
Alessandro Micarelli, Filippo Sciarrone, Mauro Mar...
SIGIR
2012
ACM
13 years 4 months ago
Optimizing positional index structures for versioned document collections
Versioned document collections are collections that contain multiple versions of each document. Important examples are Web archives, Wikipedia and other wikis, or source code and ...
Jinru He, Torsten Suel