Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevan...
WormBase (http://wormbase.org), the public database for genomics and biology of Caenorhabditis elegans, has been restructured for stronger performance and expanded for richer biol...
Erich M. Schwarz, Igor Antoshechkin, Carol Bastian...
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
Missing web pages, URIs that return the 404 “Page Not Found” error or the HTTP response code 200 but dereference unexpected content, are ubiquitous in today’s browsing exper...
Martin Klein, Jeffery L. Shipman, Michael L. Nelso...