A collaborative crawler is a group of crawling nodes, in which each crawling node is responsible for a specific portion of the web. We study the problem of collecting geographical...
To maintain interoperability in the Web environment it is necessary to comply with Web standards. Current specifications of HTML and XHTML languages define conformance conditions ...
In this paper we describe preliminary work that examines whether statistical properties of the structure of websites can be an informative measure of their quality. We aim to deve...
Vaclav Petricek, Tobias Escher, Ingemar J. Cox, He...
This paper presents a simple and intuitive method for mining search engine query logs to get fast query recommendations on a large scale industrial-strength search engine. In orde...
Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search e...