This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on...
Estimating the number of distinct elements in a large multiset has several applications, and hence has attracted active research in the past two decades. Several sampling and sket...
MetaCrystal enables users to visualize and control the degree of overlap between the results returned by different search engines. Several linked overview tools support rapid expl...
—Caching is a popular technique for reducing both server load and user response time in distributed systems. In this paper, we consider the question of whether caching might be e...
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system shoul...