The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...
Algorithms in distributed information retrieval often rely on accurate knowledge of the size of a collection. The "multiple capture-recapture" method of Shokouhi et al. ...
The size of the Web as well as user bases of search systems continue to grow exponentially. Consequently, providing subsecond query response times and high query throughput become...
Roi Blanco, Berkant Barla Cambazoglu, Claudio Lucc...
We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to ...
Users of search engines express their needs as queries, typically consisting of a small number of terms. The resulting search engine query logs are valuable resources that can be ...
Milad Shokouhi, Justin Zobel, Seyed M. M. Tahaghog...