Sciweavers

SIGIR
2008
ACM

Generalising multiple capture-recapture to non-uniform sample sizes

13 years 4 months ago
Generalising multiple capture-recapture to non-uniform sample sizes
Algorithms in distributed information retrieval often rely on accurate knowledge of the size of a collection. The "multiple capture-recapture" method of Shokouhi et al. is one of the more reliable algorithms for determining collection size, but it relies on samples with a uniform number of documents. Such uniform samples are often hard to obtain in a working system. A simple generalisation of multiple capture-recapture does not rely on uniform sample sizes. Simulations show it is as accurate as the original method even when sample sizes vary considerably, making it a useful technique in real tools. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Software--distributed systems General Terms Experimentation, Measurement Keywords Size estimation
Paul Thomas
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Where SIGIR
Authors Paul Thomas
Comments (0)