Sciweavers

EDBT
2008
ACM

Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic

14 years 4 months ago
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic
Estimating the number of distinct elements in a large multiset has several applications, and hence has attracted active research in the past two decades. Several sampling and sketching algorithms have been proposed to accurately solve this problem. The goal of the literature has always been to estimate the number of distinct elements while using minimal resources. However, in some modern applications, the accuracy of the estimate is of primal importance, and businesses are willing to trade more resources for better accuracy. Throughout our experience with building a distinct count system at a major search engine, Ask.com, we reviewed the literature of approximating distinct counts, and compared most algorithms in the literature. We deduced that Linear Counting, one of the least used algorithms, has unique and impressive advantages when the accuracy of the distinct count is critical to the business. For other estimators to attain comparable accuracy, they need more space than Linear Co...
Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Where EDBT
Authors Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi
Comments (0)