Sciweavers

CN
2008

The eternal sunshine of the sketch data structure

13 years 4 months ago
The eternal sunshine of the sketch data structure
In the past years there has been significant research on developing compact data structures for summarizing large data streams. A family of such data structures is the so-called sketches. Sketches bear similarities to the well-known Bloom filters [2] and employ hashing techniques to approximate the count associated with an arbitrary key in a data stream using fixed memory resources. One limitation of sketches is that when used for summarizing long data streams, they gradually saturate, resulting in a potentially large error on estimated key counts. In this work, we introduce two techniques to address this problem based on the observation that real-world data streams often have many transient keys that appear for short time periods and do not re-appear later on. After entering the data structure, these keys contribute to hashing collisions and thus reduce the estimation accuracy of sketches. Our techniques use a limited amount of additional memory to detect transient keys and to period...
Xenofontas A. Dimitropoulos, Marc Ph. Stoecklin, P
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Where CN
Authors Xenofontas A. Dimitropoulos, Marc Ph. Stoecklin, Paul Hurley, Andreas Kind
Comments (0)