Sciweavers

TKDE
2010

Duplicate-Insensitive Order Statistics Computation over Data Streams

13 years 2 months ago
Duplicate-Insensitive Order Statistics Computation over Data Streams
—Duplicates in data streams may often be observed by the projection on a subspace and/or multiple recordings of objects. Without the uniqueness assumption on observed data elements, many conventional aggregates computation problems need to be further investigated due to their duplication sensitive nature. In this paper, we present novel, space-efficient, one-scan algorithms to continuously maintain duplicate insensitive order sketches so that rank-based queries can be approximately processed with a relative rank error guarantee ǫ in the presence of data duplicates. Besides the space efficiency, the proposed algorithms are time-efficient and highly accurate. Moreover, our techniques may be immediately applied to the heavy hitter problem against distinct elements and to the existing fault-tolerant distributed communication techniques. A comprehensive performance study demonstrates that our algorithms can support real-time computation against high speed data streams.
Ying Zhang, Xuemin Lin, Yidong Yuan, Masaru Kitsur
Added 31 Jan 2011
Updated 31 Jan 2011
Type Journal
Year 2010
Where TKDE
Authors Ying Zhang, Xuemin Lin, Yidong Yuan, Masaru Kitsuregawa, Xiaofang Zhou, Jeffrey Xu Yu
Comments (0)