External Memory Stream Sampling

3 years 8 months ago
External Memory Stream Sampling
This paper aims to understand the I/O-complexity of maintaining a big sample set—whose size exceeds the internal memory’s capacity—on a data stream. We study this topic in a new computation model, named the external memory stream (EMS) model, that naturally extends the standard external memory model to stream environments. A suite of EMS-indigenous techniques are presented to prove matching lower and upper bounds for with-replacement (WR) and without-replacement (WoR) sampling on append-only and time-based sliding window streams, respectively. Our results imply that, compared to RAM, the EMS model is perhaps a more suitable computation model for studying stream sampling, because the new model separates different problems by their hardness in ways that could not be observed in RAM. Categories and Subject Descriptors F.2.2 [Analysis of algorithms and problem complexity]: Nonnumerical Algorithms and Problems Keywords Stream; Sampling; I/O-Efficient Algorithms; Lower Bound
Xiaocheng Hu, Miao Qiao, Yufei Tao
Added 16 Apr 2016
Updated 16 Apr 2016
Type Journal
Year 2015
Where PODS
Authors Xiaocheng Hu, Miao Qiao, Yufei Tao
Comments (0)