The massive data streams observed in network monitoring, data processing and scientific studies are typically too large to store. For many applications over such data, we must ob...
A fundamental problem in data management is to draw and maintain a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With la...
Graham Cormode, S. Muthukrishnan, Ke Yi, Qin Zhang
Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximat...
Theodore Johnson, S. Muthukrishnan, Irina Rozenbau...
Reservoir sampling is a well-known technique for sequential random sampling over data streams. Conventional reservoir sampling assumes a fixed-size reservoir. There are situation...
Mohammed Al-Kateb, Byung Suk Lee, Xiaoyang Sean Wa...
Reservoir sampling is a well-known technique for random sampling over data streams. In many streaming applications, however, an input stream may be naturally heterogeneous, i.e., c...