We exploit sketch techniques, especially the Count-Min sketch, a memory, and time efficient framework which approximates the frequency of a word pair in the corpus without explic...
We present a fast algorithm for computing approximate quantiles in high speed data streams with deterministic error bounds. For data streams of size N where N is unknown in advanc...
In this paper we consider problems related to the sortedness of a data stream. First we investigate the problem of estimating the distance to monotonicity; given a sequence of len...
In recent years, the proliferation of VOIP data has created a number of applications in which it is desirable to perform quick online classification and recognition of massive voi...
We consider the problem of maintaining aggregates over recent elements of a massive data stream. Motivated by applications involving network data, we consider asynchronous data str...