Duplicate elimination is an important stage in integrating data from multiple sources. The challenges involved are finding a robust deduplication function that can identify when t...
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries...
The long-running nature of continuous queries poses new scalability challenges for dataflow processing. CQ systems execute pipelined dataflows that may be shared across multiple q...
Mehul A. Shah, Joseph M. Hellerstein, Sirish Chand...
The performance of streaming media servers has been limited due to the dual requirements of high throughput and low memory use. Although disk throughput has been enjoying a 40% an...
Raju Rangaswami, Zoran Dimitrijevic, Edward Y. Cha...
We present a query architecture in which join operators are decomposed into their constituent data structures (State Modules, or SteMs), and dataflow among these SteMs is managed ...
Vijayshankar Raman, Amol Deshpande, Joseph M. Hell...