A plethora of data sources contain data entities that could be ordered according to a variety of attributes associated with the entities. Such orderings result effectively in a ra...
In a visualization system, one of the key issues is to optimize performance and visual fidelity. This is especially critical for large virtual environments where the models do not...
Duplicate elimination is an important stage in integrating data from multiple sources. The challenges involved are finding a robust deduplication function that can identify when t...
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries...
The long-running nature of continuous queries poses new scalability challenges for dataflow processing. CQ systems execute pipelined dataflows that may be shared across multiple q...
Mehul A. Shah, Joseph M. Hellerstein, Sirish Chand...
The performance of streaming media servers has been limited due to the dual requirements of high throughput and low memory use. Although disk throughput has been enjoying a 40% an...
Raju Rangaswami, Zoran Dimitrijevic, Edward Y. Cha...
We present a query architecture in which join operators are decomposed into their constituent data structures (State Modules, or SteMs), and dataflow among these SteMs is managed ...
Vijayshankar Raman, Amol Deshpande, Joseph M. Hell...
A Web repository is a large special-purpose collection of Web pages and associated indexes. Many useful queries and computations over such repositories involve traversal and navig...
The output of boolean association rule mining algorithms is often too large for manual examination. For dense datasets, it is often impractical to even generate all frequent items...