Many database applications require the analysis and processing of data streams. In such systems, huge amounts of data arrive rapidly and their values change over time. The variati...
Lv-an Tang, Bin Cui, Hongyan Li, Gaoshan Miao, Don...
Distributed Hash Tables (DHTs) provide a scalable solution for data sharing in P2P systems. To ensure high data availability, DHTs typically rely on data replication, yet without ...
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
The United States Environmental Protection Agency (EPA) prepares a national criteria and hazardous air pollutant (HAP) emission inventory with input from numerous states, local an...
Clustering is the process of grouping a set of objects into classes of similar objects. Although definitions of similarity vary from one clustering model to another, in most of th...
Haixun Wang, Wei Wang 0010, Jiong Yang, Philip S. ...