The success of popular algorithms such as k-means clustering or nearest neighbor searches depend on the assumption that the underlying distance functions reflect domain-specific n...
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. Our approach is based on Locality Sensitive Hashing (LSH...
— There is an ever increasing need for storing data in smaller and smaller form factors driven by the ubiquitous use and increased demands of consumer electronics. A new approach...
Abstract-- Many applications are driven by evolving data -patterns in web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction...
Shixi Chen, Haixun Wang, Shuigeng Zhou, Philip S. ...