We have been developing a Grid-enabled MPI communication library called GridMPI, which is designed to run on multiple clusters connected to a wide-area network. Some of these clust...
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
Traditional workload management methods mainly focus on the current system status while information about the interaction between queued and running transactions is largely ignore...
Gang Luo, Jeffrey F. Naughton, Curt J. Ellmann, Mi...
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar d...
More and more applications rely heavily on large amounts of data in the distributed storages collected over time or produced by large scale scientific experiments or simulations. ...