We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...
Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a comm...
We describe a simple randomized construction for generating pairs of hash functions h1, h2 from a universe U to ranges V = [m] = {0, 1, . . . , m - 1} and W = [m] so that for ever...
This paper provides the first explicit construction of extractors which are simultaneously optimal up to constant factors in both seed length and output length. More precisely, fo...
Chi-Jen Lu, Omer Reingold, Salil P. Vadhan, Avi Wi...
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying character...
Yuanyuan Tian, Richard A. Hankins, Jignesh M. Pate...