A locality-sensitive hash for real vectors

12 years 2 months ago
A locality-sensitive hash for real vectors
We present a simple and practical algorithm for the c-approximate near neighbor problem (c-NN): given n points P Rd and radius R, build a data structure which, given q Rd , can with probability 1 - return a point p P with dist(p, q) cR if there is any p P with dist(p , q) R. For c = d + 1, our algorithm deterministically ( = 0) preprocesses in time O(nd log d), space O(dn), and answers queries in expected time O(d2 ); this is the first known algorithm to deterministically guarantee an O(d)-NN solution in constant time with respect to n for all p metrics. A probabilistic version empirically achieves useful c values (c < 2) where c appears to grow minimally as d . A query time of O(d log d) is available, providing slightly less accuracy. These techniques can also be used to approximately find (pointers between) all pairs x, y P with dist(x, y) R in time O(nd log d). The key to the algorithm is a locality-sensitive hash: a mapping h : Rd U with the property that h(x) = h(y) i...
Tyler Neylon
Added 01 Mar 2010
Updated 02 Mar 2010
Type Conference
Year 2010
Where SODA
Authors Tyler Neylon
Comments (0)