b-Bit minwise hashing

12 years 8 months ago
b-Bit minwise hashing
This paper establishes the theoretical framework of b-bit minwise hashing. The original minwise hashing method has become a standard technique for estimating set similarity (e.g., resemblance) with applications in information retrieval, data management, computational advertising, etc. By only storing b bits of each hashed value (e.g., b = 1 or 2), we gain substantial advantages in terms of storage space. We prove the basic theoretical results and provide an unbiased estimator of the resemblance for any b. We demonstrate that, even in the least favorable scenario, using b = 1
Ping Li, Arnd Christian König
Added 13 May 2010
Updated 13 May 2010
Type Conference
Year 2010
Where WWW
Authors Ping Li, Arnd Christian König
Comments (0)