Parallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF8 to UTF-...
We introduce a new approach to analyzing click logs by examining both the documents that are clicked and those that are bypassed--documents returned higher in the ordering of the ...
Atish Das Sarma, Sreenivas Gollapudi, Samuel Ieong
We present a family of algorithms to uncover tribes--groups of individuals who share unusual sequences of affiliations. While much work inferring community structure describes lar...
One fundamental task in near-neighbor search as well as other similarity matching efforts is to find a distance function that can efficiently quantify the similarity between two o...
The problem of measuring "similarity" of objects arises in many applications, and many domain-specific measures have been developed, e.g., matching text across documents...