To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
In this paper, we introduce a higher-order MRF optimization
framework. On the one hand, it is very general;
we thus use it to derive a generic optimizer that can be applied
to a...
Nikos Komodakis (University of Crete), Nikos Parag...
The Apriori algorithm's frequent itemset approach has become the standard approach to discovering association rules. However, the computation requirements of the frequent item...
The All Nearest Neighbor (ANN) operation is a commonly used primitive for analyzing large multi-dimensional datasets. Since computing ANN is very expensive, in previous works R*-t...
For a wide variety of classification algorithms, scalability to large databases can be achieved by observing that most algorithms are driven by a set of sufficient statistics that...