We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approxi...
Aris Anagnostopoulos, Andrei Z. Broder, David Carm...
Background: Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have be...
Xia Jiang, Richard E. Neapolitan, M. Michael Barma...
GPS devices allow recording the movement track of the moving object they are attached to. This data typically consists of a stream of spatio-temporal (x,y,t) points. For applicati...
Nowadays the Web represents a growing collection of an enormous amount of contents where the need for better ways to find and organize the available data is becoming a fundamental...
Francesco Ronzano, Andrea Marchetti, Maurizio Tesc...
We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...