Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
We investigate how to organize a large collection of geotagged photos, working with a dataset of about 35 million images collected from Flickr. Our approach combines content analy...
David J. Crandall, Lars Backstrom, Daniel P. Hutte...
This paper presents an interdisciplinary investigation of statistical information retrieval (IR) techniques for protein identification from tandem mass spectra, a challenging probl...
Bioinformatics data is growing at a phenomenal rate. Besides the exponential growth of individual databases, the number of data depositories is increasing too. Because of the comp...
Data stream applications have made use of statistical summaries to reason about the data using nonparametric tools such as histograms, heavy hitters, and join sizes. However, rela...