This article discusses in detail the rating system that won the kaggle competition "Chess Ratings: Elo vs the rest of the world". The competition provided a historical d...
A good training dataset, representative of the test images expected in a given application, is critical for ensuring good performance of a visual categorization system. Obtaining ...
Aniruddha Kembhavi, Behjat Siddiquie, Roland Miezi...
Data captured from a live cellular network with the real users during their common daily routine help to understand how the users move within the network. Unlike the simulations wi...
Many real-world datasets can be clustered along multiple dimensions. For example, text documents can be clustered not only by topic, but also by the author's gender or sentim...
Abstract – The method of latent semantic indexing (LSI) is well known for tackling the synonymy and polysemy problems in information retrieval. However, its performance can be ve...
This paper presents a simple new algorithm that performs k-means clustering in one scan of a dataset, while using a bu er for points from the dataset of xed size. Experiments show...
An accurate cost-model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysi...
We present a novel hybrid technique for improving the predictive performance of an online Machine Learning system: Combining advantages from both memory based and concept based pr...
Marcus-Christopher Ludl, Achim Lewandowski, Georg ...
A popular model for protecting privacy when person-specific data is released is k-anonymity. A dataset is k-anonymous if each record is identical to at least (k - 1) other records ...
Data sharing between two organizations is common in many application areas e.g. business planing or marketing. Useful global patterns can be discovered from the integrated dataset...