We present algorithms for fast quantile and frequency estimation in large data streams using graphics processor units (GPUs). We exploit the high computational power and memory ba...
Naga K. Govindaraju, Nikunj Raghuvanshi, Dinesh Ma...
Duplicate detection is the problem of detecting different entries in a data source representing the same real-world entity. While research abounds in the realm of duplicate detect...
The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the fe...
We introduce a new data mining problem: mining truth tables in binary datasets. Given a matrix of objects and the properties they satisfy, a truth table identifies a subset of pr...
Clifford Conley Owens III, T. M. Murali, Naren Ram...
We consider the following scenario for a mapping system: given a source schema, a target schema, and a set of value correspondences between these two schemas, generate an executab...