In earlier work we have introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data set...
We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning wher...
Building useful classification models can be a challenging endeavor, especially when training data is imbalanced. Class imbalance presents a problem when traditional classificatio...
Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van H...
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying character...
Yuanyuan Tian, Richard A. Hankins, Jignesh M. Pate...
The problem of clustering is often addressed with techniques based on a Voronoi partition of the data space. Vector quantization is based on a similar principle, but it is a diffe...