We define and solve the problem of "distribution classification", and, in general, "distribution mining". Given n distributions (i.e., clouds) of multi-dimensi...
Yasushi Sakurai, Rosalynn Chong, Lei Li, Christos ...
: The systematic assessment, storage, and retrieval of data quality scores has proven to be an elusive problem, often tackled only with classifications, questionnaires, and models....
Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when...
Giovanni Giuffrida, Wesley W. Chu, Dominique M. Ha...
We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. O...
In this paper we show how frequent sequence mining (FSM) can be applied to data produced by monitoring distributed enterprise applications. In particular we show how we applied FSM...