Most pattern discovery algorithms easily generate very large numbers of patterns, making the results impossible to understand and hard to use. Recently, the problem of instead sel...
Hannes Heikinheimo, Jilles Vreeken, Arno Siebes, H...
—With the vast increase in collection and storage of data, the problem of data summarization is most critical for effective data management. Since much of this data is categorica...
Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications, especially for Internet classification tasks like review spam...
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model ...
The adoption of distributed version control (DVC), such as Git and Mercurial, in open-source software (OSS) projects has been explosive. Why is this and how are projects using DVC?...
Earl T. Barr, Christian Bird, Peter C. Rigby, Abra...