We propose a new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inlier...
Shohei Hido, Yuta Tsuboi, Hisashi Kashima, Masashi...
Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data m...
It is desirable to find unusual data objects by Ramaswamy et al's distance-based outlier definition because only a metric distance function between two objects is required. It...
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no al...
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Gan...
This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments ...