We propose an efficient sampling based outlier detection method for large high-dimensional data. Our method consists of two phases. In the first phase, we combine a "sampling...
Timothy de Vries, Sanjay Chawla, Pei Sun, Gia Vinh...
Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notio...
Ying Lai, Ratko Orlandic, Wai Gen Yee, Sachin Kulk...
Background: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor cent...
Yuanyuan Ding, Xin Dang, Hanxiang Peng, Dawn Wilki...
Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications ...
We consider the problem of approximating a set P of n points in Rd by a j-dimensional subspace under the p measure, in which we wish to minimize the sum of p distances from each p...
Dan Feldman, Morteza Monemizadeh, Christian Sohler...