Sciweavers

103 search results - page 2 / 21
» Comparing Massive High-Dimensional Data Sets
Sort
View
COMAD
2008
15 years 1 months ago
Disk-Based Sampling for Outlier Detection in High Dimensional Data
We propose an efficient sampling based outlier detection method for large high-dimensional data. Our method consists of two phases. In the first phase, we combine a "sampling...
Timothy de Vries, Sanjay Chawla, Pei Sun, Gia Vinh...
CIDM
2007
IEEE
15 years 6 months ago
Scalable Clustering for Large High-Dimensional Data Based on Data Summarization
Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notio...
Ying Lai, Ratko Orlandic, Wai Gen Yee, Sachin Kulk...
BMCBI
2007
123views more  BMCBI 2007»
14 years 11 months ago
Robust clustering in high dimensional data using statistical depths
Background: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor cent...
Yuanyuan Ding, Xin Dang, Hanxiang Peng, Dawn Wilki...
VLDB
2004
ACM
178views Database» more  VLDB 2004»
15 years 5 months ago
High-Dimensional OLAP: A Minimal Cubing Approach
Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications ...
Xiaolei Li, Jiawei Han, Hector Gonzalez
SODA
2010
ACM
171views Algorithms» more  SODA 2010»
15 years 9 months ago
Coresets and Sketches for High Dimensional Subspace Approximation Problems
We consider the problem of approximating a set P of n points in Rd by a j-dimensional subspace under the p measure, in which we wish to minimize the sum of p distances from each p...
Dan Feldman, Morteza Monemizadeh, Christian Sohler...