Sciweavers

103 search results - page 2 / 21
» Comparing Massive High-Dimensional Data Sets
Sort
View
COMAD
2008
13 years 7 months ago
Disk-Based Sampling for Outlier Detection in High Dimensional Data
We propose an efficient sampling based outlier detection method for large high-dimensional data. Our method consists of two phases. In the first phase, we combine a "sampling...
Timothy de Vries, Sanjay Chawla, Pei Sun, Gia Vinh...
CIDM
2007
IEEE
13 years 12 months ago
Scalable Clustering for Large High-Dimensional Data Based on Data Summarization
Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notio...
Ying Lai, Ratko Orlandic, Wai Gen Yee, Sachin Kulk...
BMCBI
2007
123views more  BMCBI 2007»
13 years 5 months ago
Robust clustering in high dimensional data using statistical depths
Background: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor cent...
Yuanyuan Ding, Xin Dang, Hanxiang Peng, Dawn Wilki...
VLDB
2004
ACM
178views Database» more  VLDB 2004»
13 years 11 months ago
High-Dimensional OLAP: A Minimal Cubing Approach
Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications ...
Xiaolei Li, Jiawei Han, Hector Gonzalez
SODA
2010
ACM
171views Algorithms» more  SODA 2010»
14 years 2 months ago
Coresets and Sketches for High Dimensional Subspace Approximation Problems
We consider the problem of approximating a set P of n points in Rd by a j-dimensional subspace under the p measure, in which we wish to minimize the sum of p distances from each p...
Dan Feldman, Morteza Monemizadeh, Christian Sohler...