A Distance-Based Packing Method for High Dimensional Data

10 years 6 months ago
A Distance-Based Packing Method for High Dimensional Data
Minkowski-sum cost model indicates that balanced data partitioning is not beneficial for high dimensional data. Thus we study several unbalanced partitioning methods and propose cost models for them based on Minkowski-sum cost model. Our cost models indicate that the distance to one of both ends of data space dominates the expected value under uniform data distribution. We generalize studied methods to adapt to data distribution and propose a new partitioning method, called DD–CSP (Distance-based Distribution–adaptive Cyclic Sliced Partition), for high–dimensional index structures. At each partition, it splits data from lower end or higher end to the center of data space based on distance cost function. Based on this fact, we propose a data structure called DSR(Dimension– independent Single value Representation) which takes constant amount of storage to represent MBHs(Minimum Bounding Hyper–cubes) independent of dimension. In our experimental studies, we compare DD–CSP wi...
Tae-wan Kim, Ki-Joune Li
Added 06 Jul 2010
Updated 06 Jul 2010
Type Conference
Year 2003
Where ADC
Authors Tae-wan Kim, Ki-Joune Li
Comments (0)