Sciweavers

EDBT
2008
ACM

Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

14 years 3 months ago
Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach
Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partitionbased B+ -tree. Thus, given a query point, its k-NN search in highdimensional spaces can be transformed into search in a single dimensional space with the...
Yi Zhuang, Yueting Zhuang, Qing Li, Lei Chen 0002,
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Where EDBT
Authors Yi Zhuang, Yueting Zhuang, Qing Li, Lei Chen 0002, Yi Yu
Comments (0)