Efficient Bulk Loading of Large High-Dimensional Indexes

12 years 7 months ago
Efficient Bulk Loading of Large High-Dimensional Indexes
Efficient index construction in multidimensional data spaces is important for many knowledge discovery algorithms, because construction times typically must be amortized by performance gains in query processing. In this paper, we propose a generic bulk loading method which allows the application of user-defined split strategies in the index construction. This approach allows the adaptation of the index properties to the requirements of a specific knowledge discovery algorithm. As our algorithm takes into account that large data sets do not fit in main memory, our algorithm is based on external sorting. Decisions of the split strategy can be made according to a sample of the data set which is selected automatically. The sort algorithm is a variant of the well-known Quicksort algorithm, enhanced to work on secondary storage. The index construction has a runtime complexity of O(n log n). We show both analytically and experimentally that the algorithm outperforms traditional index construc...
Christian Böhm, Hans-Peter Kriegel
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1999
Authors Christian Böhm, Hans-Peter Kriegel
Comments (0)