Persistent Indexing Technology for Large Sequences

15 years 7 months ago

Download www.dcs.gla.ac.uk

There are two aspects to the work being presented here. The ﬁrst is a novel persistent index structure for genomic data, a prototype of which has been completed. The second, using this index as an example, is a generic index development framework, which is under construction. We propose a variation of the suﬃx tree, the Top Compressed Suﬃx Tree, which has been designed to allow the on-disk construction of indexes over multi-gigabyte sequences. This form of the suﬃx tree extends the work of Hunt et al. [1] by improving the performance of the partitioned construction algorithm when the size of the sequence being indexed is comparable to that of the available main memory, and by providing a compact representation of the index on secondary memory. This work forms part of the GIDOF project—a project to provide a Generic Index Development and Operation Framework. GIDOF addresses the management of performance-critical parameters, automatic parameter exploration and tuning, and the p...

Robert Japp

Real-time Traffic