MAP: Searching Large Genome Databases

13 years 5 months ago

Download www.cise.ufl.edu

A number of biological applications require comparison of large genome strings. Current techniques suffer from both disk I/O and computational cost because of extensive memory requirements and large candidate sets. We propose an efﬁcient technique for alignment of large genome strings. Our technique precomputes the associations between the database strings and the query string. These associations are used to prune the database-query substring pairs that do not contain similar regions. We use a hash table to compare the unpruned regions of the query and database strings. The cost of the ensuing search is determined by how the hash table is constructed. We present a dynamic strategy that optimizes the random disk I/O needed for accessing the hash table. It also provides the user a coarse grain visualization of the similarity pattern quickly before the actual search. The experimental results show that our technique aligns genome strings up to 97 times faster than BLAST.

Tamer Kahveci, Ambuj K. Singh

Real-time Traffic