Overestimation for Multiple Sequence Alignment
Abstract— Multiple sequence alignment is an important problem in computational biology. A-star is an algorithm that can be used to find exact alignments. We present a simple modification of the A-star algorithm that improves much multiple sequence alignment, both in time and memory, at the cost of a small accuracy loss. It consists in overestimating the admissible heuristic. A typical speedup for random sequences of length two hundred fifty is 47 associated to a memory gain of 13 with an error rate of 0.09%. Concerning real sequences, the speedup can be greater than 13,000 and the memory gain greater than 150, the error rate being in the range from 0.08% to 0.71% for the sequences we have tested. Overestimation can align sequences that are not possible to align with the exact algorithm.
Tristan Cazenave
