Multiple organism gene finding by collapsed gibbs sampling
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then numerous variants of the original idea have emerged, however in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from
Sourav Chatterji, Lior Pachter
