Sciweavers

ISMB
1998

Compression of Strings with Approximate Repeats

13 years 5 months ago
Compression of Strings with Approximate Repeats
We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original substring; this is close to the situation of DNA, for example. Typically there are many explanations for a given string under the model, some optimal and many suboptimal. Rather than commit to one optimal explanation, we sum the probabilities over all explanations under the model because this gives the probability of the data under the model. The model has a small number of parameters and these can be estimated from the given string by an expectation-maximization (EM) algorithm. Each iteration of the EM algorithm takes
Lloyd Allison, Timothy Edgoose, Trevor I. Dix
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1998
Where ISMB
Authors Lloyd Allison, Timothy Edgoose, Trevor I. Dix
Comments (0)