Sciweavers

TCBB
2010

VARUN: Discovering Extensible Motifs under Saturation Constraints

12 years 10 months ago
VARUN: Discovering Extensible Motifs under Saturation Constraints
Abstract-The discovery of motifs in biosequences is frequently torn between the rigidity of the model on the one hand and the abundance of candidates on the other. In particular, motifs that include wildcards or "dont cares" escalate exponentially with their number, and this gets only worse if a dont care is allowed to stretch up to some prescribed maximum length. In this paper, a notion of extensible motif in a sequence is introduced and studied, which tightly combines the structure of the motif pattern, as described by its syntactic specification, with the statistical measure of its occurrence count. It is shown that a combination of appropriate saturation conditions and the monotonicity of probabilistic scores over regions of constant frequency afford us significant parsimony in the generation and testing of candidate overrepresented motifs. A suite of software programs called Varun1 is described, implementing the discovery of extensible motifs of the type considered. The ...
Alberto Apostolico, Matteo Comin, Laxmi Parida
Added 21 May 2011
Updated 21 May 2011
Type Journal
Year 2010
Where TCBB
Authors Alberto Apostolico, Matteo Comin, Laxmi Parida
Comments (0)