Sciweavers

BMCBI
2010

Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated

13 years 4 months ago
Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated
Background: Large datasets of protein interactions provide a rich resource for the discovery of Short Linear Motifs (SLiMs) that recur in unrelated proteins. However, existing methods for estimating the probability of motif recurrence may be biased by the size and composition of the search dataset, such that p-value estimates from different datasets, or from motifs containing different numbers of non-wildcard positions, are not strictly comparable. Here, we develop more exact methods and explore the potential biases of computationally efficient approximations. Results: A widely used heuristic for the calculation of motif over-representation approximates motif probability by assuming that all proteins have the same length and composition. We introduce pv, which calculates the probability exactly. Secondly, the recently introduced SLiMFinder statistic Sig, accounts for multiple testing (across all possible motifs) in motif discovery. However, it approximates the probability of all other...
Norman E. Davey, Richard J. Edwards, Denis C. Shie
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2010
Where BMCBI
Authors Norman E. Davey, Richard J. Edwards, Denis C. Shields
Comments (0)