Sciweavers

BMCBI
2008

Subfamily specific conservation profiles for proteins based on n-gram patterns

13 years 4 months ago
Subfamily specific conservation profiles for proteins based on n-gram patterns
Background: A new algorithm has been developed for generating conservation profiles that reflect the evolutionary history of the subfamily associated with a query sequence. It is based on ngram patterns (NP{n,m}) which are sets of n residues and m wildcards in windows of size n+m. The generation of conservation profiles is treated as a signal-to-noise problem where the signal is the count of n-gram patterns in target sequences that are similar to the query sequence and the noise is the count over all target sequences. The signal is differentiated from the noise by applying singular value decomposition to sets of target sequences rank ordered by similarity with respect to the query. Results: The new algorithm was used to construct 4,248 profiles from 120 randomly selected Pfam-A families. These were compared to profiles generated from multiple alignments using the consensus approach. The two profiles were similar whenever the subfamily associated with the query sequence was well repres...
John K. Vries, Xiong Liu
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Where BMCBI
Authors John K. Vries, Xiong Liu
Comments (0)