A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences

13 years 1 months ago

Download www.biomedcentral.com

Background: We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. The proposed clustering algorithm uses a grammar-based distance metric to determine partitioning for a set of biological sequences. The algorithm performs clustering in which new sequences are compared with cluster-representative sequences to determine membership. If comparison fails to identify a suitable cluster, a new cluster is created. Results: The performance of the proposed algorithm is validated via comparison to the popular DNA/RNA sequence clustering approach, CD-HIT-EST, and to the recently developed algorithm, UCLUST, using two different sets of 16S rDNA sequences from 2,255 genera. The proposed algorithm maintains a comparable CPU execution time with that of CD-HIT-EST which is much slower than UCLUST, and has successfully generated clusters with higher statistical accuracy than both CD-HIT-EST an...

David J. Russell, Samuel F. Way, Andrew K. Benson,

Real-time Traffic

16S RDNA Sequences | Accurate Clustering Algorithm | Algorithms | BMCBI 2010 | Business |

claim paper

Post Info
More Details (n/a)

Added	28 Feb 2011
Updated	28 Feb 2011
Type	Journal
Year	2010
Where	BMCBI
Authors	David J. Russell, Samuel F. Way, Andrew K. Benson, Khalid Sayood

Comments (0)

Sciweavers

A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences

16S RDNA Sequences | Accurate Clustering Algorithm | Algorithms | BMCBI 2010 | Business |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers