Machine Learning in Basecalling Decoding Trace Peak Behaviour

13 years 10 months ago
Machine Learning in Basecalling Decoding Trace Peak Behaviour
— DNA sequence basecalling is commonly regarded as a solved problem, despite significant error rates being reflected in inaccuracies in databases and genome annotations. These errors commonly arise from an inability to sequence through peak height variations in DNA sequencing traces from the Sanger sequencing method. Recent efforts toward improving basecalling accuracy have taken the form of more sophisticated digital filters and feature detectors. We demonstrate that the variation in peak heights itself encodes novel information which can be used for basecalling. To isolate this information for a clear demonstration, we perform a peculiar blind basecalling experiment using ABI processed output. Using classifiers responding to measurements in the context of the basecalling position, we call bases without reference to the peak heights at the basecalling position itself. Tree classifiers indicate which features are pertinent, and the application of neural nets to these features results...
David Thornley, Stavros Petridis
Added 10 Jun 2010
Updated 10 Jun 2010
Type Conference
Year 2006
Authors David Thornley, Stavros Petridis
Comments (0)