Sciweavers

EVOW
2004
Springer

Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation

13 years 8 months ago
Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation
A number of bioinformatics tools use regular expression (RE) matching to locate protein or DNA sequence motifs that have been discovered by researchers in the laboratory. For example, patterns representing nuclear localisation signals (NLSs) are used to predict nuclear localisation. NLSs are not yet well understood, and so the set of currently known NLSs may be incomplete. Here we use genetic programming (GP) to generate RE-based classifiers for nuclear localisation. While the approach is a supervised one (with respect to protein location), it is unsupervised with respect to alreadyknown NLSs. It therefore has the potential to discover new NLS motifs. We apply both treebased and linear GP to the problem. The inclusion of predicted secondary structure in the input does not improve performance. Benchmarking shows that our majority classifiers are competitive with existing tools. The evolved REs are usually "NLS-like" and work is underway to analyse these for novelty.
Amine Heddad, Markus Brameier, Robert M. MacCallum
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where EVOW
Authors Amine Heddad, Markus Brameier, Robert M. MacCallum
Comments (0)