Sciweavers

NIPS
2008

Scalable Algorithms for String Kernels with Inexact Matching

14 years 1 months ago
Scalable Algorithms for String Kernels with Inexact Matching
We present a new family of linear time algorithms based on sufficient statistics for string comparison with mismatches under the string kernels framework. Our algorithms improve theoretical complexity bounds of existing approaches while scaling well with respect to the sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets with loose mismatch constraints our algorithms are several orders of magnitude faster than the existing algorithms for string comparison under the mismatch similarity measure. We evaluate our algorithms on synthetic data and real applications in music genre classification, protein remote homology detection and protein fold prediction. The scalability of the algorithms allows us to consider complex sequence transformations, modeled using longer string features and larger numbers of mismatches, leading to a state-of-the-art performance with significantly reduced running times.
Pavel P. Kuksa, Pai-Hsi Huang, Vladimir Pavlovic
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2008
Where NIPS
Authors Pavel P. Kuksa, Pai-Hsi Huang, Vladimir Pavlovic
Comments (0)