String kernels which compare the set of all common substrings between two given strings have recently been proposed by Vishwanathan & Smola (2004). Surprisingly, these kernels...
Clustering is one of the most widely used statistical tools for data analysis. Among all existing clustering techniques, k-means is a very popular method because of its ease of pr...
Some models of textual corpora employ text generation methods involving n-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assu...
Many algorithms have been proposed for the problem of time series classification. However, it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is except...
Xiaopeng Xi, Eamonn J. Keogh, Christian R. Shelton...
Semi-naive Bayesian classifiers seek to retain the numerous strengths of naive Bayes while reducing error by weakening the attribute independence assumption. Backwards Sequential ...