Distance-based methods in machine learning and pattern recognition have to rely on a metric distance between points in the input space. Instead of specifying a metric a priori, we...
Semi-structured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured d...
Abstract— During the last years, high throughput experiments have become very popular. During the analysis of such data the need for a functional grouping of genes arises. In thi...
The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standar...
Traditional bag-of-words model and recent wordsequence kernel are two well-known techniques in the field of text categorization. Bag-of-words representation neglects the word orde...
Lei Zhang, Debbie Zhang, Simeon J. Simoff, John K....