Shannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech proc...
Clustering on multi-type relational data has attracted more and more attention in recent years due to its high impact on various important applications, such as Web mining, e-comm...
Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, Philip...
Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon i...
Feature selection is often applied to highdimensional data prior to classification learning. Using the same training dataset in both selection and learning can result in socalled ...
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve suffici...