Practical data mining rarely falls exactly into the supervised learning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised...
We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clu...
Jaak Vilo, Alvis Brazma, Inge Jonassen, Alan J. Ro...
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
The objective of data reduction is to obtain a compact representation of a large data set to facilitate repeated use of non-redundant information with complex and slow learning alg...
We present a principled Bayesian framework for modeling partial memberships of data points to clusters. Unlike a standard mixture model which assumes that each data point belongs ...
Katherine A. Heller, Sinead Williamson, Zoubin Gha...