Abstract. A new constrained model is discussed as a way of incorporating efficiently a priori expert knowledge into a clustering problem of a given individual set. The first innova...
Abstract—Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes ...
The theoretical characterisation of multiword expressions (MWEs) is tightly connected to their actual occurrences in data and to their representation in lexical resources. We pres...
In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem for spoken language in that it is difficult to collect traini...
Discovering different types of file resources (such as documentation, programs, and images) in the vast amount of data contained within network file systems is useful for both u...