We investigate the following problem: Given a set of documents of a particular topic or class ?, and a large set ? of mixed documents that contains documents from class ? and othe...
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this pape...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
This paper addresses feature selection techniques for classification of high dimensional data, such as those produced by microarray experiments. Some prior knowledge may be availa...
Supervised clustering is the problem of training a clustering algorithm to produce desirable clusterings: given sets of items and complete clusterings over these sets, we learn ho...
We use Wikipedia articles to semantically inform the generation of query models. To this end, we apply supervised machine learning to automatically link queries to Wikipedia artic...