We apply a new active learning formulation to the problem of learning medical concepts from unstructured text. The new formulation is based on maximizing the mutual information th...
In language modeling for speech recognition, both the amount of training data and the match to the target task impact the goodness of the model, with the trade-off usually favorin...
Marius A. Marin, Sergey Feldman, Mari Ostendorf, M...
PKIP, Patterned Keywords in Phrase, is our feature selection approach to text categorization (TC) for item banks. An item bank is a collection of textual data in which each item c...
Atorn Nuntiyagul, Nick Cercone, Kanlaya Naruedomku...
Abstract. Standard Support Vector Machines (SVM) text classification relies on bag-of-words kernel to express the similarity between documents. We show that a document lattice can ...
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected ba...