A central problem in information retrieval is the automated classification of text documents. While many existing methods achieve good levels of performance, they generally require...
Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The tradi...
Cross-language text classification (CLTC) aims to take advantage of existing training data from one language to construct a classifier for another language. In addition to the expe...
In this paper we generalize the LARS feature selection method to the linear SVM model, derive an efficient algorithm for it, and empirically demonstrate its usefulness as a featur...
Abstract. The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds o...
David Vilar, Hermann Ney, Alfons Juan, Enrique Vid...