Online reviews are often accompanied with numerical ratings provided by users for a set of service or product aspects. We propose a statistical model which is able to discover cor...
Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web ï...
There are many accurate methods for language identification of long text samples, but identification of very short strings still presents a challenge. This paper studies a languag...
We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is...
Craig Saunders, John Shawe-Taylor, Juho Rousu, S&a...
When linear support vector machines (SVMs) are applied to multi-class text categorization in industry, the size of the linear SVM model is very large, usually greater than several...