The amount of readily available on-line text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be o...
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade fi...
Andrej Bratko, Gordon V. Cormack, Bogdan Filipic, ...
This paper has no novel learning or statistics: it is concerned with making a wide class of preexisting statistics and learning algorithms computationally tractable when faced wit...
We present a probabilistic model for generating personalised recommendations of items to users of a web service. The Matchbox system makes use of content information in the form o...
: Patent classification is a large scale hierarchical text classification (LSHTC) task. Though comprehensive comparisons, either learning algorithms or feature selection strategies...