We investigate the following problem: Given a set of documents of a particular topic or class ?, and a large set ? of mixed documents that contains documents from class ? and othe...
Dataset shift from the training data in a source domain to the data in a target domain poses a great challenge for many statistical learning methods. Most algorithms can be viewed ...
Most text categorization methods require text content of documents that is often difficult to obtain. We consider "Collaborative Text Categorization", where each document...
This paper presents a new approach designed to reduce the computational load of the existing clustering algorithms by trimming down the documents size using fingerprinting methods...
The goal of text categorization is to classify documents into a certain number of pre-defined categories. The previous works in this area have used a large number of labeled train...