This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. ...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...
In many topic identification applications, supervised training labels are indirectly related to the semantic content of the documents being classified. For example, many topical...
Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper...
A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, ...
: Feature selection methods are often applied in the context of document classification. They are particularly important for processing large data sets that may contain millions of...
Janez Brank, Dunja Mladenic, Marko Grobelnik, Nata...