We investigate the problem of learning document classifiers in a multilingual setting, from collections where labels are only partially available. We address this problem in the ...
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system shoul...
Training a system to recognize handwritten words is a task that requires a large amount of data with their correct transcription. However, the creation of such a training set, inc...
Automatic recognition of named entities such as people, places, organizations, books, and movies across the entire web presents a number of challenges, both of scale and scope. Da...
Casey Whitelaw, Alexander Kehlenbeck, Nemanja Petr...
Current statistical parsers tend to perform well only on their training domain and nearby genres. While strong performance on a few related domains is sufficient for many situatio...