The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned wi...
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Cov...
We study the prevalent problem when a test distribution differs from the training distribution. We consider a setting where our training set consists of a small number of sample d...
Ruslan Salakhutdinov, Sham M. Kakade, Dean P. Fost...
Handwriting recognition and OCR systems need to cope with a wide variety of writing styles and fonts, many of them possibly not previously encountered during training. This paper d...
In this paper, we propose a new approach for identifying the language type of character images. We do this by classifying individual character images to determine the language bou...