This paper reports a statistical identification technique that differentiates scripts and languages in degraded and distorted document images. We identify scripts and languages th...
This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
A new technique to locate content-representing words for a given document image using representation of character shapes is described. A character shape code representation define...
Many documents are available to a computer only as images from paper. However, most natural language processing systems expect their input as character-coded text, which may be di...
The implementation of word spotting is not an easy procedure and it gets even worse in the case of historical documents since it requires character recognition and indexing of the...