Searching in Document Images

13 years 7 months ago
Searching in Document Images
Searching in scanned documents is an important problem in Digital Libraries. If OCRs are not available, the scanned images are inaccessible. In this paper, we demonstrate a searching procedure without an intermediate textual representation. We achieve effective retrieval from document databases by matching at word-level using image features. Word profiles, structural features and transform domain representations are employed for characterising the word images. A novel partial matching approach based on dynamic time warping (DTW) is proposed to take care of word form variations. With the new partial matching procedure, morphologically variant words become similar in image space. This is specially useful for grouping together similar words for indexing purpose. We extend our formulation for cross-lingual search with the help of transliteration.
C. V. Jawahar, Million Meshesha, A. Balasubramania
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Authors C. V. Jawahar, Million Meshesha, A. Balasubramanian
Comments (0)