Sciweavers

ICDAR
2011
IEEE

Character n-Gram Spotting in Document Images

12 years 4 months ago
Character n-Gram Spotting in Document Images
—In this paper, we present a novel approach to search and retrieve from document image collections, without explicit recognition. Existing recognition-free approaches such as word-spotting cannot scale to arbitrarily large vocabulary and document image collections. In this paper we put forth a framework that overcomes three issues of word-spotting: i) retrieving word images not labeled during indexing, ii) allow for query and retrieval of morphological variations of words and iii) scale the retrieval to large collections. We propose a character n-gram spotting framework, where word-images are considered as a bag of visual n-grams. The character n-grams are represented in a visual-feature space and indexed for quick retrieval. In the retrieval phase, the query word is expanded to its constituent n-grams, which are used to query the previously built index. A ranking mechanism is proposed that combines the retrieval results from the multiple lists corresponding to each n-gram. The appro...
M. Sudha Praveen, K. Pramod Sankar, C. V. Jawahar
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Where ICDAR
Authors M. Sudha Praveen, K. Pramod Sankar, C. V. Jawahar
Comments (0)