Sciweavers

ICDAR
2011
IEEE

Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method

12 years 3 months ago
Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method
—In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in nonLatin scripts. Keywords-Word Spotting, Heterogeneous Document Collections, Dense SIFT Features, Latent Semantic Indexing.
Marçal Rusiñol, David Aldavert, Rica
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Where ICDAR
Authors Marçal Rusiñol, David Aldavert, Ricardo Toledo, Josep Lladós
Comments (0)