Sciweavers

ICPR
2008
IEEE

Stop word detection in compressed textual images: An experiment on indic script documents

13 years 11 months ago
Stop word detection in compressed textual images: An experiment on indic script documents
Stop word detection is attempted in this work in the context of retrieval of document images in the compressed domain. Algorithms are presented to identify text lines and words and to cluster similar words to count word occurrence frequencies. A list of words with their occurrence frequencies is generated from a corpus of textual images. As stop words in any language show high occurrence frequencies, such words occupy the upper positions in the sorted word list. Experiments have been carried out on two major Indic scripts (Devanagari (Hindi) and Bangla). Test results using 150 document images consisting of about 12K words in each script show the promising potential of the proposed approach.
Utpal Garain, Amit Kumar Das
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Where ICPR
Authors Utpal Garain, Amit Kumar Das
Comments (0)