Stop word detection in compressed textual images: An experiment on indic script documents

13 years 11 months ago

Download figment.cse.usf.edu

Stop word detection is attempted in this work in the context of retrieval of document images in the compressed domain. Algorithms are presented to identify text lines and words and to cluster similar words to count word occurrence frequencies. A list of words with their occurrence frequencies is generated from a corpus of textual images. As stop words in any language show high occurrence frequencies, such words occupy the upper positions in the sorted word list. Experiments have been carried out on two major Indic scripts (Devanagari (Hindi) and Bangla). Test results using 150 document images consisting of about 12K words in each script show the promising potential of the proposed approach.

Utpal Garain, Amit Kumar Das

Real-time Traffic

Computer Vision | ICPR 2008 | Occurrence Frequencies | Stop Words | Word Occurrence Frequencies |

claim paper

Post Info
More Details (n/a)

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICPR
Authors	Utpal Garain, Amit Kumar Das

Comments (0)

Sciweavers

Stop word detection in compressed textual images: An experiment on indic script documents

Computer Vision | ICPR 2008 | Occurrence Frequencies | Stop Words | Word Occurrence Frequencies |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers