Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

72

Voted

ICPR
2008
IEEE

favoriteEmaildiscussreport

126views Computer Vision» more ICPR 2008»

A robust technique for text extraction in mixed-type binary documents

15 years 7 months ago

A robust technique for text extraction in mixed-type binary documents

Download figment.cse.usf.edu

A crucial preprocessing stage in applications such as OCR is text extraction from mixed-type documents. The present work, in contrast to most until now, successfully faces the problem of varying text orientation and size. The technique ﬁrst identiﬁes marks using a contour following technique, followed by a PCA (Principal Component Analyzer) which determines the direction of the main axis of each mark. Next, a nearest-neighbor technique is employed to ﬁnd the shortest distances between marks, and a feature vector is formed based on calculated mark dimensions and distances, which is then fed into a SOFM (Self Organizing Feature Map) which deﬁnes homogeneous mark clusters. Resulting cluster weights and variances are used to form a set of fuzzy rules, and a fuzzy classiﬁcation scheme identiﬁes marks as characters or non-characters. The technique succeeds in correctly and quickly extracting text areas in a variety of mixed-type documents.

Charalambos Strouthopoulos, Athanasios Nikolaidis

Real-time Traffic

Computer Vision | ICPR 2008 | Identiﬁes Marks | Mixed-type Documents | Technique ﬁrst Identiﬁes |

claim paper

Related Content

» Text extraction in complex color documents

» Data Hiding for Binary Documents Robust to PrintScan Photocopy and Geometric Distortions

» Text Extraction from Gray Scale Historical Document Images Using Adaptive Local Connectivi...

» Robust Extraction of Text from Camera Images

» Removing RuleLines from Binary Handwritten Arabic Document Images Using Directional Local ...

» Improving binary classification on text problems using differential word features

» Feature diversity in cluster ensembles for robust document clustering

» Video text recognition using feature compensation as categorydependent feature extraction

» Extractive summarisation of legal texts

Post Info
More Details (n/a)

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICPR
Authors	Charalambos Strouthopoulos, Athanasios Nikolaidis

Comments (0)