Sciweavers

6 search results - page 1 / 2
» An Overview of the Tesseract OCR Engine
Sort
View
ICDAR
2007
IEEE
13 years 11 months ago
An Overview of the Tesseract OCR Engine
The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy[1], is described in a comprehensive overview. Emphasis is placed on aspec...
R. Smith
ICDAR
2009
IEEE
13 years 2 months ago
An Open Source Tesseract Based Optical Character Recognizer for Bangla Script
BanglaOCR is currently the only open source optical character recognition (OCR) software for the Bangla (Bengali) script developed by the Center for Research on Bangla Language Pr...
Md. Abul Hasnat, Muttakinur Rahman Chowdhury, Mumi...
ICDAR
2011
IEEE
12 years 4 months ago
Character Enhancement for Historical Newspapers Printed Using Hot Metal Typesetting
—We propose a new method for an effective removal of the printing artifacts occurring in historical newspapers which are caused by problems in the hot metal typesetting, a widely...
Iuliu Vasile Konya, Stefan Eickeler, Christoph Sei...
ICDAR
2009
IEEE
13 years 11 months ago
Hybrid Page Layout Analysis via Tab-Stop Detection
A new hybrid page layout analysis algorithm is proposed, which uses bottom-up methods to form an initial data-type hypothesis and locate the tab-stops that were used when the page...
Raymond W. Smith
ICIP
2009
IEEE
13 years 2 months ago
Semantic keyword extraction via adaptive text binarization of unstructured unsourced video
We propose a fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides. We use changes of text in the ...
Michele Merler, John R. Kender