Sciweavers

52 search results - page 1 / 11
» Representing OCRed documents in HTML
Sort
View
ICDAR
1997
IEEE
13 years 8 months ago
Representing OCRed documents in HTML
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Tao Hong, Sargur N. Srihari
WWW
2005
ACM
14 years 5 months ago
Using visual cues for extraction of tabular data from arbitrary HTML documents
We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree, we also exploit visual cues in the rendered version of the document to extrac...
Bernhard Krüpl, Marcus Herzog, Wolfgang Gatte...
DRR
2010
13 years 7 months ago
Efficient automatic OCR word validation using word partial format derivation and language model
In this paper we present an OCR validation module, implemented for the System for Preservation of Electronic Resources (SPER) developed at the U.S. National Library of Medicine.1 ...
Siyuan Chen, Dharitri Misra, George R. Thoma
ICDAR
2005
IEEE
13 years 10 months ago
A Corpus for Comparative Evaluation of OCR Software and Postcorrection Techniques
We describe a new corpus collected for comparative evaluation of OCR-software and postcorrection techniques. The corpus is freely available for academic groups and use. The major ...
Stoyan Mihov, Klaus U. Schulz, Christoph Ringlstet...
JUCS
2011
97views more  JUCS 2011»
12 years 11 months ago
An OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets
: An OCR free word spotting method is developed and evaluated under a strong experimental protocol. Different feature sets are evaluated under the same experimental conditions. In ...
Israel Rios, Alceu de Souza Britto Jr., Alessandro...