Correcting OCR text by association with historical datasets

15 years 6 months ago

Download lhncbc.nlm.nih.gov

The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting algorithms to generate electronic bibliographic citation data from paper biomedical journal articles. The multi-engine OCR server incorporated in MARS performs well in general, but fares less well with text printed in small or italic fonts. Affiliations are often printed in small italic fonts in the journals processed by MARS. Consequently, although the automatic processes generate much of the citation data correctly, the affiliation field frequently contains incorrect data, which must be manually corrected by verification operators. In contrast, author names are usually printed in large, normal fonts that are correctly converted to text by the OCR server. The National Library of Medicine's MEDLINE® database contains 11 million indexed citations for biomedical journal articles. This paper documents our ...

Susan E. Hauser, Jonathan Schlaifer, Tehseen F. Sa

Real-time Traffic

Affiliation | Document Analysis | DRR 2003 | OCR Output | OCR Server |

claim paper

» A comprehensive evaluation methodology for noisy historical document recognition technique...

» Noninteractive OCR Postcorrection for GigaScale Digitization Projects

» A Complete Optical Character Recognition Methodology for Historical Documents

» Aletheia An Advanced Document Layout and Text GroundTruthing System for Production Enviro...

» The PAGE Page Analysis and GroundTruth Elements Format Framework

» Recognition Driven Page Orientation Detection

» Data association for topic intensity tracking

» Building text features for object image classification

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	DRR
Authors	Susan E. Hauser, Jonathan Schlaifer, Tehseen F. Sabir, Dina Demner-Fushman, Scott Straughan, George R. Thoma

Comments (0)

Sciweavers

Correcting OCR text by association with historical datasets

Affiliation | Document Analysis | DRR 2003 | OCR Output | OCR Server |

Explore & Download

Productivity Tools

Sciweavers