Sciweavers

CBMS
1998
IEEE

Lexicon Assistance Reduces Manual Verification of OCR Output

13 years 9 months ago
Lexicon Assistance Reduces Manual Verification of OCR Output
An OCR system chosen for its high recognition rate and low percent of false positives also assigns low confidence values to many characters that are actually correct. Human operators must verify all words containing low confidence characters. We describe the creation of a lexicon optimized for automatically selectively resetting confidence values to high, thus reducing operator verification time. Two word lists, OCR Correct and OCR Incorrect, were extracted from files already processed and verified and became the standard for comparing candidate lexicons. A lexicon was selected from several candidate word lists maintained by the National Library of Medicine (NLM). In operation for about six months, lexicon assisted verification has been reducing the number of words requiring operator verification by over 50%. Background The Lister National Center for Biomedical Communications, a Research and Development Division of NLM, is developing a system [1] for semi-automated entry of journal ar...
Susan E. Hauser, A. C. Browne, George R. Thoma, Al
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1998
Where CBMS
Authors Susan E. Hauser, A. C. Browne, George R. Thoma, Alexa T. McCray
Comments (0)