Sciweavers

DIAL
2004
IEEE

Document Style Census for OCR

13 years 8 months ago
Document Style Census for OCR
Four methods of converting paper documents to computer-readable form are compared with regard to hypothetical labor cost: keyboarding, omnifont OCR, stylespecific OCR, and style-constrained or styleadaptive OCR. The best choice is determined primarily by (1) the reject rates of the various OCR systems at a given error rate, (2) the fraction of the material that must be labeled for training the system, and (3) the cost of partitioning the material according to style. For large corpora, sampling strategies are proposed both for estimating conversion costs and for taking advantage of style homogeneity.
George Nagy, Prateek Sarkar
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where DIAL
Authors George Nagy, Prateek Sarkar
Comments (0)