Sciweavers

DRR
2003
13 years 5 months ago
Information retrieval for OCR documents: a content-based probabilistic correction model
The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comprise of a significant amount of erroneous words and unfortunately most informat...
Rong Jin, ChengXiang Zhai, Alexander G. Hauptmann
DRR
2003
13 years 5 months ago
Document structure analysis algorithms: a literature survey
Document structure analysis can be regarded as a syntactic analysis problem. The order and containment relations among the physical or logical components of a document page can be...
Song Mao, Azriel Rosenfeld, Tapas Kanungo
DRR
2003
13 years 5 months ago
Automated labeling of bibliographic data extracted from biomedical online journals
A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, , affiliation and others) from online biomedical journals to p...
Jongwoo Kim, Daniel X. Le, George R. Thoma
DRR
2003
13 years 5 months ago
Correcting OCR text by association with historical datasets
The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting ...
Susan E. Hauser, Jonathan Schlaifer, Tehseen F. Sa...