Though it has cost great research efforts for decades, object recognition is still a challenging problem. Traditional methods based on machine learning or computer vision are stil...
Xin-Jing Wang, Ming Liu, Lei Zhang, Yi Li, Wei-Yin...
The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting ...
Susan E. Hauser, Jonathan Schlaifer, Tehseen F. Sa...
— We present a general approach for the hierarchical segmentation and labeling of document layout structures. This approach models document layout as a grammar and performs a glo...
Structured document content reuse is the problem of restructuring and translating data structured under a source schema into an instance of a target schema. A notion closely tied ...
This paper presents a method of automatically creating hypermedia documents from conventional transcriptions of television programs. Using parallel text alignment techniques, the ...