ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Document clustering has been used for better document retrieval, document browsing, and text mining. In this paper, we investigate if biomedical ontology MeSH improves the cluster...
Approximate queries on a collection of strings are important in many applications such as record linkage, spell checking, and Web search, where inconsistencies and errors exist in...
The purpose of authorship search is to identify documents written by a particular author or in a particular style in large document collections. Standard search engines match docum...
The Health Level 7 Clinic Document Architecture (CDA) is an XML-based document markup standard that specifies the hierarchical structure and semantics of “clinical documents” ...