A process that attempts to solve abbreviation ambiguity is presented. Various contextrelated features and statistical features have been explored. Almost all features are domain i...
Adaptive binarization is an important first step in many document analysis and OCR processes. This paper describes a fast adaptive binarization algorithm that yields the same qual...
This paper will describe the creation of an ontology to model information contained in historical documents. It will describe the structure of the RDF/OWL ontology and how the mode...
A significant amount of information is stored in computer systems today, but people are struggling to manage their documents such that the information is easily found. XML is a de...
Search engines present fix-length passages from documents ranked by relevance against the query. In this paper, we present and compare novel, language-model based methods for extr...