XML is an SGML-based language designed for the interchange of documents with more flexible and powerful features than those provided by HTML. It can be considered as an intermedia...
We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontolog...
David W. Embley, Douglas M. Campbell, Randy D. Smi...
We present an elegant and extensible model that is capable of providing semantic interpretations for an unusually wide range of textual tables in documents. Unlike the few existin...
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Delivering tailor-made ERP software requires automation of screen and printed report creation to be cost effective. Screens generated directly from data structures tend to have poo...