We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontolog...
David W. Embley, Douglas M. Campbell, Randy D. Smi...
As the ease of accessing and generating large quantities of information increases, people’s ability to navigate through that information and maintain personal perspective decrea...
We present a document expansion approach that uses Conditional Random Field (CRF) segmentation to automatically extract salient phrases from ad titles. We then supplement the ad d...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical patt...