Text documents often embed data that is structured in nature. By processing a text database with information extraction systems, we can define a variety of structured "relati...
We present the STEX system, a semantic extension of LATEX, that allows for producing high-quality PDF documents for (proof)reading and printing, as well as semantic XML/OMDoc docu...
Andrea Kohlhase, Michael Kohlhase, Christoph Lange...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
In this paper we present the design, implementation and evaluation of SOBA, a system for ontology-based information extraction from heterogeneous data resources, including plain t...
Paul Buitelaar, Philipp Cimiano, Anette Frank, Mat...
A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicate...
Diego Arroyuelo, Francisco Claude, Sebastian Manet...