We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These were developed during our ongo...
Alistair Willis, David King, David Morse, Anton Di...
A new class of formal languages will be defined the Distributed Index Languages (DI-languages). The grammar-formalism generating the new class - the DI-grammars - cover unbound de...
The problem of document categorization is considered. The set of domains and the keywords specific for these domains is supposed to be selected beforehand as initial data. We apply...
Mikhail Alexandrov, Alexander F. Gelbukh, George L...
Finding relevant publications in the large and rapidly growing body of biomedical literature is challenging. Search queries on PubMed often return thousands of publications and it ...
This paper addresses the problem of extracting information from textual documents, either normal documents or web pages. A new approach for extracting complicate information from ...
Luo Xiao, Dieter Wissmann, Michael Brown, Stefan J...