We propose two methods for constructing automated programs for extraction of information from a class of web pages that are very common and of high practical significance - varia...
Machine-generated documents containing semi-structured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data,...
Abstract. This paper presents an information system for legal professionals that integrates natural language processing technologies such as text classification and summarization. ...
For a very long time, it has been considered that the only way of automatically extracting similar groups of words from a text collection for which no semantic information exists ...
Image anchor templates are used in document image analysis for document classification, data localization, and other tasks. Current tools allow human operators to mark out small s...