The ADROIT system that we are developing allows automatic discourse analysis of information rich natural language texts extracted directly from the web. We use guidelines and rela...
Machine-generated documents containing semi-structured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data,...
We present an interactive system to query, explore and navigate data according to a hierarchical knowledge model that had been automatically populated from unstructured textual da...
Anni Coden, Igor L. Sominsky, Michael A. Tanenblat...
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Conte...
Existing Information Extraction systems tend to focus on a tight window of context surrounding the desired information to be extracted. This leads to a number of shortcomings in t...