High-performance information extraction with AliBaba

11 years 8 months ago
High-performance information extraction with AliBaba
A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex language processing steps required and to the potentially large number of texts that need to be analyzed. Furthermore, integrating extracted data with other sources of knowledge often is mandatory for subsequent analysis. In this demo, we present the AliBaba system for scalable information extraction from biomedical documents. Unlike many other systems, AliBaba performs both entity extraction and relationship extraction and graphically visualizes the resulting network of inter-connected objects. It leverages the PubMed search engine for selection of relevant documents. The technical novelty of AliBaba is twofold: (a) its ability to automatically learn language patterns for relationship extraction without an annotated corpus, and (b) its high performance pattern matching algorithm. We show that a simple yet effecti...
Peter Palaga, Long Nguyen, Ulf Leser, Jörg Ha
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where EDBT
Authors Peter Palaga, Long Nguyen, Ulf Leser, Jörg Hakenberg
Comments (0)