Sciweavers

LWA
2008

Rule-Based Information Extraction for Structured Data Acquisition using TextMarker

13 years 5 months ago
Rule-Based Information Extraction for Structured Data Acquisition using TextMarker
Information extraction is concerned with the location of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining methods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structured data acquisition using a rule-based information extraction system. We propose a semi-automatic process model that includes the TEXTMARKER system for information extraction and data acquisition from textual documents. TEXTMARKER applies simple rules for extracting blocks from a given (semi-structured) document, which can be further analyzed using domain-specific rules. Thus, both low-level and higher-level information extraction is supported. We demonstrate the applicability and benefit of the approach with two case studies of two realworld applications.
Martin Atzmüller, Peter Klügl, Frank Pup
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LWA
Authors Martin Atzmüller, Peter Klügl, Frank Puppe
Comments (0)