Sciweavers

ICDE
2008
IEEE

An Algebraic Approach to Rule-Based Information Extraction

14 years 5 months ago
An Algebraic Approach to Rule-Based Information Extraction
Traditional approaches to rule-based information extraction (IE) have primarily been based on regular expression grammars. However, these grammar-based systems have difficulty scaling to large data sets and large numbers of rules. Inspired by traditional database research, we propose an algebraic approach to rule-based IE that addresses these scalability issues through query optimization. The operators of our algebra are motivated by our experience in building several rule-based extraction programs over diverse data sets. We present the operators of our algebra and propose several optimization strategies motivated by the text-specific characteristics of our operators. Finally we validate the potential benefits of our approach by extensive experiments over real-world blog data.
Frederick Reiss, Sriram Raghavan, Rajasekar Krishn
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2008
Where ICDE
Authors Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan
Comments (0)