Web information extraction using Markov logic networks

14 years 11 months ago

Download www.it.iitb.ac.in

In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages and sites. We use Markov Logic Networks (MLNs) to capture both content and structural features in a single uniﬁed framework, and this enables us to perform more accurate inference. MLNs allow us to model a wide range of rich structural features like proximity, precedence, alignment, and contiguity, using ﬁrst-order clauses. We show that inference in our information extraction scenario reduces to solving an instance of the maximum weight subgraph problem. We develop specialized procedures for solving the maximum subgraph variants that are far more eﬃcient than previously proposed inference methods for MLNs that solve variants of MAX-SAT. Experiments with real-life datasets demonstrate the eﬀectiveness of our MLN-based approach compared to existing state-of-the-art extraction methods.

Sandeepkumar Satpal, Sahely Bhadra, Sundararajan S

Real-time Traffic

Inference | Internet Technology | Maximum Weight Subgraph | Structural Features | WWW 2011 |

claim paper

» Tuffy Scaling up Statistical Inference in Markov Logic Networks using an RDBMS

» Unifying Logical and Statistical AI

» BioSnowball automated population of Wikis

» Quantifier Scope Disambiguation Using Extracted Pragmatic Knowledge Preliminary Results

» Logical structure based semantic relationship extraction from semistructured documents

» An empirical study on using hidden markov model for search interface segmentation

» StatSnowball a statistical approach to extracting entity relationships

» Gene Expression Analysis using Markov Chains extracted from RNNs

Post Info
More Details (n/a)

Added	29 May 2011
Updated	29 May 2011
Type	Journal
Year	2011
Where	WWW
Authors	Sandeepkumar Satpal, Sahely Bhadra, Sundararajan Sellamanickam, Rajeev Rastogi, Prithviraj Sen

Comments (0)

Sciweavers

Web information extraction using Markov logic networks

Inference | Internet Technology | Maximum Weight Subgraph | Structural Features | WWW 2011 |

Explore & Download

Productivity Tools

Sciweavers