Sciweavers

JMLR
2008

Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

13 years 4 months ago
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies--attempting to do data record detection and attribute labeling in two separate phases. In this paper, we propose an integrated web data extraction paradigm with hierarchical models. The proposed model is called Dynamic Hierarchical Markov Random Fields (DHMRFs). DHMRFs take structural uncertainty into consideration and define a joint distribution of both model structure and class labels. The joint distribution is an exponential family distribution. As a conditional model, DHMRFs relax the independence assumption as made in directed models. Since exact inference is intractable, a variational method is developed to learn the model's parameters and to find the MAP model structure and label assignments. We apply DHMRFs to a real-world web data extraction task. Experimental results show that: (1) integrated web data extraction models can achieve significant improvements on both r...
Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-Rong Wen
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2008
Where JMLR
Authors Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-Rong Wen
Comments (0)