Sciweavers

CIKM
2011
Springer

Semi-supervised multi-task learning of structured prediction models for web information extraction

12 years 4 months ago
Semi-supervised multi-task learning of structured prediction models for web information extraction
Extracting information from web pages is an important problem; it has several applications such as providing improved search results and construction of databases to serve user queries. In this paper we propose a novel structured prediction method to address two important aspects of the extraction problem: (1) labeled data is available only for a small number of sites and (2) a machine learned global model does not generalize adequately well across many websites. For this purpose, we propose a weight space based graph regularization method. This method has several advantages. First, it can use unlabeled data to address the limited labeled data problem and falls in the class of graph regularization based semi-supervised learning approaches. Second, to address the generalization inadequacy of a global model, this method builds a local model for each website. Viewing the problem of building a local model for each website as a task, we learn the models for a collection of sites jointly; t...
Paramveer S. Dhillon, Sundararajan Sellamanickam,
Added 13 Dec 2011
Updated 13 Dec 2011
Type Journal
Year 2011
Where CIKM
Authors Paramveer S. Dhillon, Sundararajan Sellamanickam, Sathiya Keerthi Selvaraj
Comments (0)