Sciweavers

DASFAA
2005
IEEE

Automatic Data Extraction from Data-Rich Web Pages

13 years 6 months ago
Automatic Data Extraction from Data-Rich Web Pages
Abstract. Extracting data from web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. In this paper, we propose a novel technique to the problem of differentiating roles of data items from Web pages, which is one of the key problems in our automatic extraction approach. The problem is resolved at various levels: semantic blocks, sections and data items, and several approaches are proposed to effectively identify the mapping between data items having the same role. Intensive experiments on real web sites show that the proposed technique can effectively help extracting desired data with high accuracies in most of the cases.
Dongdong Hu, Xiaofeng Meng
Added 13 Oct 2010
Updated 13 Oct 2010
Type Conference
Year 2005
Where DASFAA
Authors Dongdong Hu, Xiaofeng Meng
Comments (0)