Sciweavers

ICDE
2007
IEEE

Collaborative Wrapping: A Turbo Framework for Web Data Extraction

14 years 6 months ago
Collaborative Wrapping: A Turbo Framework for Web Data Extraction
To access data sources on the Web, a crucial step is wrapping, which translates query responses, rendered in textual HTML, back into their relational form. Traditionally, this problem has been addressed with syntax-based approaches for a single source. However, as online databases mutiply, we often need to wrap multipe sources, in particular for domain-based integration. Observing that sources in the same domain usually share common fields, we propose a novel wrapping concept? collaborative wrapping? where multiple sources are extracted concurrently with contentbased synchronization to produce consentaneous extractions. Toward this concept, recognizing wrapping as a communication process, we develop the turbo wraper, upon the insight of turbo codes? a multi-code decoding scheme in information theory. Our experiment shows that the turbo wrapper consistently outperforms baseline single-source methods, is robust, and does benefit from extended scales of source collaboration.
Shui-Lung Chuang, Kevin Chen-Chuan Chang, ChengXia
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2007
Where ICDE
Authors Shui-Lung Chuang, Kevin Chen-Chuan Chang, ChengXiang Zhai
Comments (0)