Sciweavers

DILS
2009
Springer

Site-Wide Wrapper Induction for Life Science Deep Web Databases

13 years 11 months ago
Site-Wide Wrapper Induction for Life Science Deep Web Databases
We present a novel approach to automatic information extraction from Deep Web Life Science databases using wrapper induction. Traditional wrapper induction techniques focus on learning wrappers based on examples from one class of Web pages, i.e. from Web pages that are all similar in structure and content. Thereby, traditional wrapper induction targets the understanding of Web pages generated from a database using the same generation template as observed in the example set. However, Life Science Web sites typically contain structurally diverse web pages from multiple classes making the problem more challenging. Furthermore, we observed that such Life Science Web sites do not just provide mere data, but they also tend to provide schema information in terms of data labels – giving further cues for solving the Web site wrapping task. Our solution to this novel challenge of Site-Wide
Saqib Mir, Steffen Staab, Isabel Rojas
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where DILS
Authors Saqib Mir, Steffen Staab, Isabel Rojas
Comments (0)