Sciweavers

ICDE
2008
IEEE

Automatically Extracting Form Labels

14 years 6 months ago
Automatically Extracting Form Labels
We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.
Hoa Nguyen, Eun Yong Kang, Juliana Freire
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2008
Where ICDE
Authors Hoa Nguyen, Eun Yong Kang, Juliana Freire
Comments (0)