Adaptive record extraction from web pages

9 years 10 months ago
Adaptive record extraction from web pages
We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns. We compare our method experimentally to the stateof-the-art, and show that our approach is very competitive for rigidly-structured records (such as product descriptions) and far superior for loosely-structured records. (such as entries on blogs). Categories and Subject Descriptors H.2.4 [Database Management]: Textual Databases; H.3.3 [Information Search and Retrieval]: Clustering General Terms Algorithms, Experimentation.
Justin Park, Denilson Barbosa
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Justin Park, Denilson Barbosa
Comments (0)