Adaptive record extraction from web pages

12 years 10 days ago
Adaptive record extraction from web pages
We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns. We compare our method experimentally to the stateof-the-art, and show that our approach is very competitive for rigidly-structured records (such as product descriptions) and far superior for loosely-structured records. (such as entries on blogs). Categories and Subject Descriptors H.2.4 [Database Management]: Textual Databases; H.3.3 [Information Search and Retrieval]: Clustering General Terms Algorithms, Experimentation.
Justin Park, Denilson Barbosa
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Justin Park, Denilson Barbosa
Comments (0)