Sciweavers

VLDB
2001
ACM

RoadRunner: Towards Automatic Data Extraction from Large Web Sites

13 years 7 months ago
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach.
Valter Crescenzi, Giansalvatore Mecca, Paolo Meria
Added 30 Jul 2010
Updated 30 Jul 2010
Type Conference
Year 2001
Where VLDB
Authors Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo
Comments (0)