Sciweavers

DEXAW
2004
IEEE

Data Extraction from Web Data Sources

13 years 7 months ago
Data Extraction from Web Data Sources
This paper provides an explanation of the basic data structures used in a new page analysis technique to create wrappers (data extractors) for the result pages produced by web sites in response to user qeries via web page forms. The key structure called a tpGrid is a representation of the web page, which is easier to analyse than the raw html code. The analysis looks for repetition patterns of sets of tagSets, which are defined in the paper.
Jerome Robinson
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where DEXAW
Authors Jerome Robinson
Comments (0)