Sciweavers

15 search results - page 2 / 3
» FiVaTech: Page-Level Web Data Extraction from Template Pages
Sort
View
KDD
2007
ACM
155views Data Mining» more  KDD 2007»
14 years 5 months ago
Mining templates from search result records of search engines
Metasearch engine, Comparison-shopping and Deep Web crawling applications need to extract search result records enwrapped in result pages returned from search engines in response ...
Hongkun Zhao, Weiyi Meng, Clement T. Yu
KDD
2007
ACM
193views Data Mining» more  KDD 2007»
14 years 5 months ago
Joint optimization of wrapper generation and template detection
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu
VLDB
2011
ACM
251views Database» more  VLDB 2011»
12 years 11 months ago
Harvesting relational tables from lists on the web
A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multi-column tables, which can then be used in more seman...
Hazem Elmeleegy, Jayant Madhavan, Alon Y. Halevy
WWW
2010
ACM
13 years 5 months ago
Exploiting content redundancy for web information extraction
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
WWW
2009
ACM
13 years 11 months ago
News article extraction with template-independent wrapper
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...