Sciweavers

55 search results - page 2 / 11
» Web page sectioning using regex-based template
Sort
View
SIGMOD
2003
ACM
190views Database» more  SIGMOD 2003»
13 years 10 months ago
Extracting Structured Data from Web Pages
Many web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its b...
Arvind Arasu, Hector Garcia-Molina
PKDD
2007
Springer
120views Data Mining» more  PKDD 2007»
13 years 10 months ago
Site-Independent Template-Block Detection
Detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction. Of the many approaches propos...
Aleksander Kolcz, Wen-tau Yih
WEBIST
2007
13 years 5 months ago
Logging and Analyzing User's Interactions in Web Portals
Content Management Systems and Web Portal Frameworks are more and more widely adopted in Web development. Those kinds of software often produce Web pages whose layout is divided in...
Gennaro Costagliola, Filomena Ferrucci, Vittorio F...
WISE
2005
Springer
13 years 10 months ago
Extracting Web Data Using Instance-Based Learning
This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic...
Yanhong Zhai, Bing Liu
DASFAA
2005
IEEE
123views Database» more  DASFAA 2005»
13 years 6 months ago
Automatic Data Extraction from Data-Rich Web Pages
Abstract. Extracting data from web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. In this paper, we propose a...
Dongdong Hu, Xiaofeng Meng