Sciweavers

2677 search results - page 41 / 536
» Extracting Structured Data from Web Pages
Sort
View
ICWE
2009
Springer
15 years 9 months ago
A Layout-Independent Web News Article Contents Extraction Method Based on Relevance Analysis
Abstract. The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrapp...
Hao Han, Takehiro Tokuda
WWW
2008
ACM
16 years 3 months ago
Web page sectioning using regex-based template
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The...
Rupesh R. Mehta, Amit Madaan
WWW
2009
ACM
15 years 10 months ago
News article extraction with template-independent wrapper
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
ICANN
2005
Springer
15 years 8 months ago
Content-Based Retrieval of Web Pages and Other Hierarchical Objects with Self-organizing Maps
We propose a content-based information retrieval (CBIR) method that models known relationships between multimedia objects as a hierarchical tree-structure incorporating additional ...
Mats Sjöberg, Jorma Laaksonen
CIKM
2008
Springer
15 years 5 months ago
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink ...
Barbara Poblete, Carlos Castillo, Aristides Gionis