Sciweavers

9 search results - page 1 / 2
» A Tree Learning Approach to Web Document Sectional Hierarchy...
Sort
View
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
13 years 11 months ago
Robust web extraction: an approach based on a probabilistic tree-edit model
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Nilesh N. Dalvi, Philip Bohannon, Fei Sha
KDD
2010
ACM
277views Data Mining» more  KDD 2010»
13 years 8 months ago
Growing a tree in the forest: constructing folksonomies by integrating structured metadata
Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured ...
Anon Plangprasopchok, Kristina Lerman, Lise Getoor
IPM
2007
149views more  IPM 2007»
13 years 4 months ago
Web page title extraction and its application
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
Yewei Xue, Yunhua Hu, Guomao Xin, Ruihua Song, Shu...
RULEML
2004
Springer
13 years 10 months ago
Rule Learning for Feature Values Extraction from HTML Product Information Sheets
The Web is now a huge information repository with a rich semantic structure that, however, is primarily addressed to human understanding rather than automated processing by a compu...
Costin Badica, Amelia Badica