Sciweavers

57 search results - page 2 / 12
» Expected Utility of Content Blocks in Web Content Extraction
Sort
View
WWW
2005
ACM
14 years 5 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
WWW
2011
ACM
12 years 11 months ago
Identifying primary content from web pages and its application to web search ranking
Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
Srinivas Vadrevu, Emre Velipasaoglu
JUCS
2008
185views more  JUCS 2008»
13 years 5 months ago
Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction
Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the diffi...
Jinbeom Kang, Joongmin Choi
APWEB
2004
Springer
13 years 10 months ago
Web Page Fragmentation and Content Manipulation for Constructing Personalized Portals
This paper presents a web page fragmentation technique, which is utilized for extracting specific parts of web pages and building personalized portals using these fragments. It is ...
Ioannis Misedakis, Vaggelis Kapoulas, Christos Bou...
WWW
2005
ACM
14 years 5 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger