Search Sciweavers | Sciweavers

57 search results - page 2 / 12

» Expected Utility of Content Blocks in Web Content Extraction

click to vote

WWW
2005
ACM

150views Internet Technology» more WWW 2005»

Extracting context to improve accuracy for HTML content extraction

14 years 5 months ago

Download www1.cs.columbia.edu

Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...

Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo

claim paper

Read More »

click to vote

WWW
2011
ACM

316views Internet Technology» more WWW 2011»

Identifying primary content from web pages and its application to web search ranking

12 years 11 months ago

Download www.www2011india.com

Web pages are usually highly structured documents. In some documents, content with diﬀerent functionality is laid out in blocks, some merely supporting the main discourse. In ot...

Srinivas Vadrevu, Emre Velipasaoglu

claim paper

Read More »

click to vote

JUCS
2008

185views more JUCS 2008»

Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction

13 years 5 months ago

Download www.jucs.org

Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the diffi...

Jinbeom Kang, Joongmin Choi

claim paper

Read More »

click to vote

APWEB
2004
Springer

92views Internet Technology» more APWEB 2004»

Web Page Fragmentation and Content Manipulation for Constructing Personalized Portals

13 years 10 months ago

Download ru6.cti.gr

This paper presents a web page fragmentation technique, which is utilized for extracting specific parts of web pages and building personalized portals using these fragments. It is ...

Ioannis Misedakis, Vaggelis Kapoulas, Christos Bou...

claim paper

Read More »

click to vote

WWW
2005
ACM

154views Internet Technology» more WWW 2005»

Thresher: automating the unwrapping of semantic content from the World Wide Web

14 years 5 months ago

Download www2005.org

We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...

Andrew Hogue, David R. Karger

claim paper

Read More »

« Prev « First page 2 / 12 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers