Search Sciweavers | Sciweavers

240 search results - page 2 / 48

» Learning to Extract Content from News Webpages

click to vote

WSDM
2010
ACM

204views Data Mining» more WSDM 2010»

Learning URL patterns for webpage de-duplication

13 years 12 months ago

Download www.wsdm-conference.org

Presence of duplicate documents in the World Wide Web adversely aﬀects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...

Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...

claim paper

Read More »

click to vote

ISMIS
2005
Springer

166views Artificial Intelligence» more ISMIS 2005»

Identifying Content Blocks from Web Documents

13 years 10 months ago

Download clgiles.ist.psu.edu

Intelligent information processing systems, such as digital libraries or search engines index web-pages according to their informative content. However, web-pages contain several n...

Sandip Debnath, Prasenjit Mitra, C. Lee Giles

claim paper

Read More »

click to vote

UIST
2006
ACM

161views Software Engineering» more UIST 2006»

Summarizing personal web browsing sessions

13 years 11 months ago

Download dontcheva.org

We describe a system, implemented as a browser extension, that enables users to quickly and easily collect, view, and share personal Web content. Our system employs a novel intera...

Mira Dontcheva, Steven M. Drucker, Geraldine Wade,...

claim paper

Read More »

click to vote

APWEB
2010
Springer

168views Internet Technology» more APWEB 2010»

ECON: An Approach to Extract Content from Web News Page

13 years 3 months ago

Download pages.cs.wisc.edu

Abstract--This paper provides a simple but effective approach, named ECON, to fully-automatically extract content from Web news page. ECON uses a DOM tree to represent the Web news...

Yan Guo, Huifeng Tang, Linhai Song, Yu Wang 0009, ...

claim paper

Read More »

click to vote

CIKM
2008
Springer

194views Information Technology» more CIKM 2008»

Coreex: content extraction from online news articles

13 years 7 months ago

Download ilpubs.stanford.edu

We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...

Jyotika Prasad, Andreas Paepcke

claim paper

Read More »

« Prev « First page 2 / 48 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers