Sciweavers

240 search results - page 2 / 48
» Learning to Extract Content from News Webpages
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
13 years 12 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
ISMIS
2005
Springer
13 years 10 months ago
Identifying Content Blocks from Web Documents
Intelligent information processing systems, such as digital libraries or search engines index web-pages according to their informative content. However, web-pages contain several n...
Sandip Debnath, Prasenjit Mitra, C. Lee Giles
UIST
2006
ACM
13 years 11 months ago
Summarizing personal web browsing sessions
We describe a system, implemented as a browser extension, that enables users to quickly and easily collect, view, and share personal Web content. Our system employs a novel intera...
Mira Dontcheva, Steven M. Drucker, Geraldine Wade,...
APWEB
2010
Springer
13 years 3 months ago
ECON: An Approach to Extract Content from Web News Page
Abstract--This paper provides a simple but effective approach, named ECON, to fully-automatically extract content from Web news page. ECON uses a DOM tree to represent the Web news...
Yan Guo, Huifeng Tang, Linhai Song, Yu Wang 0009, ...
CIKM
2008
Springer
13 years 7 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke