Sciweavers

40 search results - page 1 / 8
» Coreex: content extraction from online news articles
Sort
View
CIKM
2008
Springer
13 years 6 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke
ICWE
2009
Springer
13 years 11 months ago
A Layout-Independent Web News Article Contents Extraction Method Based on Relevance Analysis
Abstract. The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrapp...
Hao Han, Takehiro Tokuda
WWW
2009
ACM
14 years 5 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth
WWW
2009
ACM
14 years 5 months ago
Efficient overlap and content reuse detection in blogs and online news articles
The use of blogs to track and comment on real world (political, news, entertainment) events is growing. Similarly, as more individuals start relying on the Web as their primary in...
Jong Wook Kim, Jun'ichi Tatemura, K. Selçuk...
ISMIS
2003
Springer
13 years 10 months ago
MetaNews: An Information Agent for Gathering News Articles on the Web
This paper presents MetaNews, an information gathering agent for news articles on the Web. MetaNews reads HTML documents from online news sites and extracts article information fro...
Dae-Ki Kang, Joongmin Choi