Search Sciweavers | Sciweavers

129 search results - page 1 / 26

» Combining content extraction heuristics: the CombinE system

click to vote

IIWAS
2008

160views Internet Technology» more IIWAS 2008»

Combining content extraction heuristics: the CombinE system

13 years 6 months ago

Download www.informatik.uni-mainz.de

The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Conte...

Thomas Gottron

claim paper

Read More »

click to vote

WWW
2005
ACM

150views Internet Technology» more WWW 2005»

Extracting context to improve accuracy for HTML content extraction

14 years 5 months ago

Download www1.cs.columbia.edu

Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...

Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo

claim paper

Read More »

click to vote

ACL
2012

219views Computational Linguistics» more ACL 2012»

Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

11 years 7 months ago

Download www.comp.nus.edu.sg

An ideal summarization system should produce summaries that have high content coverage and linguistic quality. Many state-ofthe-art summarization systems focus on content coverage...

Ziheng Lin, Chang Liu, Hwee Tou Ng, Min-Yen Kan

claim paper

Read More »

click to vote

KDD
2004
ACM

163views Data Mining» more KDD 2004»

Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods

14 years 5 months ago

Download www.cs.cmu.edu

We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...

William W. Cohen, Sunita Sarawagi

claim paper

Read More »

click to vote

CIKM
2008
Springer

194views Information Technology» more CIKM 2008»

Coreex: content extraction from online news articles

13 years 6 months ago

Download ilpubs.stanford.edu

We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...

Jyotika Prasad, Andreas Paepcke

claim paper

Read More »

« Prev « First page 1 / 26 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers