Sciweavers

129 search results - page 1 / 26
» Combining content extraction heuristics: the CombinE system
Sort
View
IIWAS
2008
13 years 6 months ago
Combining content extraction heuristics: the CombinE system
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Conte...
Thomas Gottron
WWW
2005
ACM
14 years 5 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
ACL
2012
11 years 7 months ago
Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation
An ideal summarization system should produce summaries that have high content coverage and linguistic quality. Many state-ofthe-art summarization systems focus on content coverage...
Ziheng Lin, Chang Liu, Hwee Tou Ng, Min-Yen Kan
KDD
2004
ACM
163views Data Mining» more  KDD 2004»
14 years 5 months ago
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
William W. Cohen, Sunita Sarawagi
CIKM
2008
Springer
13 years 6 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke