Sciweavers

874 search results - page 60 / 175
» Jedi: Extracting and Synthesizing Information from the Web
Sort
View
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
15 years 4 months ago
Robust web extraction: an approach based on a probabilistic tree-edit model
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Nilesh N. Dalvi, Philip Bohannon, Fei Sha
CIKM
2008
Springer
14 years 11 months ago
Characterizing and predicting community members from evolutionary and heterogeneous networks
Mining different types of communities from web data have attracted a lot of research efforts in recent years. However, none of the existing community mining techniques has taken i...
Qiankun Zhao, Sourav S. Bhowmick, Xin Zheng, Kai Y...
EMNLP
2009
14 years 7 months ago
Toward Completeness in Concept Extraction and Classification
Many algorithms extract terms from text together with some kind of taxonomic classification (is-a) link. However, the general approaches used today, and specifically the methods o...
Eduard H. Hovy, Zornitsa Kozareva, Ellen Riloff
WWW
2010
ACM
15 years 4 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
PRIS
2004
14 years 11 months ago
Learning Text Extraction Rules, without Ignoring Stop Words
Information Extraction (IE) from text /web documents has become an important application area of AI. As the number of web sites and documents has grown dramatically, the users need...
João Cordeiro, Pavel Brazdil