Sciweavers

820 search results - page 59 / 164
» Deep web data extraction
Sort
View
WWW
2009
ACM
15 years 4 months ago
News article extraction with template-independent wrapper
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
JCIT
2010
149views more  JCIT 2010»
14 years 4 months ago
People Summarization by Combining Named Entity Recognition and Relation Extraction
The two most important tasks in entity information summarization from the Web are named entity recognition and relation extraction. Little work has been done toward an integrated ...
Xiaojiang Liu, Nenghai Yu
LREC
2010
216views Education» more  LREC 2010»
14 years 11 months ago
BlogBuster: A Tool for Extracting Corpora from the Blogosphere
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
Georgios Petasis, Dimitrios Petasis
KDD
2003
ACM
148views Data Mining» more  KDD 2003»
15 years 10 months ago
Mining data records in Web pages
A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the e...
Bing Liu, Robert L. Grossman, Yanhong Zhai
SIGMOD
2000
ACM
236views Database» more  SIGMOD 2000»
15 years 2 months ago
XTRACT: A System for Extracting Document Type Descriptors from XML Documents
XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the...
Minos N. Garofalakis, Aristides Gionis, Rajeev Ras...