Sciweavers

609 search results - page 31 / 122
» Adaptive record extraction from web pages
Sort
View
WWW
2005
ACM
16 years 2 months ago
METEOR: metadata and instance extraction from object referral lists on the web
The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by ...
Hasan Davulcu, Srinivas Vadrevu, Saravanakumar Nag...
ICDM
2007
IEEE
149views Data Mining» more  ICDM 2007»
15 years 8 months ago
Extracting Author Meta-Data from Web Using Visual Features
Enriching digital library’s author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors’ information from their hom...
Shuyi Zheng, Ding Zhou, Jia Li, C. Lee Giles
BMCBI
2008
91views more  BMCBI 2008»
15 years 2 months ago
PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval
Background: Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships fr...
Jimmy J. Lin
WEBDB
1999
Springer
196views Database» more  WEBDB 1999»
15 years 6 months ago
Web Ecology: Recycling HTML Pages as XML Documents Using W4F
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
Arnaud Sahuguet, Fabien Azavant
79
Voted
LREC
2010
216views Education» more  LREC 2010»
15 years 3 months ago
BlogBuster: A Tool for Extracting Corpora from the Blogosphere
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
Georgios Petasis, Dimitrios Petasis