Sciweavers

62 search results - page 7 / 13
» Using web page layout for extraction of sender names
Sort
View
ICDM
2007
IEEE
149views Data Mining» more  ICDM 2007»
15 years 3 months ago
Extracting Author Meta-Data from Web Using Visual Features
Enriching digital library’s author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors’ information from their hom...
Shuyi Zheng, Ding Zhou, Jia Li, C. Lee Giles
BMCBI
2011
14 years 1 months ago
Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
Background: The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, an...
Roderic D. M. Page
ACL
2006
14 years 10 months ago
URES : an Unsupervised Web Relation Extraction System
Most information extraction systems either use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated corpus. Both of these a...
Binyamin Rosenfeld, Ronen Feldman
WWW
2008
ACM
15 years 10 months ago
Towards a global schema for web entities
Popular entities often have thousands of instances on the Web. In this paper, we focus on the case where they are presented in table-like format, namely appearing with their attri...
Conglei Yao, Yongjian Yu, Sicong Shou, Xiaoming Li
AAAI
2008
14 years 11 months ago
An Unsupervised Approach for Product Record Normalization across Different Web Sites
An unsupervised probabilistic learning framework for normalizing product records across different retailer Web sites is presented. Our framework decomposes the problem into two ta...
Tak-Lam Wong, Tik-Shun Wong, Wai Lam