Sciweavers

1261 search results - page 137 / 253
» Extracting Text from PostScript
Sort
View
DRR
2003
14 years 11 months ago
Correcting OCR text by association with historical datasets
The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting ...
Susan E. Hauser, Jonathan Schlaifer, Tehseen F. Sa...
WISE
2005
Springer
15 years 3 months ago
Extracting Web Data Using Instance-Based Learning
This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic...
Yanhong Zhai, Bing Liu
WWW
2008
ACM
15 years 10 months ago
Mining for personal name aliases on the web
We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that conv...
Danushka Bollegala, Taiki Honma, Yutaka Matsuo, Mi...
PVLDB
2010
112views more  PVLDB 2010»
14 years 8 months ago
Querying Probabilistic Information Extraction
Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone I...
Daisy Zhe Wang, Michael J. Franklin, Minos N. Garo...
WWW
2010
ACM
15 years 4 months ago
Entity relation discovery from web tables and links
The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured in...
Cindy Xide Lin, Bo Zhao, Tim Weninger, Jiawei Han,...