Sciweavers

1261 search results - page 50 / 253
» Extracting Text from PostScript
Sort
View
LREC
2010
189views Education» more  LREC 2010»
14 years 11 months ago
Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content
Parallel corpora are indispensable resources for a variety of multilingual natural language processing tasks. This paper presents a technique for fully automatic construction of c...
Yulia Tsvetkov, Shuly Wintner
CIKM
2008
Springer
14 years 11 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke
ICDAR
2003
IEEE
15 years 3 months ago
Proper Names Extraction from Fax Images Combining Textual and Image Features
In the frame of a Unified Messaging System, a crucial task of the system is to provide the user with key information on every message received, like keywords reflecting the object...
Laurence Likforman-Sulem, Pascal Vaillant, Fran&cc...
DAS
2006
Springer
15 years 1 months ago
Segmentation-Driven Recognition Applied to Numerical Field Extraction from Handwritten Incoming Mail Documents
Abstract. In this paper, we present a method for the automatic extraction of numerical fields (zip codes, phone numbers, etc.) from incoming mail documents. The approach is based o...
Clément Chatelain, Laurent Heutte, Thierry ...
ERCIMDL
2010
Springer
180views Education» more  ERCIMDL 2010»
14 years 7 months ago
SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)
Extracting titles from a PDFs full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating...
Jöran Beel, Bela Gipp, Ammar Shaker, Nick Fri...