Sciweavers

82 search results - page 2 / 17
» A search engine for imaged documents in PDF files
Sort
View
DAS
2006
Springer
13 years 8 months ago
A System for Converting PDF Documents into Structured XML Format
We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files...
Hervé Déjean, Jean-Luc Meunier
DAS
2006
Springer
13 years 6 months ago
XCDF: A Canonical and Structured Document Format
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods...
Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadja...
ICDIM
2006
IEEE
13 years 10 months ago
A Framework for the Encoding of Multilayered Documents
Electronic publishing of material digitized using imaging and OCR calls for a special delivery format capable of reconstructing original documents in a well-usable electronic form...
Youssef Eldakar, Noha Adly, Magdy Nagi
ERCIMDL
2010
Springer
180views Education» more  ERCIMDL 2010»
13 years 1 months ago
SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)
Extracting titles from a PDFs full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating...
Jöran Beel, Bela Gipp, Ammar Shaker, Nick Fri...
LREC
2008
113views Education» more  LREC 2008»
13 years 6 months ago
Integration of a Multilingual Keyword Extractor in a Document Management System
In this paper we present a new Document Management System called DrStorage. This DMS is multi-platform, JCR-170 compliant, supports WebDav, versioning, user authentication and aut...
Andrea Agili, Marco Fabbri, Alessandro Panunzi, Ma...