Sciweavers

54 search results - page 1 / 11
» A System for Converting PDF Documents into Structured XML Fo...
Sort
View
DAS
2006
Springer
13 years 8 months ago
A System for Converting PDF Documents into Structured XML Format
We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files...
Hervé Déjean, Jean-Luc Meunier
DAS
2006
Springer
13 years 7 months ago
XCDF: A Canonical and Structured Document Format
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods...
Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadja...
ICDAR
2009
IEEE
13 years 2 months ago
OCD: An Optimized and Canonical Document Format
Revealing and being able to manipulate the structured content of PDF documents is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we ...
Jean-Luc Bloechle, Denis Lalanne, Rolf Ingold
ICDAR
2003
IEEE
13 years 10 months ago
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transfor...
Yasuto Ishitani
DOCENG
2009
ACM
13 years 11 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan