Sciweavers

54 search results - page 1 / 11
» A System for Converting PDF Documents into Structured XML Fo...
Sort
View
DAS
2006
Springer
14 years 1 months ago
A System for Converting PDF Documents into Structured XML Format
We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files...
Hervé Déjean, Jean-Luc Meunier
DAS
2006
Springer
13 years 11 months ago
XCDF: A Canonical and Structured Document Format
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods...
Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadja...
ICDAR
2009
IEEE
13 years 7 months ago
OCD: An Optimized and Canonical Document Format
Revealing and being able to manipulate the structured content of PDF documents is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we ...
Jean-Luc Bloechle, Denis Lalanne, Rolf Ingold
ICDAR
2003
IEEE
14 years 2 months ago
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transfor...
Yasuto Ishitani
DOCENG
2009
ACM
14 years 3 months ago
Object-level document analysis of PDF files
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
Tamir Hassan