Sciweavers

ICDAR
2009
IEEE

OCD: An Optimized and Canonical Document Format

13 years 2 months ago
OCD: An Optimized and Canonical Document Format
Revealing and being able to manipulate the structured content of PDF documents is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we present OCD, an optimized, easy-to-process and canonical format for representing structured electronic documents. The system and methods used for reverse engineering PDF documents into the OCD format are presented as well as the techniques to optimize it. We finally expose concrete evaluations of our OCD format compactness and restructuring performances.
Jean-Luc Bloechle, Denis Lalanne, Rolf Ingold
Added 18 Feb 2011
Updated 18 Feb 2011
Type Journal
Year 2009
Where ICDAR
Authors Jean-Luc Bloechle, Denis Lalanne, Rolf Ingold
Comments (0)