Sciweavers

ICDAR
2009
IEEE

Metadata Extraction from PDF Papers for Digital Library Ingest

13 years 11 months ago
Metadata Extraction from PDF Papers for Digital Library Ingest
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis. In addition, we use the information gathered from a widely known citation database (DBLP) to assist the tool in the difficult task of author identification. The system is tested on some paper collections selected from recent conference proceedings.
Simone Marinai
Added 21 May 2010
Updated 21 May 2010
Type Conference
Year 2009
Where ICDAR
Authors Simone Marinai
Comments (0)