Automated Template-Based Metadata Extraction Architecture

15 years 6 months ago

Download www.cs.odu.edu

This paper describes our efforts to develop a toolset and process for automated metadata extraction from large, diverse, and evolving document collections. A number of federal agencies, universities, laboratories, and companies are placing their collections online and making them searchable via metadata fields such as author, title, and publishing organization. Manually creating metadata for a large collection is an extremely time-consuming task, but is difficult to automate, particularly for collections consisting of documents with diverse layout and structure. Our automated process enables many more documents to be available online than would otherwise have been possible due to time and cost constraints. We describe our architecture and implementation and illustrate the effectiveness of the tool-set by providing experimental results on two major collections DTIC (Defense Technical Information Center) and NASA (National Aeronautics and Space Administration).

Paul Flynn, Li Zhou, Kurt Maly, Steven J. Zeil, Mo

Real-time Traffic

Education | Evolving Document Collections | ICADL 2007 | Metadata | Metadata Extraction |

claim paper

» A Methodology for Readers Emotional State Extraction to Augment Expressions in Speech Synt...

» CIMA Based Remote Instrument and Data Access An Extension into the Australian eScience Env...

» Developing CIMABased Cyberinfrastructure for Remote Access to Scientific Instruments and C...

» CORC Helping Libraries Take a Leading Role in the Digital Age

Post Info
More Details (n/a)

Added	08 Jun 2010
Updated	08 Jun 2010
Type	Conference
Year	2007
Where	ICADL
Authors	Paul Flynn, Li Zhou, Kurt Maly, Steven J. Zeil, Mohammad Zubair

Comments (0)

Sciweavers

Automated Template-Based Metadata Extraction Architecture

Education | Evolving Document Collections | ICADL 2007 | Metadata | Metadata Extraction |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers