Sciweavers

ICDAR
2003
IEEE

A Segmentation Method for Bibliographic References by Contextual Tagging of Fields

13 years 9 months ago
A Segmentation Method for Bibliographic References by Contextual Tagging of Fields
In this paper, a method based on part-of-speech tagging (PoS) is used for bibliographic reference structure. This method operates on a roughly structured ASCII file, produced by OCR.. Because of the heterogeneity of the reference structure, the method acts in a bottom-up way, without an a priori model, gathering structural elements from basic tags to sub-fields and fields. Significant tags are first grouped in homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to record fields: ``authors'', “title”, “conference name”, “date”, etc. Non labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected records. The designed prototype operates with a great satisfaction on different record layouts and character recognition qualities. Without manual intervention, 96.6% words are correctly attributed, and about 75,9% references ar...
Dominique Besagni, Abdel Belaïd, Nelly Benet
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where ICDAR
Authors Dominique Besagni, Abdel Belaïd, Nelly Benet
Comments (0)