Sciweavers

ICDE
2006
IEEE

Segmentation of Publication Records of Authors from the Web

14 years 6 months ago
Segmentation of Publication Records of Authors from the Web
Publication records are often found in the authors' personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications. In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel "Split and Merge" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.
Wei Zhang, Clement T. Yu, Neil R. Smalheiser, Vetl
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2006
Where ICDE
Authors Wei Zhang, Clement T. Yu, Neil R. Smalheiser, Vetle I. Torvik
Comments (0)