Sciweavers

CIKM
2009
Springer

The impact of document structure on keyphrase extraction

13 years 11 months ago
The impact of document structure on keyphrase extraction
Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic approaches have been developed. Typically, candidate phrases are extracted using features such as position or frequency in the document text. Document structure may contain useful information about which parts or phrases of a document are important, but has rarely been considered as a source of information for keyphrase extraction. We address this issue in the context of keyphrase extraction from scientific literature. We introduce a new, large corpus that consists of full-text journal articles, where the rich collection and document structure available at the publishing stage is explicitly annotated. We explore features based on the XML tags contained in the documents, and based on generic section types derived using position and cue words in section titles. For XML tags we find sections, , and title to perform...
Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where CIKM
Authors Katja Hofmann, Manos Tsagkias, Edgar Meij, Maarten de Rijke
Comments (0)