Extracting Useful Information from the Full Text of Fiction

11 years 6 months ago
Extracting Useful Information from the Full Text of Fiction
In this paper, we describe some experiments in large-scale Information Extraction (IE) focusing on book texts. We investigate the scalability of IE techniques to full-sized books, and the utility of IE techniques in extracting useful information from fiction. In particular, we evaluate a variety of Named Entity Recognition (NER) techniques in identifying the central characters in works of fiction. First, we describe the creation of a gold standard for evaluation, which contains ordered lists of characters for a corpus of classic book texts in Project Gutenberg. Second, we describe several approaches to the task of character identification, where our best model achieves an average coverage score of 78.4% across all central characters. Finally, we propose a number of approaches for future work.
Sharon Givon, Maria Milosavljevic
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2007
Where RIAO
Authors Sharon Givon, Maria Milosavljevic
Comments (0)