Measuring historical word sense variation

10 years 4 months ago
Measuring historical word sense variation
We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we are able to automatically classify the Latin word senses in a 389 million word corpus and track the rise and fall of those senses over a span of two thousand years. We evaluate the performance of seven different classifiers both in a tenfold test on 83,892 words from the aligned parallel corpus and on a smaller, manually annotated sample of 525 words, measuring both the overall accuracy of each system and how well that accuracy correlates (via mean square error) to the observed historical variation. Categories and Subject Descriptors H.3.7 [Information Systems: Information Storage and Retrieval]: digital libraries General Terms Design, Documentation, Performance Keywords Word sense disambiguat...
David Bamman, Gregory Crane
Added 15 Sep 2011
Updated 15 Sep 2011
Type Journal
Year 2011
Where JCDL
Authors David Bamman, Gregory Crane
Comments (0)