Rare Word Translation Extraction from Aligned Comparable Documents

8 years 10 months ago
Rare Word Translation Extraction from Aligned Comparable Documents
We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora. We obtain very high F-Measure between 80% and 98% for recognizing and extracting correct translations for rare terms (from 1 to 5 occurrences). Moreover, we show that our system can be trained on a pair of languages and test on a different pair of languages, obtaining a F-Measure of 77% for the classification of Chinese-English translations using a training corpus of Spanish-French. Our method is therefore even applicable to low languages without training data.
Emmanuel Prochasson, Pascale Fung
Added 23 Aug 2011
Updated 23 Aug 2011
Type Journal
Year 2011
Where ACL
Authors Emmanuel Prochasson, Pascale Fung
Comments (0)