Sciweavers

ACL
2011

From Bilingual Dictionaries to Interlingual Document Representations

12 years 8 months ago
From Bilingual Dictionaries to Interlingual Document Representations
Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We first use the bilingual dictionary to find candidate document alignments and then use them to find an interlingual representation. Since the candidate alignments are noisy, we develop a robust learning algorithm to learn the interlingual representation. We show that bilingual dictionaries generalize to different domains better: our approach gives better performance than either a word by word translation method or Canonical Correlation Analysis (CCA) trained on a different domain.
Jagadeesh Jagarlamudi, Hal Daumé III, Ragha
Added 24 Aug 2011
Updated 24 Aug 2011
Type Journal
Year 2011
Where ACL
Authors Jagadeesh Jagarlamudi, Hal Daumé III, Raghavendra Udupa
Comments (0)