Sciweavers

EMNLP
2010

Translingual Document Representations from Discriminative Projections

13 years 8 months ago
Translingual Document Representations from Discriminative Projections
Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA). Both of these variants start with a basic model of documents (PCA and PLSA). Each model is then made discriminative by encouraging comparable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two discriminative variants, OPCA and CPLSA, significantly outperform their corresponding baselines. The largest differences in performance are observed on the task of retrieval when the documents are only comparabl...
John Platt, Kristina Toutanova, Wen-tau Yih
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors John Platt, Kristina Toutanova, Wen-tau Yih
Comments (0)