Sciweavers

ICML
1998
IEEE

Learning a Language-Independent Representation for Terms from a Partially Aligned Corpus

14 years 5 months ago
Learning a Language-Independent Representation for Terms from a Partially Aligned Corpus
Cross-language latent semantic indexing is a method that learns useful languageindependent vector representations of terms through a statistical analysis of a documentaligned text. This is accomplished by taking a collection of, say, English paragraphs and their translations in Spanish and processing them by singular value decomposition to yield a high-dimensional vector representation for each term in the collection. These term vectors have the property that semantically similar terms have vectors with high cosine measure, regardless of their source language. In the present work, we extend this approach to the case in which EnglishSpanish translations are not available, but instead, translations for documents in both languages are available in a third \bridge" language, say, French. Thus, although no aligned English-Spanish documents are used, our method creates a representation in which English and Spanish terms can be compared. The resulting vector representation of terms can ...
Michael L. Littman, Fan Jiang, Greg A. Keim
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 1998
Where ICML
Authors Michael L. Littman, Fan Jiang, Greg A. Keim
Comments (0)