Using Query-Relevant Documents Pairs for Cross-Lingual Information Retrieval

15 years 11 months ago

Download users.dsic.upv.es

The world wide web is a natural setting for cross-lingual information retrieval. The European Union is a typical example of a multilingual scenario, where multiple users have to deal with information published in at least 20 languages. Given queries in some source language and a target corpus in another language, the typical approximation consists in translating either the query or the target dataset to the other language. Other approaches use parallel corpora to obtain a statistical dictionary of words among the diﬀerent languages. In this work, we propose to use a training corpus made up by a set of QueryRelevant Document Pairs (QRDP) in a probabilistic cross-lingual information retrieval approach which is based on the IBM alignment model 1 for statistical machine translation. Our approach has two main advantages over those that use direct translation and parallel corpora: we will not obtain a translation of the query, but a set of associated words which share their meaning in some...

David Pinto, Alfons Juan, Paolo Rosso

Real-time Traffic