In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modificatio...
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained ...
Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose. Since ad-hoc manual translation c...
Prasanth Kolachina, Nicola Cancedda, Marc Dymetman...
The use of domain-specific metadata (subject keywords) is tested for monolingual and bilingual retrieval on the GIRT social science collection. A new technique, Entry Vocabulary Mo...
In TREC-9, we participated in the English-Chinese Cross-Language Information Retrieval (CLIR) track. Our work involved two aspects: finding good methods for Chinese IR, and findin...
Jianfeng Gao, Jian-Yun Nie, Jian Zhang, Endong Xun...