Abstract. We present a systematic comparison of preprocessing techniques for two language pairs: English-Czech and English-Hindi. The two target languages, although both belonging ...
This paper proposes a novel semisupervised word alignment technique called EMDC that integrates discriminative and generative methods. A discriminative aligner is used to find hig...
Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in ...
The Pangloss Example-Based Machine Translation engine (I'anEI3MT) l is a translation system reql,iring essentially no knowledge of the structure of a language, merely a large...
This paper proposes a novel maximum entropy based rule selection (MERS) model for syntax-based statistical machine translation (SMT). The MERS model combines local contextual info...