In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we invest...
We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We in...
Abstract. Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, t...
We combine the strengths of Bayesian modeling and synchronous grammar in unsupervised learning of basic translation phrase pairs. The structured space of a synchronous grammar is ...
Hao Zhang, Chris Quirk, Robert C. Moore, Daniel Gi...
We present improvements to a greedy decoding algorithm for statistical machine translation that reduce its time complexity from at least cubic ( ¢¡¤£¦¥¨§ when applied na...