Most current sentence alignment approaches adopt sentence length and cognate as the alignment features; and they are mostly trained and tested in the documents with the same style...
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...
Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However, SMT has been advanced from word-based paradigm to p...
Xinyan Xiao, Deyi Xiong, Min Zhang, Qun Liu, Shoux...
In this paper we address the problem of translating between languages with word order disparity. The idea of augmenting statistical machine translation (SMT) by using a syntax-bas...
The output of handwritten word recognizers (HWR) tends to be very noisy due to various factors. In order to compensate for this behaviour, several choices of the HWR must be initi...