Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluat...
In this paper we report on our experiments on aligning names and faces as found in images and captions of online news websites. Developing accurate technologies for linking names a...
Phi The Pham, Marie-Francine Moens, Tinne Tuytelaa...
This paper describes a method of extracting katakana words and phrases, along with their English counterparts from non-aligned monolingual web search engine query logs. The method...