Sciweavers

41 search results - page 1 / 9
» Large Scale Parallel Document Mining for Machine Translation
Sort
View
COLING
2010
12 years 11 months ago
Large Scale Parallel Document Mining for Machine Translation
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
ACL
2008
13 years 6 months ago
Mining Parenthetical Translations from the Web by Word Alignment
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extrac...
Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari...
ACL
2011
12 years 8 months ago
A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation
This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence s...
Ming Tan, Wenli Zhou, Lei Zheng, Shaojun Wang
LREC
2008
109views Education» more  LREC 2008»
13 years 6 months ago
Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has...
Kazuaki Maeda, Xiaoyi Ma, Stephanie Strassel
AMTA
1998
Springer
13 years 8 months ago
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
Abstract. Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genreand domain-speci city, licensing restri...
Philip Resnik