Sciweavers

COLING
2010
12 years 11 months ago
Large Scale Parallel Document Mining for Machine Translation
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
ACL
2009
13 years 2 months ago
Active Learning for Multilingual Statistical Machine Translation
Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneous...
Gholamreza Haffari, Anoop Sarkar
NLE
2007
180views more  NLE 2007»
13 years 4 months ago
Segmentation and alignment of parallel text for statistical machine translation
We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a s...
Yonggang Deng, Shankar Kumar, William Byrne
LREC
2008
109views Education» more  LREC 2008»
13 years 6 months ago
Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has...
Kazuaki Maeda, Xiaoyi Ma, Stephanie Strassel
LREC
2010
164views Education» more  LREC 2010»
13 years 6 months ago
Enhanced Infrastructure for Creation and Collection of Translation Resources
Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system trainin...
Zhiyi Song, Stephanie Strassel, Gary Krug, Kazuaki...