Sciweavers

LREC
2010

Enhanced Infrastructure for Creation and Collection of Translation Resources

13 years 6 months ago
Enhanced Infrastructure for Creation and Collection of Translation Resources
Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system training and development. This paper describes recent efforts at Linguistic Data Consortium to create linguistic resources for MT, including corpora, specifications and resource infrastructure. We review LDC's three-pronged approach to parallel text corpus development (acquisition of existing parallel text from known repositories, harvesting and aligning of potential parallel documents from the web, and manual creation of parallel text by professional translators), and describe recent adaptations that have enabled significant expansions in the scope, variety, quality, efficiency and cost-effectiveness of translation resource creation at LDC.
Zhiyi Song, Stephanie Strassel, Gary Krug, Kazuaki
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Zhiyi Song, Stephanie Strassel, Gary Krug, Kazuaki Maeda
Comments (0)