Sciweavers

LREC
2008

CzEng 0.7: Parallel Corpus with Community-Supplied Translations

13 years 5 months ago
CzEng 0.7: Parallel Corpus with Community-Supplied Translations
This paper describes CzEng 0.7, a new release of Czech-English parallel corpus freely available for research and educational purposes. We provide basic statistics of the corpus and focus on data produced by a community of volunteers. Anonymous contributors manually correct the output of a machine translation (MT) system, generating on average 2000 sentences a month, 70% of which are indeed correct translations. We compare the utility of community-supplied and of professionally translated training data for a baseline English-to-Czech MT system.
Ondrej Bojar, Miroslav Janícek, Zdenek Zabo
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Ondrej Bojar, Miroslav Janícek, Zdenek Zabokrtský, Pavel Ceska, Peter Bena
Comments (0)