Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

98

ACL
2007

favoriteEmaildiscussreport

136views Computational Linguistics» more ACL 2007»

Corpus Effects on the Evaluation of Automated Transliteration Systems

15 years 2 months ago

Corpus Effects on the Evaluation of Automated Transliteration Systems

Download aclweb.org

Most current machine transliteration systems employ a corpus of known sourcetarget word pairs to train their system, and typically evaluate their systems on a similar corpus. In this paper we explore the performance of transliteration systems on corpora that are varied in a controlled way. In particular, we control the number, and prior language knowledge of human transliterators used to construct the corpora, and the origin of the source words that make up the corpora. We ﬁnd that the word accuracy of automated transliteration systems can vary by up to 30% (in absolute terms) depending on the corpus on which they are run. We conclude that at least four human transliterators should be used to construct corpora for evaluating automated transliteration systems; and that although absolute word accuracy metrics may not translate across corpora, the relative rankings of system performance remains stable across differing corpora.

Sarvnaz Karimi, Andrew Turpin, Falk Scholer

Real-time Traffic

ACL 2007 | Computational Linguistics | Human Transliterators | Machine Transliteration Systems | Transliteration Systems |

claim paper

Related Content

» Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation

» Statistical transliteration for englisharabic cross language information retrieval

» Effects of Aligned Corpus Quality and Size in CorpusBased CLIR

» Adaptation Using OutofDomain Corpus within EBMT

» A Dataset Search Engine for the Research Document Corpus

» A Coreference Corpus and Resolution System for Dutch

» Calbc Silver Standard Corpus

» Webjig An Automated User Data Collection System for Website Usability Evaluation

» Sensitivity of Automated MT Evaluation Metrics on Higher Quality MT Output BLEU vs TaskBas...

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ACL
Authors	Sarvnaz Karimi, Andrew Turpin, Falk Scholer

Comments (0)