Sciweavers

COLING
2010

Urdu and Hindi: Translation and sharing of linguistic resources

12 years 11 months ago
Urdu and Hindi: Translation and sharing of linguistic resources
Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resources such as treebanks, part of speech tagged data and parallel corpora with English are limited for both these languages. We use the translation system to share linguistic resources between the two languages. We demonstrate improvements on three tasks and show: statistical machine translation from Urdu to English is improved (0.8 in BLEU score) by using a Hindi-English parallel corpus, Hindi part of speech tagging is improved (upto 6% absolute) by using an Urdu part of speech corpus and a Hindi-English word aligner is improved by using a manually word aligned UrduEnglish corpus (upto 9% absolute...
Karthik Visweswariah, Vijil Chenthamarakshan, Nand
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Karthik Visweswariah, Vijil Chenthamarakshan, Nandakishore Kambhatla
Comments (0)