Sciweavers

ACTAC
2008

Sentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm

13 years 4 months ago
Sentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm
We present an ecient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor nding methods. The accuracy of nding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair. Key words: sentence segmentation, sentence alignment, hybrid method, lengthbased alignment, Named Entity recognition, anchor, cognates, dynamic programming
Krisztina Tóth, Richárd Farkas, Andr
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2008
Where ACTAC
Authors Krisztina Tóth, Richárd Farkas, András Kocsor
Comments (0)