Sciweavers

IICAI
2007

A Java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models

13 years 5 months ago
A Java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models
In recent years statistical word alignment models have been widely used for various Natural Language Processing (NLP) problems. In this paper we describe a platform independent and object oriented implementation (in Java) of a word alignment algorithm. This algorithm is based on the first three IBM models. This is an ongoing work in which we are trying to explore the possible enhancements to the IBM models, especially for related languages like the Indian languages. We have been able to improve the performance by introducing a similarity measure (Dice coefficient), using a list of cognates and morph analyzer. Use of information about cognates is especially relevant for Indian languages because these languages have a lot of borrowed and inherited words which are common to more than one language. For our experiments on English-Hindi word alignment, we also tried to use a bilingual dictionary to bootstrap the Expectation Maximization (EM) algorithm. After training on 7399 sentence aligned...
G. Chinnappa, Anil Kumar Singh
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where IICAI
Authors G. Chinnappa, Anil Kumar Singh
Comments (0)