Curate a transliteration corpus from transliteration/translation pairs

15 years 9 months ago

Download wil.csie.cyut.edu.tw

Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.

Shih-Hung Wu, Yu-Te Li

Real-time Traffic

Information Retrieval | IRI 2008 | Machine Transliteration | Rule-based Machine Transliteration | Statistic Machine Transliteration |

claim paper

» Statistical transliteration for englisharabic cross language information retrieval

» Mining Name Translations from Entity Graph Mapping

» Webbased acquisition of Japanese katakana variants

Post Info
More Details (n/a)

Added	31 May 2010
Updated	31 May 2010
Type	Conference
Year	2008
Where	IRI
Authors	Shih-Hung Wu, Yu-Te Li

Comments (0)

Sciweavers

Curate a transliteration corpus from transliteration/translation pairs

Information Retrieval | IRI 2008 | Machine Transliteration | Rule-based Machine Transliteration | Statistic Machine Transliteration |

Explore & Download

Productivity Tools

Sciweavers