Joint Tokenization and Translation

13 years 4 months ago

Download nlp.ict.ac.cn

As tokenization is usually ambiguous for many natural languages such as Chinese and Korean, tokenization errors might potentially introduce translation mistakes for translation systems that rely on 1-best tokenizations. While using lattices to offer more alternatives to translation systems have elegantly alleviated this problem, we take a further step to tokenize and translate jointly. Taking a sequence of atomic units that can be combined to form words in different ways as input, our joint decoder produces a tokenization on the source side and a translation on the target side simultaneously. By integrating tokenization and translation features in a discriminative framework, our joint decoder outperforms the baseline translation systems using 1-best tokenizations and lattices significantly on both ChineseEnglish and Korean-Chinese tasks. Interestingly, as a tokenizer, our joint decoder achieves significant improvements over monolingual Chinese tokenizers.

Xinyan Xiao, Yang Liu, Young-Sook Hwang, Qun Liu,

Real-time Traffic

COLING 2010 | Computational Linguistics | Joint Decoder | Tokenization | Translation Systems |

claim paper

» Unsupervised Tokenization for Machine Translation

» Effective Translation Tokenization and Combination for CrossLingual Retrieval

» JHUAPL Experiments in Tokenization and Nonword Translation

» Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM ...

» Enriching Statistical Translation Models Using a DomainIndependent Multilingual Lexical Kn...

» Joint Parsing and Translation

» Models of Cooccurrence

» Unsupervised cleansing of noisy text

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Xinyan Xiao, Yang Liu, Young-Sook Hwang, Qun Liu, Shouxun Lin

Comments (0)

Sciweavers

Joint Tokenization and Translation

COLING 2010 | Computational Linguistics | Joint Decoder | Tokenization | Translation Systems |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers