Adapting Chinese Word Segmentation for Machine Translation Based on Short Units

15 years 5 months ago

Download www.lrec-conf.org

In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an important first step in machine translation (MT) and its performance impacts MT results. Many factors affect Chinese word segmentations, including the segmentation standards and segmentation strategies. The performance of a corpus-based word segmentation model depends heavily on the quality and the segmentation standard of the training corpora. However, we observed that existing manually annotated Chinese corpora tend to have low segmentation granularity and provide poor morphological information due to the present segmentation standards. In this paper, we introduce a short-unit standard of Chinese word segmentation, which is particularly suitable for machine translation, and propose a semi-automatic method of transforming the existing corpora into the ones that can satisfy our standards. We evaluate the usef...

Yiou Wang, Kiyotaka Uchimoto, Jun'ichi Kazama, Can

Real-time Traffic

Chinese Word Segmentation | Education | LREC 2010 | Segmentation Standards | Word Segmentation |

claim paper

» Mining Bilingual Data from the Web with Adaptively Learnt Patterns

» Segmentation and alignment of parallel text for statistical machine translation

» BilingualLSA Based LM Adaptation for Spoken Language Translation

» Automatic extraction of bilingual terms from a ChineseJapanese parallel corpus

» A PhraseBased Statistical Model for SMS Text Normalization

» Discriminative Keyword Selection Using Support Vector Machines

» Compound Nouns in a UnificationBased MT System

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Yiou Wang, Kiyotaka Uchimoto, Jun'ichi Kazama, Canasai Kruengkrai, Kentaro Torisawa

Comments (0)

Sciweavers

Adapting Chinese Word Segmentation for Machine Translation Based on Short Units

Chinese Word Segmentation | Education | LREC 2010 | Segmentation Standards | Word Segmentation |

Explore & Download

Productivity Tools

Sciweavers