A Robust Cross-Style Bilingual Sentences Alignment Model

8 years 10 months ago
A Robust Cross-Style Bilingual Sentences Alignment Model
Most current sentence alignment approaches adopt sentence length and cognate as the alignment features; and they are mostly trained and tested in the documents with the same style. Since the length distribution, alignment-type distribution (used by length-based approaches) and cognate frequency vary significantly across texts with different styles, the length-based approaches fail to achieve similar performance when tested in corpora of different styles. The experiments show that the performance in F-measure could drop from 98.2% to 85.6% when a length-based approach is trained by a technical manual and then tested on a general magazine. Since a large percentage of content words in the source text would be translated into the corresponding translation duals to preserve the meaning in the target text, transfer lexicons are usually regarded as more reliable cues for aligning sentences when the alignment task is performed by human. To enhance the robustness, a robust statistical model ba...
Tz-Liang Kueng, Keh-Yih Su
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Authors Tz-Liang Kueng, Keh-Yih Su
Comments (0)