Sciweavers

COLING
2010

Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm

12 years 11 months ago
Fast-Champollion: A Fast and Robust Sentence Alignment Algorithm
Sentence-level aligned parallel texts are important resources for a number of natural language processing (NLP) tasks and applications such as statistical machine translation and cross-language information retrieval. With the rapid growth of online parallel texts, efficient and robust sentence alignment algorithms become increasingly important. In this paper, we propose a fast and robust sentence alignment algorithm, i.e., FastChampollion, which employs a combination of both length-based and lexiconbased algorithm. By optimizing the process of splitting the input bilingual texts into small fragments for alignment, FastChampollion, as our extensive experiments show, is 4.0 to 5.1 times as fast as the current baseline methods such as Champollion (Ma, 2006) on short texts and achieves about 39.4 times as fast on long texts, and Fast-Champollion is as robust as Champollion.
Peng Li, Maosong Sun, Ping Xue
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Peng Li, Maosong Sun, Ping Xue
Comments (0)