Sciweavers

MT
2007

Automatic extraction of translations from web-based bilingual materials

13 years 4 months ago
Automatic extraction of translations from web-based bilingual materials
This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares webbased translation texts of Statistics Canada (StatCan) news releases in the StatCan publication The Daily. The goal is to extract translations for translation memory systems, for translation terminology building, for cross-language information retrieval and for corpus-based machine translation systems. Three years of officially published statistical news release texts at www.statcan.ca were collected to compose the StatCan Daily data bank. The English and French texts in this collection were roughly aligned using the Gale-Church statistical algorithm. After this, boundary markers of text segments and paragraphs were adjusted and the Gale-Church algorithm was run a second time for a more fine-grained text segment alignment. To detect misaligned areas of texts and to prevent mis-matched translation pairs from being selected, key textual and structural...
Qibo Zhu, Diana Zaiu Inkpen, Ash Asudeh
Added 27 Dec 2010
Updated 27 Dec 2010
Type Journal
Year 2007
Where MT
Authors Qibo Zhu, Diana Zaiu Inkpen, Ash Asudeh
Comments (0)