Sciweavers

COLING
2010

An Empirical Study on Web Mining of Parallel Data

12 years 11 months ago
An Empirical Study on Web Mining of Parallel Data
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract parallel sentences. This paper attempts the much more challenging task of directly searching for high-quality sentence pairs from the Web. We tackle the problem by formulating good search query using ,,Learning to Rank and by filtering noisy document pairs using IBM Model 1 alignment. End-to-end evaluation shows that the proposed approach significantly improves the performance of statistical machine translation.
Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim
Comments (0)