Sciweavers

ACL
1997

An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques

13 years 5 months ago
An Alignment Method for Noisy Parallel Corpora based on Image Processing Techniques
This paper presents a new approach to bitext correspondence problem (BCP) of noisy bilingual corpora based on image processing (IP) techniques. By using one of several ways of estimating the lexical translation probability (LTP) between pairs of source and target words, we can turn a bitext into a discrete gray-level image. We contend that the BCP, when seen in this light, bears a striking resemblance to the line detection problem in IP. Therefore, BCPs, including sentence and word alignment, can benefit from a wealth of effective, well established IP techniques, including convolution-based filters, texture analysis and Hough transform. This paper describes a new program, PlotAlign that produces a word-level bitext map for noisy or non-literal bitext, based on these techniques.
Jason S. Chang, Mathis H. M. Chen
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1997
Where ACL
Authors Jason S. Chang, Mathis H. M. Chen
Comments (0)