Robust Measurement and Comparison of Context Similarity for Finding Translation Pairs

15 years 7 days ago

Download www.aclweb.org

In cross-language information retrieval it is often important to align words that are similar in meaning in two corpora written in different languages. Previous research shows that using context similarity to align words is helpful when no dictionary entry is available. We suggest a new method which selects a subset of words (pivot words) associated with a query and then matches these words across languages. To detect word associations, we demonstrate that a new Bayesian method for estimating Point-wise Mutual Information provides improved accuracy. In the second step, matching is done in a novel way that calculates the chance of an accidental overlap of pivot words using the hypergeometric distribution. We implemented a wide variety of previously suggested methods. Testing in two conditions, a small comparable corpora pair and a large but unrelated corpora pair, both written in disparate languages, we show that our approach consistently outperforms the other systems.

Daniel Andrade, Tetsuya Nasukawa, Jun-ichi Tsujii

Real-time Traffic

COLING 2010 | Comparable Corpora Pair | Computational Linguistics | Corpora Pair | Unrelated Corpora Pair |

claim paper

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Daniel Andrade, Tetsuya Nasukawa, Jun-ichi Tsujii

Comments (0)

Sciweavers

Robust Measurement and Comparison of Context Similarity for Finding Translation Pairs

COLING 2010 | Comparable Corpora Pair | Computational Linguistics | Corpora Pair | Unrelated Corpora Pair |

Explore & Download

Productivity Tools

Sciweavers