Sciweavers

Share
ACL
2011

Collecting Highly Parallel Data for Paraphrase Evaluation

8 years 6 months ago
Collecting Highly Parallel Data for Paraphrase Evaluation
A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates. In addition to being simple and efficient to compute, experiments show that these metrics correlate highly with human judgments.
David Chen, William B. Dolan
Added 23 Aug 2011
Updated 23 Aug 2011
Type Journal
Year 2011
Where ACL
Authors David Chen, William B. Dolan
Comments (0)
books