Sciweavers

AMTA
2004
Springer

A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings Between Languages

13 years 9 months ago
A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings Between Languages
Abstract. We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferring transfer mappings for Machine Translation. The corpus is sufficiently small that a bilingual user can translate and wordalign it within a matter of hours. We describe how the corpus is created and how its structural diversity is ensured. We then argue that it is not necessary to introduce a large amount of redundancy into the corpus. This is shown by creating an increasingly redundant corpus and observing that the information gained converges as redundancy increases.1
Katharina Probst, Alon Lavie
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Where AMTA
Authors Katharina Probst, Alon Lavie
Comments (0)