Sciweavers

ACL
2010

The Human Language Project: Building a Universal Corpus of the World's Languages

13 years 2 months ago
The Human Language Project: Building a Universal Corpus of the World's Languages
We present a grand challenge to build a corpus that will include all of the world's languages, in a consistent structure that permits large-scale cross-linguistic processing, enabling the study of universal linguistics. The focal data types, bilingual texts and lexicons, relate each language to one of a set of reference languages. We propose that the ability to train systems to translate into and out of a given language be the yardstick for determining when we have successfully captured a language. We call on the computational linguistics community to begin work on this Universal Corpus, pursuing the many strands of activity described here, as their contribution to the global effort to document the world's linguistic heritage before more languages fall silent.
Steven P. Abney, Steven Bird
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where ACL
Authors Steven P. Abney, Steven Bird
Comments (0)