A Statistical Model for Lost Language Decipherment

13 years 2 months ago

Download people.csail.mit.edu

In this paper we propose a method for the automatic decipherment of lost languages. Given a non-parallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a non-parametric Bayesian framework to simultaneously capture both low-level character mappings and highlevel morphemic correspondences. This formulation enables us to encode some of the linguistic intuitions that have guided human decipherers. When applied to the ancient Semitic language Ugaritic, the model correctly maps 29 of 30 letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for 60% of the Ugaritic words which have cognates in Hebrew.

Benjamin Snyder, Regina Barzilay, Kevin Knight

Real-time Traffic

ACL 2010 | Computational Linguistics | Correct Hebrew Cognate | Low-level Character Mappings | Non-parametric Bayesian Framework |

claim paper

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	ACL
Authors	Benjamin Snyder, Regina Barzilay, Kevin Knight

Sciweavers

A Statistical Model for Lost Language Decipherment

ACL 2010 | Computational Linguistics | Correct Hebrew Cognate | Low-level Character Mappings | Non-parametric Bayesian Framework |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers