Sciweavers

EMNLP
2004

Learning Hebrew Roots: Machine Learning with Linguistic Constraints

13 years 6 months ago
Learning Hebrew Roots: Machine Learning with Linguistic Constraints
The morphology of Semitic languages is unique in the sense that the major word-formation mechanism is an inherently non-concatenative process of interdigitation, whereby two morphemes, a root and a pattern, are interwoven. Identifying the root of a given word in a Semitic language is an important task, in some cases a crucial part of morphological analysis. It is also a non-trivial task, which many humans find challenging. We present a machine learning approach to the problem of extracting roots of Hebrew words. Given the large number of potential roots (thousands), we address the problem as one of combining several classifiers, each predicting the value of one of the root's consonants. We show that when these predictors are combined by enforcing some fairly simple linguistics constraints, high accuracy, which compares favorably with human performance on this task, can be achieved.
Ezra Daya, Dan Roth, Shuly Wintner
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where EMNLP
Authors Ezra Daya, Dan Roth, Shuly Wintner
Comments (0)