Unsupervised Discovery of Morphemes

8 years 8 months ago
Unsupervised Discovery of Morphemes
We present two methods for unsupervised segmentation of words into morphemelike units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second method, Maximum Likelihood (ML) optimization is used. The quality of the segmentations is measured using an evaluation method that compares the segmentations produced to an existing morphological analysis. Experiments on both Finnish and English corpora show that the presented methods perform well compared to a current stateof-the-art system.
Mathias Creutz, Krista Lagus
Added 18 Dec 2010
Updated 18 Dec 2010
Type Journal
Year 2002
Where CORR
Authors Mathias Creutz, Krista Lagus
Comments (0)