We present a novel Evaluation Metric for Morphological Analysis (EMMA) that is both linguistically appealing and empirically sound. EMMA uses a graphbased assignment algorithm, op...
We present two methods for unsupervised segmentation of words into morphemelike units. The model utilized is especially suited for languages with a rich morphology, such as Finnis...
We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four “morphologically rich” languages: Finnish, Estonian, Turkish, and ...
We present the first results on parsing the SYNTAGRUS treebank of Russian with a data-driven dependency parser, achieving a labeled attachment score of over 82% and an unlabeled a...
Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic ...