Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages

13 years 5 months ago

Download eprints.pascal-network.org

We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four “morphologically rich” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) rates, whereas the morph LMs can recognize previously unseen word forms by concatenating morphs. We show that the morph LMs generally outperform the word LMs and that they perform fairly well on OOVs without compromising the accuracy obtained for in-vocabulary words.

Mathias Creutz, Teemu Hirsimäki, Mikko Kurimo

Real-time Traffic

Computational Linguistics | Morph Lms | NAACL 2007 | Standard Word Lms | Word Lms |

claim paper

» Recognition Performance of a Structured Language Model

» Pronunciation modeling by sharing Gaussian densities across phonetic models

» Handling OutofVocabulary Words and Recognition Errors Based on Word Linguistic Context for...

» Relevance language modeling for speech recognition

» Recovery of Rare Words in Lecture Speech

» SCARF a segmental conditional random field toolkit for speech recognition

» A Structured Language Model

» Why Is the Recognition of Spontaneous Speech so Hard

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2007
Where	NAACL
Authors	Mathias Creutz, Teemu Hirsimäki, Mikko Kurimo, Antti Puurula, Janne Pylkkönen, Vesa Siivola, Matti Varjokallio, Ebru Arisoy, Murat Saraclar, Andreas Stolcke

Comments (0)

Sciweavers

Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages

Computational Linguistics | Morph Lms | NAACL 2007 | Standard Word Lms | Word Lms |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers