Using morpheme and syllable based sub-words for polish LVCSR

14 years 9 months ago

Download www-i6.informatik.rwth-aachen.de

Polish is a synthetic language with a high morpheme-perword ratio. It makes use of a high degree of inﬂection leading to high out-of-vocabulary (OOV) rates, and high Language Model (LM) perplexities. This poses a challenge for Large Vocabulary and Continuous Speech Recognition (LVCSR) systems. Here, the use of morpheme and syllable based units is investigated for building sub-lexical LMs. A different type of sub-lexical units is proposed based on combining morphemic or syllabic units with corresponding pronunciations. Thereby, a set of grapheme-phoneme pairs called graphones are used for building LMs. A relative reduction of 3.5% in Word Error Rate (WER) is obtained with respect to a traditional system based on full-words.

M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schl

Real-time Traffic

Continuous Speech Recognition | ICASSP 2011 | Signal Processing | Synthetic Language | Word Error Rate |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney

Comments (0)

Sciweavers

Using morpheme and syllable based sub-words for polish LVCSR

Continuous Speech Recognition | ICASSP 2011 | Signal Processing | Synthetic Language | Word Error Rate |

Explore & Download

Productivity Tools

Sciweavers