Confidence estimation, OOV detection and language ID using phone-to-word transduction and phone-level alignments

15 years 10 months ago

Download research.microsoft.com

Automatic Speech Recognition (ASR) systems continue to make errors during search when handling various phenomena including noise, pronunciation variation, and out of vocabulary (OOV) words. Predicting the probability that a word is incorrect can prevent the error from propagating and perhaps allow the system to recover. This paper addresses the problem of detecting errors and OOVs for read Wall Street Journal speech when the word error rate (WER) is very low. It augments a traditional conﬁdence estimate by introducing two novel methods: phone-level comparison using Multi-String Alignment (MSA) and word-level comparison using phone-to-word transduction. We show that features from phone and word string comparisons can be added to a standard maximum entropy framework thereby substantially improving performance in detecting both errors and OOVs. Additionally we show an extension to detecting English and accented English for the Language Identiﬁcation (LID) task.

Christopher M. White, Geoffrey Zweig, Lukas Burget

Real-time Traffic

Automatic Speech Recognition | ICASSP 2008 | Signal Processing | Word Error Rate | Word String Comparisons |

claim paper

Post Info
More Details (n/a)

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICASSP
Authors	Christopher M. White, Geoffrey Zweig, Lukas Burget, Petr Schwarz, Hynek Hermansky

Comments (0)

Sciweavers

Confidence estimation, OOV detection and language ID using phone-to-word transduction and phone-level alignments

Automatic Speech Recognition | ICASSP 2008 | Signal Processing | Word Error Rate | Word String Comparisons |

Explore & Download

Productivity Tools

Sciweavers