Why word error rate is not a good metric for speech recognizer training for the speech translation task?

14 years 9 months ago

Download mirlab.org

Speech translation (ST) is an enabling technology for cross-lingual oral communication. A ST system consists of two major components: an automatic speech recognizer (ASR) and a machine translator (MT). Nowadays, most ASR systems are trained and tuned by minimizing word error rate (WER). However, WER counts word errors at the surface level. It does not consider the contextual and syntactic roles of a word, which are often critical for MT. In the end-to-end ST scenarios, whether WER is a good metric for the ASR component of the full ST system is an open issue and lacks systematic studies. In this paper, we report our recent investigation on this issue, focusing on the interactions of ASR and MT in a ST system. We show that BLEU-oriented global optimization of ASR system parameters improves the translation

Xiaodong He, Li Deng, Alex Acero

Real-time Traffic

Cross-lingual Oral Communication | ICASSP 2011 | Signal Processing | Word Error Rate | Word Errors |

claim paper

» Robust speech recognition using multiple prior models for speech reconstruction

» Robust speech recognition using dynamic noise adaptation

» Exemplarbased Sparse Representation phone identification features

» The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings

Post Info
More Details (n/a)

Added	21 Aug 2011
Updated	21 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Xiaodong He, Li Deng, Alex Acero

Comments (0)

Sciweavers

Why word error rate is not a good metric for speech recognizer training for the speech translation task?

Cross-lingual Oral Communication | ICASSP 2011 | Signal Processing | Word Error Rate | Word Errors |

Explore & Download

Productivity Tools

Sciweavers