Sciweavers

LREC
2008

On the Use of Web Resources and Natural Language Processing Techniques to Improve Automatic Speech Recognition Systems

13 years 5 months ago
On the Use of Web Resources and Natural Language Processing Techniques to Improve Automatic Speech Recognition Systems
Language models used in current automatic speech recognition systems are trained on general-purpose corpora and are therefore not relevant to transcribe spoken documents dealing with successive precise topics, such as long multimedia streams, frequently tackling reports and debates. To overcome this problem, this paper shows that Web resources and natural language processing techniques can be effective to automatically collect a topic specific corpora from the Internet in order to adapt the baseline language model of an automatic speech recognition system. We detail how to characterize the topic of a segment and how to collect Web pages from which a topicspecific language model can be trained. We finally present experiments where an adapted language model is obtained by combining the topic-specific language model with the general purpose one to obtain new transcriptions. The results show that our topic adaptation technique leads to significant transcription quality gains.
Gwénolé Lecorvé, Guillaume Gr
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot
Comments (0)