Sciweavers

ICASSP
2011
IEEE

Leveraging the Web for automatically generating indexable and browsable keywords for speech files

12 years 8 months ago
Leveraging the Web for automatically generating indexable and browsable keywords for speech files
This paper presents a method for generating indexable and browsable keyword metadata from ASR transcripts by leveraging the Web. Search engine queries are built from an ASR transcript and used to retrieve similar text from the Web. The keyword meta information embedded in those pages for search engines is then ranked using a mutual information criteria to derive a keyword set. The proposed method is training-free, allows phrase keyword generation, and can generate words that were not spoken in the ASR transcript, alleviating the impact of ASR out-of-vocabulary. Subjective evaluations on technical presentations demonstrate a clear preference for this approach. Additionally an objective measure of keyword generation performance is proposed and shown to be a useful guide for tuning compared to more onerous subjective evaluations.
Kishan Thambiratnam, Gang Li, Sha Meng, Frank Seid
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Kishan Thambiratnam, Gang Li, Sha Meng, Frank Seide
Comments (0)