Sciweavers

TREC
1998

Spoken Document Retrieval For TREC-7 At Cambridge University

13 years 5 months ago
Spoken Document Retrieval For TREC-7 At Cambridge University
This paper presents work done at Cambridge University, on the TREC7 Spoken Document Retrieval (SDR) Track. The broadcast news audio was transcribed using a 2-pass gender-dependent HTK speech recogniser which ran at 50 times real time and gave an overall word error rate of 24.8%, the lowest in the track. The Okapi-based retrieval engine used in TREC-6 by the City/Cambridge University collaboration was supplemented by improving the stop-list, adding a bad-spelling mapper and stemmer exceptions list, adding word-pair information, integrating part-of-speech weighting on query terms and including some pre-search statistical expansion. The final system gave an average precision of 0.4817 on the reference and 0.4509 on the automatic transcription, with the R-precision being 0.4603 and 0.4330 respectively. The paper also presents results on a new set of 60 queries with assessments for the TREC-6 test document data used for development purposes, and analyses the relationship between recognitio...
Sue E. Johnson, P. Jourlin, G. L. Moore, Karen Spa
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1998
Where TREC
Authors Sue E. Johnson, P. Jourlin, G. L. Moore, Karen Sparck Jones, Philip C. Woodland
Comments (0)