Sciweavers

ICASSP
2011
IEEE

Using latent topic features to improve binary classification of spoken documents

12 years 8 months ago
Using latent topic features to improve binary classification of spoken documents
In many topic identification applications, supervised training labels are indirectly related to the semantic content of the documents being classified. For example, many topically distinct emails will all be assigned a single broad category label of “spam” or “not-spam”, and a two-class classifier will lack direct knowledge of the underlying topic structure. This paper examines the degradation of topic identification performance on conversational speech when multiple semantic topics are combined into a single broad category. We then develop techniques using document clustering and Latent Dirchlet Allocation (LDA) to exploit the underlying semantic topics which improve performance over classifiers trained on the single category label by up to 20%.
Jonathan Wintrode
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Jonathan Wintrode
Comments (0)