Sciweavers

MM
2015
ACM

SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents

8 years 6 days ago
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents
Topic models such as Latent Dirichlet Allocation (LDA) [3] have been extensively used for characterizing text collections according to the topics discussed in documents. Organizing documents according to topic can be applied to different information access tasks such as document clustering, content-based recommendation or summarization. Spoken documents such as podcasts typically involve more than one speaker (e.g., meetings, interviews, chat shows or news with reporters). This paper presents a work-inprogress based on a variation of LDA that includes in the model the different speakers participating in conversational audio transcripts. Intuitively, each speaker has her own background knowledge which generates different topic and word distributions. We believe that informing a topic model with speaker segmentation (e.g., using existing speaker diarization techniques) may enhance discovery of topics in multi-speaker audio content. Categories and Subject Descriptors H.5.1 [Multimedia In...
Damiano Spina, Johanne R. Trippas, Lawrence Cavedo
Added 14 Apr 2016
Updated 14 Apr 2016
Type Journal
Year 2015
Where MM
Authors Damiano Spina, Johanne R. Trippas, Lawrence Cavedon, Mark Sanderson
Comments (0)