Sciweavers

EMNLP
2010

Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

13 years 2 months ago
Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.
Shafiq R. Joty, Giuseppe Carenini, Gabriel Murray,
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Shafiq R. Joty, Giuseppe Carenini, Gabriel Murray, Raymond T. Ng
Comments (0)