Exploiting conversation structure in unsupervised topic segmentation for emails

Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond T. Ng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.

Original languageEnglish
Title of host publicationEMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages388-398
Number of pages11
Publication statusPublished - 1 Dec 2010
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2010 - Cambridge, MA, United States
Duration: 9 Oct 201011 Oct 2010

Publication series

NameEMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2010
CountryUnited States
CityCambridge, MA
Period9/10/1011/10/10

    Fingerprint

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Joty, S., Carenini, G., Murray, G., & Ng, R. T. (2010). Exploiting conversation structure in unsupervised topic segmentation for emails. In EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 388-398). (EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).