Exploiting conversation structure in unsupervised topic segmentation for emails

Shafiq Rayhan Joty, Giuseppe Carenini, Gabriel Murray, Raymond T. Ng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.

Original languageEnglish
Title of host publicationEMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages388-398
Number of pages11
Publication statusPublished - 1 Dec 2010
Externally publishedYes
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2010 - Cambridge, MA, United States
Duration: 9 Oct 201011 Oct 2010

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2010
CountryUnited States
CityCambridge, MA
Period9/10/1011/10/10

Fingerprint

Electronic mail
Information use

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Rayhan Joty, S., Carenini, G., Murray, G., & Ng, R. T. (2010). Exploiting conversation structure in unsupervised topic segmentation for emails. In EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 388-398)

Exploiting conversation structure in unsupervised topic segmentation for emails. / Rayhan Joty, Shafiq; Carenini, Giuseppe; Murray, Gabriel; Ng, Raymond T.

EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2010. p. 388-398.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rayhan Joty, S, Carenini, G, Murray, G & Ng, RT 2010, Exploiting conversation structure in unsupervised topic segmentation for emails. in EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 388-398, Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, Cambridge, MA, United States, 9/10/10.
Rayhan Joty S, Carenini G, Murray G, Ng RT. Exploiting conversation structure in unsupervised topic segmentation for emails. In EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2010. p. 388-398
Rayhan Joty, Shafiq ; Carenini, Giuseppe ; Murray, Gabriel ; Ng, Raymond T. / Exploiting conversation structure in unsupervised topic segmentation for emails. EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2010. pp. 388-398
@inproceedings{d9a7eed77e444e59ba8adb979ee0c0c0,
title = "Exploiting conversation structure in unsupervised topic segmentation for emails",
abstract = "This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.",
author = "{Rayhan Joty}, Shafiq and Giuseppe Carenini and Gabriel Murray and Ng, {Raymond T.}",
year = "2010",
month = "12",
day = "1",
language = "English",
isbn = "1932432868",
pages = "388--398",
booktitle = "EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

TY - GEN

T1 - Exploiting conversation structure in unsupervised topic segmentation for emails

AU - Rayhan Joty, Shafiq

AU - Carenini, Giuseppe

AU - Murray, Gabriel

AU - Ng, Raymond T.

PY - 2010/12/1

Y1 - 2010/12/1

N2 - This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.

AB - This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.

UR - http://www.scopus.com/inward/record.url?scp=80053279502&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053279502&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053279502

SN - 1932432868

SN - 9781932432862

SP - 388

EP - 398

BT - EMNLP 2010 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ER -