Improving tweet timeline generation by predicting optimal retrieval depth

Maram Hasanain, Tamer Elsayed, Walid Magdy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Tweet Timeline Generation (TTG) systems provide users with informative and concise summaries of topics, as they developed over time, in a retrospective manner. In order to produce a tweet timeline that constitutes a summary of a given topic, a TTG system typically retrieves a list of potentially-relevant tweets over which the timeline is eventually generated. In such design, dependency of the performance of the timeline generation step on that of the retrieval step is inevitable. In this work, we aim at improving the performance of a given timeline generation system by controlling the depth of the ranked list of retrieved tweets considered in generating the timeline. We propose a supervised approach in which we predict the optimal depth of the ranked tweet list for a given topic by combining estimates of list quality computed at different depths. We conducted our experiments on a recent TREC TTG test collection of 243M tweets and 55 topics. We experimented with 14 different retrieval models (used to retrieve the initial ranked list of tweets) and 3 different TTG models (used to generate the final timeline). Our results demonstrate the effectiveness of the proposed approach; it managed to improve TTG performance over a strong baseline in 76% of the cases, out of which 31% were statistically significant, with no single significant degradation observed.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages135-146
Number of pages12
Volume9460
ISBN (Print)9783319289397
DOIs
Publication statusPublished - 2015
Event11th Asia Information Retrieval Societies Conference, AIRS 2015 - Brisbane, Australia
Duration: 2 Dec 20154 Dec 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9460
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other11th Asia Information Retrieval Societies Conference, AIRS 2015
CountryAustralia
CityBrisbane
Period2/12/154/12/15

Fingerprint

Retrieval
Degradation
Test Generation
Experiments
Baseline
Predict
Model
Estimate
Demonstrate
Experiment

Keywords

  • Dynamic retrieval cutoff
  • Microblogs
  • Query difficulty
  • Query performance prediction
  • Regression
  • Tweet summarization

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Hasanain, M., Elsayed, T., & Magdy, W. (2015). Improving tweet timeline generation by predicting optimal retrieval depth. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9460, pp. 135-146). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9460). Springer Verlag. https://doi.org/10.1007/978-3-319-28940-3_11

Improving tweet timeline generation by predicting optimal retrieval depth. / Hasanain, Maram; Elsayed, Tamer; Magdy, Walid.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9460 Springer Verlag, 2015. p. 135-146 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9460).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hasanain, M, Elsayed, T & Magdy, W 2015, Improving tweet timeline generation by predicting optimal retrieval depth. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9460, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9460, Springer Verlag, pp. 135-146, 11th Asia Information Retrieval Societies Conference, AIRS 2015, Brisbane, Australia, 2/12/15. https://doi.org/10.1007/978-3-319-28940-3_11
Hasanain M, Elsayed T, Magdy W. Improving tweet timeline generation by predicting optimal retrieval depth. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9460. Springer Verlag. 2015. p. 135-146. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-28940-3_11
Hasanain, Maram ; Elsayed, Tamer ; Magdy, Walid. / Improving tweet timeline generation by predicting optimal retrieval depth. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9460 Springer Verlag, 2015. pp. 135-146 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5fab4aa735524d98acbdf8051569b729,
title = "Improving tweet timeline generation by predicting optimal retrieval depth",
abstract = "Tweet Timeline Generation (TTG) systems provide users with informative and concise summaries of topics, as they developed over time, in a retrospective manner. In order to produce a tweet timeline that constitutes a summary of a given topic, a TTG system typically retrieves a list of potentially-relevant tweets over which the timeline is eventually generated. In such design, dependency of the performance of the timeline generation step on that of the retrieval step is inevitable. In this work, we aim at improving the performance of a given timeline generation system by controlling the depth of the ranked list of retrieved tweets considered in generating the timeline. We propose a supervised approach in which we predict the optimal depth of the ranked tweet list for a given topic by combining estimates of list quality computed at different depths. We conducted our experiments on a recent TREC TTG test collection of 243M tweets and 55 topics. We experimented with 14 different retrieval models (used to retrieve the initial ranked list of tweets) and 3 different TTG models (used to generate the final timeline). Our results demonstrate the effectiveness of the proposed approach; it managed to improve TTG performance over a strong baseline in 76{\%} of the cases, out of which 31{\%} were statistically significant, with no single significant degradation observed.",
keywords = "Dynamic retrieval cutoff, Microblogs, Query difficulty, Query performance prediction, Regression, Tweet summarization",
author = "Maram Hasanain and Tamer Elsayed and Walid Magdy",
year = "2015",
doi = "10.1007/978-3-319-28940-3_11",
language = "English",
isbn = "9783319289397",
volume = "9460",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "135--146",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Improving tweet timeline generation by predicting optimal retrieval depth

AU - Hasanain, Maram

AU - Elsayed, Tamer

AU - Magdy, Walid

PY - 2015

Y1 - 2015

N2 - Tweet Timeline Generation (TTG) systems provide users with informative and concise summaries of topics, as they developed over time, in a retrospective manner. In order to produce a tweet timeline that constitutes a summary of a given topic, a TTG system typically retrieves a list of potentially-relevant tweets over which the timeline is eventually generated. In such design, dependency of the performance of the timeline generation step on that of the retrieval step is inevitable. In this work, we aim at improving the performance of a given timeline generation system by controlling the depth of the ranked list of retrieved tweets considered in generating the timeline. We propose a supervised approach in which we predict the optimal depth of the ranked tweet list for a given topic by combining estimates of list quality computed at different depths. We conducted our experiments on a recent TREC TTG test collection of 243M tweets and 55 topics. We experimented with 14 different retrieval models (used to retrieve the initial ranked list of tweets) and 3 different TTG models (used to generate the final timeline). Our results demonstrate the effectiveness of the proposed approach; it managed to improve TTG performance over a strong baseline in 76% of the cases, out of which 31% were statistically significant, with no single significant degradation observed.

AB - Tweet Timeline Generation (TTG) systems provide users with informative and concise summaries of topics, as they developed over time, in a retrospective manner. In order to produce a tweet timeline that constitutes a summary of a given topic, a TTG system typically retrieves a list of potentially-relevant tweets over which the timeline is eventually generated. In such design, dependency of the performance of the timeline generation step on that of the retrieval step is inevitable. In this work, we aim at improving the performance of a given timeline generation system by controlling the depth of the ranked list of retrieved tweets considered in generating the timeline. We propose a supervised approach in which we predict the optimal depth of the ranked tweet list for a given topic by combining estimates of list quality computed at different depths. We conducted our experiments on a recent TREC TTG test collection of 243M tweets and 55 topics. We experimented with 14 different retrieval models (used to retrieve the initial ranked list of tweets) and 3 different TTG models (used to generate the final timeline). Our results demonstrate the effectiveness of the proposed approach; it managed to improve TTG performance over a strong baseline in 76% of the cases, out of which 31% were statistically significant, with no single significant degradation observed.

KW - Dynamic retrieval cutoff

KW - Microblogs

KW - Query difficulty

KW - Query performance prediction

KW - Regression

KW - Tweet summarization

UR - http://www.scopus.com/inward/record.url?scp=84958074535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958074535&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-28940-3_11

DO - 10.1007/978-3-319-28940-3_11

M3 - Conference contribution

SN - 9783319289397

VL - 9460

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 135

EP - 146

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -